Visualize CSV Mini-App#

  • Jupyter: File -> Make a copy Colab: File -> Save a copy in Drive

  • Run notebook cells by pressing shift-enter

  • Either edit annd run top cells one-by-one, or edit and run the self-contained version at the bottom

[1]:
#!pip install graphistry -q
[3]:
import pandas as pd
import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

1. Upload csv#

Use a file by uploading it or via URL.

Run help(pd.read_csv) for more options.

File Upload: Jupyter Notebooks#

  • If circle on top right not green, click kernel -> reconnect

  • Go to file directory (/tree) by clicking the Jupyter logo

  • Navigate to the directory page containing your notebook

  • Press the upload button on the top right

File Upload: Google Colab#

  • Open the left sidebar by pressing the right arrow on the left

  • Go to the Files tab

  • Press UPLOAD

  • Make sure goes into /content

File Upload: URL#

  • Uncomment below line and put in the actual data url

  • Run help(pd.read_csv) for more options

[4]:
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print('# rows', len(df))
df.sample(min(len(df), 3))
('# rows', 220)
[4]:
attackerIP victimIP victimPort vulnName count time(max) time(min)
145 41.230.211.128 172.31.14.66 445.0 MS08067 (NetAPI) 2 1.421730e+09 1.421729e+09
25 122.121.202.157 172.31.14.66 445.0 MS08067 (NetAPI) 8 1.423612e+09 1.423611e+09
75 182.68.160.230 172.31.14.66 445.0 MS08067 (NetAPI) 9 1.417438e+09 1.417436e+09

2. Optional: Clean up CSV#

[5]:
df = df.rename(columns={
#    'attackerIP': 'src_ip',
#    'victimIP': 'dest_ip'
})

df.sample(3)
[5]:
attackerIP victimIP victimPort vulnName count time(max) time(min)
70 182.161.224.84 172.31.14.66 139.0 MS08067 (NetAPI) 4 1.419954e+09 1.419952e+09
10 115.115.227.82 172.31.14.66 445.0 MS08067 (NetAPI) 2 1.413569e+09 1.413569e+09
152 46.130.76.13 172.31.14.66 445.0 MS08067 (NetAPI) 7 1.421093e+09 1.421092e+09

3. Configure: Visualize with 3 kinds of graphs#

Set mode and the corresponding values:

Mode “A”. See graph from table of (src,dst) edges#

Mode “B”. See hypergraph: Draw row as node and connect it to entities in same row#

  • Pick which cols to make nodes

  • If multiple cols share same type (e.g., “src_ip”, “dest_ip” are both “ip”), unify them

Mode “C”. See by creating multiple nodes, edges per row#

  • Pick how different column values point to other column values

  • If multiple cols share same type (e.g., “src_ip”, “dest_ip” are both “ip”), unify them

[6]:
#Pick 'A', 'B', or 'C'
mode = 'B'
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'



### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'vulnName']
categories = { #optional
    'ip': ['attacker_IP', 'victimIP']
    #, 'user': ['owner', 'seller'],
}



### 'C' == mode
edges = {
      'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
      'victimIP': [ 'victimPort'],
      'vulnName': [ 'victimIP' ]
}
categories = { #optional
      'ip': ['attackerIP', 'victimIP']
       #, user': ['owner', 'seller'], ...
}

4. Plot: Upload & render!#

[75]:
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']

#hg
print(len(g._edges))

g.plot()
('# links', 1100)
('# events', 220)
('# attrib entities', 221)
1100
[75]:

Alternative: Combined#

Split into data loading and cleaning/configuring/plotting.

[59]:
#!pip install graphistry -q
import pandas as pd
import graphistry
#graphistry.register(key='MY_KEY', server='hub.graphistry.com')


##########
#1. Load
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print(df.columns)
print('rows:', len(df))
print(df.sample(min(len(df),3)))
Index([u'attackerIP', u'victimIP', u'victimPort', u'vulnName', u'count',
       u'time(max)', u'time(min)'],
      dtype='object')
('rows:', 220)
         attackerIP      victimIP  victimPort             vulnName  count  \
81  187.143.247.231  172.31.14.66       445.0      MS04011 (LSASS)      1
47   151.252.204.92  172.31.14.66       139.0     MS08067 (NetAPI)      1
41     125.64.35.68  172.31.14.66      9999.0  MaxDB Vulnerability      6

       time(max)     time(min)
81  1.420657e+09  1.420657e+09
47  1.422929e+09  1.422929e+09
41  1.420915e+09  1.417479e+09
[79]:
##########
#2. Clean
#df = df.rename(columns={'attackerIP': 'src_ip', 'victimIP: 'dest_ip', 'victimPort': 'protocol'})


##########
#3. Config - Pick 'A', 'B', or 'C'
mode = 'C'
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'

### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'victimPort', 'vulnName']
categories = { #optional
    'ip': ['src_ip', 'dest_ip']
    #, 'user': ['owner', 'seller'],
}

### 'C' == mode
edges = {
    'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
    'victimIP': [ 'victimPort' ],
    'vulnName': ['victimIP' ]
}
categories = { #optional
    'ip': ['attackerIP', 'victimIP']
    #, 'user': ['owner', 'seller'], ...
}

##########
#4. Plot
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']


g.plot()
('# links', 1100)
('# events', 220)
('# attrib entities', 221)
[79]:
[ ]: