Visualize CSV Mini-App#

Jupyter: File -> Make a copy Colab: File -> Save a copy in Drive
Run notebook cells by pressing shift-enter
Either edit annd run top cells one-by-one, or edit and run the self-contained version at the bottom

[1]:

#!pip install graphistry -q

[3]:

import pandas as pd
import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html

1. Upload csv#

Use a file by uploading it or via URL.

Run help(pd.read_csv) for more options.

File Upload: Jupyter Notebooks#

If circle on top right not green, click kernel -> reconnect
Go to file directory (/tree) by clicking the Jupyter logo
Navigate to the directory page containing your notebook
Press the upload button on the top right

File Upload: Google Colab#

Open the left sidebar by pressing the right arrow on the left
Go to the Files tab
Press UPLOAD
Make sure goes into /content

File Upload: URL#

Uncomment below line and put in the actual data url
Run help(pd.read_csv) for more options

[4]:

file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print('# rows', len(df))
df.sample(min(len(df), 3))

('# rows', 220)

[4]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
145	41.230.211.128	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.421730e+09	1.421729e+09
25	122.121.202.157	172.31.14.66	445.0	MS08067 (NetAPI)	8	1.423612e+09	1.423611e+09
75	182.68.160.230	172.31.14.66	445.0	MS08067 (NetAPI)	9	1.417438e+09	1.417436e+09

2. Optional: Clean up CSV#

[5]:

df = df.rename(columns={
#    'attackerIP': 'src_ip',
#    'victimIP': 'dest_ip'
})

df.sample(3)

[5]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
70	182.161.224.84	172.31.14.66	139.0	MS08067 (NetAPI)	4	1.419954e+09	1.419952e+09
10	115.115.227.82	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.413569e+09	1.413569e+09
152	46.130.76.13	172.31.14.66	445.0	MS08067 (NetAPI)	7	1.421093e+09	1.421092e+09

3. Configure: Visualize with 3 kinds of graphs#

Set mode and the corresponding values:

Mode “A”. See graph from table of (src,dst) edges#

Mode “B”. See hypergraph: Draw row as node and connect it to entities in same row#

Pick which cols to make nodes
If multiple cols share same type (e.g., “src_ip”, “dest_ip” are both “ip”), unify them

Mode “C”. See by creating multiple nodes, edges per row#

Pick how different column values point to other column values
If multiple cols share same type (e.g., “src_ip”, “dest_ip” are both “ip”), unify them

[6]:

#Pick 'A', 'B', or 'C'
mode = 'B'
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'



### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'vulnName']
categories = { #optional
    'ip': ['attacker_IP', 'victimIP']
    #, 'user': ['owner', 'seller'],
}



### 'C' == mode
edges = {
      'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
      'victimIP': [ 'victimPort'],
      'vulnName': [ 'victimIP' ]
}
categories = { #optional
      'ip': ['attackerIP', 'victimIP']
       #, user': ['owner', 'seller'], ...
}

4. Plot: Upload & render!#

See UI guide

[75]:

g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']

#hg
print(len(g._edges))

g.plot()

('# links', 1100)
('# events', 220)
('# attrib entities', 221)
1100

[75]:

Alternative: Combined#

Split into data loading and cleaning/configuring/plotting.

[59]:

#!pip install graphistry -q
import pandas as pd
import graphistry
#graphistry.register(key='MY_KEY', server='hub.graphistry.com')


##########
#1. Load
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print(df.columns)
print('rows:', len(df))
print(df.sample(min(len(df),3)))

Index([u'attackerIP', u'victimIP', u'victimPort', u'vulnName', u'count',
       u'time(max)', u'time(min)'],
      dtype='object')
('rows:', 220)
         attackerIP      victimIP  victimPort             vulnName  count  \
81  187.143.247.231  172.31.14.66       445.0      MS04011 (LSASS)      1
47   151.252.204.92  172.31.14.66       139.0     MS08067 (NetAPI)      1
41     125.64.35.68  172.31.14.66      9999.0  MaxDB Vulnerability      6

       time(max)     time(min)
81  1.420657e+09  1.420657e+09
47  1.422929e+09  1.422929e+09
41  1.420915e+09  1.417479e+09

[79]:

##########
#2. Clean
#df = df.rename(columns={'attackerIP': 'src_ip', 'victimIP: 'dest_ip', 'victimPort': 'protocol'})


##########
#3. Config - Pick 'A', 'B', or 'C'
mode = 'C'
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'

### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'victimPort', 'vulnName']
categories = { #optional
    'ip': ['src_ip', 'dest_ip']
    #, 'user': ['owner', 'seller'],
}

### 'C' == mode
edges = {
    'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
    'victimIP': [ 'victimPort' ],
    'vulnName': ['victimIP' ]
}
categories = { #optional
    'ip': ['attackerIP', 'victimIP']
    #, 'user': ['owner', 'seller'], ...
}

##########
#4. Plot
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']


g.plot()

('# links', 1100)
('# events', 220)
('# attrib entities', 221)

[79]:

[ ]:

Visualize CSV Mini-App

Contents

Visualize CSV Mini-App#

1. Upload csv#

File Upload: Jupyter Notebooks#

File Upload: Google Colab#

File Upload: URL#

2. Optional: Clean up CSV#

3. Configure: Visualize with 3 kinds of graphs#

Mode “A”. See graph from table of (src,dst) edges#

Mode “B”. See hypergraph: Draw row as node and connect it to entities in same row#

Mode “C”. See by creating multiple nodes, edges per row#

4. Plot: Upload & render!#

Alternative: Combined#