Tutorial: Data Analysis in Graphistry#
Register
Load table
Plot:
Simple: input is a list of edges
Arbitrary: input is a table (hypergraph transform)
Advanced plotting
Further reading
1. Register#
[101]:
import graphistry
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
2. Load table#
Graphistry works seamlessly with dataframes like Pandas and GPU RAPIDS cuDF
[94]:
import pandas as pd
df = pd.read_csv('./data/honeypot.csv')
df.sample(3)
[94]:
attackerIP | victimIP | victimPort | vulnName | count | time(max) | time(min) | |
---|---|---|---|---|---|---|---|
64 | 178.77.190.33 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 6 | 1.419968e+09 | 1.419967e+09 |
7 | 112.209.78.240 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 10 | 1.414516e+09 | 1.414514e+09 |
182 | 79.140.174.193 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 2 | 1.422062e+09 | 1.422062e+09 |
3. Plot#
A. Simple graphs#
Build up a set of bindings. Simple graphs are:
required: edge table, with src+dst ID columns, and optional additional property columns
optional: node table, with matching node ID column
See UI Guide for in-tool activity
Demo graph schema:
Input table: Above alerts
df
with columns| attackerIP | victimIP |
Edges: Link
df
’s columnsattackerIP -> victimIP
Nodes: Unspecified; Graphistry defaults to generating based on the edges
Node colors: Graphistry defaults to inferring the commmunity
Node sizes: Graphistry defaults to the number of edges (“degree”)
[16]:
g = graphistry.edges(df, 'attackerIP', 'victimIP')
[17]:
g.plot()
[17]:
B. Hypergraphs – Plot arbitrary tables#
The hypergraph transform is a convenient method to transform tables into graphs:
It extracts entities from the table and links them together
Entities get linked together when they are from the same row
Approach 1: Treat each row as a node, and link it to each cell value in it#
Demo graph schema: * Edges: row -> attackerIP, row -> victimIP, row -> victimPort, row -> volnName * Nodes: row, attackerIP, victimIP, victimPort, vulnName * Node colors: Automatic based on inferred commmunity * node sizes: Number of edges
[93]:
hg1 = graphistry.hypergraph(
df,
# Optional: Subset of columns to turn into nodes; defaults to all
entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
# Optional: merge nodes when their IDs appear in multiple columns
# ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
# ... with just one node ip::1.1.1.1
opts={
'CATEGORIES': {
'ip': ['attackerIP', 'victimIP']
}
})
hg1_g = hg1['graph']
hg1_g.plot()
# links 880
# events 220
# attrib entities 221
[93]:
Approach 2: Link values from column entries#
For more advanced hypergraph control, we can skip the row node, and control which edges are generated, by enabling direct
.
Demo graph schema: * Edges: * attackerIP -> victimIP, attackerIP -> victimPort, attackerIP -> vulnName * victimPort -> victimIP * vulnName -> victimIP * Nodes: attackerIP, victimIP, victimPort, vulnName * Default colors: Automatic based on inferred commmunity * Default node size: Number of edges
[102]:
hg2 = graphistry.hypergraph(
df,
entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
direct=True,
opts={
# Optional: Without, creates edges that are all-to-all for each row
'EDGES': {
'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
'victimPort': ['victimIP'],
'vulnName': ['victimIP']
},
# Optional: merge nodes when their IDs appear in multiple columns
# ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
# ... with just one node ip::1.1.1.1
'CATEGORIES': {
'ip': ['attackerIP', 'victimIP']
}
})
hg2_g = hg2['graph']
hg2_g.plot()
# links 1100
# events 220
# attrib entities 221
[102]:
3. Advanced plotting#
You can then drive visual styles based on node and edge attributes
This demo starts by computing a node table. By default, you do not need to explictly provide a table of nodes, but then you may lack data for node properties:
Regular inferred graph nodes will only have id and degree
Hypergraph edges and row nodes will have many properties, but hypergraph entity nodes will only have id, type/category, and degree
Demo schema:
Node table:
| node_id | type | attacks |
Point size: number of attacks
Point icon & color: attacker vs victim
Edge color: based on first attack
[62]:
# Cell:
# Compute nodes_df by combining entities in attackerIP and victimIP
# As part of this, compute attack counts for each node
targets_df = (
df
[['victimIP']]
.drop_duplicates()
.rename(columns={'victimIP': 'node_id'})
.assign(type='victim')
)
attackers_df = (
df
.groupby(['attackerIP'])
.agg(attacks=pd.NamedAgg(column="attackerIP", aggfunc="count"))
.reset_index()
.rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
)
nodes_df = pd.concat([targets_df, attackers_df])
nodes_df.sort_values(by='attacks', ascending=False)[:5]
[62]:
node_id | type | attacks | |
---|---|---|---|
31 | 125.64.35.67 | attacker | 6.0 |
32 | 125.64.35.68 | attacker | 4.0 |
95 | 198.204.253.101 | attacker | 2.0 |
78 | 188.225.73.153 | attacker | 2.0 |
79 | 188.44.107.239 | attacker | 2.0 |
[86]:
# Cell:
# Add
# New encodings features requires api=3: `graphistry.register(api=3, username='...', password='...')
g2 = (g
.nodes(nodes_df, 'node_id')
# 'red', '#f00', '#ff0000'
.encode_point_color('type', categorical_mapping={
'attacker': 'red',
'victim': 'white'
}, default_mapping='gray')
# Icons: https://fontawesome.com/v4.7/cheatsheet/
.encode_point_icon('type', categorical_mapping={
'attacker': 'bomb',
'victim': 'laptop'
})
# Gradient
.encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)
.encode_point_size('attacks')
.addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})
# Options: https://hub.graphistry.com/docs/api/1/rest/url/
.settings(url_params={'play': 1000, 'pointSize': 0.5})
)
g2.plot(as_files=False)
[86]:
Advanced bindings work with hypergraphs too#
Hypergraphs precompute a lot of values on nodes and edges, which we can use to drive clearer visualizations
[104]:
hg2_g._nodes.sample(3)
[104]:
attackerIP | nodeTitle | type | category | nodeID | victimIP | victimPort | vulnName | EventID | |
---|---|---|---|---|---|---|---|---|---|
159 | 77.52.11.94 | 77.52.11.94 | attackerIP | ip | ip::77.52.11.94 | NaN | NaN | NaN | NaN |
162 | 78.187.242.78 | 78.187.242.78 | attackerIP | ip | ip::78.187.242.78 | NaN | NaN | NaN | NaN |
170 | 81.47.128.144 | 81.47.128.144 | attackerIP | ip | ip::81.47.128.144 | NaN | NaN | NaN | NaN |
[103]:
hg2_g._edges.sample(3)
[103]:
edgeType | category | vulnName | dst | time(max) | time(min) | src | victimPort | victimIP | EventID | count | attackerIP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
516 | ip::vulnName | attackerIP::vulnName | MS08067 (NetAPI) | vulnName::MS08067 (NetAPI) | 1.416885e+09 | 1.416881e+09 | ip::186.149.87.94 | 445.0 | 172.31.14.66 | EventID::76 | 3 | 186.149.87.94 |
932 | vulnName::ip | vulnName::victimIP | MS08067 (NetAPI) | ip::172.31.14.66 | 1.423515e+09 | 1.423515e+09 | vulnName::MS08067 (NetAPI) | 445.0 | 172.31.14.66 | EventID::52 | 1 | 176.119.227.9 |
983 | vulnName::ip | vulnName::victimIP | MS08067 (NetAPI) | ip::172.31.14.66 | 1.423932e+09 | 1.423932e+09 | vulnName::MS08067 (NetAPI) | 445.0 | 172.31.14.66 | EventID::103 | 2 | 192.110.160.227 |
[113]:
(hg2_g
.encode_point_color('type', categorical_mapping={
'attackerIP': 'yellow',
'victimIP': 'blue'
}, default_mapping='gray')
.encode_point_icon('type', categorical_mapping={
'attackerIP': 'bomb',
'victimIP': 'laptop'
}, default_mapping='')
.encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)
.settings(url_params={'pointsOfInterestMax': 10})
).plot()
[113]: