Categorical ring layout tutorial#
Graphs where nodes have a categorical attribute may be layed out radially with the new time ring layout.
Example values might be names, IDs, colors, and small multiples.
The tutorial overviews:
Continuous coloring
Automated use with smart defaults given just the
ring_col: str
value dimensionorder: Optional[List[Any]]
: Sort the axisdrop_empty
,combine_unhandled
,append_unhandled
: Handle missing values and axisr_min
,r_max
: Control the ring radius rangesaxis: Optional[Dict[Any,str]]
: Pass in axis labelsformat_axis: Callable, format_label: Callable
: Changing the labelsreverse: bool
: Reversing the axis
For larger graphs, we also describe automatic GPU acceleration support
Setup#
[1]:
import os
os.environ['LOG_LEVEL'] = 'INFO'
[2]:
from typing import Any, Dict, List
import numpy as np
import pandas as pd
import graphistry
graphistry.register(
api=3,
username=FILL_ME_IN,
password=FILL_ME_IN,
protocol='https',
server='hub.graphistry.com',
client_protocol_hostname='https://hub.graphistry.com'
)
Data#
Edges: Load a table of IDS network events for our edges
Nodes: IP addresses, computing for each IP the time of the first and last events it was seen in
[3]:
df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/honeypot.csv')
df = df.assign(t= pd.Series(pd.to_datetime(df['time(max)'] * 1000000000)))
print(df.dtypes)
print(len(df))
df.sample(3)
attackerIP object
victimIP object
victimPort float64
vulnName object
count int64
time(max) float64
time(min) float64
t datetime64[ns]
dtype: object
220
[3]:
attackerIP | victimIP | victimPort | vulnName | count | time(max) | time(min) | t | |
---|---|---|---|---|---|---|---|---|
179 | 78.140.56.110 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 12 | 1.414243e+09 | 1.414241e+09 | 2014-10-25 13:08:50 |
79 | 186.23.87.31 | 172.31.14.66 | 445.0 | MS08067 (NetAPI) | 11 | 1.420662e+09 | 1.420661e+09 | 2015-01-07 20:24:19 |
90 | 188.225.73.153 | 172.31.14.66 | 443.0 | IIS Vulnerability | 1 | 1.418287e+09 | 1.418287e+09 | 2014-12-11 08:42:18 |
[4]:
ip_times = pd.concat([
df[['attackerIP', 't', 'count', 'time(min)', 'vulnName']].rename(columns={'attackerIP': 'ip'}),
df[['victimIP', 't', 'count', 'time(min)', 'vulnName']].rename(columns={'victimIP': 'ip'})
])
def most_frequent(series):
return series.mode().iloc[0] if not series.mode().empty else None
ip_times = ip_times.groupby('ip').agg({
't': ['min', 'max'],
'count': ['sum'],
'time(min)': ['min'],
'vulnName': [most_frequent, lambda x: str(list(x.unique()))]
}).reset_index()
ip_times.columns = ['ip', 't_min', 't_max', 'count', 'time_min', 'vuln_top', 'vuln_all']
print(ip_times.dtypes)
print(ip_times.shape)
ip_times.sample(5)
ip object
t_min datetime64[ns]
t_max datetime64[ns]
count int64
time_min float64
vuln_top object
vuln_all object
dtype: object
(203, 7)
[4]:
ip | t_min | t_max | count | time_min | vuln_top | vuln_all | |
---|---|---|---|---|---|---|---|
160 | 77.232.152.116 | 2015-02-08 05:48:25 | 2015-02-09 00:17:13 | 2 | 1.423375e+09 | IIS Vulnerability | ['IIS Vulnerability', 'MaxDB Vulnerability'] |
146 | 49.149.168.197 | 2014-12-05 10:46:55 | 2014-12-05 10:46:55 | 8 | 1.417775e+09 | MS08067 (NetAPI) | ['MS08067 (NetAPI)'] |
42 | 176.103.22.19 | 2014-12-06 19:23:06 | 2014-12-06 19:23:06 | 13 | 1.417892e+09 | MS08067 (NetAPI) | ['MS08067 (NetAPI)'] |
23 | 119.157.215.18 | 2014-12-19 20:50:11 | 2014-12-19 20:50:11 | 4 | 1.419021e+09 | MS08067 (NetAPI) | ['MS08067 (NetAPI)'] |
112 | 220.128.136.237 | 2014-09-30 08:43:16 | 2014-09-30 08:43:16 | 2 | 1.412066e+09 | MS08067 (NetAPI) | ['MS08067 (NetAPI)'] |
[5]:
g = graphistry.edges(df, 'attackerIP', 'victimIP').nodes(ip_times, 'ip')
Visualization#
Default#
The default layout will scan for a numeric column and try to infer reasonable layout settings
[6]:
g.ring_categorical_layout('vuln_top').plot(render=False)
[6]:
'https://hub.graphistry.com/graph/graph.html?dataset=6cf9007b825e47eab9b7855cd407c7d3&type=arrow&viztoken=1c6e0c33-2c29-4a00-870a-1ddeaeb2b623&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028603&info=true&play=0&lockedR=True&bg=%23E2E2E2'
Control axis order#
[8]:
order = sorted(list(g._nodes['vuln_top'].unique()))
g.ring_categorical_layout(
ring_col='vuln_top',
order=order,
reverse=True
).plot(render=False)
[8]:
'https://hub.graphistry.com/graph/graph.html?dataset=77130f55096c4a71a4226e4557e1ac6c&type=arrow&viztoken=2c585098-cd4f-43e6-b2a6-8cc0c2097451&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027155&info=true&play=0&lockedR=True&bg=%23E2E2E2'
Handle missing values and axis labels#
When passed in axis labels do not cover all observed values in the data, we can:
Put all unexpected values in one ring “Other”, or a ring per unique value
Put the new rings before or after the other rings
[10]:
order = sorted(list(g._nodes['vuln_top'].unique()))
order = order[:3] + order[6:]
missing_labels = set(g._nodes['vuln_top'].unique()) - set(order)
print('showing', order)
print('combining into Other', missing_labels)
g.ring_categorical_layout(
ring_col='vuln_top',
order=order,
reverse=True,
combine_unhandled=True, # put into 1 ring
append_unhandled=False, # put after other items
).plot(render=False)
showing ['DCOM Vulnerability', 'HTTP Vulnerability', 'IIS Vulnerability', 'MaxDB Vulnerability', 'SYMANTEC Vulnerability', 'TIVOLI Vulnerability']
combining into Other {'MYDOOM Vulnerability', 'MS04011 (LSASS)', 'MS08067 (NetAPI)'}
[10]:
'https://hub.graphistry.com/graph/graph.html?dataset=e897ebdfa58e4229afaad3a768400b26&type=arrow&viztoken=79ba2bd2-1ab9-442e-b6aa-534caadebd8a&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027203&info=true&play=0&lockedR=True&bg=%23E2E2E2'
[12]:
g.ring_categorical_layout(
ring_col='vuln_top',
order=order,
reverse=True,
combine_unhandled=True, # put into 1 ring
append_unhandled=True, # put after other items
).plot(render=False)
[12]:
'https://hub.graphistry.com/graph/graph.html?dataset=9e63c4b30bde4555a314fc0e4953553e&type=arrow&viztoken=e5e38bef-3992-4cd4-b521-b7f632a671c4&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027264&info=true&play=0&lockedR=True&bg=%23E2E2E2'
When axis cover data not seen in the data, we can drop it (default) or keep it as an unpopulated ring
[15]:
order_excessive = list(g._nodes['vuln_top'].unique()) + ['a value that never occurs']
g.ring_categorical_layout(
ring_col='vuln_top',
order=order_excessive,
drop_empty=False,
).plot(render=False)
[15]:
'https://hub.graphistry.com/graph/graph.html?dataset=e562ab9d9df94593be7e75632d10989e&type=arrow&viztoken=5c835399-d4c3-45b5-821b-cd37ff9f64f4&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027656&info=true&play=0&lockedR=True&bg=%23E2E2E2'
Control sizes#
Control the radius of the first, last rings via
min_r
,max_r
[16]:
g.ring_categorical_layout(
ring_col='vuln_top',
min_r=500,
max_r=1000,
).plot(render=False)
[16]:
'https://hub.graphistry.com/graph/graph.html?dataset=18611f4354ae4c68b87b073d1e2fc916&type=arrow&viztoken=2f2844e2-a154-4e29-9ff8-0008b2dbe7d7&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027856&info=true&play=0&lockedR=True&bg=%23E2E2E2'
Control axis labels#
Label each axis as "ring: <lower case value>"
[20]:
axis: Dict[Any, str] = {
v: f'ring: {v.lower()}'
for v in list(g._nodes['vuln_top'].unique())
}
print('axis', axis)
g.ring_categorical_layout(
ring_col='vuln_top',
axis=axis
).plot(render=False)
axis {'MS08067 (NetAPI)': 'ring: ms08067 (netapi)', 'MS04011 (LSASS)': 'ring: ms04011 (lsass)', 'MaxDB Vulnerability': 'ring: maxdb vulnerability', 'IIS Vulnerability': 'ring: iis vulnerability', 'SYMANTEC Vulnerability': 'ring: symantec vulnerability', 'MYDOOM Vulnerability': 'ring: mydoom vulnerability', 'TIVOLI Vulnerability': 'ring: tivoli vulnerability', 'DCOM Vulnerability': 'ring: dcom vulnerability', 'HTTP Vulnerability': 'ring: http vulnerability'}
[20]:
'https://hub.graphistry.com/graph/graph.html?dataset=27e512bb757c41a4ae8f1037bd934969&type=arrow&viztoken=bec9ee71-fa6b-4122-b553-f20d41899d6f&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027971&info=true&play=0&lockedR=True&bg=%23E2E2E2'
Compute a custom label based on the value
[23]:
def axis_to_title(v: str, step: int, radius: float) -> str:
lbl = f'ring: {v.lower()}'
return lbl
g.ring_categorical_layout(
ring_col='vuln_top',
format_labels=axis_to_title
).plot(render=False)
[23]:
'https://hub.graphistry.com/graph/graph.html?dataset=6b910e61013d40f5a6238ab2acb71f00&type=arrow&viztoken=b678d9bb-8d6b-49bb-b509-2af9b45d6de0&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028107&info=true&play=0&lockedR=True&bg=%23E2E2E2'
Control more aspects of the axis, like border style
[24]:
def fancy_axis_transform(axis: List[Dict]) -> List[Dict]:
"""
- same radii
- add "Ring ..." to labels
- color radial axis based on ring number
* ring 3: internal (blue axis style)
* ring 6: external (orange axis style)
* other rings: space (default gray axis style)
"""
out = []
print('sample input axis[0]:', axis[0])
for i, ring in enumerate(axis):
out.append({
'r': ring['r'],
'label': f'Ring {ring["label"]}',
'internal': i == 3, # blue
'external': i == 6, # orange
'space': i != 3 and i != 6 # gray
})
print('sample output axis[0]:', out[0])
return out
g.ring_categorical_layout(
ring_col='vuln_top',
min_r=400,
max_r=1000,
format_axis=fancy_axis_transform
).plot(render=False)
sample input axis[0]: {'label': 'MS08067 (NetAPI)', 'r': 400.0, 'internal': True}
sample output axis[0]: {'r': 400.0, 'label': 'Ring MS08067 (NetAPI)', 'internal': False, 'external': False, 'space': True}
[24]:
'https://hub.graphistry.com/graph/graph.html?dataset=14effed027694ed59e2ac56114b23edf&type=arrow&viztoken=8c566e91-36ae-4ea2-b49d-0598260edbea&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028166&info=true&play=0&lockedR=True&bg=%23E2E2E2'
GPU Acceleration#
For larger graphs, automatic GPU acceleration triggers when g._nodes
is a cudf.DataFrame
.
To ensure GPU acceleration is used, set engine='cudf'
[7]:
import cudf
(g
.nodes(cudf.from_pandas(g._nodes))
.ring_categorical_layout('vuln_top', engine='cudf')
).plot(render=False)
[7]:
'https://hub.graphistry.com/graph/graph.html?dataset=a337d895c28e4fd2b84bd57475942c5a&type=arrow&viztoken=427551fa-b454-4325-b6df-fc038fc685c2&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028625&info=true&play=0&lockedR=True&bg=%23E2E2E2'