Categorical ring layout tutorial#

Graphs where nodes have a categorical attribute may be layed out radially with the new time ring layout.

Example values might be names, IDs, colors, and small multiples.

The tutorial overviews:

  • Continuous coloring

  • Automated use with smart defaults given just the ring_col: str value dimension

  • order: Optional[List[Any]]: Sort the axis

  • drop_empty, combine_unhandled, append_unhandled: Handle missing values and axis

  • r_min, r_max: Control the ring radius ranges

  • axis: Optional[Dict[Any,str]]: Pass in axis labels

  • format_axis: Callable, format_label: Callable: Changing the labels

  • reverse: bool: Reversing the axis

For larger graphs, we also describe automatic GPU acceleration support

Setup#

[1]:
import os
os.environ['LOG_LEVEL'] = 'INFO'
[2]:
from typing import Any, Dict, List
import numpy as np
import pandas as pd
import graphistry

graphistry.register(
    api=3,
    username=FILL_ME_IN,
    password=FILL_ME_IN,
    protocol='https',
    server='hub.graphistry.com',
    client_protocol_hostname='https://hub.graphistry.com'
)

Data#

  • Edges: Load a table of IDS network events for our edges

  • Nodes: IP addresses, computing for each IP the time of the first and last events it was seen in

[3]:
df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/honeypot.csv')
df = df.assign(t= pd.Series(pd.to_datetime(df['time(max)'] * 1000000000)))
print(df.dtypes)
print(len(df))
df.sample(3)
attackerIP            object
victimIP              object
victimPort           float64
vulnName              object
count                  int64
time(max)            float64
time(min)            float64
t             datetime64[ns]
dtype: object
220
[3]:
attackerIP victimIP victimPort vulnName count time(max) time(min) t
179 78.140.56.110 172.31.14.66 445.0 MS08067 (NetAPI) 12 1.414243e+09 1.414241e+09 2014-10-25 13:08:50
79 186.23.87.31 172.31.14.66 445.0 MS08067 (NetAPI) 11 1.420662e+09 1.420661e+09 2015-01-07 20:24:19
90 188.225.73.153 172.31.14.66 443.0 IIS Vulnerability 1 1.418287e+09 1.418287e+09 2014-12-11 08:42:18
[4]:
ip_times = pd.concat([
    df[['attackerIP', 't', 'count', 'time(min)', 'vulnName']].rename(columns={'attackerIP': 'ip'}),
    df[['victimIP', 't', 'count', 'time(min)', 'vulnName']].rename(columns={'victimIP': 'ip'})
])

def most_frequent(series):
    return series.mode().iloc[0] if not series.mode().empty else None

ip_times = ip_times.groupby('ip').agg({
    't': ['min', 'max'],
    'count': ['sum'],
    'time(min)': ['min'],
    'vulnName': [most_frequent, lambda x: str(list(x.unique()))]
}).reset_index()
ip_times.columns = ['ip', 't_min', 't_max', 'count', 'time_min', 'vuln_top', 'vuln_all']

print(ip_times.dtypes)
print(ip_times.shape)
ip_times.sample(5)
ip                  object
t_min       datetime64[ns]
t_max       datetime64[ns]
count                int64
time_min           float64
vuln_top            object
vuln_all            object
dtype: object
(203, 7)
[4]:
ip t_min t_max count time_min vuln_top vuln_all
160 77.232.152.116 2015-02-08 05:48:25 2015-02-09 00:17:13 2 1.423375e+09 IIS Vulnerability ['IIS Vulnerability', 'MaxDB Vulnerability']
146 49.149.168.197 2014-12-05 10:46:55 2014-12-05 10:46:55 8 1.417775e+09 MS08067 (NetAPI) ['MS08067 (NetAPI)']
42 176.103.22.19 2014-12-06 19:23:06 2014-12-06 19:23:06 13 1.417892e+09 MS08067 (NetAPI) ['MS08067 (NetAPI)']
23 119.157.215.18 2014-12-19 20:50:11 2014-12-19 20:50:11 4 1.419021e+09 MS08067 (NetAPI) ['MS08067 (NetAPI)']
112 220.128.136.237 2014-09-30 08:43:16 2014-09-30 08:43:16 2 1.412066e+09 MS08067 (NetAPI) ['MS08067 (NetAPI)']
[5]:
g = graphistry.edges(df, 'attackerIP', 'victimIP').nodes(ip_times, 'ip')

Visualization#

Default#

The default layout will scan for a numeric column and try to infer reasonable layout settings

[6]:
g.ring_categorical_layout('vuln_top').plot(render=False)
[6]:
'https://hub.graphistry.com/graph/graph.html?dataset=6cf9007b825e47eab9b7855cd407c7d3&type=arrow&viztoken=1c6e0c33-2c29-4a00-870a-1ddeaeb2b623&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028603&info=true&play=0&lockedR=True&bg=%23E2E2E2'

Control axis order#

[8]:
order = sorted(list(g._nodes['vuln_top'].unique()))

g.ring_categorical_layout(
    ring_col='vuln_top',
    order=order,
    reverse=True
).plot(render=False)
[8]:
'https://hub.graphistry.com/graph/graph.html?dataset=77130f55096c4a71a4226e4557e1ac6c&type=arrow&viztoken=2c585098-cd4f-43e6-b2a6-8cc0c2097451&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027155&info=true&play=0&lockedR=True&bg=%23E2E2E2'

Handle missing values and axis labels#

When passed in axis labels do not cover all observed values in the data, we can:

  • Put all unexpected values in one ring “Other”, or a ring per unique value

  • Put the new rings before or after the other rings

[10]:
order = sorted(list(g._nodes['vuln_top'].unique()))
order = order[:3] + order[6:]
missing_labels = set(g._nodes['vuln_top'].unique()) - set(order)

print('showing', order)
print('combining into Other', missing_labels)

g.ring_categorical_layout(
    ring_col='vuln_top',
    order=order,
    reverse=True,
    combine_unhandled=True,  # put into 1 ring
    append_unhandled=False,  # put after other items
).plot(render=False)
showing ['DCOM Vulnerability', 'HTTP Vulnerability', 'IIS Vulnerability', 'MaxDB Vulnerability', 'SYMANTEC Vulnerability', 'TIVOLI Vulnerability']
combining into Other {'MYDOOM Vulnerability', 'MS04011 (LSASS)', 'MS08067 (NetAPI)'}
[10]:
'https://hub.graphistry.com/graph/graph.html?dataset=e897ebdfa58e4229afaad3a768400b26&type=arrow&viztoken=79ba2bd2-1ab9-442e-b6aa-534caadebd8a&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027203&info=true&play=0&lockedR=True&bg=%23E2E2E2'
[12]:
g.ring_categorical_layout(
    ring_col='vuln_top',
    order=order,
    reverse=True,
    combine_unhandled=True,  # put into 1 ring
    append_unhandled=True,  # put after other items
).plot(render=False)
[12]:
'https://hub.graphistry.com/graph/graph.html?dataset=9e63c4b30bde4555a314fc0e4953553e&type=arrow&viztoken=e5e38bef-3992-4cd4-b521-b7f632a671c4&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027264&info=true&play=0&lockedR=True&bg=%23E2E2E2'

When axis cover data not seen in the data, we can drop it (default) or keep it as an unpopulated ring

[15]:
order_excessive = list(g._nodes['vuln_top'].unique()) + ['a value that never occurs']
g.ring_categorical_layout(
    ring_col='vuln_top',
    order=order_excessive,
    drop_empty=False,
).plot(render=False)
[15]:
'https://hub.graphistry.com/graph/graph.html?dataset=e562ab9d9df94593be7e75632d10989e&type=arrow&viztoken=5c835399-d4c3-45b5-821b-cd37ff9f64f4&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027656&info=true&play=0&lockedR=True&bg=%23E2E2E2'

Control sizes#

  • Control the radius of the first, last rings via min_r, max_r

[16]:
g.ring_categorical_layout(
    ring_col='vuln_top',
    min_r=500,
    max_r=1000,
).plot(render=False)
[16]:
'https://hub.graphistry.com/graph/graph.html?dataset=18611f4354ae4c68b87b073d1e2fc916&type=arrow&viztoken=2f2844e2-a154-4e29-9ff8-0008b2dbe7d7&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027856&info=true&play=0&lockedR=True&bg=%23E2E2E2'

Control axis labels#

Label each axis as "ring: <lower case value>"

[20]:
axis: Dict[Any, str] = {
    v: f'ring: {v.lower()}'
    for v in list(g._nodes['vuln_top'].unique())
}
print('axis', axis)

g.ring_categorical_layout(
    ring_col='vuln_top',
    axis=axis
).plot(render=False)
axis {'MS08067 (NetAPI)': 'ring: ms08067 (netapi)', 'MS04011 (LSASS)': 'ring: ms04011 (lsass)', 'MaxDB Vulnerability': 'ring: maxdb vulnerability', 'IIS Vulnerability': 'ring: iis vulnerability', 'SYMANTEC Vulnerability': 'ring: symantec vulnerability', 'MYDOOM Vulnerability': 'ring: mydoom vulnerability', 'TIVOLI Vulnerability': 'ring: tivoli vulnerability', 'DCOM Vulnerability': 'ring: dcom vulnerability', 'HTTP Vulnerability': 'ring: http vulnerability'}
[20]:
'https://hub.graphistry.com/graph/graph.html?dataset=27e512bb757c41a4ae8f1037bd934969&type=arrow&viztoken=bec9ee71-fa6b-4122-b553-f20d41899d6f&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721027971&info=true&play=0&lockedR=True&bg=%23E2E2E2'

Compute a custom label based on the value

[23]:
def axis_to_title(v: str, step: int, radius: float) -> str:
    lbl = f'ring: {v.lower()}'
    return lbl

g.ring_categorical_layout(
    ring_col='vuln_top',
    format_labels=axis_to_title
).plot(render=False)
[23]:
'https://hub.graphistry.com/graph/graph.html?dataset=6b910e61013d40f5a6238ab2acb71f00&type=arrow&viztoken=b678d9bb-8d6b-49bb-b509-2af9b45d6de0&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028107&info=true&play=0&lockedR=True&bg=%23E2E2E2'

Control more aspects of the axis, like border style

[24]:
def fancy_axis_transform(axis: List[Dict]) -> List[Dict]:
    """
      - same radii
      - add "Ring ..." to labels
      - color radial axis based on ring number
          * ring 3: internal (blue axis style)
          * ring 6: external (orange axis style)
          * other rings: space (default gray axis style)
    """
    out = []
    print('sample input axis[0]:', axis[0])
    for i, ring in enumerate(axis):
        out.append({
            'r': ring['r'],
            'label': f'Ring {ring["label"]}',
            'internal': i == 3,  # blue
            'external': i == 6,  # orange
            'space': i != 3 and i != 6  # gray
        })
    print('sample output axis[0]:', out[0])
    return out

g.ring_categorical_layout(
    ring_col='vuln_top',
    min_r=400,
    max_r=1000,
    format_axis=fancy_axis_transform
).plot(render=False)
sample input axis[0]: {'label': 'MS08067 (NetAPI)', 'r': 400.0, 'internal': True}
sample output axis[0]: {'r': 400.0, 'label': 'Ring MS08067 (NetAPI)', 'internal': False, 'external': False, 'space': True}
[24]:
'https://hub.graphistry.com/graph/graph.html?dataset=14effed027694ed59e2ac56114b23edf&type=arrow&viztoken=8c566e91-36ae-4ea2-b49d-0598260edbea&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028166&info=true&play=0&lockedR=True&bg=%23E2E2E2'

GPU Acceleration#

For larger graphs, automatic GPU acceleration triggers when g._nodes is a cudf.DataFrame.

To ensure GPU acceleration is used, set engine='cudf'

[7]:
import cudf

(g
 .nodes(cudf.from_pandas(g._nodes))
 .ring_categorical_layout('vuln_top', engine='cudf')
).plot(render=False)

[7]:
'https://hub.graphistry.com/graph/graph.html?dataset=a337d895c28e4fd2b84bd57475942c5a&type=arrow&viztoken=427551fa-b454-4325-b6df-fc038fc685c2&usertag=6c2f6dc1-pygraphistry-0+unknown&splashAfter=1721028625&info=true&play=0&lockedR=True&bg=%23E2E2E2'