Tutorial: Data Analysis in Graphistry#

Register
Load table
Plot:
- Simple: input is a list of edges
- Arbitrary: input is a table (hypergraph transform)
Advanced plotting
Further reading

1. Register#

[101]:

import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html

2. Load table#

Graphistry works seamlessly with dataframes like Pandas and GPU RAPIDS cuDF

[94]:

import pandas as pd

df = pd.read_csv('./data/honeypot.csv')

df.sample(3)

[94]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
64	178.77.190.33	172.31.14.66	445.0	MS08067 (NetAPI)	6	1.419968e+09	1.419967e+09
7	112.209.78.240	172.31.14.66	445.0	MS08067 (NetAPI)	10	1.414516e+09	1.414514e+09
182	79.140.174.193	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.422062e+09	1.422062e+09

3. Plot#

A. Simple graphs#

Build up a set of bindings. Simple graphs are:
- required: edge table, with src+dst ID columns, and optional additional property columns
- optional: node table, with matching node ID column
See UI Guide for in-tool activity

Demo graph schema:

Input table: Above alerts df with columns | attackerIP | victimIP |
Edges: Link df’s columns attackerIP -> victimIP
Nodes: Unspecified; Graphistry defaults to generating based on the edges
Node colors: Graphistry defaults to inferring the commmunity
Node sizes: Graphistry defaults to the number of edges (“degree”)

[16]:

g = graphistry.edges(df, 'attackerIP', 'victimIP')

[17]:

g.plot()

[17]:

B. Hypergraphs – Plot arbitrary tables#

The hypergraph transform is a convenient method to transform tables into graphs:

It extracts entities from the table and links them together
Entities get linked together when they are from the same row

Approach 1: Treat each row as a node, and link it to each cell value in it#

Demo graph schema: * Edges: row -> attackerIP, row -> victimIP, row -> victimPort, row -> volnName * Nodes: row, attackerIP, victimIP, victimPort, vulnName * Node colors: Automatic based on inferred commmunity * node sizes: Number of edges

[93]:

hg1 = graphistry.hypergraph(
    df,

    # Optional: Subset of columns to turn into nodes; defaults to all
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],

    # Optional: merge nodes when their IDs appear in multiple columns
    # ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
    # ... with just one node ip::1.1.1.1
    opts={
        'CATEGORIES': {
            'ip': ['attackerIP', 'victimIP']
        }
    })

hg1_g = hg1['graph']
hg1_g.plot()

# links 880
# events 220
# attrib entities 221

[93]:

Approach 2: Link values from column entries#

For more advanced hypergraph control, we can skip the row node, and control which edges are generated, by enabling direct.

Demo graph schema: * Edges: * attackerIP -> victimIP, attackerIP -> victimPort, attackerIP -> vulnName * victimPort -> victimIP * vulnName -> victimIP * Nodes: attackerIP, victimIP, victimPort, vulnName * Default colors: Automatic based on inferred commmunity * Default node size: Number of edges

[102]:

hg2 = graphistry.hypergraph(
    df,
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
    direct=True,
    opts={
        # Optional: Without, creates edges that are all-to-all for each row
        'EDGES': {
            'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
            'victimPort': ['victimIP'],
            'vulnName': ['victimIP']
        },

        # Optional: merge nodes when their IDs appear in multiple columns
        # ... so replace nodes attackerIP::1.1.1.1 and victimIP::1.1.1.1
        # ... with just one node ip::1.1.1.1
        'CATEGORIES': {
            'ip': ['attackerIP', 'victimIP']
        }
    })

hg2_g = hg2['graph']
hg2_g.plot()

# links 1100
# events 220
# attrib entities 221

[102]:

3. Advanced plotting#

You can then drive visual styles based on node and edge attributes

This demo starts by computing a node table. By default, you do not need to explictly provide a table of nodes, but then you may lack data for node properties:

Regular inferred graph nodes will only have id and degree
Hypergraph edges and row nodes will have many properties, but hypergraph entity nodes will only have id, type/category, and degree

Demo schema:

Node table: | node_id | type | attacks |
Point size: number of attacks
Point icon & color: attacker vs victim
Edge color: based on first attack

[62]:

# Cell:
# Compute nodes_df by combining entities in attackerIP and victimIP
# As part of this, compute attack counts for each node

targets_df = (
    df
    [['victimIP']]
    .drop_duplicates()
    .rename(columns={'victimIP': 'node_id'})
    .assign(type='victim')
)

attackers_df = (
    df
    .groupby(['attackerIP'])
    .agg(attacks=pd.NamedAgg(column="attackerIP", aggfunc="count"))
    .reset_index()
    .rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
)

nodes_df = pd.concat([targets_df, attackers_df])

nodes_df.sort_values(by='attacks', ascending=False)[:5]

[62]:

	node_id	type	attacks
31	125.64.35.67	attacker	6.0
32	125.64.35.68	attacker	4.0
95	198.204.253.101	attacker	2.0
78	188.225.73.153	attacker	2.0
79	188.44.107.239	attacker	2.0

[86]:

# Cell:
# Add


# New encodings features requires api=3: `graphistry.register(api=3, username='...', password='...')

g2 = (g
      .nodes(nodes_df, 'node_id')

      # 'red', '#f00', '#ff0000'
      .encode_point_color('type', categorical_mapping={
          'attacker': 'red',
          'victim': 'white'
      }, default_mapping='gray')

      # Icons: https://fontawesome.com/v4.7/cheatsheet/
      .encode_point_icon('type', categorical_mapping={
          'attacker': 'bomb',
          'victim': 'laptop'
      })

      # Gradient
      .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

      .encode_point_size('attacks')

      .addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})

      # Options: https://hub.graphistry.com/docs/api/1/rest/url/
      .settings(url_params={'play': 1000, 'pointSize': 0.5})
)

g2.plot(as_files=False)

[86]:

Advanced bindings work with hypergraphs too#

Hypergraphs precompute a lot of values on nodes and edges, which we can use to drive clearer visualizations

[104]:

hg2_g._nodes.sample(3)

[104]:

	attackerIP	nodeTitle	type	category	nodeID	victimIP	victimPort	vulnName	EventID
159	77.52.11.94	77.52.11.94	attackerIP	ip	ip::77.52.11.94	NaN	NaN	NaN	NaN
162	78.187.242.78	78.187.242.78	attackerIP	ip	ip::78.187.242.78	NaN	NaN	NaN	NaN
170	81.47.128.144	81.47.128.144	attackerIP	ip	ip::81.47.128.144	NaN	NaN	NaN	NaN

[103]:

hg2_g._edges.sample(3)

[103]:

	edgeType	category	vulnName	dst	time(max)	time(min)	src	victimPort	victimIP	EventID	count	attackerIP
516	ip::vulnName	attackerIP::vulnName	MS08067 (NetAPI)	vulnName::MS08067 (NetAPI)	1.416885e+09	1.416881e+09	ip::186.149.87.94	445.0	172.31.14.66	EventID::76	3	186.149.87.94
932	vulnName::ip	vulnName::victimIP	MS08067 (NetAPI)	ip::172.31.14.66	1.423515e+09	1.423515e+09	vulnName::MS08067 (NetAPI)	445.0	172.31.14.66	EventID::52	1	176.119.227.9
983	vulnName::ip	vulnName::victimIP	MS08067 (NetAPI)	ip::172.31.14.66	1.423932e+09	1.423932e+09	vulnName::MS08067 (NetAPI)	445.0	172.31.14.66	EventID::103	2	192.110.160.227

[113]:

(hg2_g

 .encode_point_color('type', categorical_mapping={
     'attackerIP': 'yellow',
     'victimIP': 'blue'
 }, default_mapping='gray')

 .encode_point_icon('type', categorical_mapping={
      'attackerIP': 'bomb',
      'victimIP': 'laptop'
 }, default_mapping='')

 .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

 .settings(url_params={'pointsOfInterestMax': 10})

).plot()

[113]:

Tutorial: Data Analysis in Graphistry

Contents