Tutorial: Data Analysis in Graphistry#

  1. Register

  2. Load table

  3. Plot:

    • Simple: input is a list of edges

    • Arbitrary: input is a table (hypergraph transform)

  4. Advanced plotting

  5. Further reading

1. Register#

[101]:
import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

2. Load table#

Graphistry works seamlessly with dataframes like Pandas and GPU RAPIDS cuDF

[94]:
import pandas as pd

df = pd.read_csv('./data/honeypot.csv')

df.sample(3)
[94]:
attackerIP victimIP victimPort vulnName count time(max) time(min)
64 178.77.190.33 172.31.14.66 445.0 MS08067 (NetAPI) 6 1.419968e+09 1.419967e+09
7 112.209.78.240 172.31.14.66 445.0 MS08067 (NetAPI) 10 1.414516e+09 1.414514e+09
182 79.140.174.193 172.31.14.66 445.0 MS08067 (NetAPI) 2 1.422062e+09 1.422062e+09

3. Plot#

A. Simple graphs#

  • Build up a set of bindings. Simple graphs are:

    • required: edge table, with src+dst ID columns, and optional additional property columns

    • optional: node table, with matching node ID column

  • See UI Guide for in-tool activity

Demo graph schema:

  • Input table: Above alerts df with columns | attackerIP | victimIP |

  • Edges: Link df’s columns attackerIP -> victimIP

  • Nodes: Unspecified; Graphistry defaults to generating based on the edges

  • Node colors: Graphistry defaults to inferring the commmunity

  • Node sizes: Graphistry defaults to the number of edges (“degree”)

[16]:
g = graphistry.edges(df, 'attackerIP', 'victimIP')
[17]:
g.plot()
[17]:

B. Hypergraphs – Plot arbitrary tables#

The hypergraph transform is a convenient method to transform tables into graphs:

  • It extracts entities from the table and links them together

  • Entities get linked together when they are from the same row

3. Advanced plotting#

You can then drive visual styles based on node and edge attributes

This demo starts by computing a node table. By default, you do not need to explictly provide a table of nodes, but then you may lack data for node properties:

  • Regular inferred graph nodes will only have id and degree

  • Hypergraph edges and row nodes will have many properties, but hypergraph entity nodes will only have id, type/category, and degree

Demo schema:

  • Node table: | node_id | type | attacks |

  • Point size: number of attacks

  • Point icon & color: attacker vs victim

  • Edge color: based on first attack

[62]:
# Cell:
# Compute nodes_df by combining entities in attackerIP and victimIP
# As part of this, compute attack counts for each node

targets_df = (
    df
    [['victimIP']]
    .drop_duplicates()
    .rename(columns={'victimIP': 'node_id'})
    .assign(type='victim')
)

attackers_df = (
    df
    .groupby(['attackerIP'])
    .agg(attacks=pd.NamedAgg(column="attackerIP", aggfunc="count"))
    .reset_index()
    .rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
)

nodes_df = pd.concat([targets_df, attackers_df])

nodes_df.sort_values(by='attacks', ascending=False)[:5]
[62]:
node_id type attacks
31 125.64.35.67 attacker 6.0
32 125.64.35.68 attacker 4.0
95 198.204.253.101 attacker 2.0
78 188.225.73.153 attacker 2.0
79 188.44.107.239 attacker 2.0
[86]:
# Cell:
# Add


# New encodings features requires api=3: `graphistry.register(api=3, username='...', password='...')

g2 = (g
      .nodes(nodes_df, 'node_id')

      # 'red', '#f00', '#ff0000'
      .encode_point_color('type', categorical_mapping={
          'attacker': 'red',
          'victim': 'white'
      }, default_mapping='gray')

      # Icons: https://fontawesome.com/v4.7/cheatsheet/
      .encode_point_icon('type', categorical_mapping={
          'attacker': 'bomb',
          'victim': 'laptop'
      })

      # Gradient
      .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

      .encode_point_size('attacks')

      .addStyle(bg={'color': '#eee'}, page={'title': 'My Graph'})

      # Options: https://hub.graphistry.com/docs/api/1/rest/url/
      .settings(url_params={'play': 1000, 'pointSize': 0.5})
)

g2.plot(as_files=False)
[86]:

Advanced bindings work with hypergraphs too#

Hypergraphs precompute a lot of values on nodes and edges, which we can use to drive clearer visualizations

[104]:
hg2_g._nodes.sample(3)
[104]:
attackerIP nodeTitle type category nodeID victimIP victimPort vulnName EventID
159 77.52.11.94 77.52.11.94 attackerIP ip ip::77.52.11.94 NaN NaN NaN NaN
162 78.187.242.78 78.187.242.78 attackerIP ip ip::78.187.242.78 NaN NaN NaN NaN
170 81.47.128.144 81.47.128.144 attackerIP ip ip::81.47.128.144 NaN NaN NaN NaN
[103]:
hg2_g._edges.sample(3)
[103]:
edgeType category vulnName dst time(max) time(min) src victimPort victimIP EventID count attackerIP
516 ip::vulnName attackerIP::vulnName MS08067 (NetAPI) vulnName::MS08067 (NetAPI) 1.416885e+09 1.416881e+09 ip::186.149.87.94 445.0 172.31.14.66 EventID::76 3 186.149.87.94
932 vulnName::ip vulnName::victimIP MS08067 (NetAPI) ip::172.31.14.66 1.423515e+09 1.423515e+09 vulnName::MS08067 (NetAPI) 445.0 172.31.14.66 EventID::52 1 176.119.227.9
983 vulnName::ip vulnName::victimIP MS08067 (NetAPI) ip::172.31.14.66 1.423932e+09 1.423932e+09 vulnName::MS08067 (NetAPI) 445.0 172.31.14.66 EventID::103 2 192.110.160.227
[113]:
(hg2_g

 .encode_point_color('type', categorical_mapping={
     'attackerIP': 'yellow',
     'victimIP': 'blue'
 }, default_mapping='gray')

 .encode_point_icon('type', categorical_mapping={
      'attackerIP': 'bomb',
      'victimIP': 'laptop'
 }, default_mapping='')

 .encode_edge_color('time(min)', palette=['blue', 'purple', 'red'], as_continuous=True)

 .settings(url_params={'pointsOfInterestMax': 10})

).plot()
[113]:

Further reading:#