Graphistry Tutorial: Notebooks + TigerGraph via raw REST calls#

  • Connect to Graphistry, TigerGraph

  • Load data from TigerGraph into a Pandas Dataframes

  • Plot in Graphistry as a Graph and Hypergraph

  • Explore in Graphistry

  • Advanced notebooks

Configuration#

[ ]:
TIGER_CONFIG = {
    'fqdn': 'http://MY_TIGER_SERVER:9000'
}

Connect to Graphistry + Test#

[ ]:
#!pip install graphistry
[ ]:
import pandas as pd
import requests
[ ]:
### COMMON ISSUES: wrong server, wrong key, wrong protocol, network notebook->graphistry firewall permissions

import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

graphistry.__version__
[ ]:
### EXPECTED RESULT: Visualization of a curved triangle
### COMMON ISSUES: Blank box as HTTPS not configured on Graphistry server so browser disallows iframe. Try plot(render=False)

g = graphistry\
  .edges(pd.DataFrame({'s': [0,1,2], 'd': [1,2,0], 'a': ['quick', 'brown', 'fox'] }))\
  .bind(source='s', destination='d')

g.plot() #g.plot(render=False)

Connect to TigerGraph and Test#

[ ]:
### EXPECTED RESULT: {'GET /statistics': ...}
### COMMON ISSUES: returns '{}' (may need to run a few times); wrong fqdn; firewall issues; ...
requests.get(TIGER_CONFIG['fqdn'] + '/statistics?seconds=60').json()

Query Tigergraph#

[ ]:
# string -> dict
def query_raw(query_string):
    url = TIGER_CONFIG['fqdn'] + "/query/" + query_string
    r = requests.get(url)
    return r.json()


def flatten (lst_of_lst):
    try:
        if type(lst_of_lst[0]) == list:
            return [item for sublist in lst_of_lst for item in sublist]
        else:
            return lst_of_lst
    except:
      print('fail', lst_of_lst)
      return lst_of_lst

#str * dict -> dict
def named_edge_to_record(name, edge):
    record = {k: edge[k] for k in edge.keys() if not (type(edge[k]) == dict) }
    record['type'] = name
    nested = [k for k in edge.keys() if type(edge[k]) == dict]
    if len(nested) == 1:
        for k in edge[nested[0]].keys():
            record[k] = edge[nested[0]][k]
    else:
        for prefix in nested:
            for k in edge[nested[prefix]].keys():
                record[prefix + "_" + k] = edge[nested[prefix]][k]
    return record


def query(query_string):
    results = query_raw(query_string)['results']
    out = {}
    for o in results:
        for k in o.keys():
            if type(o[k]) == list:
                out[k] = flatten(o[k])
    out = flatten([[named_edge_to_record(k,v) for v in out[k]] for k in out.keys()])
    print('# results', len(out))
    return pd.DataFrame(out)



def graph_edges(edges):
  return graphistry.bind(source='from_id', destination='to_id').edges(edges)
[ ]:
df = query("connection_mining?A=1&B=10&k=1000")
print('rows: ', len(df))
df.sample(3)

Visualize result of TigerGraph query#

[ ]:
### EXPECTED RESULT: GRAPH VISUALIZATION
### COMMON ISSUES: try inspecting  query_raw('connection_mining?A=1&B=10&k=2')


graph_edges(query("connection_mining?A=1&B=10&k=1000")).plot()

In-Tool UI Walkthrough#

1. Clustering, Pan/Zoom, Data Table + Data Brush#

Open Visual guide in a separate tab

  1. Toggle visual clustering: Click to start, click to stop. (Edges invisible during clustering.)

  2. Pan/zoom: Just like Google maps

  3. Autocenter button when lost

  4. Click node or edge to see details.

  5. Data Table with Nodes, Edges, (Events) tabs

  6. Use Data brush mode to click-drag to select region and filter data table

Challenge: What node has the most edges? What do its edges have in common?#

2. Histograms and Using data for sizes & colors#

  • For point:degree histogram on bottom right, press each button and see what it does

  • Set node size based on attribute. Then, Scene settings -> Point size slider.

  • Make histogram log scale in case of an extreme distribution

  • Pick any color. If UI doesn’t update, try running clustering for one tick.

  • Add a histogram for point:_title

  • Try coloring via a categorical vs gradient : What is the difference?

3. Filtering#

  • Add histogram edge:from_type

  • Click-drag the degree histogram to filter for multiple bins

  • Open/close filter panel and toggle on/off the filter

  • Toggle cull isolated nodes to remove noisey nodes with no edges left

  • Click filter on histogram to remove

  • You can manually create SQL WHERE clauses here. filters -> edge:e_type -> edge:e_type ilike "%phone%"

  • Toggle visual clustering and then off when stablized

Challenge: How many distinct phone networks are there?#

4. Data table#

  • Search points, e.g., 135 area code

  • Export CSV (currently returns filtered as well)

Advanced Notebooks#

Hypergraph#

If you have a CSV and not a graph, hypergraphs are a quick way to analyze the data as a graph. They turn each entity into a node, and link them together if they are in the same row of the CSV. E.g., link together a phone and address. It does so indirectly – it creates a node for the row, and connects the row to each entity mentioned.

Challenge: What was the last tainted transaction, and the amount on it?#

[ ]:
df = pd.read_csv('https://github.com/graphistry/pygraphistry/raw/master/demos/data/transactions.csv')
df.sample(10)
[ ]:
hg = graphistry.hypergraph(df[:1000], entity_types=['Source', 'Destination', 'Transaction ID'])
print('Hypergraph parts', hg.keys())
hg['graph'].plot()
[ ]:
help(graphistry.hypergraph)

Adding Graphs#

[ ]:
df1 = query("connection_mining?A=1&B=10&k=1000").assign(data_source='query1')
df2 = query("connection_mining?A=1&B=12&k=1000").assign(data_source='query2')
edges2 = pd.concat([df1, df2], ignore_index=True)

graph_edges(edges2).plot()

Custom Nodes and Attributes + Saving Sessions#

[ ]:
conn = query("connection_mining?A=1&B=10&k=1000")

froms = conn.rename(columns={'from_id': 'id', 'from_type': 'node_type'})[['id', 'node_type']]
tos = conn.rename(columns={'to_id': 'id', 'to_type': 'node_type'})[['id', 'node_type']]
nodes = pd.concat([froms, tos], ignore_index=True).drop_duplicates().dropna()
nodes.sample(3)
[ ]:
nodes['node_type'].unique()
[ ]:
#https://hub.graphistry.com/docs/api/api-color-palettes/

type2color = {
    'phone_call': 0,
    'citizen': 1,
    'bank_account': 2,
    'phone_number': 3,
    'bank_transfer_event': 4,
    'hotel_room_event': 5
}

nodes['color'] = nodes['node_type'].apply(lambda type_str: type2color[type_str])

nodes.sample(3)
[ ]:
g = graphistry.bind(source='from_id', destination='to_id').edges(conn)

#updating colors
g = g.bind(node='id', point_color='color').nodes(nodes)

#saving sessions
g = g.settings(url_params={'workbook': 'my_workbook1'})


g.plot()
[ ]: