SQL Example#

Get & plot data from the first table in a SQL DB

  • Uses SQL ODBC drivers prepacked with Graphistry

  • Default: visualizes the schema migration table used within Graphistry

  • Shows several viz modes + a convenience function for sql->interactive viz

  • Try: Modify the indicated lines to change to visualize any other table

Further reading: - UI Guide - CSV upload notebook app

Setup#

Graphistry#

[ ]:
import graphistry

#pip install graphistry -q

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure

SQL connection string#

  • Modify with your own db connection strig

  • For heavier use and sharing, see sample for hiding creds from the notebook while still reusing them across sessions

[ ]:
user = "graphistry"
pwd = "password"
server = "postgres:5432"

##OPTIONAL: Mount in installation's ${PWD}/.notebooks/db_secrets.json and read in
#import json
#with open('/home/graphistry/notebooks/db_secrets.json') as json_file:
#    cfg = json.load(json_file)
#    user = cfg['user']
#    pwd = cfg['pwd']
#    server = cfg['server']
#
## .. The first time you run this notebook, save the secret cfg to the system's persistent notebook folder:
#import json
#with open('/home/graphistry/notebooks/db_secrets.json', 'w') as outfile:
#    json.dump({
#        "user": "graphistry",
#        "pwd": "password",
#        "server": "postgres:5432"
#    }, outfile)
## Delete ^^^ after use

db_string = "postgres://" + user + ":" + pwd + "@" + server
### Take care not to save a print of the result
[ ]:
# OPTIONAL: Install ODBC drivers in other environments:
# ! apt-get update
# ! apt-get install -y g++ unixodbc unixodbc-dev
# ! conda install -c anaconda pyodbc=4.0.26 sqlalchemy=1.3.5

Connect to DB#

[ ]:
import pandas as pd
import sqlalchemy as db
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String
from sqlalchemy.orm import sessionmaker
[ ]:
engine = create_engine(db_string)
Session = sessionmaker(bind=engine)
session = Session()

Inspect available tables#

[ ]:
table_names = engine.table_names()
", ".join(table_names)

Optional: Modify to pick your own table!#

[ ]:
if 'django_migrations' in table_names:
    table = 'django_migrations'
else:
    table = table_names[0]

Initialize viz: Get data#

[ ]:
result = engine.execute("SELECT * FROM \"" + table + "\" LIMIT 1000")
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print("table", table, '# rows', len(df))
df.sample(min(3, len(df)))

Plot#

Several variants: 1. Treat each row & cell value as a node, and connect row<>cell values 2. Treat each cell value as a node, and connect all cell values together when they occur on the same row 3. Treat each cell value as an edge, and specify which columns to to connect values together on 4. Use explict node/edge tables

1. Treat each row & cell value as a node, and connect row<>cell values#

[ ]:
graphistry.hypergraph(df)['graph'].plot()

2. Treat each cell value as a node, and connect all cell values together when they occur on the same row#

[ ]:
graphistry.hypergraph(df, direct=True)['graph'].plot()

3. Treat each cell value as an edge, and specify which columns to to connect values together on#

[ ]:
graphistry.hypergraph(df, direct=True,
    opts={
        'EDGES': {
            'id': ['name'],
            'applied': ['name'],
            'name': ['app']
        }
    })['graph'].plot()

4. Use explict node/edge tables#

[ ]:
g = graphistry.bind(source='name', destination='app').edges(df.assign(name=df['name'].apply(lambda x: 'id_' + x)))
g.plot()
[ ]:
# Add node bindings..
nodes_df = pd.concat([
    df[['name', 'id', 'applied']],
    df[['app']].drop_duplicates().assign(\
          id = df[['app']].drop_duplicates()['app'], \
        name = df[['app']].drop_duplicates()['app'])
], ignore_index=True, sort=False)

g = g.bind(node='id', point_title='name').nodes(nodes_df)

g.plot()

Convenience function#

[ ]:
def explore(sql, *args, **kvargs):
    result = engine.execute(sql)
    df = pd.DataFrame(result.fetchall(), columns=result.keys())
    print('# rows', len(df))
    g = graphistry.hypergraph(df, *args, **kvargs)['graph']
    return g

Simple use#

[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000").plot()

Pass in graphistry.hypergraph() options#

[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000", direct=True).plot()

Get data back#

[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000")._nodes

Further docs#

[ ]: