SQL Example#
Get & plot data from the first table in a SQL DB
Uses SQL ODBC drivers prepacked with Graphistry
Default: visualizes the schema migration table used within Graphistry
Shows several viz modes + a convenience function for sql->interactive viz
Try: Modify the indicated lines to change to visualize any other table
Further reading: - UI Guide - CSV upload notebook app
Setup#
Graphistry#
[ ]:
import graphistry
#pip install graphistry -q
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
SQL connection string#
Modify with your own db connection strig
For heavier use and sharing, see sample for hiding creds from the notebook while still reusing them across sessions
[ ]:
user = "graphistry"
pwd = "password"
server = "postgres:5432"
##OPTIONAL: Mount in installation's ${PWD}/.notebooks/db_secrets.json and read in
#import json
#with open('/home/graphistry/notebooks/db_secrets.json') as json_file:
# cfg = json.load(json_file)
# user = cfg['user']
# pwd = cfg['pwd']
# server = cfg['server']
#
## .. The first time you run this notebook, save the secret cfg to the system's persistent notebook folder:
#import json
#with open('/home/graphistry/notebooks/db_secrets.json', 'w') as outfile:
# json.dump({
# "user": "graphistry",
# "pwd": "password",
# "server": "postgres:5432"
# }, outfile)
## Delete ^^^ after use
db_string = "postgres://" + user + ":" + pwd + "@" + server
### Take care not to save a print of the result
[ ]:
# OPTIONAL: Install ODBC drivers in other environments:
# ! apt-get update
# ! apt-get install -y g++ unixodbc unixodbc-dev
# ! conda install -c anaconda pyodbc=4.0.26 sqlalchemy=1.3.5
Connect to DB#
[ ]:
import pandas as pd
import sqlalchemy as db
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String
from sqlalchemy.orm import sessionmaker
[ ]:
engine = create_engine(db_string)
Session = sessionmaker(bind=engine)
session = Session()
Inspect available tables#
[ ]:
table_names = engine.table_names()
", ".join(table_names)
Optional: Modify to pick your own table!#
[ ]:
if 'django_migrations' in table_names:
table = 'django_migrations'
else:
table = table_names[0]
Initialize viz: Get data#
[ ]:
result = engine.execute("SELECT * FROM \"" + table + "\" LIMIT 1000")
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print("table", table, '# rows', len(df))
df.sample(min(3, len(df)))
Plot#
Several variants: 1. Treat each row & cell value as a node, and connect row<>cell values 2. Treat each cell value as a node, and connect all cell values together when they occur on the same row 3. Treat each cell value as an edge, and specify which columns to to connect values together on 4. Use explict node/edge tables
1. Treat each row & cell value as a node, and connect row<>cell values#
[ ]:
graphistry.hypergraph(df)['graph'].plot()
2. Treat each cell value as a node, and connect all cell values together when they occur on the same row#
[ ]:
graphistry.hypergraph(df, direct=True)['graph'].plot()
3. Treat each cell value as an edge, and specify which columns to to connect values together on#
[ ]:
graphistry.hypergraph(df, direct=True,
opts={
'EDGES': {
'id': ['name'],
'applied': ['name'],
'name': ['app']
}
})['graph'].plot()
4. Use explict node/edge tables#
[ ]:
g = graphistry.bind(source='name', destination='app').edges(df.assign(name=df['name'].apply(lambda x: 'id_' + x)))
g.plot()
[ ]:
# Add node bindings..
nodes_df = pd.concat([
df[['name', 'id', 'applied']],
df[['app']].drop_duplicates().assign(\
id = df[['app']].drop_duplicates()['app'], \
name = df[['app']].drop_duplicates()['app'])
], ignore_index=True, sort=False)
g = g.bind(node='id', point_title='name').nodes(nodes_df)
g.plot()
Convenience function#
[ ]:
def explore(sql, *args, **kvargs):
result = engine.execute(sql)
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print('# rows', len(df))
g = graphistry.hypergraph(df, *args, **kvargs)['graph']
return g
Simple use#
[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000").plot()
Pass in graphistry.hypergraph() options#
[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000", direct=True).plot()
Get data back#
[ ]:
explore("SELECT * FROM django_migrations LIMIT 1000")._nodes
Further docs#
[ ]: