Demo Notebook - Graphistry and Google Spanner Graph#
Graphistry is a cutting-edge platform for large-scale visual graph exploration and analysis. It enables users to intuitively investigate complex relationships, patterns, and anomalies across vast datasets through highly interactive, GPU-accelerated visualizations. Google Cloud Spanner, on the other hand, is a globally distributed, horizontally scalable, and strongly consistent database ideal for managing large, interconnected datasets.
This interactive guide demonstrates how to combine the power of Graphistry’s visual graph analytics and AI with the robust data capabilities of Google Cloud Spanner Graph.
Together, these technologies empower you to:
Visualize Complex Graphs: Easily explore relationships and uncover insights in your data through rich visual representations.
Handle Large Datasets: Leverage Cloud Spanner’s ability to manage vast amounts of interconnected information with strong consistency and scalability.
Perform Advanced Analytics: Apply graph-based algorithms and clustering techniques to extract actionable insights from structured data.
This demo is designed for:
Data Scientists: Interested in adding visual graph analytics to their toolkit.
Database Engineers: Looking to integrate graph capabilities into their Cloud Spanner workflows.
Application Developers: Prototyping applications built using Graphistry and Google Spanner.
This notebook showcases:
Connecting to Cloud Spanner: How to retrieve and preprocess data from Cloud Spanner for graph processing.
Graph Visualization with Graphistry: Turning raw data into meaningful visualizations to explore relationships and clusters.
Real-World Use Cases: Applying these tools to solve practical problems.
Prerequisites#
A Google Cloud account with access to Cloud Spanner.
A Graphistry Enterprise Server or free-tier Graphistry Hub account.
Python environment with Graphistry and gcloud spanner support (see pip install below).
This demo is based on FinGraph sample graph.
Let’s Get Started!#
Dive in and see how the synergy of Graphistry and Google Cloud Spanner can transform your data exploration and analysis workflows.
[ ]:
# intall required dependecies for graphistry + google spanner
!pip install --upgrade graphistry[spanner]
[ ]:
import graphistry
graphistry.__version__
Settings#
[ ]:
PROJECT_ID = "my_project"
INSTANCE_ID = "my_instance"
DATABASE_ID = "finance-graph-db"
# optional setting to limit the number of records returned
LIMIT_CLAUSE = ""
# or use:
# LIMIT_CLAUSE = "limit 1000"
# google settings, option 1: interactive login using gcloud auth application-default login (below)
SPANNER_CONF = { "project_id": PROJECT_ID,
"instance_id": INSTANCE_ID,
"database_id": DATABASE_ID }
# google settings, option 2: use a service account key:
# KEY_FILE="/path/to/credentials.json"
# SPANNER_CONF = { "project_id": PROJECT_ID,
# "instance_id": INSTANCE_ID,
# "database_id": DATABASE_ID,
# "credentials_file": KEY_FILE}
Graphistry register and gcloud init#
[ ]:
# graphistry register
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://pygraphistry.readthedocs.io/en/latest/server/register.html
import os
graphistry.register(api=3,
protocol = "https",
server = os.getenv("GRAPHISTRY_SERVER"),
username = os.getenv("GRAPHISTRY_USERNAME"),
password = os.getenv("GRAPHISTRY_PASSWORD"),
spanner_config=SPANNER_CONF
)
[ ]:
# Set the google project id
!gcloud config set project {PROJECT_ID}
%env GOOGLE_CLOUD_PROJECT={PROJECT_ID}
Google web-based auth below, only required if not using a credentials json file:#
[ ]:
#!gcloud auth application-default login
Example 1: GQL Path Query to Graphistry Visualization of all nodes and edges (LIMIT optional)#
to extract the data from Spanner Graph as a graph with nodes and edges in a single object, a GQL path query is required.
The format of a path query is as follows, note the p= at the start of the MATCH clause, and the SAFE_TO_JSON(p) without these, the query will not produce the results needed to properly load a graphistry graph. LIMIT is optional, but for large graphs with millions of edges or more, it’s best to filter either in the query or use LIMIT so as not to exhaust GPU memory.
GRAPH FinGraph
MATCH p = (a)-[b]->(c) where 1=1 LIMIT 10000 return SAFE_TO_JSON(p) as path
[ ]:
query=f'''GRAPH FinGraph
MATCH p = (a)-[b]->(c) where 1=1 {LIMIT_CLAUSE} return SAFE_TO_JSON(p) as path'''
g = graphistry.spanner_gql_to_g(query)
[10]:
g.plot()
[10]:
[ ]:
Example 1.1 - inspect contents of graphistry graph (nodes and edges):#
[11]:
len(g._nodes), len(g._edges)
[11]:
(1312, 1700)
[12]:
g._nodes.head(3)
[12]:
label | identifier | create_time | id | is_blocked | type_ | balance | interest_rate | loan_amount | name | type | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Account | mUZpbkdyYXBoLkFjY291bnQAeJICZA== | 2021-05-23T17:49:47.709Z | 306 | False | trust account | NaN | NaN | NaN | NaN | Account |
1 | Loan | mUZpbkdyYXBoLkxvYW4AeJEC | 2020-03-18T23:42:57.719Z | 1 | NaN | NaN | 123359.0 | 0.064 | 2022278.5 | NaN | Loan |
2 | Account | mUZpbkdyYXBoLkFjY291bnQAeJGw | 2020-10-22T12:03:40.496Z | 88 | False | merchant account | NaN | NaN | NaN | NaN | Account |
[13]:
g._edges.head(3)
[13]:
label | identifier | source | destination | amount | create_time | id | loan_id | to_id | account_id | type | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Repays | mUZpbkdyYXBoLkFjY291bnRSZXBheUxvYW4AeJICZJECmY... | mUZpbkdyYXBoLkFjY291bnQAeJICZA== | mUZpbkdyYXBoLkxvYW4AeJEC | 16260.3 | 2022-02-19T00:04:45.871Z | 306 | 1.0 | NaN | NaN | Repays |
1 | Repays | mUZpbkdyYXBoLkFjY291bnRSZXBheUxvYW4AeJGwkQSZgA... | mUZpbkdyYXBoLkFjY291bnQAeJGw | mUZpbkdyYXBoLkxvYW4AeJEE | 21892.5 | 2022-03-01T12:31:53.977Z | 88 | 2.0 | NaN | NaN | Repays |
2 | Repays | mUZpbkdyYXBoLkFjY291bnRSZXBheUxvYW4AeJGwkQSZgA... | mUZpbkdyYXBoLkFjY291bnQAeJGw | mUZpbkdyYXBoLkxvYW4AeJEE | 56432.7 | 2022-04-16T05:03:05.325Z | 88 | 2.0 | NaN | NaN | Repays |
Example 2: Quantified path traversal#
(slightly modified from example to use a path query for visualization)
from: spanner-graph-getting-started
Query 2 - Quantified path traversal and return graph elements
The following query matches all account money transfers starting from a source account with id=75 within 3 to 6 hops, to reach a destination account with id=199. The {3,6} syntax is used to represent a quantified 3 to 6 hop path traversal between src_accnt and dst_accnt.
GRAPH FinGraph
MATCH
p = (src_accnt:Account {id:75})-[transfers:Transfers]->{3,6}
(dst_accnt:Account {id:199})
RETURN SAFE_TO_JSON(p) as path
Visually, you can think of the quantified edge traversal like below: it starts from a src_account node, and fetches all possible account transfer paths between 3 to 6 hops, to reach dst_account.
The highlighted path at the bottom below, for example, is a 6-hop query.
source: https://codelabs.developers.google.com/static/codelabs/spanner-graph-getting-started/#6
[ ]:
query2='''GRAPH FinGraph
MATCH
p = (src_accnt:Account {id:75})-[transfers:Transfers]->{3,6}
(dst_accnt:Account {id:199}) where 1=1
RETURN SAFE_TO_JSON(p) as path
'''
[15]:
g2 = graphistry.spanner_gql_to_g(query2)
g2.plot()
[15]:
[16]:
# now run again and retrive all the paths
query2a='''GRAPH FinGraph
MATCH
p = (src_accnt:Account )-[transfers:Transfers]->{3,6}
(dst_accnt:Account ) where 1=1
RETURN SAFE_TO_JSON(p) as path
'''
g2a = graphistry.spanner_gql_to_g(query2a)
g2a.plot()
[16]:
Example 3: Spanner GQL Tabular Query to pandas dataframe (LIMIT optional)#
This example shows a non-path query that returns tabular results, which are then convered to a dataframe for easy manipulation and inspection of the results.
GRAPH FinGraph MATCH (p:Person)-[]-()->(l:Loan) RETURN p.id as ID, p.name AS Name, SUM(l.loan_amount) AS TotalBorrowed ORDER BY TotalBorrowed DESC LIMIT 10
[ ]:
query_top10='''GRAPH FinGraph
MATCH (p:Person)-[:Owns]-(:Account)->(l:Loan) WHERE 1=1
RETURN p.id as ID, p.name AS Name, SUM(l.loan_amount) AS TotalBorrowed
ORDER BY TotalBorrowed DESC
LIMIT 10'''
[ ]:
Top10_Borrowers_df = graphistry.spanner_query_to_df(query_top10)
[19]:
Top10_Borrowers_df.head(10)
[19]:
ID | Name | TotalBorrowed | |
---|---|---|---|
0 | 337 | Tutmarc | 15269003.6 |
1 | 484 | Greiner | 14098853.6 |
2 | 370 | Morrisseau | 13912146.2 |
3 | 113 | Paakkonen | 13022928.4 |
4 | 416 | Greif | 12713990.0 |
5 | 68 | Cabon | 12256398.8 |
6 | 66 | Stinson | 11462716.8 |
7 | 46 | Riby | 10772732.5 |
8 | 406 | Jöhncke | 10470230.8 |
9 | 169 | Gubenko | 10330528.5 |
Example 4: Spanner SQL Query to pandas dataframe#
This example shows a SQL query to Spanner that returns tabular results, which are then convered to a dataframe for easy manipulation and inspection of the results.
Query:#
SELECT * from Account
source: https://cloud.google.com/blog/products/databases/announcing-spanner-graph
[ ]:
accounts_df = graphistry.spanner_query_to_df('SELECT * from Account')
[21]:
accounts_df.head(10)
[21]:
id | create_time | is_blocked | type | |
---|---|---|---|---|
0 | 1 | 2020-01-10 14:22:20.222000+00:00 | False | brokerage account |
1 | 2 | 2020-01-28 01:55:09.206000+00:00 | False | prepaid card |
2 | 3 | 2020-02-18 13:44:20.655000+00:00 | False | brokerage account |
3 | 4 | 2020-02-29 16:49:53.902000+00:00 | False | debit card |
4 | 5 | 2020-03-02 20:47:18.726000+00:00 | False | brokerage account |
5 | 6 | 2020-03-21 22:25:34.327000+00:00 | False | custodial account |
6 | 7 | 2020-04-14 00:53:48.932000+00:00 | False | brokerage account |
7 | 8 | 2020-04-15 03:08:15.427000+00:00 | True | trust account |
8 | 9 | 2020-04-20 13:20:25.717000+00:00 | False | certificate of deposit |
9 | 10 | 2020-04-26 00:12:17.773000+00:00 | False | debit card |
Example 5: Spanner SQL Query to inspect the database schema#
This example shows a SQL query to Spanner that retrieves the tables, columns and types from the information schema in Spanner. This can be helpful for seeing what’s available in the database or using this data as part of a workflow.
SELECT table_name, column_name, spanner_type FROM information_schema.columns
[22]:
columns_df = graphistry.spanner_query_to_df('SELECT table_name, column_name, spanner_type FROM information_schema.columns')
columns_df.head(10)
[22]:
table_name | column_name | spanner_type | |
---|---|---|---|
0 | Account | id | INT64 |
1 | Account | create_time | TIMESTAMP |
2 | Account | is_blocked | BOOL |
3 | Account | type | STRING(MAX) |
4 | AccountAudits | id | INT64 |
5 | AccountAudits | audit_timestamp | TIMESTAMP |
6 | AccountAudits | audit_details | STRING(MAX) |
7 | AccountRepayLoan | id | INT64 |
8 | AccountRepayLoan | loan_id | INT64 |
9 | AccountRepayLoan | amount | FLOAT64 |
[23]:
len(columns_df.table_name.unique())
[23]:
94
[24]:
query_tables='''
SELECT table_name, table_type
FROM information_schema.tables
WHERE table_catalog = ''
AND table_schema = ''
AND table_type IN ('BASE TABLE', 'VIEW');
'''
tables_df = graphistry.spanner_query_to_df(query_tables)
tables_df
[24]:
table_name | table_type | |
---|---|---|
0 | Account | BASE TABLE |
1 | AccountAudits | BASE TABLE |
2 | AccountRepayLoan | BASE TABLE |
3 | AccountTransferAccount | BASE TABLE |
4 | Loan | BASE TABLE |
5 | Person | BASE TABLE |
6 | PersonOwnAccount | BASE TABLE |
Continue your Graph and AI journey…#
Graphistry#
Get started with PyGraphistry and explore AI-powered visual analytics:
Graphistry: Get Started - Learn how to leverage Graphistry for your projects.
Louie AI - AI-powered data insights and visualization.
Join our Community Slack - Get support and insights from experts.
Google Spanner#
Explore Google Spanner graph features, documentation, and use cases:
Google Spanner Python Documentation - Official Python client library for Google Spanner.
Spanner Graph Codelab - Hands-on tutorial for getting started with Spanner graph features.
Google Spanner Graph Overview - Learn about Spanner’s graph processing capabilities.
Graph Queries Overview - Understand how to query graph data in Spanner.
Graph Query Statements - Reference guide for writing graph queries in Spanner.