Tutorial: Visualizing the Silk Road Blockchain with Graphistry and Neo4j#
Investigating large datasets becomes easier by directly visualizing cypher (BOLT) query results with Graphistry. This tutorial walks through querying Neo4j, visualizing the results, and additional configurations and queries.
This analysis is based on a blockchain data extract the Graphistry team performed around court proceedings from when Carl Force, the key DEA agent in the Silk Road bust, was sentenced for embezzling money from Ross Ulbricht (Dread Pirate Roberts). We explore to how to recreate the analysis, and determine where Carl’s money went after he performed the initial embezzling.
Instructions * Read along the various cells * Click the prebuilt visualizations to start them, and interact with them just like Google Maps * To try on your own, setup your own Neo4j instance & get a Graphistry API key, and run the data loading cells
Further reading
Python client tutorials & demos: graphistry/pygraphistry
Graphistry API Key: https://www.graphistry.com/api-request
Neo4j-as-a-service: http://graphstory.com
DEA incident: https://arstechnica.com/tech-policy/2016/08/stealing-bitcoins-with-badges-how-silk-roads-dirty-cops-got-caught/
Config#
Install dependencies
On first run of a non-Graphistry notebook server:
Uncomment and run the first two lines
Restart your Python kernel runtime from the top menue
For advanced alternate installs, see subsequent commented lines
[ ]:
#!pip install --user pandas
#!pip install --user graphistry[bolt]
### ADVANCED:
### If you already have the neo4j python driver, you can leave out '[bolt]':
### !pip install --user graphistry
### If you already have graphistry but not neo4j, you can reuse your existing graphistry:
### !pip install --user neo4j
Import & test
[ ]:
import pandas as pd
import neo4j # just for testing
from neo4j import GraphDatabase # for data loader
import graphistry
print('neo4j', neo4j.__version__)
print('graphistry', graphistry.__version__)
Connect
You may need to reconnect if your Neo4j connection closes
Uncomment the below section for non-Graphistry notebook servers
[ ]:
NEO4J = {
'uri': "bolt://my.site.COM:7687",
'auth': ("neo4j", "myalphapwd1")
}
graphistry.register(bolt=NEO4J)
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options, see https://github.com/graphistry/pygraphistry#configure
Optional: Load tainted transactions into your own Neo4j DB#
To populate your own Neo4j instance, set one or both of the top commands to True
[ ]:
DELETE_EXISTING_DATABASE=False
POPULATE_DATABASE=False
if DELETE_EXISTING_DATABASE:
driver = GraphDatabase.driver(**NEO4J)
with driver.session() as session:
# split into 2 transancations case of memory limit errors
print('Deleting existing transactions')
tx = session.begin_transaction()
tx.run("""MATCH (a:Account)-[r]->(b) DELETE r""")
tx.commit()
print('Deleting existing accounts')
tx = session.begin_transaction()
tx.run("""MATCH (a:Account) DELETE a""")
tx.commit()
print('Delete successful')
if POPULATE_DATABASE:
edges = pd.read_csv('https://www.dropbox.com/s/q1daa707y99ind9/edges.csv?dl=1')
edges = edges.rename(columns={'Amount $': "USD", 'Transaction ID': 'Transaction'})[['USD', 'Date', 'Source', 'Destination', 'Transaction']]
id_len = len(edges['Source'][0].split('...')[0]) #truncate IDs (dirty data)
edges = edges.assign(
Source=edges['Source'].apply(lambda id: id[:id_len]),
Destination=edges['Destination'].apply(lambda id: id[:id_len]))
ROSS_FULL='2a37b3bdca935152335c2097e5da367db24209cc'
ROSS = ROSS_FULL[:32]
CARL_FULL = 'b2233dd22ade4c9978ec1fd1fbb36eb7f9b4609e'
CARL = CARL_FULL[:32]
CARL_NICK = 'Carl Force (DEA)'
ROSS_NICK = 'Ross Ulbricht (SilkRoad)'
nodes = pd.read_csv('https://www.dropbox.com/s/nf796f1asow8tx7/nodes.csv?dl=1')
nodes = nodes.rename(columns={'Balance $': 'USD', 'Balance (avg) $': 'USD_avg', 'Balance (max) $': 'USD_max', 'Tainted Coins': 'Tainted_Coins'})[['Account', 'USD', 'USD_avg', 'USD_max', 'Tainted_Coins']]
nodes['Account'] = nodes['Account'].apply(lambda id: id[:id_len])
nodes['Account'] = nodes['Account'].apply(lambda id: CARL_NICK if id == CARL else ROSS_NICK if id == ROSS else id)
driver = GraphDatabase.driver(**NEO4J)
with driver.session() as session:
tx = session.begin_transaction()
print('Loading', len(nodes), 'accounts')
for index, row in nodes.iterrows():
if index % 2000 == 0:
print('Committing', index - 2000, '...', index)
tx.commit()
tx = session.begin_transaction()
tx.run("""
CREATE (a:Account {
Account: $Account,
USD: $USD, USD_avg: $USD_avg, USD_max: $USD_max, Tainted_Coins: $Tainted_Coins
})
RETURN id(a)
""", **row)
if index % 2000 == 0:
print(index)
print('Committing rest')
tx.commit()
tx = session.begin_transaction()
print('Creating index on Account')
tx.run(""" CREATE INDEX ON :Account(Account) """)
tx.commit()
STATUS=1000
BATCH=2000
driver = GraphDatabase.driver(**NEO4J)
with driver.session() as session:
tx = session.begin_transaction()
print('Loading', len(edges), 'transactions')
for index, row in edges.iterrows():
tx.run("""MATCH (a:Account),(b:Account)
WHERE a.Account = $Source AND b.Account = $Destination
CREATE (a)-[r:PAYMENT {
Source: $Source, Destination: $Destination, USD: $USD, Date: $Date, Transaction: $Transaction
}]->(b)
""", **row)
if index % STATUS == 0:
print(index)
if index % BATCH == 0 and index > 0:
print('sending batch out')
tx.commit()
print('... done')
tx = session.begin_transaction()
tx.commit()
Cypher Demos#
1a. Warmup: Visualize all $7K - $10K transactions#
Try panning and zooming (same touchpad/mouse controls as Google Maps), and clicking on individual wallets and transactions.
[ ]:
g = graphistry.cypher("""
MATCH (a)-[r:PAYMENT]->(b) WHERE r.USD > 7000 AND r.USD < 10000 RETURN a, r, b ORDER BY r.USD DESC
""")
[ ]:
g.plot()
Screenshot
1b. Cleanup: Configure node and edge titles to use amount fields#
Static config: We can preconfigure the visualization from directly within the notebook
Dynamic config: Try dynamically improving the visualization on-the-fly within the tool by
Do
add histogram for...
onedge:USD
andpoint:USD_MAX
Set edge/point coloring using them, and selecting a “Gradient (Spectral7 7)” blend, and toggling to reverse order (so cold to hot).
For
point:USD_MAX
, toggle it to controling point size, and in theScene settings
, increase the point size slider
[ ]:
g = g\
.bind(point_title='Account')\
.bind(edge_title='USD')
g.plot()
2. Look for all transactions 1-5 hops from embezzling DEA Agent Carl Force#
2a. Downstream#
Where did most of Carl’s money go? * Try setting up filters on edge:USD
to separate out small vs big money flows.
[ ]:
g.cypher("""
match (a)-[r:PAYMENT*1..20]->(b)
where a.Account = $root and ALL(transfer IN r WHERE transfer.USD > $min_amount and transfer.USD < $max_amount )
return a, r, b
""",
{'root': "Carl Force (DEA)",
'min_amount': 999,
'max_amount': 99999}).plot()
Screenshot:
2b. Upstream#
From where did Carl get most of his money?
[ ]:
g.cypher("""
match (a)-[r:PAYMENT*1..10]->(b)
where b.Account=$sink and ALL(transfer IN r WHERE transfer.USD > $min_amount and transfer.USD < $max_amount )
return r, a, b
""",
{'sink': "Carl Force (DEA)",
'min_amount': 1999,
'max_amount': 99999}).plot()
Screenshot:
3. Paths between Silk Road and Carl Force#
[ ]:
g.cypher("match (a)-[r:PAYMENT*1..10]->(b) where a.Account=$silk and b.Account=$dea return r, a, b",
{'dea': "Carl Force (DEA)", "silk": "Ross Ulbricht (SilkRoad)"}).plot()
Further Reading#
Python client tutorials & demos: graphistry/pygraphistry
DEA incident: https://arstechnica.com/tech-policy/2016/08/stealing-bitcoins-with-badges-how-silk-roads-dirty-cops-got-caught/