Graphistry <> graphviz integration quickstart#

The graphviz engine is popular for layout of small graphs and rendering to static images. The Graphistry Python bindings to graphviz enable using pygraphistry as usual for quickly loading and manipulating your data, and then benefiting from graphviz for layout, and optionally, rendering.

The example below shows laying out and rendering company ownership data that is in a tree and benefits from graphviz’s high-quality layout engine.

Setup#

Notes:

  • You must install the graphviz engine, as well as its pygraphviz Python bindings and pygraphistry

  • graphviz is most known for its "dot" layout engine, and it includes others as well

  • graphviz is generally not recommended for layout of graphs over 10,000 nodes and edges

[20]:
#!apt-get install graphviz graphviz-dev

#!pip install -q graphistry[pygraphviz]
  Preparing metadata (setup.py) ... done
  Building wheel for graphistry (setup.py) ... done

Imports#

[102]:
from typing import Any, Dict, Literal, Optional
import logging
try:
  import pygraphviz as pgv
except (ImportError, ModuleNotFoundError):
  logging.error("ImportError: Did you install pygraphviz and the supporting native packages?")
  raise

import pandas as pd
import graphistry
from graphistry import Plottable
graphistry.register(api=3, username=FILL_ME_IN, password=FILL_ME_IN)

graphistry.__version__
[102]:
'0.34.5+12.g4dba3e6'

Sample graph: HSBC Beneficial ownership graph#

Sample data from openownership.org. Corporate ownership graphs often have deeply tree structure, and for bigger conglomerates with numerous subsidaries, officers, board officers, suppliers, and lenders, can greatly benefit from higher-quality tree layouts.

[2]:
companies_df = pd.DataFrame([{'label': 'Hsbc Finance (Netherlands)', 'n': '1862294673469042014'},
 {'label': 'Hsbc Holdings Plc', 'n': '7622088245850069747'},
 {'label': 'Unknown person(s)', 'n': '7622088245850069747-unknown'},
 {'label': 'HSBC PROPERTY (UK) LIMITED', 'n': '16634236373777089526'},
 {'label': 'HSBC ALTERNATIVE INVESTMENTS LIMITED',
  'n': '18011320449780894329'},
 {'label': 'HSBC INVESTMENT COMPANY LIMITED', 'n': '9134577322728469115'},
 {'label': 'HSBC IM PENSION TRUST LIMITED', 'n': '1446072728533515665'},
 {'label': 'MERCANTILE COMPANY LIMITED', 'n': '6904185395252167658'},
 {'label': 'Mp Payments Group Limited', 'n': '13630126251685975826'},
 {'label': 'MP PAYMENTS OPERATIONS LIMITED', 'n': '11514603667851101425'},
 {'label': 'MP PAYMENTS UK LIMITED', 'n': '13417892994160273884'},
 {'label': 'Hsbc Asia Pacific Holdings (Uk) Limited',
  'n': '2173486047275631423'},
 {'label': 'HSBC SECURITIES (JAPAN) LIMITED', 'n': '18045747820524565803'}])

ownership_df = pd.DataFrame([{'s': '7622088245850069747', 'd': '1862294673469042014'},
 {'s': '7622088245850069747-unknown', 'd': '7622088245850069747'},
 {'s': '1862294673469042014', 'd': '16634236373777089526'},
 {'s': '1862294673469042014', 'd': '18011320449780894329'},
 {'s': '1862294673469042014', 'd': '9134577322728469115'},
 {'s': '9134577322728469115', 'd': '1446072728533515665'},
 {'s': '9134577322728469115', 'd': '6904185395252167658'},
 {'s': '9134577322728469115', 'd': '13630126251685975826'},
 {'s': '13630126251685975826', 'd': '11514603667851101425'},
 {'s': '13630126251685975826', 'd': '13417892994160273884'},
 {'s': '9134577322728469115', 'd': '2173486047275631423'},
 {'s': '2173486047275631423', 'd': '18045747820524565803'},
 {'s': '9134577322728469115', 'd': '16634236373777089526'}])
[19]:
g = graphistry.edges(ownership_df, 's', 'd').nodes(companies_df, 'n').bind(point_title='label')
[33]:
g = g.nodes(g._nodes.assign(sz=1)).encode_point_size('sz')

Minimal tree layout and graphviz layout engines#

Graphviz provides 15+ layout engines you can use. General guidance is to use for graphs up to 10,000 nodes and engines.

The "dot" layout engine is best known due to its beautiful hierarchical layouts for directed acycle graphs like trees.

[35]:
g2 = g.layout_graphviz('dot')
g2.plot()
[35]:

Additional layout engines beyond "dot" are below. See also the graphviz layout engines documents. The same documentation, and the below section on global graph attributes, describe options you can pass in to different layout engines.

[6]:
from graphistry.plugins_types.graphviz_types import PROGS
PROGS
[6]:
['acyclic',
 'ccomps',
 'circo',
 'dot',
 'fdp',
 'gc',
 'gvcolor',
 'gvpr',
 'neato',
 'nop',
 'osage',
 'patchwork',
 'sccmap',
 'sfdp',
 'tred',
 'twopi',
 'unflatten']
[36]:
g2b = g.layout_graphviz('neato')
g2b.plot()
[36]:
[ ]:
from graphistry.plugins_types.graphviz_types import PROGS

Global attributes#

You can set global attributes. Parameter `graph_attr <https://graphviz.org/docs/graph/>`__ generally refers to layout engine options, while `edge_attr <https://graphviz.org/docs/edges/>`__ and `node_attr <https://graphviz.org/docs/nodes/>`__ are generally for default colors, sizes, shapes, etc.

[39]:
g2b = g.layout_graphviz(
    'dot',
    graph_attr={'ratio': 10},
    edge_attr={},
    node_attr={}
)
g2b.plot()
[39]:
[1]:

from graphistry.plugins_types.graphviz_types import GRAPH_ATTRS GRAPH_ATTRS
[1]:
['_background',
 'bb',
 'beautify',
 'bgcolor',
 'center',
 'charset',
 'class',
 'clusterrank',
 'colorscheme',
 'comment',
 'compound',
 'concentrate',
 'Damping',
 'defaultdist',
 'dim',
 'dimen',
 'diredgeconstraints',
 'dpi',
 'epsilon',
 'esep',
 'fontcolor',
 'fontname',
 'fontnames',
 'fontpath',
 'fontsize',
 'forcelabels',
 'gradientangle',
 'href',
 'id',
 'imagepath',
 'inputscale',
 'K',
 'label',
 'label_scheme',
 'labeljust',
 'labelloc',
 'landscape',
 'layerlistsep',
 'layers',
 'layerselect',
 'layersep',
 'layout',
 'levels',
 'levelsgap',
 'lheight',
 'linelength',
 'lp',
 'lwidth',
 'margin',
 'maxiter',
 'mclimit',
 'mindist',
 'mode',
 'model',
 'newrank',
 'nodesep',
 'nojustify',
 'normalize',
 'notranslate',
 'nslimit',
 'nslimit1',
 'oneblock',
 'ordering',
 'orientation',
 'outputorder',
 'overlap',
 'overlap_scaling',
 'overlap_shrink',
 'pack',
 'packmode',
 'pad',
 'page',
 'pagedir',
 'quadtree',
 'quantum',
 'rankdir',
 'ranksep',
 'ratio',
 'remincross',
 'repulsiveforce',
 'resolution',
 'root',
 'rotate',
 'rotation',
 'scale',
 'searchsize',
 'sep',
 'showboxes',
 'size',
 'smoothing',
 'sortv',
 'splines',
 'start',
 'style',
 'stylesheet',
 'target',
 'TBbalance',
 'tooltip',
 'truecolor',
 'URL',
 'viewport',
 'voro_margin',
 'xdotversion']
[2]:

from graphistry.plugins_types.graphviz_types import EDGE_ATTRS EDGE_ATTRS
[2]:
['arrowhead',
 'arrowsize',
 'arrowtail',
 'class',
 'color',
 'colorscheme',
 'comment',
 'constraint',
 'decorate',
 'dir',
 'edgehref',
 'edgetarget',
 'edgetooltip',
 'edgeURL',
 'fillcolor',
 'fontcolor',
 'fontname',
 'fontsize',
 'head_lp',
 'headclip',
 'headhref',
 'headlabel',
 'headport',
 'headtarget',
 'headtooltip',
 'headURL',
 'href',
 'id',
 'label',
 'labelangle',
 'labeldistance',
 'labelfloat',
 'labelfontcolor',
 'labelfontname',
 'labelfontsize',
 'labelhref',
 'labeltarget',
 'labeltooltip',
 'labelURL',
 'layer',
 'len',
 'lhead',
 'lp',
 'ltail',
 'minlen',
 'nojustify',
 'penwidth',
 'pos',
 'samehead',
 'sametail',
 'showboxes',
 'style',
 'tail_lp',
 'tailclip',
 'tailhref',
 'taillabel',
 'tailport',
 'tailtarget',
 'tailtooltip',
 'tailURL',
 'target',
 'tooltip',
 'URL',
 'weight',
 'xlabel',
 'xlp']
[3]:

from graphistry.plugins_types.graphviz_types import NODE_ATTRS NODE_ATTRS
[3]:
['area',
 'class',
 'color',
 'colorscheme',
 'comment',
 'distortion',
 'fillcolor',
 'fixedsize',
 'fontcolor',
 'fontname',
 'fontsize',
 'gradientangle',
 'group',
 'height',
 'href',
 'id',
 'image',
 'imagepos',
 'imagescale',
 'label',
 'labelloc',
 'layer',
 'margin',
 'nojustify',
 'ordering',
 'orientation',
 'penwidth',
 'peripheries',
 'pin',
 'pos',
 'rects',
 'regular',
 'root',
 'samplepoints',
 'shape',
 'shapefile',
 'showboxes',
 'sides',
 'skew',
 'sortv',
 'style',
 'target',
 'tooltip',
 'URL',
 'vertices',
 'width',
 'xlabel',
 'xlp',
 'z']

Static image rendering and entity-level attributes#

graphviz suports rendering to a static file in various image formats such as png.

You can add graphviz-specific columns to your node and edge dataframes that configure per-row render settings. These use the same names as in the above global attribute guidance, such as color, shape, and label.

Adding a column for an attribute will typically disable the global attribute. For example, creating setting node column "shape" with values "star" and None, and global node attribute "shape" with value value "box". All the nodes with shape == "star" will render as a star in the static image, and the rows with value None will not default to the global node attribute "box", but to graphviz’s general default of an oval.

[76]:
g._nodes.apply(lambda row: row['n'], axis=1)
[76]:
0
0 1862294673469042014
1 7622088245850069747
2 7622088245850069747-unknown
3 16634236373777089526
4 18011320449780894329
5 9134577322728469115
6 1446072728533515665
7 6904185395252167658
8 13630126251685975826
9 11514603667851101425
10 13417892994160273884
11 2173486047275631423
12 18045747820524565803

[68]:
g._nodes.apply(lambda row: print('row', row['n']), 1)
row 1862294673469042014
row 7622088245850069747
row 7622088245850069747-unknown
row 16634236373777089526
row 18011320449780894329
row 9134577322728469115
row 1446072728533515665
row 6904185395252167658
row 13630126251685975826
row 11514603667851101425
row 13417892994160273884
row 2173486047275631423
row 18045747820524565803
[68]:
0
0 None
1 None
2 None
3 None
4 None
5 None
6 None
7 None
8 None
9 None
10 None
11 None
12 None

[99]:
# row-level attrs

root_id = '7622088245850069747-unknown'

g2c = g.nodes(g._nodes.assign(
    label=g._nodes.apply(lambda row: "ROOT: Unknown person(s)" if row['n'] == root_id else row['label'], axis=1),
    shape=g._nodes.n.apply(lambda n: "box" if n == root_id else None),
    color=g._nodes.n.apply(lambda n: "blue" if n == root_id else 'red')
)).edges(g._edges.assign(
    color=g._edges[g._source].apply(lambda n: 'blue' if n == root_id else None)
))


# Save a static graphviz render
g2c_positioned = g2c.layout_graphviz(
    "dot",
    render_to_disk=True,
    path=f'./graph.png',
    graph_attr={},
    edge_attr={},
    node_attr={'color': 'green'},  # ignored due to g2c._nodes.color
    format='png'
)

g2c_positioned._nodes.head()
[99]:
n x y label sz shape color
0 1862294673469042014 381.39 234.0 Hsbc Finance (Netherlands) 1 None red
1 16634236373777089526 140.39 90.0 HSBC PROPERTY (UK) LIMITED 1 None red
2 18011320449780894329 381.39 162.0 HSBC ALTERNATIVE INVESTMENTS LIMITED 1 None red
3 9134577322728469115 778.39 162.0 HSBC INVESTMENT COMPANY LIMITED 1 None red
4 1446072728533515665 454.39 90.0 HSBC IM PENSION TRUST LIMITED 1 None red
[98]:
from IPython.display import Image
Image(filename='./graph.png')
[98]:
../../../_images/demos_demos_databases_apis_graphviz_graphviz_24_0.png
[101]:
g2d = g.layout_graphviz('circo')
g2d.plot()
[101]:
[ ]: