GPU-Accelerated Group-in-a-Box Layout for Analyzing Large Graphs

GPU-Accelerated Group-in-a-Box Layout for Analyzing Large Graphs#

Visualizing interactions within large, complex datasets can be challenging. The Group-in-a-Box layout simplifies this by arranging communities into grids, making it easier to analyze relationships within and between groups. It’s especially useful for deeper insights in social media analysis.

PyGraphistry extends the Group-in-a-box Layout for Multifaceted Analysis of Communities with high-speed performance, flexible customization, and integration into the PyGraphistry ecosystem.

Key Benefits#

  • Faster Insights: GPU support accelerates commercial workloads, reducing runtime for a 3M+ edge social network from 18 minutes to just 26 seconds. CPU mode likewise benefits from algorithmic and vector optimizations. The result is rapid analysis iterations that unblocks workflows and achieves previously out-of-reach results.

  • Customizable Layouts:

    • Flexible Partitioning: Choose from built-in algorithms or custom keys to align partitions with data structures like regional clusters or demographics.

    • Adaptive Community Layouts: Focus on tightly connected groups or highlight outliers to uncover hidden patterns.

  • Clear Visualization of Isolated Nodes: The PyGraphistry variant additionally arranges isolated nodes in circles around their connected counterparts to handle the common case of noisey nodes within a partition.

Tutorial#

Follow this tutorial to master PyGraphistry’s Group-in-a-Box layout:

  • Load a 45,000-edge blockchain transaction graph

  • Layout in seconds on CPU or 1 second on GPU

  • Customize partitioning and group layouts.

[31]:
import pandas as pd
import graphistry
graphistry.__version__
[31]:
'0+unknown'
[2]:
# API key page (free GPU account): https://hub.graphistry.com/users/personal/key/
graphistry.register(
    api=3,
    personal_key_id=FILL_ME_IN,
    personal_key_secret=FILL_ME_IN
)

Data#

[12]:
e_df = pd.read_csv(
    'https://raw.githubusercontent.com/graphistry/pygraphistry/refs/heads/master/demos/data/transactions.csv',
)

print(e_df.shape)
e_df.head()
(45117, 6)
[12]:
Amount $ Date Destination Source Transaction ID isTainted
0 3223.975200 1.385240e+12 84a0b53e1ac008b8dd0fd6212d4b7fa2... 2dd13954e18508bb8b3a41d96a022be9... b6eb8ba20df31fa74fbe7755f58c18f82a599d6bb5fa79... 0
1 3708.021600 1.401500e+12 3b62a891b99969042d4e6ac8158d0a18... 7c74d3afb41e536e26948a1d2455a7c7... 60df3c67063e136a0c9715edcd12ae717e6f9ed492afe2... 0
2 2.480000 1.398560e+12 3b62a891b99969042d4e6ac8158d0a18... 50dced19b8ee41114916bf3ca894f455... a6aafd3d85600844536b8a5f2c255686c33dc4969e68a4... 0
3 991986.487600 1.385540e+12 e52baeb69fbd7a24f3ef825bc4e20973... 4289f81f7ce5dfba9bf6e20794c76a9f... 1cea981590e8a3ac67ba872f9411412c6b6f4dc7358071... 0
4 902.063544 1.387850e+12 a209cf1b3dc79338896ffa773b4249ff... 7f0e2244e41718b68e36ed0c810d084b... a399e3920b1e6a1e487b0559821f823065970499e74cbc... 0
[13]:
g = graphistry.edges(e_df, 'Source', 'Destination')
[14]:
g._edges.shape
[14]:
(45117, 6)
[17]:
g.materialize_nodes()._nodes.shape
[17]:
(28832, 1)

Regular layout#

[18]:
g.plot()
[18]:

Group-in-a-box: CPU Mode#

Passing in a pandas dataframe defaults to using igraph rf layout (CPU) within each partition. Even in CPU mode, it is significantly faster than published group-in-a-box algorithms through a combination of asymptotic and machine-oriented optimizations.

[19]:
g2 = g.group_in_a_box_layout()
edge index g._edge not set so using edge index as ID; set g._edge via g.edges(), or change merge_if_existing to Falseedge index g._edge __edge_index__ missing as attribute in ig; using ig edge order for IDsPandas engine detected. FA2 falling back to igraph fredge index g._edge not set so using edge index as ID; set g._edge via g.edges(), or change merge_if_existing to Falseedge index g._edge __edge_index__ missing as attribute in ig; using ig edge order for IDs
[20]:
g2.plot()
[20]:

GPU Mode#

Switching the input to GPU dataframes automatically transitions execution to GPU mode, which is dramatically faster.

The GPU mode defaults to ForceAtlas2, which generally shows a superior layout within a group when zoomed in.

[26]:
g_gpu = g.to_cudf()

g2_gpu = g_gpu.group_in_a_box_layout()
[27]:
g2_gpu.plot()
[27]:

Configure: Precomputed partition and alternate layout#

  • Use an existing node attribute to predetermine box membership

  • Control the layout algorithm and its parameters

[28]:
g_louvain = g.to_cudf().compute_cugraph('louvain', directed=False)
assert 'louvain' in g_louvain._nodes
[29]:
from graphistry.plugins.igraph import layout_algs as igraph_layouts
from graphistry.plugins.cugraph import layout_algs as cugraph_layouts

{
    'igraph_layout_algs': ', '.join(igraph_layouts),
    'cugraph_layout_algs': ', '.join(cugraph_layouts)
}
[29]:
{'igraph_layout_algs': 'auto, automatic, bipartite, circle, circular, dh, davidson_harel, drl, drl_3d, fr, fruchterman_reingold, fr_3d, fr3d, fruchterman_reingold_3d, grid, grid_3d, graphopt, kk, kamada_kawai, kk_3d, kk3d, kamada_kawai_3d, lgl, large, large_graph, mds, random, random_3d, rt, tree, reingold_tilford, rt_circular, reingold_tilford_circular, sphere, spherical, circle_3d, circular_3d, star, sugiyama',
 'cugraph_layout_algs': 'force_atlas2'}
[30]:
(g_louvain
     .group_in_a_box_layout(
         partition_key='louvain',
         layout_alg='force_atlas2',
         layout_params={
             'lin_log_mode': True
         }
     )
).plot()
[30]:
[ ]:

[ ]: