Hypergraphs#

Hypergraphs are graphs where edges may connect more than two nodes, such as an event involving multiple entities.

Graphistry encodes hypergraphs as regular graphs of two forms. One is a bipartite graph between hypernodes and regular nodes connected by hyperedges. The other is regular nodes connected by hyperedges. In both cases, each hyperedge is encoded by multiple regular src/dst edges.

Hypergraph#

graphistry.PlotterBase.PlotterBase.hypergraph = <function PlotterBase.hypergraph>

Transform a dataframe into a hypergraph.

Parameters:
  • raw_events (pandas.DataFrame) – Dataframe to transform (pandas or cudf).

  • entity_types (Optional[list]) – Columns (strings) to turn into nodes, None signifies all

  • opts (dict) – See below

  • drop_edge_attrs (bool) – Whether to include each row’s attributes on its edges, defaults to False (include)

  • verbose (bool) – Whether to print size information

  • direct (bool) – Omit hypernode and instead strongly connect nodes in an event

  • engine (bool) – String (pandas, cudf, …) for engine to use

  • npartitions (Optional[int]) – For distributed engines, how many coarse-grained pieces to split events into

  • chunksize (Optional[int]) – For distributed engines, split events after chunksize rows

  • drop_na (bool)

Create a graph out of the dataframe, and return the graph components as dataframes, and the renderable result Plotter. Hypergraphs reveal relationships between rows and between column values. This transform is useful for lists of events, samples, relationships, and other structured high-dimensional data.

Specify local compute engine by passing engine=’pandas’, ‘cudf’, ‘dask’, ‘dask_cudf’ (default: ‘pandas’). If events are not in that engine’s format, they will be converted into it.

The transform creates a node for every unique value in the entity_types columns (default: all columns). If direct=False (default), every row is also turned into a node. Edges are added to connect every table cell to its originating row’s node, or if direct=True, to the other nodes from the same row. Nodes are given the attribute ‘type’ corresponding to the originating column name, or in the case of a row, ‘EventID’. Options further control the transform, such column category definitions for controlling whether values reocurring in different columns should be treated as one node, or whether to only draw edges between certain column type pairs.

Consider a list of events. Each row represents a distinct event, and each column some metadata about an event. If multiple events have common metadata, they will be transitively connected through those metadata values. The layout algorithm will try to cluster the events together. Conversely, if an event has unique metadata, the unique metadata will turn into nodes that only have connections to the event node, and the clustering algorithm will cause them to form a ring around the event node.

Best practice is to set EVENTID to a row’s unique ID, SKIP to all non-categorical columns (or entity_types to all categorical columns), and CATEGORY to group columns with the same kinds of values.

To prevent creating nodes for null values, set drop_na=True. Some dataframe engines may have undesirable null handling, and recommend replacing None values with np.nan .

The optional opts={...} configuration options are:

  • ‘EVENTID’: Column name to inspect for a row ID. By default, uses the row index.

  • ‘CATEGORIES’: Dictionary mapping a category name to inhabiting columns. E.g., {‘IP’: [‘srcAddress’, ‘dstAddress’]}. If the same IP appears in both columns, this makes the transform generate one node for it, instead of one for each column.

  • ‘DELIM’: When creating node IDs, defines the separator used between the column name and node value

  • ‘SKIP’: List of column names to not turn into nodes. For example, dates and numbers are often skipped.

  • ‘EDGES’: For direct=True, instead of making all edges, pick column pairs. E.g., {‘a’: [‘b’, ‘d’], ‘d’: [‘d’]} creates edges between columns a->b and a->d, and self-edges d->d.

Returns:

{‘entities’: DF, ‘events’: DF, ‘edges’: DF, ‘nodes’: DF, ‘graph’: Plotter}

Return type:

dict

Parameters:
  • entity_types (List[str] | None)

  • opts (dict)

  • drop_na (bool)

  • drop_edge_attrs (bool)

  • verbose (bool)

  • direct (bool)

  • engine (str)

  • npartitions (int | None)

  • chunksize (int | None)

Example: Connect user<-row->boss

import graphistry
users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']})
h = graphistry.hypergraph(users_df)
g = h['graph'].plot()

Example: Connect user->boss

import graphistry
users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']})
h = graphistry.hypergraph(users_df, direct=True)
g = h['graph'].plot()

Example: Connect user<->boss

import graphistry
users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']})
h = graphistry.hypergraph(users_df, direct=True, opts={'EDGES': {'user': ['boss'], 'boss': ['user']}})
g = h['graph'].plot()

Example: Only consider some columns for nodes

import graphistry
users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']})
h = graphistry.hypergraph(users_df, entity_types=['boss'])
g = h['graph'].plot()

Example: Collapse matching user::<id> and boss::<id> nodes into one person::<id> node

import graphistry
users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']})
h = graphistry.hypergraph(users_df, opts={'CATEGORIES': {'person': ['user', 'boss']}})
g = h['graph'].plot()

Example: Use cudf engine instead of pandas

import cudf, graphistry
users_gdf = cudf.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']})
h = graphistry.hypergraph(users_gdf, engine='cudf')
g = h['graph'].plot()
hypergraph

Primary alias for function graphistry.hyper_dask.hypergraph().

class graphistry.hyper_dask.HyperBindings(TITLE='nodeTitle', DELIM='::', NODEID='nodeID', ATTRIBID='attribID', EVENTID='EventID', EVENTTYPE='event', SOURCE='src', DESTINATION='dst', CATEGORY='category', NODETYPE='type', EDGETYPE='edgeType', NULLVAL='null', SKIP=None, CATEGORIES={}, EDGES=None)#

Bases: object

Parameters:
  • TITLE (str)

  • DELIM (str)

  • NODEID (str)

  • ATTRIBID (str)

  • EVENTID (str)

  • EVENTTYPE (str)

  • SOURCE (str)

  • DESTINATION (str)

  • CATEGORY (str)

  • NODETYPE (str)

  • EDGETYPE (str)

  • NULLVAL (str)

  • SKIP (List[str] | None)

  • CATEGORIES (Dict[str, List[str]])

  • EDGES (Dict[str, List[str]] | None)

class graphistry.hyper_dask.Hypergraph(g, defs, entities, event_entities, edges, source, destination, engine=Engine.PANDAS, debug=False)#

Bases: object

Parameters:
  • entities (Any)

  • event_entities (Any)

  • edges (Any)

  • source (str)

  • destination (str)

  • engine (Engine)

  • debug (bool)

graphistry.hyper_dask.clean_events(events, defs, engine, npartitions=None, chunksize=None, dropna=False, debug=False)#

Copy with reset index and in the target engine format

Parameters:
  • events (Any)

  • defs (HyperBindings)

  • engine (Engine)

  • npartitions (int | None)

  • chunksize (int | None)

  • dropna (bool)

  • debug (bool)

Return type:

Any

graphistry.hyper_dask.coerce_col_safe(s, to_dtype)#
graphistry.hyper_dask.col2cat(cat_lookup, col)#
Parameters:
  • cat_lookup (Dict[str, str])

  • col (str)

graphistry.hyper_dask.concat(dfs, engine, debug=False)#
Parameters:
  • dfs (List[Any])

  • engine (Engine)

graphistry.hyper_dask.df_coercion(df, engine, npartitions=None, chunksize=None, debug=False)#

Go from df to engine of choice

Supported coercions:

pd <- pd cudf <- pd, cudf ddf <- pd, ddf dgdf <- pd, cudf, dgdf

Parameters:
  • df (Any)

  • engine (Engine)

  • npartitions (int | None)

  • chunksize (int | None)

  • debug (bool)

Return type:

Any

graphistry.hyper_dask.direct_edgelist_shape(entity_types, defs)#

Edges take format {src_col: [dest_col1, dest_col2], ….} If None, create connect all to all, leaving up to algorithm in which direction

Parameters:
Return type:

Dict[str, List[str]]

graphistry.hyper_dask.format_direct_edges(engine, events, entity_types, defs, edge_shape, drop_na, drop_edge_attrs, debug=False)#
Parameters:
  • engine (Engine)

  • events (Any)

  • defs (HyperBindings)

  • drop_na (bool)

  • drop_edge_attrs (bool)

  • debug (bool)

Return type:

Any

graphistry.hyper_dask.format_entities(events, entity_types, defs, direct, drop_na, engine, npartitions, chunksize, debug=False)#
Parameters:
  • events (Any)

  • entity_types (List[str])

  • defs (HyperBindings)

  • direct (bool)

  • drop_na (bool)

  • engine (Engine)

  • npartitions (int | None)

  • chunksize (int | None)

  • debug (bool)

Return type:

Any

graphistry.hyper_dask.format_entities_from_col(defs, cat_lookup, drop_na, engine, col_name, df_with_col, meta, debug)#
For unique v in column col, create [{col: str(v), title: str(v), nodetype: col, nodeid: <cat><delim><v>}]
  • respect drop_na

  • respect colname overrides

  • receive+return pd.DataFrame / cudf.DataFrame depending on engine

Parameters:
  • defs (HyperBindings)

  • cat_lookup (Dict[str, str])

  • drop_na (bool)

  • engine (Engine)

  • col_name (str)

  • df_with_col (Any)

  • meta (DataFrame)

  • debug (bool)

Return type:

Any

graphistry.hyper_dask.format_hyperedges(engine, events, entity_types, defs, drop_na, drop_edge_attrs, debug=False)#
Parameters:
  • engine (Engine)

  • events (Any)

  • entity_types (List[str])

  • defs (HyperBindings)

  • drop_na (bool)

  • drop_edge_attrs (bool)

  • debug (bool)

Return type:

Any

graphistry.hyper_dask.format_hypernodes(events, defs, drop_na)#
graphistry.hyper_dask.get_df_cons(engine)#
Parameters:

engine (Engine)

graphistry.hyper_dask.get_series_cons(engine, dtype='int32')#
Parameters:

engine (Engine)

graphistry.hyper_dask.hyperbinding(g, defs, entities, event_entities, edges, source, destination)#
graphistry.hyper_dask.hypergraph(g, raw_events, entity_types=None, opts={}, drop_na=True, drop_edge_attrs=False, verbose=True, direct=False, engine='pandas', npartitions=None, chunksize=None, debug=False)#
Internal details:
  • IDs currently strings: ${namespace(col)}${delim}${str(val)}

  • debug: sprinkle persist() to catch bugs earlier

Parameters:
  • raw_events (Any)

  • entity_types (List[str] | None)

  • opts (dict)

  • drop_na (bool)

  • drop_edge_attrs (bool)

  • verbose (bool)

  • direct (bool)

  • engine (str)

  • npartitions (int | None)

  • chunksize (int | None)

  • debug (bool)

graphistry.hyper_dask.make_reverse_lookup(categories)#
graphistry.hyper_dask.mt_df(engine)#
Parameters:

engine (Engine)

graphistry.hyper_dask.mt_nodes(defs, events, entity_types, direct, engine)#
Parameters:
  • defs (HyperBindings)

  • events (Any)

  • entity_types (List[str])

  • direct (bool)

  • engine (Engine)

Return type:

DataFrame

graphistry.hyper_dask.mt_series(engine, dtype='int32')#
Parameters:

engine (Engine)

graphistry.hyper_dask.screen_entities(events, entity_types, defs)#

List entity columns: Unskipped user-specified entities when provided, else unskipped cols

Parameters:
  • events (Any)

  • entity_types (List[str] | None)

  • defs (HyperBindings)

Return type:

List[str]

graphistry.hyper_dask.series_cons(engine, arr, dtype='int32', npartitions=None, chunksize=None)#
Parameters:
  • engine (Engine)

  • arr (List)

graphistry.hyper_dask.shallow_copy(df, engine, debug=False)#
Parameters:
  • df (Any)

  • engine (Engine)

  • debug (bool)

Return type:

Any