Hypergraphs#
Hypergraphs are graphs where edges may connect more than two nodes, such as an event involving multiple entities.
Graphistry encodes hypergraphs as regular graphs of two forms. One is a bipartite graph between hypernodes and regular nodes connected by hyperedges. The other is regular nodes connected by hyperedges. In both cases, each hyperedge is encoded by multiple regular src/dst edges.
Hypergraph#
- graphistry.PlotterBase.PlotterBase.hypergraph = <function PlotterBase.hypergraph>
Transform a dataframe into a hypergraph.
- Parameters:
raw_events (pandas.DataFrame) – Dataframe to transform (pandas or cudf).
entity_types (Optional[list]) – Columns (strings) to turn into nodes, None signifies all
opts (dict) – See below
drop_edge_attrs (bool) – Whether to include each row’s attributes on its edges, defaults to False (include)
verbose (bool) – Whether to print size information
direct (bool) – Omit hypernode and instead strongly connect nodes in an event
engine (bool) – String (pandas, cudf, …) for engine to use
npartitions (Optional[int]) – For distributed engines, how many coarse-grained pieces to split events into
chunksize (Optional[int]) – For distributed engines, split events after chunksize rows
drop_na (bool)
Create a graph out of the dataframe, and return the graph components as dataframes, and the renderable result Plotter. Hypergraphs reveal relationships between rows and between column values. This transform is useful for lists of events, samples, relationships, and other structured high-dimensional data.
Specify local compute engine by passing engine=’pandas’, ‘cudf’, ‘dask’, ‘dask_cudf’ (default: ‘pandas’). If events are not in that engine’s format, they will be converted into it.
The transform creates a node for every unique value in the entity_types columns (default: all columns). If direct=False (default), every row is also turned into a node. Edges are added to connect every table cell to its originating row’s node, or if direct=True, to the other nodes from the same row. Nodes are given the attribute ‘type’ corresponding to the originating column name, or in the case of a row, ‘EventID’. Options further control the transform, such column category definitions for controlling whether values reocurring in different columns should be treated as one node, or whether to only draw edges between certain column type pairs.
Consider a list of events. Each row represents a distinct event, and each column some metadata about an event. If multiple events have common metadata, they will be transitively connected through those metadata values. The layout algorithm will try to cluster the events together. Conversely, if an event has unique metadata, the unique metadata will turn into nodes that only have connections to the event node, and the clustering algorithm will cause them to form a ring around the event node.
Best practice is to set EVENTID to a row’s unique ID, SKIP to all non-categorical columns (or entity_types to all categorical columns), and CATEGORY to group columns with the same kinds of values.
To prevent creating nodes for null values, set drop_na=True. Some dataframe engines may have undesirable null handling, and recommend replacing None values with np.nan .
The optional
opts={...}
configuration options are:‘EVENTID’: Column name to inspect for a row ID. By default, uses the row index.
‘CATEGORIES’: Dictionary mapping a category name to inhabiting columns. E.g., {‘IP’: [‘srcAddress’, ‘dstAddress’]}. If the same IP appears in both columns, this makes the transform generate one node for it, instead of one for each column.
‘DELIM’: When creating node IDs, defines the separator used between the column name and node value
‘SKIP’: List of column names to not turn into nodes. For example, dates and numbers are often skipped.
‘EDGES’: For direct=True, instead of making all edges, pick column pairs. E.g., {‘a’: [‘b’, ‘d’], ‘d’: [‘d’]} creates edges between columns a->b and a->d, and self-edges d->d.
- Returns:
{‘entities’: DF, ‘events’: DF, ‘edges’: DF, ‘nodes’: DF, ‘graph’: Plotter}
- Return type:
dict
- Parameters:
entity_types (List[str] | None)
opts (dict)
drop_na (bool)
drop_edge_attrs (bool)
verbose (bool)
direct (bool)
engine (str)
npartitions (int | None)
chunksize (int | None)
Example: Connect user<-row->boss
import graphistry users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']}) h = graphistry.hypergraph(users_df) g = h['graph'].plot()
Example: Connect user->boss
import graphistry users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']}) h = graphistry.hypergraph(users_df, direct=True) g = h['graph'].plot()
Example: Connect user<->boss
import graphistry users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']}) h = graphistry.hypergraph(users_df, direct=True, opts={'EDGES': {'user': ['boss'], 'boss': ['user']}}) g = h['graph'].plot()
Example: Only consider some columns for nodes
import graphistry users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']}) h = graphistry.hypergraph(users_df, entity_types=['boss']) g = h['graph'].plot()
Example: Collapse matching user::<id> and boss::<id> nodes into one person::<id> node
import graphistry users_df = pd.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']}) h = graphistry.hypergraph(users_df, opts={'CATEGORIES': {'person': ['user', 'boss']}}) g = h['graph'].plot()
Example: Use cudf engine instead of pandas
import cudf, graphistry users_gdf = cudf.DataFrame({'user': ['a','b','x'], 'boss': ['x', 'x', 'y']}) h = graphistry.hypergraph(users_gdf, engine='cudf') g = h['graph'].plot()
- hypergraph
Primary alias for function
graphistry.hyper_dask.hypergraph()
.
- class graphistry.hyper_dask.HyperBindings(TITLE='nodeTitle', DELIM='::', NODEID='nodeID', ATTRIBID='attribID', EVENTID='EventID', EVENTTYPE='event', SOURCE='src', DESTINATION='dst', CATEGORY='category', NODETYPE='type', EDGETYPE='edgeType', NULLVAL='null', SKIP=None, CATEGORIES={}, EDGES=None)#
Bases:
object
- Parameters:
TITLE (str)
DELIM (str)
NODEID (str)
ATTRIBID (str)
EVENTID (str)
EVENTTYPE (str)
SOURCE (str)
DESTINATION (str)
CATEGORY (str)
NODETYPE (str)
EDGETYPE (str)
NULLVAL (str)
SKIP (List[str] | None)
CATEGORIES (Dict[str, List[str]])
EDGES (Dict[str, List[str]] | None)
- class graphistry.hyper_dask.Hypergraph(g, defs, entities, event_entities, edges, source, destination, engine=Engine.PANDAS, debug=False)#
Bases:
object
- Parameters:
entities (Any)
event_entities (Any)
edges (Any)
source (str)
destination (str)
engine (Engine)
debug (bool)
- graphistry.hyper_dask.clean_events(events, defs, engine, npartitions=None, chunksize=None, dropna=False, debug=False)#
Copy with reset index and in the target engine format
- Parameters:
events (Any)
defs (HyperBindings)
engine (Engine)
npartitions (int | None)
chunksize (int | None)
dropna (bool)
debug (bool)
- Return type:
Any
- graphistry.hyper_dask.coerce_col_safe(s, to_dtype)#
- graphistry.hyper_dask.col2cat(cat_lookup, col)#
- Parameters:
cat_lookup (Dict[str, str])
col (str)
- graphistry.hyper_dask.concat(dfs, engine, debug=False)#
- Parameters:
dfs (List[Any])
engine (Engine)
- graphistry.hyper_dask.df_coercion(df, engine, npartitions=None, chunksize=None, debug=False)#
Go from df to engine of choice
- Supported coercions:
pd <- pd cudf <- pd, cudf ddf <- pd, ddf dgdf <- pd, cudf, dgdf
- Parameters:
df (Any)
engine (Engine)
npartitions (int | None)
chunksize (int | None)
debug (bool)
- Return type:
Any
- graphistry.hyper_dask.direct_edgelist_shape(entity_types, defs)#
Edges take format {src_col: [dest_col1, dest_col2], ….} If None, create connect all to all, leaving up to algorithm in which direction
- Parameters:
entity_types (List[str])
defs (HyperBindings)
- Return type:
Dict[str, List[str]]
- graphistry.hyper_dask.format_direct_edges(engine, events, entity_types, defs, edge_shape, drop_na, drop_edge_attrs, debug=False)#
- Parameters:
engine (Engine)
events (Any)
defs (HyperBindings)
drop_na (bool)
drop_edge_attrs (bool)
debug (bool)
- Return type:
Any
- graphistry.hyper_dask.format_entities(events, entity_types, defs, direct, drop_na, engine, npartitions, chunksize, debug=False)#
- Parameters:
events (Any)
entity_types (List[str])
defs (HyperBindings)
direct (bool)
drop_na (bool)
engine (Engine)
npartitions (int | None)
chunksize (int | None)
debug (bool)
- Return type:
Any
- graphistry.hyper_dask.format_entities_from_col(defs, cat_lookup, drop_na, engine, col_name, df_with_col, meta, debug)#
- For unique v in column col, create [{col: str(v), title: str(v), nodetype: col, nodeid: <cat><delim><v>}]
respect drop_na
respect colname overrides
receive+return pd.DataFrame / cudf.DataFrame depending on engine
- Parameters:
defs (HyperBindings)
cat_lookup (Dict[str, str])
drop_na (bool)
engine (Engine)
col_name (str)
df_with_col (Any)
meta (DataFrame)
debug (bool)
- Return type:
Any
- graphistry.hyper_dask.format_hyperedges(engine, events, entity_types, defs, drop_na, drop_edge_attrs, debug=False)#
- Parameters:
engine (Engine)
events (Any)
entity_types (List[str])
defs (HyperBindings)
drop_na (bool)
drop_edge_attrs (bool)
debug (bool)
- Return type:
Any
- graphistry.hyper_dask.format_hypernodes(events, defs, drop_na)#
- graphistry.hyper_dask.get_df_cons(engine)#
- Parameters:
engine (Engine)
- graphistry.hyper_dask.get_series_cons(engine, dtype='int32')#
- Parameters:
engine (Engine)
- graphistry.hyper_dask.hyperbinding(g, defs, entities, event_entities, edges, source, destination)#
- graphistry.hyper_dask.hypergraph(g, raw_events, entity_types=None, opts={}, drop_na=True, drop_edge_attrs=False, verbose=True, direct=False, engine='pandas', npartitions=None, chunksize=None, debug=False)#
- Internal details:
IDs currently strings: ${namespace(col)}${delim}${str(val)}
debug: sprinkle persist() to catch bugs earlier
- Parameters:
raw_events (Any)
entity_types (List[str] | None)
opts (dict)
drop_na (bool)
drop_edge_attrs (bool)
verbose (bool)
direct (bool)
engine (str)
npartitions (int | None)
chunksize (int | None)
debug (bool)
- graphistry.hyper_dask.make_reverse_lookup(categories)#
- graphistry.hyper_dask.mt_df(engine)#
- Parameters:
engine (Engine)
- graphistry.hyper_dask.mt_nodes(defs, events, entity_types, direct, engine)#
- Parameters:
defs (HyperBindings)
events (Any)
entity_types (List[str])
direct (bool)
engine (Engine)
- Return type:
DataFrame
- graphistry.hyper_dask.mt_series(engine, dtype='int32')#
- Parameters:
engine (Engine)
- graphistry.hyper_dask.screen_entities(events, entity_types, defs)#
List entity columns: Unskipped user-specified entities when provided, else unskipped cols
- Parameters:
events (Any)
entity_types (List[str] | None)
defs (HyperBindings)
- Return type:
List[str]
- graphistry.hyper_dask.series_cons(engine, arr, dtype='int32', npartitions=None, chunksize=None)#
- Parameters:
engine (Engine)
arr (List)
- graphistry.hyper_dask.shallow_copy(df, engine, debug=False)#
- Parameters:
df (Any)
engine (Engine)
debug (bool)
- Return type:
Any