graphistry.layout package

Subpackages

Submodules

graphistry.compute.ComputeMixin module

class graphistry.compute.ComputeMixin.ComputeMixin(*args, **kwargs)

Bases: object

chain(*args, **kwargs)

Experimental: Chain a list of operations

Return subgraph of matches according to the list of node & edge matchers

If any matchers are named, add a correspondingly named boolean-valued column to the output

Parameters

ops – List[ASTobject] Various node and edge matchers

Returns

Plotter

Return type

Plotter

Example: Find nodes of some type
from graphistry.ast import n

people_nodes_df = g.chain([ n({"type": "person"}) ])._nodes
Example: Find 2-hop edge sequences with some attribute
from graphistry.ast import e_forward

g_2_hops = g.chain([ e_forward({"interesting": True}, hops=2) ])
g_2_hops.plot()

Example: Find any node 1-2 hops out from another node, and label each hop

from graphistry.ast import n, e_undirected

g_2_hops = g.chain([ n({g._node: "a"}), e_undirected(name="hop1"), e_undirected(name="hop2") ])
print('# first-hop edges:', len(g_2_hops._edges[ g_2_hops._edges.hop1 == True ]))

Example: Transaction nodes between two kinds of risky nodes

from graphistry.ast import n, e_forward, e_reverse

g_risky = g.chain([
    n({"risk1": True}),
    e_forward(to_fixed=True),
    n({"type": "transaction"}, name="hit"),
    e_reverse(to_fixed=True),
    n({"risk2": True})
])
print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ]))
collapse(node, attribute, column, self_edges=False, unwrap=False, verbose=False)

Topology-aware collapse by given column attribute starting at node

Traverses directed graph from start node node and collapses clusters of nodes that share the same property so that topology is preserved.

Parameters
  • node (Union[str, int]) – start node to begin traversal

  • attribute (Union[str, int]) – the given attribute to collapse over within column

  • column (Union[str, int]) – the column of nodes DataFrame that contains attribute to collapse over

:returns:A new Graphistry instance with nodes and edges DataFrame containing collapsed nodes and edges given by column attribute – nodes and edges DataFrames contain six new columns collapse_{node | edges} and final_{node | edges}, while original (node, src, dst) columns are left untouched :rtype: Plottable

Parameters
  • self_edges (bool) –

  • unwrap (bool) –

  • verbose (bool) –

drop_nodes(nodes)

return g with any nodes/edges involving the node id series removed

filter_edges_by_dict(*args, **kwargs)

filter edges to those that match all values in filter_dict

filter_nodes_by_dict(*args, **kwargs)

filter nodes to those that match all values in filter_dict

get_degrees(col='degree', degree_in='degree_in', degree_out='degree_out')

Decorate nodes table with degree info

Edges must be dataframe-like: pandas, cudf, …

Parameters determine generated column names

Warning: Self-cycles are currently double-counted. This may change.

Example: Generate degree columns

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.get_degrees()
print(g2._nodes)  # pd.DataFrame with 'id', 'degree', 'degree_in', 'degree_out'
Parameters
  • col (str) –

  • degree_in (str) –

  • degree_out (str) –

get_indegrees(col='degree_in')

See get_degrees

Parameters

col (str) –

get_outdegrees(col='degree_out')

See get_degrees

Parameters

col (str) –

get_topological_levels(level_col='level', allow_cycles=True, warn_cycles=True, remove_self_loops=True)

Label nodes on column level_col based on topological sort depth Supports pandas + cudf, using parallelism within each level computation Options: * allow_cycles: if False and detects a cycle, throw ValueException, else break cycle by picking a lowest-in-degree node * warn_cycles: if True and detects a cycle, proceed with a warning * remove_self_loops: preprocess by removing self-cycles. Avoids allow_cycles=False, warn_cycles=True messages.

Example:

edges_df = gpd.DataFrame({‘s’: [‘a’, ‘b’, ‘c’, ‘d’],’d’: [‘b’, ‘c’, ‘e’, ‘e’]}) g = graphistry.edges(edges_df, ‘s’, ‘d’) g2 = g.get_topological_levels() g2._nodes.info() # pd.DataFrame with | ‘id’ , ‘level’ |

Parameters
  • level_col (str) –

  • allow_cycles (bool) –

  • warn_cycles (bool) –

  • remove_self_loops (bool) –

Return type

Plottable

hop(*args, **kwargs)

Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources

g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). hops: how many hops to consider, if any bound (default 1) to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: ‘forward’, ‘reverse’, ‘undirected’ edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) source_node_match: dict of kv-pairs to match nodes before hopping destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use)

keep_nodes(nodes)

Limit nodes and edges to those selected by parameter nodes For edges, both source and destination must be in nodes Nodes can be a list or series of node IDs, or a dictionary When a dictionary, each key corresponds to a node column, and nodes will be included when all match

materialize_nodes(reuse=True, engine='auto')

Generate g._nodes based on g._edges

Uses g._node for node id if exists, else ‘id’

Edges must be dataframe-like: cudf, pandas, …

When reuse=True and g._nodes is not None, use it

Example: Generate nodes

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.materialize_nodes()
print(g2._nodes)  # pd.DataFrame
Parameters
  • reuse (bool) –

  • engine (Union[Engine, Literal[‘auto’]]) –

Return type

Plottable

Module contents