graphistry.layout package¶
Subpackages¶
Submodules¶
graphistry.compute.ComputeMixin module¶
-
class
graphistry.compute.ComputeMixin.
ComputeMixin
(*args, **kwargs)¶ Bases:
object
-
chain
(*args, **kwargs)¶ Experimental: Chain a list of operations
Return subgraph of matches according to the list of node & edge matchers
If any matchers are named, add a correspondingly named boolean-valued column to the output
- Parameters
ops – List[ASTobject] Various node and edge matchers
- Returns
Plotter
- Return type
- Example: Find nodes of some type
from graphistry.ast import n people_nodes_df = g.chain([ n({"type": "person"}) ])._nodes
- Example: Find 2-hop edge sequences with some attribute
from graphistry.ast import e_forward g_2_hops = g.chain([ e_forward({"interesting": True}, hops=2) ]) g_2_hops.plot()
Example: Find any node 1-2 hops out from another node, and label each hop
from graphistry.ast import n, e_undirected g_2_hops = g.chain([ n({g._node: "a"}), e_undirected(name="hop1"), e_undirected(name="hop2") ]) print('# first-hop edges:', len(g_2_hops._edges[ g_2_hops._edges.hop1 == True ]))
Example: Transaction nodes between two kinds of risky nodes
from graphistry.ast import n, e_forward, e_reverse g_risky = g.chain([ n({"risk1": True}), e_forward(to_fixed=True), n({"type": "transaction"}, name="hit"), e_reverse(to_fixed=True), n({"risk2": True}) ]) print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ]))
-
drop_nodes
(nodes)¶ return g with any nodes/edges involving the node id series removed
-
filter_edges_by_dict
(*args, **kwargs)¶ filter edges to those that match all values in filter_dict
-
filter_nodes_by_dict
(*args, **kwargs)¶ filter nodes to those that match all values in filter_dict
-
get_degrees
(col='degree', degree_in='degree_in', degree_out='degree_out')¶ Decorate nodes table with degree info
Edges must be dataframe-like: pandas, cudf, …
Parameters determine generated column names
Warning: Self-cycles are currently double-counted. This may change.
Example: Generate degree columns
edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']}) g = graphistry.edges(edges, 's', 'd') print(g._nodes) # None g2 = g.get_degrees() print(g2._nodes) # pd.DataFrame with 'id', 'degree', 'degree_in', 'degree_out'
- Parameters
col (
str
) –degree_in (
str
) –degree_out (
str
) –
-
get_indegrees
(col='degree_in')¶ See get_degrees
- Parameters
col (
str
) –
-
get_outdegrees
(col='degree_out')¶ See get_degrees
- Parameters
col (
str
) –
-
get_topological_levels
(level_col='level', allow_cycles=True, warn_cycles=True, remove_self_loops=True)¶ Label nodes on column level_col based on topological sort depth Supports pandas + cudf, using parallelism within each level computation Options: * allow_cycles: if False and detects a cycle, throw ValueException, else break cycle by picking a lowest-in-degree node * warn_cycles: if True and detects a cycle, proceed with a warning * remove_self_loops: preprocess by removing self-cycles. Avoids allow_cycles=False, warn_cycles=True messages.
Example:
edges_df = gpd.DataFrame({‘s’: [‘a’, ‘b’, ‘c’, ‘d’],’d’: [‘b’, ‘c’, ‘e’, ‘e’]}) g = graphistry.edges(edges_df, ‘s’, ‘d’) g2 = g.get_topological_levels() g2._nodes.info() # pd.DataFrame with | ‘id’ , ‘level’ |
- Parameters
level_col (
str
) –allow_cycles (
bool
) –warn_cycles (
bool
) –remove_self_loops (
bool
) –
- Return type
Plottable
-
hop
(*args, **kwargs)¶ Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources
g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). hops: how many hops to consider, if any bound (default 1) to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: ‘forward’, ‘reverse’, ‘undirected’ edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) source_node_match: dict of kv-pairs to match nodes before hopping destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use)
-
materialize_nodes
(reuse=True)¶ Generate g._nodes based on g._edges
Uses g._node for node id if exists, else ‘id’
Edges must be dataframe-like: cudf, pandas, …
When reuse=True and g._nodes is not None, use it
Example: Generate nodes
edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']}) g = graphistry.edges(edges, 's', 'd') print(g._nodes) # None g2 = g.materialize_nodes() print(g2._nodes) # pd.DataFrame
- Parameters
reuse (
bool
) –
-