graphistry.compute package

gfql(*args, **kwargs)#

Execute a GFQL query - either a chain or a DAG

Unified entrypoint that automatically detects query type and dispatches to the appropriate execution engine.

Parameters:

query – GFQL query - ASTObject, List[ASTObject], Chain, ASTLet, dict, or supported query string
engine – Execution engine (auto, pandas, cudf)
output – For DAGs, name of binding to return (default: last executed)
policy – Optional policy hooks for external control (preload, postload, precall, postcall phases)
where – Optional same-path constraints for list/Chain queries
language – Optional string-query language selector. Defaults to "cypher" when query is a string.
params – Optional parameter dictionary for string-query compilation
validate – When True, run local preflight validation before execution via g.gfql_validate(...).
shortest_path_backend – Backend for shortestPath execution: "auto" (default), "igraph" (require igraph, raise if missing), "cugraph" (require cugraph, raise if missing), or "bfs" (always use DataFrame BFS). "auto" tries cugraph on CUDF engine, igraph on pandas, falls back to BFS silently.

Returns:

Resulting Plottable

Return type:

Plottable

GFQL physical indexes

param index_policy:: ’off’ (never use indexes), ‘use’ (use resident, cost-gated; default planner behavior), ‘auto’ (build on demand), or ‘force’ (always probe the index). Also accepts index DDL strings (CREATE GFQL INDEX ...) / wire ops as the query — routed to the index registry. See create_index() and Seeded Traversal Indexes (CSR Adjacency).

gfql_explain(query, *, index_policy='use', engine='auto')#

Explain how the GFQL planner would run query: per-hop index-vs-scan choice, cost-gate numbers, and resident-index validity. Read-only (no execution). Returns a report object; print it for a human-readable plan.

Parameters:

query (GFQLQuery)
index_policy (IndexPolicy)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])

Return type:

GfqlExplainReport

gfql_index_all(engine='auto')#: Convenience: build all GFQL physical indexes (both edge adjacencies + node_id). Returns a new Plottable.

gfql_index_edges(direction='both', engine='auto')#: Convenience: build the edge adjacency index(es) — ‘forward’, ‘reverse’, or ‘both’. Returns a new Plottable.

gfql_remote(chain, api_token=None, dataset_id=None, output_type='all', format=None, df_export_args=None, node_col_subset=None, edge_col_subset=None, engine='auto', validate=True, persist=False, params=None, output=None)#

Run GFQL query remotely.

This is the remote execution version of gfql(). It supports chains, Let/DAG patterns, and Cypher strings.

The query is compiled locally and sent to the server as wire-protocol JSON. A gfql_query field carries the full typed envelope (including WHERE clauses); gfql_operations carries a flat array for backward compatibility with older servers.

Parameters:

chain (Chain | List[ASTObject] | ASTLet | Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]] | str) – GFQL query — Chain, List[ASTObject], ASTLet, Dict, or Cypher string (compiled locally before sending).
params (Dict[str, Any] | None) – Optional parameter dict for Cypher string queries (e.g., params={"val": 10} for $val references).
api_token (str | None)
dataset_id (str | None)
output_type (Literal['all', 'nodes', 'edges', 'shape'])
format (Literal['json', 'csv', 'parquet'] | None)
df_export_args (Dict[str, Any] | None)
node_col_subset (List[str] | None)
edge_col_subset (List[str] | None)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])
validate (bool)
persist (bool)
output (str | None)

Return type:

Example:

# Chain (existing)
g.gfql_remote([n(), e(), n()])

# Cypher string with params
g.gfql_remote(
    "MATCH (n) WHERE n.score > $cutoff RETURN n",
    params={"cutoff": 10},
)

# GRAPH constructor
g.gfql_remote("GRAPH { MATCH (a)-[r]->(b) WHERE a.score > 5 }")

See chain_remote() for additional parameter documentation.

gfql_remote_shape(chain, api_token=None, dataset_id=None, format=None, df_export_args=None, node_col_subset=None, edge_col_subset=None, engine='auto', validate=True, persist=False)#

Get shape metadata for remote GFQL query execution.

This is the remote shape version of gfql(). Returns metadata about the resulting graph without downloading the full data.

See chain_remote_shape() for detailed documentation (chain_remote_shape is deprecated).

Parameters:

chain (Chain | List[ASTObject] | ASTLet | Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]] | str)
api_token (str | None)
dataset_id (str | None)
format (Literal['json', 'csv', 'parquet'] | None)
df_export_args (Dict[str, Any] | None)
node_col_subset (List[str] | None)
edge_col_subset (List[str] | None)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])
validate (bool)
persist (bool)

Return type:

DataFrame

gfql_validate(*args, **kwargs)#

Validate a GFQL/Cypher query without executing it.

Raises structured GFQL exceptions on validation failures and never dispatches query execution operators.

hop(*args, **kwargs)#

Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources

This can be faster than the equivalent chain([…]) call that wraps it with additional steps

See chain() examples for examples of many of the parameters

g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). hops: consider paths of length 1 to ‘hops’ steps, if any (default 1). Shorthand for max_hops. min_hops/max_hops: inclusive traversal bounds; defaults preserve legacy behavior (min=1 unless max=0; max defaults to hops). output_min_hops/output_max_hops: optional output slice applied after traversal; defaults keep all traversed hops up to max_hops. Useful for showing a subrange (e.g., min/max = 2..4 but display only hops 3..4). label_node_hops/label_edge_hops: optional column names for hop numbers (omit or None to skip). Nodes record the first retained hop step they are reached (1 = first expansion); when min_hops prunes shorter branches, labels reflect the shortest retained path. Edges record the hop step that traversed them. label_seeds: when True and labeling, also write hop 0 for seed nodes in the node label column. to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: ‘forward’, ‘reverse’, ‘undirected’ edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) source_node_match: dict of kv-pairs to match nodes before hopping (including intermediate) destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) source_node_query: dataframe query to match nodes before hopping (including intermediate) destination_node_query: dataframe query to match nodes after hopping (including intermediate) edge_query: dataframe query to match edges before hopping (including intermediate) return_as_wave_front: Exclude starting node(s) in return, returning only encountered nodes include_zero_hop_seed: internal Cypher opt-in for exact zero-hop path semantics Note: chain() reverse passes set return_as_wave_front=True and use target_wave_front to constrain reachability. target_wave_front: Only consider these nodes + self._nodes for reachability engine: ‘auto’, ‘pandas’, ‘cudf’ (GPU)

keep_nodes(nodes)#: Limit nodes and edges to those selected by parameter nodes For edges, both source and destination must be in nodes Nodes can be a list or series of node IDs, or a dictionary When a dictionary, each key corresponds to a node column, and nodes will be included when all match

materialize_nodes(reuse=True, engine=EngineAbstract.AUTO)#

Generate g._nodes based on g._edges

Uses g._node for node id if exists, else ‘id’

Edges must be dataframe-like: cudf, pandas, …

When reuse=True and g._nodes is not None, use it

Example: Generate nodes

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.materialize_nodes()
print(g2._nodes)  # pd.DataFrame

Parameters:

reuse (bool)
engine (EngineAbstract | str)

Return type:

prune_self_edges()#

python_remote_g(*args, **kwargs)#

Remotely run Python code on a remote dataset that returns a Plottable

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘parquet’.
output_type (Optional[OutputTypeGraph]) – What shape of output to fetch. Defaults to ‘all’. Options include ‘nodes’, ‘edges’, ‘all’ (both). For other variants, see python_remote_shape and python_remote_json.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
g2 = g1.python_remote_g(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g
    ''',
    engine='cudf')
num_edges = len(g2._edges)
print(f'num_edges: {num_edges}')

python_remote_json(*args, **kwargs)#

Remotely run Python code on a remote dataset that returns json

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
obj = g1.python_remote_json(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return {'num_edges': len(g._edges)}
    ''',
    engine='cudf')
num_edges = obj['num_edges']
print(f'num_edges: {num_edges}')

python_remote_table(*args, **kwargs)#

Remotely run Python code on a remote dataset that returns a table

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘parquet’.
output_type (Optional[OutputTypeGraph]) – What shape of output to fetch. Defaults to ‘table’. Options include ‘table’, ‘nodes’, and ‘edges’.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
edges_df = g1.python_remote_table(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g._edges
    ''',
    engine='cudf')
num_edges = len(edges_df)
print(f'num_edges: {num_edges}')

search_edges(term, columns=None, case_sensitive=False, regex=False)#: Keep edges where ANY column matches term — see search_nodes().

search_nodes(term, columns=None, case_sensitive=False, regex=False)#: Keep nodes where ANY column matches term (viz-filter L2 inspector semantics: OR across columns; case-insensitive substring default; regex opt-in; string columns always, integer columns iff the term is a numeric literal — floats/dates via explicit columns= on pandas ONLY: cuDF declines them, its float/temporal stringification diverges from pandas). pandas/cuDF native; polars frames raise NotImplementedError (use the cypher search_any op).

show_indexes()#: Return a pandas DataFrame describing resident GFQL indexes (name, kind, column, valid). Empty if none; valid=False marks a stale index after a frame rebind.

to_cudf()#

Convert to GPU mode by converting any defined nodes and edges to cudf dataframes

When nodes or edges are already cudf dataframes, they are left as is

Parameters:: g (Plottable) – Graphistry object
Returns:: Graphistry object
Return type:: Plottable

to_pandas()#

Convert nodes and edges to pandas DataFrames.

Supports all input types: cuDF, Arrow, Polars, Spark, dask, and pandas (identity).

Return type:: Plottable

graphistry.compute.ast module#

class graphistry.compute.ast.ASTCall(function, params=None)#

Bases: ASTObject

Call a method on the current graph with validated parameters.

Allows safe execution of Plottable methods through GFQL with parameter validation and schema checking.

Attributes:: function: Name of the method to call (must be in safelist) params: Dictionary of parameters to pass to the method

Parameters:

function (str)
params (Dict[str, Any] | None)

execute(g, prev_node_wavefront, target_wave_front, engine)#

Execute the method call on the graph.

Args:: g: Graph to operate on prev_node_wavefront: Previous node wavefront (unused) target_wave_front: Target wavefront (unused) engine: Execution engine (pandas/cudf)
Returns:: New Plottable with method results
Raises:: GFQLTypeError: If method not in safelist or parameters invalid

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

classmethod from_json(d, validate=True)#

Create ASTCall from JSON representation.

Parameters:

d (dict) – JSON dictionary with ‘function’ field and optional ‘params’
validate (bool) – Whether to validate after creation

Returns:

New ASTCall instance

Return type:

ASTCall

Raises:

AssertionError – If ‘function’ field is missing

Example::

call_json = {‘type’: ‘Call’, ‘function’: ‘hop’, ‘params’: {‘steps’: 2}} call = ASTCall.from_json(call_json)

reverse()#

Return type:: ASTCall

to_json(validate=True)#

Convert Call to JSON representation.

Args:: validate: If True, validate before serialization
Returns:: Dictionary with type, function, and params fields

Parameters:: validate (bool)
Return type:: Dict[str, Any]

class graphistry.compute.ast.ASTEdge(direction='forward', edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None, prune_to_endpoints=False, include_zero_hop_seed=False)#

Bases: ASTObject

Internal, not intended for use outside of this module.

Parameters:

direction (Literal['forward', 'reverse', 'undirected'])
edge_match (dict | None)
hops (int | None)
min_hops (int | None)
max_hops (int | None)
output_min_hops (int | None)
output_max_hops (int | None)
label_node_hops (str | None)
label_edge_hops (str | None)
label_seeds (bool)
to_fixed_point (bool)
source_node_match (dict | None)
destination_node_match (dict | None)
source_node_query (str | None)
destination_node_query (str | None)
edge_query (str | None)
name (str | None)
prune_to_endpoints (bool)
include_zero_hop_seed (bool)

direction: Literal['forward', 'reverse', 'undirected']#

execute(g, prev_node_wavefront, target_wave_front, engine)#

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

classmethod from_json(d, validate=True)#

Given c.to_json(), hydrate back c

Args:: d: Dictionary from to_json() validate: If True (default), validate after parsing
Returns:: Hydrated AST object
Raises:: GFQLValidationError: If validate=True and validation fails

Parameters:

d (Dict[str, Any])
validate (bool)

Return type:

is_simple_single_hop()#

Check if edge is single-hop without hop labels (safe to skip backward hop call).

Return type:: bool

reverse()#

Return type:: ASTEdge

to_json(validate=True)#

Returns JSON-compatible dictionry {“type”: “ClassName”, “arg1”: val1, …} Emits all non-reserved instance fields

Return type:: Dict[str, Any]

class graphistry.compute.ast.ASTEdgeForward(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, source_node_match=None, destination_node_match=None, to_fixed_point=False, name=None, source_node_query=None, destination_node_query=None, edge_query=None, prune_to_endpoints=False, include_zero_hop_seed=False)#

Bases: ASTEdge

Internal, not intended for use outside of this module.

Parameters:

edge_match (dict | None)
hops (int | None)
min_hops (int | None)
max_hops (int | None)
output_min_hops (int | None)
output_max_hops (int | None)
label_node_hops (str | None)
label_edge_hops (str | None)
label_seeds (bool)
source_node_match (dict | None)
destination_node_match (dict | None)
to_fixed_point (bool)
name (str | None)
source_node_query (str | None)
destination_node_query (str | None)
edge_query (str | None)
prune_to_endpoints (bool)
include_zero_hop_seed (bool)

direction: Literal['forward', 'reverse', 'undirected']#

classmethod from_json(d, validate=True)#

Given c.to_json(), hydrate back c

Args:: d: Dictionary from to_json() validate: If True (default), validate after parsing
Returns:: Hydrated AST object
Raises:: GFQLValidationError: If validate=True and validation fails

Parameters:

d (Dict[str, Any])
validate (bool)

Return type:

class graphistry.compute.ast.ASTEdgeReverse(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, source_node_match=None, destination_node_match=None, to_fixed_point=False, name=None, source_node_query=None, destination_node_query=None, edge_query=None, prune_to_endpoints=False, include_zero_hop_seed=False)#

Bases: ASTEdge

Internal, not intended for use outside of this module.

Parameters:

edge_match (dict | None)
hops (int | None)
min_hops (int | None)
max_hops (int | None)
output_min_hops (int | None)
output_max_hops (int | None)
label_node_hops (str | None)
label_edge_hops (str | None)
label_seeds (bool)
source_node_match (dict | None)
destination_node_match (dict | None)
to_fixed_point (bool)
name (str | None)
source_node_query (str | None)
destination_node_query (str | None)
edge_query (str | None)
prune_to_endpoints (bool)
include_zero_hop_seed (bool)

direction: Literal['forward', 'reverse', 'undirected']#

classmethod from_json(d, validate=True)#

Given c.to_json(), hydrate back c

Args:: d: Dictionary from to_json() validate: If True (default), validate after parsing
Returns:: Hydrated AST object
Raises:: GFQLValidationError: If validate=True and validation fails

Parameters:

d (Dict[str, Any])
validate (bool)

Return type:

class graphistry.compute.ast.ASTEdgeUndirected(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, source_node_match=None, destination_node_match=None, to_fixed_point=False, name=None, source_node_query=None, destination_node_query=None, edge_query=None, prune_to_endpoints=False, include_zero_hop_seed=False)#

Bases: ASTEdge

Internal, not intended for use outside of this module.

Parameters:

edge_match (dict | None)
hops (int | None)
min_hops (int | None)
max_hops (int | None)
output_min_hops (int | None)
output_max_hops (int | None)
label_node_hops (str | None)
label_edge_hops (str | None)
label_seeds (bool)
source_node_match (dict | None)
destination_node_match (dict | None)
to_fixed_point (bool)
name (str | None)
source_node_query (str | None)
destination_node_query (str | None)
edge_query (str | None)
prune_to_endpoints (bool)
include_zero_hop_seed (bool)

direction: Literal['forward', 'reverse', 'undirected']#

classmethod from_json(d, validate=True)#

Given c.to_json(), hydrate back c

Args:: d: Dictionary from to_json() validate: If True (default), validate after parsing
Returns:: Hydrated AST object
Raises:: GFQLValidationError: If validate=True and validation fails

Parameters:

d (Dict[str, Any])
validate (bool)

Return type:

class graphistry.compute.ast.ASTLet(bindings, validate=True)#

Bases: ASTObject

Let-bindings for named graph operations in a DAG.

Lets you define reusable graph operations that can reference each other, forming a directed acyclic graph (DAG) of computations.

Parameters#

bindingsDict[str, Union[ASTObject, Chain, List[ASTObject], Plottable]]: Mapping from binding names to graph operations (AST objects or Plottables).

Raises#

GFQLTypeError: If bindings is not a dict or contains invalid keys/values.

Example#

>>> dag = ASTLet({
...     "persons": n({"type": "person"}),
...     "friends": ASTRef("persons", [e_forward({"rel": "friend"})])
... })

bindings: Dict[str, ASTObject | Chain | Plottable]#

execute(g, prev_node_wavefront, target_wave_front, engine)#

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

classmethod from_json(d, validate=True)#

Create ASTLet from JSON representation.

Parameters:

d (dict) – JSON dictionary with ‘bindings’ field
validate (bool) – Whether to validate after creation

Returns:

New ASTLet instance

Return type:

ASTLet

Raises:

AssertionError – If ‘bindings’ field is missing

reverse()#

Return type:: ASTLet

to_json(validate=True)#

Convert Let to JSON representation.

Parameters:: validate (bool) – Whether to validate before serialization
Returns:: JSON-serializable dictionary
Return type:: dict

Parameters:

bindings (Dict[str, ASTObject | Chain | List[ASTObject] | Plottable | Dict[str, Any]])
validate (bool)

class graphistry.compute.ast.ASTNode(filter_dict=None, name=None, query=None)#

Bases: ASTObject

Internal, not intended for use outside of this module.

Parameters:

filter_dict (dict | None)
name (str | None)
query (str | None)

execute(g, prev_node_wavefront, target_wave_front, engine)#

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

classmethod from_json(d, validate=True)#

Given c.to_json(), hydrate back c

Args:: d: Dictionary from to_json() validate: If True (default), validate after parsing
Returns:: Hydrated AST object
Raises:: GFQLValidationError: If validate=True and validation fails

Parameters:

d (Dict[str, Any])
validate (bool)

Return type:

ASTNode

reverse()#

Return type:: ASTNode

to_json(validate=True)#

Returns JSON-compatible dictionry {“type”: “ClassName”, “arg1”: val1, …} Emits all non-reserved instance fields

Return type:: Dict[str, Any]

class graphistry.compute.ast.ASTObject(name=None)#

Bases: ASTSerializable

Internal, not intended for use outside of this module. These are operator-level expressions used as g.chain(List<ASTObject>)

Parameters:: name (str | None)

abstract execute(g, prev_node_wavefront, target_wave_front, engine)#

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

abstract reverse()#

Return type:: ASTObject

class graphistry.compute.ast.ASTRef(ref, chain)#

Bases: ASTObject

Execute a chain of operations starting from a DAG binding reference.

Allows building graph operations that start from a named binding defined in an ASTLet (DAG) and apply additional operations.

Parameters:

ref (str) – Name of the binding to reference from the DAG
chain (List[ASTObject]) – List of operations to apply to the referenced graph

Raises:

GFQLTypeError – If ref is not a string or chain is not a list

Example::

# Reference ‘persons’ binding and find their friends friends = ASTRef(‘persons’, [e_forward({‘rel’: ‘friend’})])

execute(g, prev_node_wavefront, target_wave_front, engine)#

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

classmethod from_json(d, validate=True)#

Create ASTRef from JSON representation.

Parameters:

d (dict) – JSON dictionary with ‘ref’ and ‘chain’ fields
validate (bool) – Whether to validate after creation

Returns:

New ASTRef instance

Return type:

ASTRef

Raises:

AssertionError – If ‘ref’ or ‘chain’ fields are missing

reverse()#

Return type:: ASTRef

to_json(validate=True)#

Convert Ref to JSON representation.

Parameters:: validate (bool) – Whether to validate before serialization
Returns:: JSON-serializable dictionary
Return type:: dict

class graphistry.compute.ast.ASTRemoteGraph(dataset_id, token=None)#

Bases: ASTObject

Load a graph from Graphistry server.

Allows fetching previously uploaded graphs by dataset ID, optionally with an authentication token.

Parameters:

dataset_id (str) – Unique identifier of the dataset on the server
token (Optional[str]) – Optional authentication token

Raises:

GFQLTypeError – If dataset_id is not a string or is empty

Example::

# Fetch public dataset remote = ASTRemoteGraph(‘my-dataset-id’)

# Fetch private dataset with token remote = ASTRemoteGraph(‘private-dataset’, token=’auth-token’)

execute(g, prev_node_wavefront, target_wave_front, engine)#

Parameters:

g (Plottable)
prev_node_wavefront (Any | None)
target_wave_front (Any | None)
engine (Engine)

Return type:

classmethod from_json(d, validate=True)#

Create ASTRemoteGraph from JSON representation.

Parameters:

d (dict) – JSON dictionary with ‘dataset_id’ field
validate (bool) – Whether to validate after creation

Returns:

New ASTRemoteGraph instance

Return type:

ASTRemoteGraph

Raises:

AssertionError – If ‘dataset_id’ field is missing

reverse()#

Return type:: ASTRemoteGraph

to_json(validate=True)#

Convert RemoteGraph to JSON representation.

Parameters:: validate (bool) – Whether to validate before serialization
Returns:: JSON-serializable dictionary
Return type:: dict

graphistry.compute.ast.anti_semi_apply(*, binding_ops, join_aliases, neq=None)#

Filter active rows by removing rows matching a correlated pattern.

binding_ops encodes the pattern to evaluate as bindings rows. join_aliases names shared aliases used as anti-join keys. neq optionally names two pattern aliases whose bindings must DIFFER (EXISTS { (n)--(m) WHERE m <> n } — the viz drop-self prune flavor).

Parameters:

binding_ops (List[Dict[str, Any]])
join_aliases (Sequence[str])
neq (Sequence[str] | None)

Return type:

graphistry.compute.ast.count_table(table='nodes', source=None, alias='count(*)')#

Count matched rows without materializing them (fast path for a lone count(*)).

Emitted by the Cypher lowering when a RETURN is exactly count(*) over a single node/edge pattern (no DISTINCT, GROUP BY, row-level WHERE, UNWIND, paging, or multi-relationship binding). Produces a one-row table {alias: n} where n is the height of the active table (or, when a source alias-mask column is present, the count of its truthy rows). This avoids the full-frame materialize + constant-key group_by the general aggregate path performs — the win that turns count(*) from O(N) into a single reduction. See plans/gfql-engine-followups (BEAT LADYBUG).

Parameters:

table (str)
source (str | None)
alias (str)

Return type:

graphistry.compute.ast.distinct()#

Create a DISTINCT operation for GFQL row pipelines.

Return type:: ASTCall

graphistry.compute.ast.drop_cols(cols)#

Drop named columns from the active row table (ignores missing columns at runtime).

Parameters:: cols (Iterable[str])
Return type:: ASTCall

graphistry.compute.ast.e#: alias of ASTEdgeUndirected

graphistry.compute.ast.e_forward#: alias of ASTEdgeForward

graphistry.compute.ast.e_reverse#: alias of ASTEdgeReverse

graphistry.compute.ast.e_undirected#: alias of ASTEdgeUndirected

graphistry.compute.ast.from_json(o, validate=True)#

Parameters:

o (None | bool | str | float | int | List[Any] | Dict[str, Any])
validate (bool)

Return type:

graphistry.compute.ast.group_by(keys, aggregations, key_prefixes=None)#

Create grouped aggregation operation for row pipelines.

key_prefixes: if provided, all DataFrame columns whose name starts with any of these prefixes are appended to the key list at runtime. Useful for bindings-row paths where alias property column names (e.g. "tag.name") are not known at lowering time.

Parameters:

keys (Iterable[str])
aggregations (Iterable[Sequence[Any]])
key_prefixes (Iterable[str] | None)

Return type:

graphistry.compute.ast.join_apply(*, binding_ops, join_aliases, how='inner')#

Join active rows with rows produced by a correlated binding pattern.

binding_ops encodes the right-hand pattern to evaluate as bindings rows. join_aliases names shared aliases used as join keys. how is "inner" or "left".

Parameters:

binding_ops (List[Dict[str, Any]])
join_aliases (Sequence[str])
how (str)

Return type:

graphistry.compute.ast.let#: alias of ASTLet

graphistry.compute.ast.limit(value)#

Create a LIMIT operation for GFQL row pipelines.

Parameters:: value (Any)
Return type:: ASTCall

graphistry.compute.ast.maybe_filter_dict_from_json(d, key)#

Parameters:

d (Dict)
key (str)

Return type:

Dict | None

graphistry.compute.ast.n#: alias of ASTNode

graphistry.compute.ast.normalize_gfql_to_wire(expr)#

Normalize GFQL expression to wire format (list of JSON-serializable dicts).

Accepts: - Chain object - Single ASTObject - List of ASTObjects - Dict with ‘type’: ‘Chain’ and ‘chain’ key - Dict with ‘type’: ‘gfql_chain’ and ‘gfql’ key - Dict with just ‘chain’ or ‘gfql’ key - Single dict (parsed as AST op)

Returns: - List of JSON-serializable dicts ready for wire protocol

Raises: - TypeError: if expr type is not supported - ValueError: if expr is empty - GFQLSyntaxError: if dict cannot be parsed as valid AST

Parameters:: expr (Any)
Return type:: List[Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]]]

graphistry.compute.ast.order_by(keys)#

Create an ORDER BY operation for GFQL row pipelines.

Parameters:: keys (Iterable[Tuple[Any, str]])
Return type:: ASTCall

graphistry.compute.ast.ref#: alias of ASTRef

graphistry.compute.ast.remote#: alias of ASTRemoteGraph

graphistry.compute.ast.return_(items)#

Python-safe alias for Cypher RETURN semantics.

Parameters:: items (Iterable[str | Tuple[str, Any]])
Return type:: ASTCall

graphistry.compute.ast.rows(table='nodes', source=None, alias_endpoints=None, binding_ops=None)#

Create a row-source operation for GFQL row pipelines.

This operation converts graph outputs into a row table context (nodes or edges), optionally constrained to a named alias/source tag.

When alias_endpoints is provided, builds a bindings table by joining edges with node properties for each alias. Keys are alias names, values are "src" or "dst" indicating which edge endpoint maps to that alias.

When binding_ops is provided, builds a bindings table from serialized node/edge operations. Alternating node/edge chains materialize connected same-path bindings, while node-only sequences materialize cartesian bindings rows. This is the preferred path for direct Cypher multi-alias scalar and aggregate projections.

Parameters:

table (str)
source (str | None)
alias_endpoints (Dict[str, str] | None)
binding_ops (List[Dict[str, Any]] | None)

Return type:

graphistry.compute.ast.search_any(*, alias, term, out_col, case_sensitive=False, regex=False, columns=None)#

Annotate active rows with a cross-column search marker (viz-filter L2): out_col True where ANY of alias’s columns matches term — OR across columns, case-insensitive substring default, regex opt-in, dtype-gated (string columns always; integer columns iff numeric-literal term).

Parameters:

alias (str)
term (str)
out_col (str)
case_sensitive (bool)
regex (bool)
columns (Sequence[str] | None)

Return type:

graphistry.compute.ast.select(items)#

Create a row projection operation for GFQL row pipelines.

Parameters:: items (Iterable[str | Tuple[str, Any]])
Return type:: ASTCall

graphistry.compute.ast.semi_apply_mark(*, binding_ops, join_aliases, out_col, neq=None)#

Annotate active rows with a correlated pattern-existence boolean.

binding_ops encodes the pattern to evaluate as bindings rows. join_aliases names shared aliases used as join keys. out_col receives a bool marker where True means the pattern matched.

Parameters:

binding_ops (List[Dict[str, Any]])
join_aliases (Sequence[str])
out_col (str)
neq (Sequence[str] | None)

Return type:

graphistry.compute.ast.serialize_binding_ops(ops)#

Serialize node/edge bindings for rows(binding_ops=...).

Parameters:: ops (Sequence[ASTObject])
Return type:: List[Dict[str, Any]]

graphistry.compute.ast.skip(value)#

Create a SKIP operation for GFQL row pipelines.

Parameters:: value (Any)
Return type:: ASTCall

graphistry.compute.ast.unwind(expr, as_='value')#

Create a constrained UNWIND row expansion operation.

Parameters:

expr (Any)
as_ (str)

Return type:

graphistry.compute.ast.where_rows(filter_dict=None, expr=None)#

Create a row-table WHERE operation using filter_dict and/or expression.

Parameters:

filter_dict (Dict[str, Any] | None)
expr (str | None)

Return type:

graphistry.compute.ast.with_(items, extend=False)#

Python-safe alias for Cypher WITH row projection semantics.

Parameters:

items (Iterable[str | Tuple[str, Any]])
extend (bool)

Return type:

graphistry.compute.ast_temporal module#

class graphistry.compute.ast_temporal.DateTimeValue(value, timezone='UTC')#

Bases: TemporalValue

Tagged datetime value with timezone support

Parameters:

value (str)
timezone (str)

as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: Timestamp

classmethod from_datetime(dt)#

Create from Python datetime

Parameters:: dt (datetime)
Return type:: DateTimeValue

classmethod from_pandas_timestamp(ts)#

Create from pandas Timestamp

Parameters:: ts (Timestamp)
Return type:: DateTimeValue

to_json()#

Return dict for tagged temporal value

Return type:: DateTimeWire

class graphistry.compute.ast_temporal.DateValue(value)#

Bases: _ScalarTemporalValue

Tagged date value

Parameters:: value (str)

as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: Timestamp

classmethod from_date(d)#

Create from Python date

Parameters:: d (date)
Return type:: DateValue

class graphistry.compute.ast_temporal.TemporalValue#

Bases: ABC

Base class for temporal values with tagging support

abstract as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: Any

abstract to_json()#

Serialize to JSON-compatible dictionary

Return type:: DateTimeWire | DateWire | TimeWire

class graphistry.compute.ast_temporal.TimeValue(value)#

Bases: _ScalarTemporalValue

Tagged time value

Parameters:: value (str)

as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: time

classmethod from_time(t)#

Create from Python time

Parameters:: t (time)
Return type:: TimeValue

graphistry.compute.ast_temporal.temporal_value_from_json(d)#

Factory function to create temporal value from JSON dict

Parameters:: d (Dict[str, Any])
Return type:: TemporalValue

graphistry.compute.chain module#

class graphistry.compute.chain.Chain(chain, where=None, validate=True)#

Bases: ASTSerializable

Parameters:

chain (List[ASTObject])
where (Sequence[WhereComparison] | None)
validate (bool)

classmethod from_json(d, validate=True)#

Convert a JSON AST into a list of ASTObjects

Parameters:

d (Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]])
validate (bool)

Return type:

Chain

to_json(validate=True)#

Convert a list of ASTObjects into a JSON AST

Return type:: Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]]

validate(collect_all=False)#

Validate this AST node.

Args:

collect_all: If True, collect all errors instead of raising on first.: If False (default), raise on first error.

Returns:

If collect_all=True: List of validation errors (empty if valid) If collect_all=False: None if valid

Raises:

GFQLValidationError: If collect_all=False and validation fails

Parameters:: collect_all (bool)
Return type:: List[GFQLValidationError] | None

validate_schema(g, collect_all=False)#

Validate this chain against a graph’s schema without executing.

Args:: g: Graph to validate against collect_all: If True, collect all errors. If False, raise on first.
Returns:: If collect_all=True: List of errors (empty if valid) If collect_all=False: None if valid
Raises:: GFQLSchemaError: If collect_all=False and validation fails

Parameters:

g (Plottable)
collect_all (bool)

Return type:

List[GFQLSchemaError] | None

graphistry.compute.chain.chain(self, ops, engine=EngineAbstract.AUTO, validate_schema=True, policy=None, context=None, start_nodes=None)#

Chain a list of ASTObject (node/edge) traversal operations

Return subgraph of matches according to the list of node & edge matchers If any matchers are named, add a correspondingly named boolean-valued column to the output

For direct calls, exposes convenience List[ASTObject]. Internal operational should prefer Chain.

Use engine=’cudf’ to force automatic GPU acceleration mode

Parameters:

ops (List[ASTObject] | Chain) – List[ASTObject] Various node and edge matchers
validate_schema (bool) – Whether to validate the chain against the graph schema before executing
policy – Optional policy dict for hooks
context – Optional ExecutionContext for tracking execution state
start_nodes (Any | None) – Optional node wavefront for the first traversal step
self (Plottable)
engine (EngineAbstract | str)

Returns:

Plotter

Return type:

Plotter

graphistry.compute.chain.combine_steps(g, kind, steps, engine, label_steps=None)#

Collect nodes and edges, taking care to deduplicate and tag any names

Parameters:

g (Plottable)
kind (str)
steps (List[Tuple[ASTObject, Plottable]])
engine (Engine)
label_steps (List[Tuple[ASTObject, Plottable]] | None)

Return type:

Any

graphistry.compute.chain_remote module#

class graphistry.compute.chain_remote.CompiledBindingLike(*args, **kwargs)#

Bases: Protocol

chain: Chain#

name: str#

procedure_call: CompiledProcedureCallLike | None#

use_ref: str | None#

class graphistry.compute.chain_remote.CompiledProcedureCallLike(*args, **kwargs)#

Bases: Protocol

call_params: Dict[str, Any] | None#

procedure: str#

class graphistry.compute.chain_remote.CompiledQueryLike(*args, **kwargs)#

Bases: Protocol

chain: Chain#

graph_bindings: Sequence[CompiledBindingLike]#

procedure_call: CompiledProcedureCallLike | None#

use_ref: str | None#

class graphistry.compute.chain_remote.CompiledUnionLike(*args, **kwargs)#

Bases: Protocol

branches: Sequence[Any]#

graphistry.compute.chain_remote.chain_remote(self, chain, api_token=None, dataset_id=None, output_type='all', format=None, df_export_args=None, node_col_subset=None, edge_col_subset=None, engine='auto', validate=True, persist=False, params=None, output=None)#

Remotely run GFQL chain query on a remote dataset.

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

chain (Union[Chain, List[ASTObject], Dict[str, JSONVal], ASTLet, str]) – GFQL query as a Python object, serialized GFQL JSON, or Cypher string
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not provided, will upload current data, store that dataset_id, and run GFQL against that.
output_type (OutputType) – Whether to return nodes and edges (“all”, default), Plottable with just nodes (“nodes”), or Plottable with just edges (“edges”). For just a dataframe of the resultant graph shape (output_type=”shape”), use instead chain_remote_shape().
format (Optional[FormatType]) – What format to fetch results. We recommend a columnar format such as parquet, which it defaults to when output_type is not shape.
df_export_args (Optional[Dict, str, Any]]) – When server parses data, any additional parameters to pass in.
node_col_subset (Optional[List[str]]) – When server returns nodes, what property subset to return. Defaults to all.
edge_col_subset (Optional[List[str]]) – When server returns edges, what property subset to return. Defaults to all.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.
persist (bool) – Whether to persist dataset on server and return dataset_id for immediate URL generation. Default false.
self (Plottable)
params (Dict[str, Any] | None)
output (str | None)

Return type:

Example: Explicitly upload graph and return subgraph where nodes have at least one edge

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst').upload()
assert g1._dataset_id, "Graph should have uploaded"

g2 = g1.chain_remote([n(), e(), n()])
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

Example: Return subgraph where nodes have at least one edge, with implicit upload

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')
g2 = g1.chain_remote([n(), e(), n()])
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

Example: Return subgraph where nodes have at least one edge, with implicit upload, and force GPU mode

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')
g2 = g1.chain_remote([n(), e(), n()], engine='cudf')
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

graphistry.compute.chain_remote.chain_remote_generic(self, chain, api_token=None, dataset_id=None, output_type='all', format=None, df_export_args=None, node_col_subset=None, edge_col_subset=None, engine='auto', validate=True, persist=False, params=None, output=None)#

Parameters:

self (Plottable)
chain (Chain | Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]] | List[Any] | ASTLet | str)
api_token (str | None)
dataset_id (str | None)
output_type (Literal['all', 'nodes', 'edges', 'shape'])
format (Literal['json', 'csv', 'parquet'] | None)
df_export_args (Dict[str, Any] | None)
node_col_subset (List[str] | None)
edge_col_subset (List[str] | None)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])
validate (bool)
persist (bool)
params (Dict[str, Any] | None)
output (str | None)

Return type:

Plottable | DataFrame

graphistry.compute.chain_remote.chain_remote_shape(self, chain, api_token=None, dataset_id=None, format=None, df_export_args=None, node_col_subset=None, edge_col_subset=None, engine='auto', validate=True, persist=False)#

Like chain_remote(), except instead of returning a Plottable, returns a pd.DataFrame of the shape of the resulting graph.

Useful as a fast success indicator that avoids the need to return a full graph when a match finds hits, return just the metadata.

Example: Upload graph and compute number of nodes with at least one edge

import graphistry
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst').upload()
assert g1._dataset_id, "Graph should have uploaded"

shape_df = g1.chain_remote_shape([n(), e(), n()])
print(shape_df)

Example: Compute number of nodes with at least one edge, with implicit upload, and force GPU mode

import graphistry
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')

shape_df = g1.chain_remote_shape([n(), e(), n()], engine='cudf')
print(shape_df)

Parameters:

self (Plottable)
chain (Chain | List[ASTObject] | Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]] | ASTLet | str)
api_token (str | None)
dataset_id (str | None)
format (Literal['json', 'csv', 'parquet'] | None)
df_export_args (Dict[str, Any] | None)
node_col_subset (List[str] | None)
edge_col_subset (List[str] | None)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])
validate (bool)
persist (bool)

Return type:

DataFrame

graphistry.compute.cluster module#

class graphistry.compute.cluster.ClusterMixin(*a, **kw)#

Bases: object

dbscan(min_dist=0.2, min_samples=1, cols=None, kind='nodes', fit_umap_embedding=True, target=False, verbose=False, engine_dbscan='auto', *args, **kwargs)#

DBSCAN clustering on cpu or gpu infered automatically. Adds a _dbscan column to nodes or edges.

NOTE: g.transform_dbscan(..) currently unsupported on GPU.

Saves model as g._dbscan_nodes or g._dbscan_edges

Examples:

g = graphistry.edges(edf, 'src', 'dst').nodes(ndf, 'node')

# cluster by UMAP embeddings
kind = 'nodes' | 'edges'
g2 = g.umap(kind=kind).dbscan(kind=kind)
print(g2._nodes['_dbscan']) | print(g2._edges['_dbscan'])

# dbscan in umap or featurize API
g2 = g.umap(dbscan=True, min_dist=1.2, min_samples=2, **kwargs)
# or, here dbscan is infered from features, not umap embeddings
g2 = g.featurize(dbscan=True, min_dist=1.2, min_samples=2, **kwargs)

# and via chaining,
g2 = g.umap().dbscan(min_dist=1.2, min_samples=2, **kwargs)

# cluster by feature embeddings
g2 = g.featurize().dbscan(**kwargs)

# cluster by a given set of feature column attributes, or with target=True
g2 = g.featurize().dbscan(cols=['ip_172', 'location', 'alert'], target=False, **kwargs)

# equivalent to above (ie, cols != None and umap=True will still use features dataframe, rather than UMAP embeddings)
g2 = g.umap().dbscan(cols=['ip_172', 'location', 'alert'], umap=True | False, **kwargs)

g2.plot() # color by `_dbscan` column

Useful:

Enriching the graph with cluster labels from UMAP is useful for visualizing clusters in the graph by color, size, etc, as well as assessing metrics per cluster, e.g. graphistry/pygraphistry

Args:

min_dist float:: The maximum distance between two samples for them to be considered as in the same neighborhood.
kind str:: ‘nodes’ or ‘edges’
cols:: list of columns to use for clustering given g.featurize has been run, nice way to slice features or targets by fragments of interest, e.g. [‘ip_172’, ‘location’, ‘ssh’, ‘warnings’]
fit_umap_embedding bool:: whether to use UMAP embeddings or features dataframe to cluster DBSCAN
min_samples:: The number of samples in a neighborhood for a point to be considered as a core point. This includes the point itself.
target:: whether to use the target column as the clustering feature

Parameters:

min_dist (float)
min_samples (int)
cols (List | str | None)
kind (Literal['nodes', 'edges'])
fit_umap_embedding (bool)
target (bool)
verbose (bool)
engine_dbscan (Literal['cuml', 'sklearn', 'auto'])

transform_dbscan(df, y=None, min_dist='auto', infer_umap_embedding=False, sample=None, n_neighbors=None, kind='nodes', return_graph=True, verbose=False)#

Transforms a minibatch dataframe to one with a new column ‘_dbscan’ containing the DBSCAN cluster labels on the minibatch and generates a graph with the minibatch and the original graph, with edges between the minibatch and the original graph inferred from the umap embedding or features dataframe. Graph nodes | edges will be colored by ‘_dbscan’ column.

Examples:

fit:
    g = graphistry.edges(edf, 'src', 'dst').nodes(ndf, 'node')
    g2 = g.featurize().dbscan()

predict:
::

    emb, X, _, ndf = g2.transform_dbscan(ndf, return_graph=False)
    # or
    g3 = g2.transform_dbscan(ndf, return_graph=True)
    g3.plot()

likewise for umap:

fit:
    g = graphistry.edges(edf, 'src', 'dst').nodes(ndf, 'node')
    g2 = g.umap(X=.., y=..).dbscan()

predict:
::

    emb, X, y, ndf = g2.transform_dbscan(ndf, ndf, return_graph=False)
    # or
    g3 = g2.transform_dbscan(ndf, ndf, return_graph=True)
    g3.plot()

Args:

df:: dataframe to transform
y:: optional labels dataframe
min_dist:: The maximum distance between two samples for them to be considered as in the same neighborhood. smaller values will result in less edges between the minibatch and the original graph. Default ‘auto’, infers min_dist from the mean distance and std of new points to the original graph
fit_umap_embedding:: whether to use UMAP embeddings or features dataframe when inferring edges between the minibatch and the original graph. Default False, uses the features dataframe
sample:: number of samples to use when inferring edges between the minibatch and the original graph, if None, will only use closest point to the minibatch. If greater than 0, will sample the closest sample points in existing graph to pull in more edges. Default None
kind:: ‘nodes’ or ‘edges’
return_graph:: whether to return a graph or the (emb, X, y, minibatch df enriched with DBSCAN labels), default True infered graph supports kind=’nodes’ only.
verbose:: whether to print out progress, default False

Parameters:

df (DataFrame)
y (DataFrame | None)
min_dist (float | str)
infer_umap_embedding (bool)
sample (int | None)
n_neighbors (int | None)
kind (str)
return_graph (bool)
verbose (bool)

graphistry.compute.cluster.dbscan_fit_inplace(res, dbscan, kind='nodes', cols=None, use_umap_embedding=True, target=False, verbose=False)#

Fits clustering on UMAP embeddings if umap is True, otherwise on the features dataframe

or target dataframe if target is True.

Sets:

res._dbscan_edges or res._dbscan_nodes to the DBSCAN model
res._edges or res._nodes gains column _dbscan

Args:

res:: graphistry graph
kind:: ‘nodes’ or ‘edges’
cols:: list of columns to use for clustering given g.featurize has been run
use_umap_embedding:: whether to use UMAP embeddings or features dataframe for clustering (default: True)
target:: whether to use the target dataframe or features dataframe (typically False, for features)

Parameters:

res (Plottable)
dbscan (Any)
kind (Literal['nodes', 'edges'])
cols (List | str | None)
use_umap_embedding (bool)
target (bool)
verbose (bool)

Return type:

None

graphistry.compute.cluster.dbscan_predict_cuml(X, model)#

Parameters:

X (Any)
model (Any)

Return type:

Any

graphistry.compute.cluster.dbscan_predict_sklearn(X, model)#

DBSCAN has no predict per se, so we reverse engineer one here from https://stackoverflow.com/questions/27822752/scikit-learn-predicting-new-points-with-dbscan

Parameters:

X (DataFrame)
model (Any)

Return type:

ndarray

graphistry.compute.cluster.get_model_matrix(g, kind, cols, umap, target)#

Allows for a single function to get the model matrix for both nodes and edges as well as targets, embeddings, and features

Args:

g:: graphistry graph
kind:: ‘nodes’ or ‘edges’
cols:: list of columns to use for clustering given g.featurize has been run
umap:: whether to use UMAP embeddings or features dataframe
target:: whether to use the target dataframe or features dataframe

Returns:

pd.DataFrame: dataframe of model matrix given the inputs

Parameters:

g (Plottable)
kind (Literal['nodes', 'edges'])
cols (List | str | None)

Return type:

Any

graphistry.compute.cluster.make_safe_gpu_dataframes(X, y, engine)#

Coerce a dataframe to pd vs cudf based on engine

Parameters:

X (Any | None)
y (Any | None)
engine (Engine)

Return type:

Tuple[Any | None, Any | None]

graphistry.compute.cluster.resolve_dbscan_engine(engine, g_or_df=None)#

Resolves the engine to use for DBSCAN clustering

If ‘auto’, decide by checking if cuml or sklearn is installed, and if provided, natural type of the dataset. GPU is used if both a GPU dataset and GPU library is installed. Otherwise, CPU library.

Parameters:

engine (Literal['cuml', 'sklearn', 'auto'])
g_or_df (Any | None)

Return type:

Literal[‘cuml’, ‘sklearn’]

graphistry.compute.collapse module#

graphistry.compute.collapse.check_default_columns_present_and_coerce_to_string(g)#

Helper to set COLLAPSE columns to nodes and edges dataframe, while converting src, dst, node to dtype(str)

Generates unique internal column names to avoid conflicts with user data. Stores the generated names as attributes on the graph object: - g._collapse_node_col - g._collapse_src_col - g._collapse_dst_col

Parameters:: g (Plottable) – graphistry instance
Returns:: graphistry instance

graphistry.compute.collapse.check_has_set(ndf, parent, child, collapse_node_col)#

Parameters:: collapse_node_col (str)

graphistry.compute.collapse.collapse_algo(g, child, parent, attribute, column, seen)#

Basically candy crush over graph properties in a topology aware manner

Checks to see if child node has desired property from parent, we will need to check if (start_node=parent: has_attribute , children nodes: has_attribute) by case (T, T), (F, T), (T, F) and (F, F),we start recursive collapse (or not) on the children, reassigning nodes and edges.

if (T, T), append children nodes to start_node, re-assign the name of the node, and update the edge table with new name,

if (F, T) start k-(potentially new) super nodes, with k the number of children of start_node. Start node keeps k outgoing edges.

if (T, F) it is the end of the cluster, and we keep new node as is; keep going

if (F, F); keep going

Parameters:

seen (dict)
g (Plottable) – graphistry instance
child (str | int) – child node to start traversal, for first traversal, set child=parent or vice versa.
parent (str | int) – parent node to start traversal, in main call, this is set to child.
attribute (str | int) – attribute to collapse by
column (str | int) – column in nodes dataframe to collapse over.

Returns:

graphistry instance with collapsed nodes.

graphistry.compute.collapse.collapse_by(self, parent, start_node, attribute, column, seen, self_edges=False, unwrap=False, verbose=True)#

Main call in collapse.py, collapses nodes and edges by attribute, and returns normalized graphistry object.

Parameters:

self (Plottable) – graphistry instance
parent (str | int) – parent node to start traversal, in main call, this is set to child.
start_node (str | int)
attribute (str | int) – attribute to collapse by
column (str | int) – column in nodes dataframe to collapse over.
seen (dict) – dict of previously collapsed pairs – {n1, n2) is seen as different from (n2, n1)
verbose (bool) – bool, default True
self_edges (bool)
unwrap (bool)

Return type:

:returns graphistry instance with collapsed and normalized nodes.

graphistry.compute.collapse.collapse_nodes_and_edges(g, parent, child)#

Asserts that parent and child node in ndf should be collapsed into super node. Sets new ndf with COLLAPSE nodes in graphistry instance g

# this asserts that we SHOULD merge parent and child as super node # outside logic controls when that is the case # for example, it assumes parent is already in cluster keys of COLLAPSE node

Parameters:

g (Plottable) – graphistry instance
parent (str | int) – node with attribute in column
child (str | int) – node with attribute in column

Returns:

graphistry instance

graphistry.compute.collapse.get_children(g, node_id, hops=1)#

Helper that gets children at k-hops from node node_id

:returns graphistry instance of hops

Parameters:

g (Plottable)
node_id (str | int)
hops (int)

graphistry.compute.collapse.get_cluster_store_keys(ndf, node, collapse_node_col)#

Main innovation in finding and adding to super node. Checks if node is a segment in any collapse_node in COLLAPSE column of nodes DataFrame

Parameters:

ndf (DataFrame) – node DataFrame
node (str | int) – node to find
collapse_node_col (str) – the collapse node column name

Returns:

DataFrame of bools of where wrap_key(node) exists in COLLAPSE column

graphistry.compute.collapse.get_edges_in_out_cluster(g, node_id, attribute, column, directed=True)#

Traverses children of node_id and separates them into incluster and outcluster sets depending if they have attribute in node DataFrame column

Parameters:

g (Plottable) – graphistry instance
node_id (str | int) – node with attribute in column
attribute (str | int) – attribute to collapse in column over
column (str | int) – column to collapse over
directed (bool)

graphistry.compute.collapse.get_edges_of_node(g, node_id, outgoing_edges=True, hops=1)#

Gets edges of node at k-hops from node

Parameters:

g (Plottable) – graphistry instance
node_id (str | int) – node to find edges from
outgoing_edges (bool) – bool, if true, finds all outgoing edges of node, default True
hops (int) – the number of hops from node to take, default = 1

Returns:

DataFrame of edges

graphistry.compute.collapse.get_new_node_name(ndf, parent, child, collapse_node_col)#

If child in cluster group, melts name, else makes new parent_name from parent, child

Parameters:

ndf (DataFrame) – node DataFrame
parent (str | int) – node with attribute in column
child (str | int) – node with attribute in column
collapse_node_col (str) – the collapse node column name

Return type:

str

:returns new_parent_name

graphistry.compute.collapse.has_edge(g, n1, n2, directed=True)#

Checks if n1 and n2 share an (directed or not) edge

Parameters:

g (Plottable) – graphistry instance
n1 (str | int) – node to check if has edge to n2
n2 (str | int) – node to check if has edge to n1
directed (bool) – bool, if True, checks only outgoing edges from n1->`n2`, else finds undirected edges

Returns:

bool, if edge exists between n1 and n2

Return type:

bool

graphistry.compute.collapse.has_property(g, ref_node, attribute, column)#

Checks if ref_node is in node dataframe in column with attribute :param attribute: :param column: :param g: graphistry instance :param ref_node: node to check if it as attribute in column

Returns:

bool

Parameters:

g (Plottable)
ref_node (str | int)
attribute (str | int)
column (str | int)

Return type:

bool

graphistry.compute.collapse.in_cluster_store_keys(ndf, node, collapse_node_col)#

checks if node is in collapse_node in COLLAPSE column of nodes DataFrame

Parameters:

ndf (DataFrame) – nodes DataFrame
node (str | int) – node to find
collapse_node_col (str) – the collapse node column name

Returns:

bool

Return type:

bool

graphistry.compute.collapse.melt(ndf, node, collapse_node_col)#

Reduces node if in cluster store, otherwise passes it through. ex:

node = “4” will take any sequence from get_cluster_store_keys, “1 2 3”, “4 3 6” and returns “1 2 3 4 6” when they have a common entry (3).

:param ndf, node DataFrame :param node: node to melt :param collapse_node_col: the collapse node column name :returns new_parent_name of super node

Parameters:

ndf (DataFrame)
node (str | int)
collapse_node_col (str)

Return type:

str

graphistry.compute.collapse.normalize_graph(g, self_edges=False, unwrap=False)#

Final step after collapse traversals are done, removes duplicates and moves COLLAPSE columns into respective(node, src, dst) columns of node, edges dataframe from Graphistry instance g.

Parameters:

g (Plottable) – graphistry instance
self_edges (bool) – bool, whether to keep duplicates from ndf, edf, default False
unwrap (bool) – bool, whether to unwrap node text with ~, default True

Returns:

final graphistry instance

Return type:

graphistry.compute.collapse.reduce_key(key)#

Takes “1 1 2 1 2 3” -> “1 2 3

Parameters:: key (str | int) – node name
Returns:: new node name with duplicates removed
Return type:: str

graphistry.compute.collapse.unpack(g)#

Helper method that unpacks graphistry instance

ex:

ndf, edf, src, dst, node = unpack(g)

Parameters:: g (Plottable) – graphistry instance
Returns:: node DataFrame, edge DataFrame, source column, destination column, node column

graphistry.compute.collapse.unwrap_key(name)#

Unwraps node name: ~name~ -> name

Parameters:: name (str | int) – node to unwrap
Returns:: unwrapped node name
Return type:: str

graphistry.compute.collapse.wrap_key(name)#

Wraps node name -> ~name~

Parameters:: name (str | int) – node name
Returns:: wrapped node name
Return type:: str

graphistry.compute.conditional module#

class graphistry.compute.conditional.ConditionalMixin(*a, **kw)#

Bases: Plottable

DGL_graph: Any | None#

conditional_graph(x, given, kind='nodes', *args, **kwargs)#

conditional_graph – p(x|given) = p(x, given) / p(given)

Useful for finding the conditional probability of a node or edge attribute

returned dataframe sums to 1 on each column

Parameters:

x – target column
given – the dependent column
kind – ‘nodes’ or ‘edges’
args/kwargs – additional arguments for g.bind(…)

Returns:

a graphistry instance with the conditional graph edges weighted by the conditional probability. edges are between x and given, keep in mind that g._edges.columns = [given, x, _probs]

conditional_probs(x, given, kind='nodes', how='index')#

Produces a Dense Matrix of the conditional probability of x given y

Args:: x: the column variable of interest given the column y=given given : the variabe to fix constant df pd.DataFrame: dataframe how (str, optional): One of ‘column’ or ‘index’. Defaults to ‘index’. kind (str, optional): ‘nodes’ or ‘edges’. Defaults to ‘nodes’.
Returns:: pd.DataFrame: the conditional probability of x given the column y as dense array like dataframe

session: ClientSession#

graphistry.compute.conditional.conditional_probability(x, given, df)#

conditional probability function over categorical variables: p(x | given) = p(x, given)/p(given)
Args:: x: the column variable of interest given the column ‘given’ given: the variabe to fix constant df: dataframe with columns [given, x]
Returns:: pd.DataFrame: the conditional probability of x given the column ‘given’

Parameters:: df (DataFrame)

graphistry.compute.conditional.probs(x, given, df, how='index')#

Produces a Dense Matrix of the conditional probability of x given y=given

Args:: x: the column variable of interest given the column ‘y’ given : the variabe to fix constant df pd.DataFrame: dataframe how (str, optional): One of ‘column’ or ‘index’. Defaults to ‘index’.
Returns:: pd.DataFrame: the conditional probability of x given the column ‘y’ as dense array like dataframe

Parameters:: df (DataFrame)

graphistry.compute.filter_by_dict module#

graphistry.compute.filter_by_dict.filter_by_dict(df, filter_dict=None, engine=EngineAbstract.AUTO)#

return df where rows match all values in filter_dict

Parameters:

df (Any)
filter_dict (dict | None)
engine (EngineAbstract | str)

Return type:

Any

graphistry.compute.filter_by_dict.filter_edges_by_dict(self, filter_dict=None, engine=EngineAbstract.AUTO)#

filter edges to those that match all values in filter_dict

Parameters:

self (Plottable)
filter_dict (dict | None)
engine (EngineAbstract | str)

Return type:

graphistry.compute.filter_by_dict.filter_nodes_by_dict(self, filter_dict=None, engine=EngineAbstract.AUTO)#

filter nodes to those that match all values in filter_dict

Parameters:

self (Plottable)
filter_dict (dict | None)
engine (EngineAbstract | str)

Return type:

graphistry.compute.filter_by_dict.resolve_filter_column(df, col, val)#

Parameters:

df (Any)
col (str)
val (Any)

Return type:

Tuple[str, Any]

graphistry.compute.hop module#

Graph hop/traversal operations for PyGraphistry.

NOTE: Excluded from pyre (.pyre_configuration) - hop() complexity causes hang. Use mypy.

graphistry.compute.hop.hop(self, nodes=None, hops=1, *, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, direction='forward', edge_match=None, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, return_as_wave_front=False, include_zero_hop_seed=False, target_wave_front=None, engine=EngineAbstract.AUTO)#

Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources

This can be faster than the equivalent chain([…]) call that wraps it with additional steps

See chain() examples for examples of many of the parameters

Parameters:

self (Plottable)
nodes (Any | None)
hops (int | None)
min_hops (int | None)
max_hops (int | None)
output_min_hops (int | None)
output_max_hops (int | None)
label_node_hops (str | None)
label_edge_hops (str | None)
label_seeds (bool)
to_fixed_point (bool)
direction (str)
edge_match (dict | None)
source_node_match (dict | None)
destination_node_match (dict | None)
source_node_query (str | None)
destination_node_query (str | None)
edge_query (str | None)
return_as_wave_front (bool)
include_zero_hop_seed (bool)
target_wave_front (Any | None)
engine (EngineAbstract | str)

Return type:

graphistry.compute.hop.query_if_not_none(query, df)#

Parameters:

query (str | None)
df (Any)

Return type:

Any

graphistry.compute.python_remote module#

graphistry.compute.python_remote.python_remote_g(self, code, api_token=None, dataset_id=None, format='parquet', output_type='all', engine='auto', run_label=None, validate=True)#

Remotely run Python code on a remote dataset that returns a Plottable

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘parquet’.
output_type (Optional[OutputTypeGraph]) – What shape of output to fetch. Defaults to ‘all’. Options include ‘nodes’, ‘edges’, ‘all’ (both). For other variants, see python_remote_shape and python_remote_json.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.
self (Plottable)

Return type:

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
g2 = g1.python_remote_g(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g
    ''',
    engine='cudf')
num_edges = len(g2._edges)
print(f'num_edges: {num_edges}')

graphistry.compute.python_remote.python_remote_generic(self, code, api_token=None, dataset_id=None, format='json', output_type='json', engine='auto', run_label=None, validate=True)#

Remotely run Python code on a remote dataset.

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘json’. We recommend a columnar format such as parquet.
output_type (Optional[OutputTypeAll]) – What shape of output to fetch. Defaults to ‘json’. Options include ‘nodes’, ‘edges’, ‘all’ (both), ‘table’, ‘shape’, and ‘json’.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.
self (Plottable)

Return type:

Plottable | DataFrame | Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
out_json = g1.python_remote(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return {
                'num_edges': len(g._edges)
            }
    ''',
    engine='cudf')
num_edges = out_json['num_edges']
print(f'num_edges: {num_edges}')

graphistry.compute.python_remote.python_remote_json(self, code, api_token=None, dataset_id=None, engine='auto', run_label=None, validate=True)#

Remotely run Python code on a remote dataset that returns json

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.
self (Plottable)

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
obj = g1.python_remote_json(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return {'num_edges': len(g._edges)}
    ''',
    engine='cudf')
num_edges = obj['num_edges']
print(f'num_edges: {num_edges}')

graphistry.compute.python_remote.python_remote_table(self, code, api_token=None, dataset_id=None, format='parquet', output_type='table', engine='auto', run_label=None, validate=True)#

Remotely run Python code on a remote dataset that returns a table

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘parquet’.
output_type (Optional[OutputTypeGraph]) – What shape of output to fetch. Defaults to ‘table’. Options include ‘table’, ‘nodes’, and ‘edges’.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.
self (Plottable)

Return type:

DataFrame

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
edges_df = g1.python_remote_table(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g._edges
    ''',
    engine='cudf')
num_edges = len(edges_df)
print(f'num_edges: {num_edges}')

graphistry.compute.python_remote.validate_python_str(code)#

Validate Python code string.

Returns True if the code string is valid, otherwise return False or raise ValueError

Parameters:: code (str)
Return type:: bool

graphistry.compute.typing module#

class graphistry.compute.typing.ArrayLike(*args, **kwargs)#

Bases: Protocol

Small numpy/cupy-like 1-D array surface used by compute kernels.

astype(dtype)#

Parameters:: dtype (Any)
Return type:: ArrayLike

dtype: Any#

nbytes: int#

shape: Tuple[int, ...]#

sum()#

Return type:: Any

class graphistry.compute.typing.ArrayNamespace(*args, **kwargs)#

Bases: Protocol

Small numpy/cupy namespace surface used by compute kernels.

arange(*args, **kwargs)#

Parameters:

args (Any)
kwargs (Any)

Return type:

argsort(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

asarray(a, dtype=Ellipsis)#

Parameters:

a (Any)
dtype (Any)

Return type:

concatenate(arrays)#

Parameters:: arrays (Any)
Return type:: ArrayLike

cumsum(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

int64: Any#

nonzero(a)#

Parameters:: a (ArrayLike)
Return type:: Tuple[ArrayLike, …]

ones(shape, dtype=Ellipsis)#

Parameters:

shape (Any)
dtype (Any)

Return type:

promote_types(type1, type2)#

Parameters:

type1 (Any)
type2 (Any)

Return type:

Any

searchsorted(a, v)#

Parameters:

a (ArrayLike)
v (ArrayLike)

Return type:

sort(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

unique(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

where(condition, x, y)#

Parameters:

condition (ArrayLike)
x (Any)
y (Any)

Return type:

zeros(shape, dtype=Ellipsis)#

Parameters:

shape (Any)
dtype (Any)

Return type:

Module contents#

class graphistry.compute.ArrayLike(*args, **kwargs)#

Bases: Protocol

Small numpy/cupy-like 1-D array surface used by compute kernels.

astype(dtype)#

Parameters:: dtype (Any)
Return type:: ArrayLike

dtype: Any#

nbytes: int#

shape: Tuple[int, ...]#

sum()#

Return type:: Any

class graphistry.compute.ArrayNamespace(*args, **kwargs)#

Bases: Protocol

Small numpy/cupy namespace surface used by compute kernels.

arange(*args, **kwargs)#

Parameters:

args (Any)
kwargs (Any)

Return type:

argsort(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

asarray(a, dtype=Ellipsis)#

Parameters:

a (Any)
dtype (Any)

Return type:

concatenate(arrays)#

Parameters:: arrays (Any)
Return type:: ArrayLike

cumsum(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

int64: Any#

nonzero(a)#

Parameters:: a (ArrayLike)
Return type:: Tuple[ArrayLike, …]

ones(shape, dtype=Ellipsis)#

Parameters:

shape (Any)
dtype (Any)

Return type:

promote_types(type1, type2)#

Parameters:

type1 (Any)
type2 (Any)

Return type:

Any

searchsorted(a, v)#

Parameters:

a (ArrayLike)
v (ArrayLike)

Return type:

sort(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

unique(a)#

Parameters:: a (ArrayLike)
Return type:: ArrayLike

where(condition, x, y)#

Parameters:

condition (ArrayLike)
x (Any)
y (Any)

Return type:

zeros(shape, dtype=Ellipsis)#

Parameters:

shape (Any)
dtype (Any)

Return type:

class graphistry.compute.Between(lower, upper, inclusive=True)#

Bases: ASTPredicate

Parameters:

lower (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
upper (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
inclusive (bool)

to_json(validate=True)#

Serialize maintaining backward compatibility

Return type:: dict

class graphistry.compute.Chain(chain, where=None, validate=True)#

Bases: ASTSerializable

Parameters:

chain (List[ASTObject])
where (Sequence[WhereComparison] | None)
validate (bool)

classmethod from_json(d, validate=True)#

Convert a JSON AST into a list of ASTObjects

Parameters:

d (Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]])
validate (bool)

Return type:

Chain

to_json(validate=True)#

Convert a list of ASTObjects into a JSON AST

Return type:: Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]]

validate(collect_all=False)#

Validate this AST node.

Args:

collect_all: If True, collect all errors instead of raising on first.: If False (default), raise on first error.

Returns:

If collect_all=True: List of validation errors (empty if valid) If collect_all=False: None if valid

Raises:

GFQLValidationError: If collect_all=False and validation fails

Parameters:: collect_all (bool)
Return type:: List[GFQLValidationError] | None

validate_schema(g, collect_all=False)#

Validate this chain against a graph’s schema without executing.

Args:: g: Graph to validate against collect_all: If True, collect all errors. If False, raise on first.
Returns:: If collect_all=True: List of errors (empty if valid) If collect_all=False: None if valid
Raises:: GFQLSchemaError: If collect_all=False and validation fails

Parameters:

g (Plottable)
collect_all (bool)

Return type:

List[GFQLSchemaError] | None

class graphistry.compute.ComputeMixin(*a, **kw)#

Bases: Plottable

chain(*args, **kwargs)#

Deprecated since version 2.XX.X: Use gfql() instead for a unified API that supports both chains and DAGs.

Chain a list of ASTObject (node/edge) traversal operations

Return subgraph of matches according to the list of node & edge matchers If any matchers are named, add a correspondingly named boolean-valued column to the output

For direct calls, exposes convenience List[ASTObject]. Internal operational should prefer Chain.

Use engine=’cudf’ to force automatic GPU acceleration mode

Parameters:

ops – List[ASTObject] Various node and edge matchers
validate_schema – Whether to validate the chain against the graph schema before executing
policy – Optional policy dict for hooks
context – Optional ExecutionContext for tracking execution state
start_nodes – Optional node wavefront for the first traversal step

Returns:

Plotter

Return type:

Plotter

chain_remote(*args, **kwargs)#

Deprecated since version 2.XX.X: Use gfql_remote() instead for a unified API that supports both chains and DAGs.

Remotely run GFQL chain query on a remote dataset.

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

param chain:

GFQL query as a Python object, serialized GFQL JSON, or Cypher string

type chain:

Union[Chain, List[ASTObject], Dict[str, JSONVal], ASTLet, str]

param api_token:

Optional JWT token. If not provided, refreshes JWT and uses that.

type api_token:

Optional[str]

param dataset_id:

Optional dataset_id. If not provided, will fallback to self._dataset_id. If not provided, will upload current data, store that dataset_id, and run GFQL against that.

type dataset_id:

Optional[str]

param output_type:

Whether to return nodes and edges (“all”, default), Plottable with just nodes (“nodes”), or Plottable with just edges (“edges”). For just a dataframe of the resultant graph shape (output_type=”shape”), use instead chain_remote_shape().

type output_type:

OutputType

param format:

What format to fetch results. We recommend a columnar format such as parquet, which it defaults to when output_type is not shape.

type format:

Optional[FormatType]

param df_export_args:

When server parses data, any additional parameters to pass in.

type df_export_args:

Optional[Dict, str, Any]]

param node_col_subset:

When server returns nodes, what property subset to return. Defaults to all.

type node_col_subset:

Optional[List[str]]

param edge_col_subset:

When server returns edges, what property subset to return. Defaults to all.

type edge_col_subset:

Optional[List[str]]

param engine:

Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.

type engine:

EngineAbstractType

param validate:

Whether to locally test code, and if uploading data, the data. Default true.

type validate:

bool

param persist:

Whether to persist dataset on server and return dataset_id for immediate URL generation. Default false.

type persist:

bool
Example: Explicitly upload graph and return subgraph where nodes have at least one edge
import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst').upload()
assert g1._dataset_id, "Graph should have uploaded"

g2 = g1.chain_remote([n(), e(), n()])
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')
Example: Return subgraph where nodes have at least one edge, with implicit upload
import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')
g2 = g1.chain_remote([n(), e(), n()])
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')
Example: Return subgraph where nodes have at least one edge, with implicit upload, and force GPU mode
import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')
g2 = g1.chain_remote([n(), e(), n()], engine='cudf')
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

Return type:: Plottable

chain_remote_shape(*args, **kwargs)#

Deprecated since version 2.XX.X: Use gfql_remote_shape() instead for a unified API that supports both chains and DAGs.

Like chain_remote(), except instead of returning a Plottable, returns a pd.DataFrame of the shape of the resulting graph.

Useful as a fast success indicator that avoids the need to return a full graph when a match finds hits, return just the metadata.

Example: Upload graph and compute number of nodes with at least one edge

import graphistry
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst').upload()
assert g1._dataset_id, "Graph should have uploaded"

shape_df = g1.chain_remote_shape([n(), e(), n()])
print(shape_df)

Example: Compute number of nodes with at least one edge, with implicit upload, and force GPU mode

import graphistry
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')

shape_df = g1.chain_remote_shape([n(), e(), n()], engine='cudf')
print(shape_df)

Return type:: DataFrame

collapse(node, attribute, column, self_edges=False, unwrap=False, verbose=False)#

Topology-aware collapse by given column attribute starting at node

Traverses directed graph from start node node and collapses clusters of nodes that share the same property so that topology is preserved.

Parameters:

node (str | int) – start node to begin traversal
attribute (str | int) – the given attribute to collapse over within column
column (str | int) – the column of nodes DataFrame that contains attribute to collapse over
self_edges (bool) – whether to include self edges in the collapsed graph
unwrap (bool) – whether to unwrap the collapsed graph into a single node
verbose (bool) – whether to print out collapse summary information

create_index(kind, *, column=None, name=None, engine='auto')#

Build a GFQL physical index for O(degree) seeded traversal.

Parameters:

kind – ‘edge_out_adj’ (forward hops), ‘edge_in_adj’ (reverse hops), or ‘node_id’ (node lookup)
column – column to index (defaults to the binding for the kind, e.g. the edge source column)
name – optional custom index name (defaults to ‘kind:column’)
engine – ‘auto’ | ‘pandas’ | ‘cudf’ | ‘polars’ — array backend for the index

Returns:

new Plottable with the index resident (original is untouched)

Example

g2 = g.create_index('edge_out_adj')
g2.gfql([n({'id': 0}), e_forward(hops=2)], index_policy='use')

See Seeded Traversal Indexes (CSR Adjacency) for policies, cost gating, and benchmarks.

drop_index(kind=None)#: Drop one resident GFQL index (by kind) or all (kind=None). Idempotent; returns a new Plottable.

drop_nodes(nodes)#: return g with any nodes/edges involving the node id series removed

filter_edges_by_dict(*args, **kwargs)#: filter edges to those that match all values in filter_dict

filter_nodes_by_dict(*args, **kwargs)#: filter nodes to those that match all values in filter_dict

get_degrees(col='degree', degree_in='degree_in', degree_out='degree_out')#

Decorate nodes table with degree info

Edges must be dataframe-like: pandas, cudf, …

Parameters determine generated column names

Warning: Self-cycles are currently double-counted. This may change.

Example: Generate degree columns

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.get_degrees()
print(g2._nodes)  # pd.DataFrame with 'id', 'degree', 'degree_in', 'degree_out'

Parameters:

col (str)
degree_in (str)
degree_out (str)

get_indegrees(col='degree_in')#

See get_degrees

Parameters:: col (str)

get_outdegrees(col='degree_out')#

See get_degrees

Parameters:: col (str)

get_topological_levels(level_col='level', allow_cycles=True, warn_cycles=True, remove_self_loops=True)#

Example:

Parameters:

level_col (str)
allow_cycles (bool)
warn_cycles (bool)
remove_self_loops (bool)

Return type:

gfql(*args, **kwargs)#

Execute a GFQL query - either a chain or a DAG

Unified entrypoint that automatically detects query type and dispatches to the appropriate execution engine.

Parameters:

query – GFQL query - ASTObject, List[ASTObject], Chain, ASTLet, dict, or supported query string
engine – Execution engine (auto, pandas, cudf)
output – For DAGs, name of binding to return (default: last executed)
policy – Optional policy hooks for external control (preload, postload, precall, postcall phases)
where – Optional same-path constraints for list/Chain queries
language – Optional string-query language selector. Defaults to "cypher" when query is a string.
params – Optional parameter dictionary for string-query compilation
validate – When True, run local preflight validation before execution via g.gfql_validate(...).
shortest_path_backend – Backend for shortestPath execution: "auto" (default), "igraph" (require igraph, raise if missing), "cugraph" (require cugraph, raise if missing), or "bfs" (always use DataFrame BFS). "auto" tries cugraph on CUDF engine, igraph on pandas, falls back to BFS silently.

Returns:

Resulting Plottable

Return type:

Plottable

GFQL physical indexes

param index_policy:: ’off’ (never use indexes), ‘use’ (use resident, cost-gated; default planner behavior), ‘auto’ (build on demand), or ‘force’ (always probe the index). Also accepts index DDL strings (CREATE GFQL INDEX ...) / wire ops as the query — routed to the index registry. See create_index() and Seeded Traversal Indexes (CSR Adjacency).

gfql_explain(query, *, index_policy='use', engine='auto')#

Parameters:

query (GFQLQuery)
index_policy (IndexPolicy)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])

Return type:

GfqlExplainReport

gfql_index_all(engine='auto')#: Convenience: build all GFQL physical indexes (both edge adjacencies + node_id). Returns a new Plottable.

gfql_index_edges(direction='both', engine='auto')#: Convenience: build the edge adjacency index(es) — ‘forward’, ‘reverse’, or ‘both’. Returns a new Plottable.

Run GFQL query remotely.

This is the remote execution version of gfql(). It supports chains, Let/DAG patterns, and Cypher strings.

Parameters:

chain (Chain | List[ASTObject] | ASTLet | Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]] | str) – GFQL query — Chain, List[ASTObject], ASTLet, Dict, or Cypher string (compiled locally before sending).
params (Dict[str, Any] | None) – Optional parameter dict for Cypher string queries (e.g., params={"val": 10} for $val references).
api_token (str | None)
dataset_id (str | None)
output_type (Literal['all', 'nodes', 'edges', 'shape'])
format (Literal['json', 'csv', 'parquet'] | None)
df_export_args (Dict[str, Any] | None)
node_col_subset (List[str] | None)
edge_col_subset (List[str] | None)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])
validate (bool)
persist (bool)
output (str | None)

Return type:

Example:

# Chain (existing)
g.gfql_remote([n(), e(), n()])

# Cypher string with params
g.gfql_remote(
    "MATCH (n) WHERE n.score > $cutoff RETURN n",
    params={"cutoff": 10},
)

# GRAPH constructor
g.gfql_remote("GRAPH { MATCH (a)-[r]->(b) WHERE a.score > 5 }")

See chain_remote() for additional parameter documentation.

Get shape metadata for remote GFQL query execution.

This is the remote shape version of gfql(). Returns metadata about the resulting graph without downloading the full data.

See chain_remote_shape() for detailed documentation (chain_remote_shape is deprecated).

Parameters:

chain (Chain | List[ASTObject] | ASTLet | Dict[str, None | bool | str | float | int | List[Any] | Dict[str, Any]] | str)
api_token (str | None)
dataset_id (str | None)
format (Literal['json', 'csv', 'parquet'] | None)
df_export_args (Dict[str, Any] | None)
node_col_subset (List[str] | None)
edge_col_subset (List[str] | None)
engine (EngineAbstract | Literal['pandas', 'cudf', 'dask', 'dask_cudf', 'polars', 'polars-gpu', 'auto'])
validate (bool)
persist (bool)

Return type:

DataFrame

gfql_validate(*args, **kwargs)#

Validate a GFQL/Cypher query without executing it.

Raises structured GFQL exceptions on validation failures and never dispatches query execution operators.

hop(*args, **kwargs)#

Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources

This can be faster than the equivalent chain([…]) call that wraps it with additional steps

See chain() examples for examples of many of the parameters

keep_nodes(nodes)#: Limit nodes and edges to those selected by parameter nodes For edges, both source and destination must be in nodes Nodes can be a list or series of node IDs, or a dictionary When a dictionary, each key corresponds to a node column, and nodes will be included when all match

materialize_nodes(reuse=True, engine=EngineAbstract.AUTO)#

Generate g._nodes based on g._edges

Uses g._node for node id if exists, else ‘id’

Edges must be dataframe-like: cudf, pandas, …

When reuse=True and g._nodes is not None, use it

Example: Generate nodes

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.materialize_nodes()
print(g2._nodes)  # pd.DataFrame

Parameters:

reuse (bool)
engine (EngineAbstract | str)

Return type:

prune_self_edges()#

python_remote_g(*args, **kwargs)#

Remotely run Python code on a remote dataset that returns a Plottable

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘parquet’.
output_type (Optional[OutputTypeGraph]) – What shape of output to fetch. Defaults to ‘all’. Options include ‘nodes’, ‘edges’, ‘all’ (both). For other variants, see python_remote_shape and python_remote_json.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
g2 = g1.python_remote_g(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g
    ''',
    engine='cudf')
num_edges = len(g2._edges)
print(f'num_edges: {num_edges}')

python_remote_json(*args, **kwargs)#

Remotely run Python code on a remote dataset that returns json

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
obj = g1.python_remote_json(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return {'num_edges': len(g._edges)}
    ''',
    engine='cudf')
num_edges = obj['num_edges']
print(f'num_edges: {num_edges}')

python_remote_table(*args, **kwargs)#

Remotely run Python code on a remote dataset that returns a table

Uses the latest bound _dataset_id, and uploads current dataset if not already bound. Note that rebinding calls of edges() and nodes() reset the _dataset_id binding.

Parameters:

code (Union[str, Callable[..., object]]) – Python code that includes a top-level function def task(g: Plottable) -> Union[str, Dict].
api_token (Optional[str]) – Optional JWT token. If not provided, refreshes JWT and uses that.
dataset_id (Optional[str]) – Optional dataset_id. If not provided, will fallback to self._dataset_id. If not defined, will upload current data, store that dataset_id, and run code against that.
format (Optional[FormatType]) – What format to fetch results. Defaults to ‘parquet’.
output_type (Optional[OutputTypeGraph]) – What shape of output to fetch. Defaults to ‘table’. Options include ‘table’, ‘nodes’, and ‘edges’.
engine (EngineAbstractType) – Override which run mode GFQL uses. Defaults to ‘auto’ which auto-detects based on DataFrame type. Also accepts ‘pandas’ or ‘cudf’.
run_label (Optional[str]) – Optional label for the run for serverside job tracking.
validate (bool) – Whether to locally test code, and if uploading data, the data. Default true.

Return type:

Any

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
edges_df = g1.python_remote_table(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g._edges
    ''',
    engine='cudf')
num_edges = len(edges_df)
print(f'num_edges: {num_edges}')

search_edges(term, columns=None, case_sensitive=False, regex=False)#: Keep edges where ANY column matches term — see search_nodes().

search_nodes(term, columns=None, case_sensitive=False, regex=False)#: Keep nodes where ANY column matches term (viz-filter L2 inspector semantics: OR across columns; case-insensitive substring default; regex opt-in; string columns always, integer columns iff the term is a numeric literal — floats/dates via explicit columns= on pandas ONLY: cuDF declines them, its float/temporal stringification diverges from pandas). pandas/cuDF native; polars frames raise NotImplementedError (use the cypher search_any op).

show_indexes()#: Return a pandas DataFrame describing resident GFQL indexes (name, kind, column, valid). Empty if none; valid=False marks a stale index after a frame rebind.

to_cudf()#

Convert to GPU mode by converting any defined nodes and edges to cudf dataframes

When nodes or edges are already cudf dataframes, they are left as is

Parameters:: g (Plottable) – Graphistry object
Returns:: Graphistry object
Return type:: Plottable

to_pandas()#

Convert nodes and edges to pandas DataFrames.

Supports all input types: cuDF, Arrow, Polars, Spark, dask, and pandas (identity).

Return type:: Plottable

class graphistry.compute.Contains(pat, case=True, flags=0, na=None, regex=True)#

Bases: ASTPredicate

Parameters:

pat (str)
case (bool)
flags (int)
na (bool | None)
regex (bool)

graphistry.compute.DataFrameT#: alias of Any

class graphistry.compute.DateTimeValue(value, timezone='UTC')#

Bases: TemporalValue

Tagged datetime value with timezone support

Parameters:

value (str)
timezone (str)

as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: Timestamp

classmethod from_datetime(dt)#

Create from Python datetime

Parameters:: dt (datetime)
Return type:: DateTimeValue

classmethod from_pandas_timestamp(ts)#

Create from pandas Timestamp

Parameters:: ts (Timestamp)
Return type:: DateTimeValue

to_json()#

Return dict for tagged temporal value

Return type:: DateTimeWire

class graphistry.compute.DateValue(value)#

Bases: _ScalarTemporalValue

Tagged date value

Parameters:: value (str)

as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: Timestamp

classmethod from_date(d)#

Create from Python date

Parameters:: d (date)
Return type:: DateValue

class graphistry.compute.Duplicated(keep='first')#

Bases: ASTPredicate

Parameters:: keep (Literal['first', 'last', False])

class graphistry.compute.EQ(val)#

Bases: _ScalarTemporalComparison

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)

static op(a, b, /)#: Same as a == b.

safe_scalar_compare: ClassVar[bool] = False#

class graphistry.compute.Endswith(pat, case=True, na=None)#

Bases: _BoundaryStringPredicate

Parameters:

pat (str | tuple)
case (bool)
na (bool | None)

class graphistry.compute.Fullmatch(pat, case=True, flags=0, na=None)#

Bases: _RegexStringPredicate

Parameters:

pat (str)
case (bool)
flags (int)
na (bool | None)

class graphistry.compute.GE(val)#

Bases: _ScalarTemporalComparison

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)

static op(a, b, /)#: Same as a >= b.

class graphistry.compute.GT(val)#

Bases: _ScalarTemporalComparison

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)

static op(a, b, /)#: Same as a > b.

class graphistry.compute.IsAlnum#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsAlpha#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsDecimal#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsDigit#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsIn(options)#

Bases: ASTPredicate

Parameters:: options (List[int | float | str | number | None | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue])

to_json(validate=True)#

Override to handle temporal values in options

Return type:: dict

class graphistry.compute.IsLeapYear#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsLower#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsMonthEnd#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsMonthStart#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsNA#: Bases: ASTPredicate

class graphistry.compute.IsNull#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsNumeric#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsQuarterEnd#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsQuarterStart#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsSpace#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsTitle#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsUpper#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsYearEnd#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.IsYearStart#

Bases: _DatetimePropertyPredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.LE(val)#

Bases: _ScalarTemporalComparison

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)

static op(a, b, /)#: Same as a <= b.

class graphistry.compute.LT(val)#

Bases: _ScalarTemporalComparison

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)

static op(a, b, /)#: Same as a < b.

class graphistry.compute.Match(pat, case=True, flags=0, na=None)#

Bases: _RegexStringPredicate

Parameters:

pat (str)
case (bool)
flags (int)
na (bool | None)

class graphistry.compute.NE(val)#

Bases: _ScalarTemporalComparison

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)

static op(a, b, /)#: Same as a != b.

safe_scalar_compare: ClassVar[bool] = False#

class graphistry.compute.NotNA#: Bases: ASTPredicate

class graphistry.compute.NotNull#

Bases: _CallablePredicate

static predicate(s)#

Parameters:: s (Any)
Return type:: Any

class graphistry.compute.Startswith(pat, case=True, na=None)#

Bases: _BoundaryStringPredicate

Parameters:

pat (str | tuple)
case (bool)
na (bool | None)

class graphistry.compute.TemporalValue#

Bases: ABC

Base class for temporal values with tagging support

abstract as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: Any

abstract to_json()#

Serialize to JSON-compatible dictionary

Return type:: DateTimeWire | DateWire | TimeWire

class graphistry.compute.TimeValue(value)#

Bases: _ScalarTemporalValue

Tagged time value

Parameters:: value (str)

as_pandas_value()#

Convert to pandas-compatible value for comparison

Return type:: time

classmethod from_time(t)#

Create from Python time

Parameters:: t (time)
Return type:: TimeValue

graphistry.compute.between(lower, upper, inclusive=True)#

Return whether a given value is between a lower and upper threshold

Parameters:

lower (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
upper (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
inclusive (bool)

Return type:

Between

graphistry.compute.call(function, params=None)#

Create a type-safe Call operation for GFQL.

Type-checked overloads ensure parameter correctness for each method.

Args:

function: Name of the Plottable method to call params: Dictionary of parameters matching the method signature

Returns:

ASTCall object for use in gfql() or gfql_remote()

Example:

>>> call('hop', {'hops': 2, 'direction': 'forward'})
>>> call('umap', {'n_neighbors': 15, 'engine': 'cuml'})

Parameters:

function (str)
params (Dict[str, Any] | HopParams | UmapParams | HypergraphParams | GetDegreesParams | FilterEdgesByDictParams | FilterNodesByDictParams | ComputeCugraphParams | ComputeIgraphParams | EncodePointColorParams | EncodeEdgeColorParams | EncodePointSizeParams | EncodeTextColumnParams | EncodePointIconParams | EncodeEdgeIconParams | EncodeAxisParams | LayoutIgraphParams | LayoutCugraphParams | LayoutGraphvizParams | RingContinuousLayoutParams | RingCategoricalLayoutParams | TimeRingLayoutParams | Fa2LayoutParams | GroupInABoxLayoutParams | GetIndegreesParams | GetOutdegreesParams | MaterializeNodesParams | PruneSelfEdgesParams | CollapseParams | DropNodesParams | KeepNodesParams | GetTopologicalLevelsParams | NameParams | DescriptionParams | RowsParams | SelectParams | WithParams | WhereRowsParams | OrderByParams | SkipParams | LimitParams | DistinctParams | UnwindParams | GroupByParams | None)

Return type:

ASTCall

graphistry.compute.col#: alias of StepColumnRef

graphistry.compute.compare#: alias of WhereComparison

graphistry.compute.contains(pat, case=True, flags=0, na=None, regex=True)#

Return whether a given pattern or regex is contained within a string

Parameters:

pat (str)
case (bool)
flags (int)
na (bool | None)
regex (bool)

Return type:

Contains

graphistry.compute.distinct()#

Create a DISTINCT operation for GFQL row pipelines.

Return type:: ASTCall

graphistry.compute.duplicated(keep='first')#

Return whether a given value is duplicated

Parameters:: keep (Literal['first', 'last', False])
Return type:: Duplicated

graphistry.compute.e#: alias of ASTEdgeUndirected

graphistry.compute.e_forward#: alias of ASTEdgeForward

graphistry.compute.e_reverse#: alias of ASTEdgeReverse

graphistry.compute.e_undirected#: alias of ASTEdgeUndirected

graphistry.compute.endswith(pat, case=True, na=None)#

Return whether a given pattern or tuple of patterns is at the end of a string.

Parameters:

pat (str | tuple) – Pattern (str) or tuple of patterns to match at end of string. When tuple, returns True if the string ends with ANY pattern (OR logic).
case (bool) – If True, case-sensitive matching (default: True).
na (bool | None) – Fill value for missing values (default: None).

Returns:

Endswith predicate.

Return type:

Endswith

Examples#

>>> n({"email": endswith(".com")})
>>> n({"email": endswith(".COM", case=False)})
>>> n({"filename": endswith((".txt", ".csv"))})
>>> n({"filename": endswith((".TXT", ".CSV"), case=False)})

graphistry.compute.eq(val)#

Return whether a given value is equal to a threshold.

Accepts: - Numeric values (int, float, np.number, bool) - Temporal values (datetime/date/time/pd.Timestamp, temporal wire dicts, TemporalValue) - Strings (e.g., eq(“active”))

For null checks, use isna() / notna() instead.

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
Return type:: EQ

graphistry.compute.fullmatch(pat, case=True, flags=0, na=None)#

Return whether a given pattern matches the entire string

Unlike match() which matches from the start, fullmatch() requires the pattern to match the entire string. This is useful for exact validation of formats like emails, phone numbers, or IDs.

Args:

pat: Regular expression pattern to match against entire string case: If True, case-sensitive matching (default: True) flags: Regex flags (e.g., re.IGNORECASE, re.MULTILINE) na: Fill value for missing values (default: None)

Returns:

Fullmatch predicate

Examples:

>>> # Exact digit match
>>> n({"code": fullmatch(r"\d{3}")})  # Matches "123" but not "123abc"
>>>
>>> # Case-insensitive email validation
>>> n({"email": fullmatch(r"[a-z]+@[a-z]+\.com", case=False)})
>>>
>>> # With regex flags
>>> import re
>>> n({"id": fullmatch(r"[A-Z]{3}-\d{4}", flags=re.IGNORECASE)})

Parameters:

pat (str)
case (bool)
flags (int)
na (bool | None)

Return type:

Fullmatch

graphistry.compute.ge(val)#

Return whether a given value is greater than or equal to a threshold

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
Return type:: GE

graphistry.compute.group_by(keys, aggregations, key_prefixes=None)#

Create grouped aggregation operation for row pipelines.

Parameters:

keys (Iterable[str])
aggregations (Iterable[Sequence[Any]])
key_prefixes (Iterable[str] | None)

Return type:

graphistry.compute.gt(val)#

Return whether a given value is greater than a threshold

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
Return type:: GT

graphistry.compute.is_in(options)#

Parameters:: options (List[int | float | str | number | None | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue])
Return type:: IsIn

graphistry.compute.is_leap_year()#

Return whether a given value is a leap year

Return type:: IsLeapYear

graphistry.compute.is_month_end()#

Return whether a given value is a month end

Return type:: IsMonthEnd

graphistry.compute.is_month_start()#

Return whether a given value is a month start

Return type:: IsMonthStart

graphistry.compute.is_quarter_end()#

Return whether a given value is a quarter end

Return type:: IsQuarterEnd

graphistry.compute.is_quarter_start()#

Return whether a given value is a quarter start

Return type:: IsQuarterStart

graphistry.compute.is_year_end()#

Return whether a given value is a year end

Return type:: IsYearEnd

graphistry.compute.is_year_start()#

Return whether a given value is a year start

Return type:: IsYearStart

graphistry.compute.isalnum()#

Return whether a given string is alphanumeric

Return type:: IsAlnum

graphistry.compute.isalpha()#

Return whether a given string is alphabetic

Return type:: IsAlpha

graphistry.compute.isdecimal()#

Return whether a given string is decimal

Return type:: IsDecimal

graphistry.compute.isdigit()#

Return whether a given string is numeric

Return type:: IsDigit

graphistry.compute.islower()#

Return whether a given string is lowercase

Return type:: IsLower

graphistry.compute.isna()#

Return whether a given value is NA

Return type:: IsNA

graphistry.compute.isnull()#

Return whether a given string is null

Return type:: IsNull

graphistry.compute.isnumeric()#

Return whether a given string is numeric

Return type:: IsNumeric

graphistry.compute.isspace()#

Return whether a given string is whitespace

Return type:: IsSpace

graphistry.compute.istitle()#

Return whether a given string is title case

Return type:: IsTitle

graphistry.compute.isupper()#

Return whether a given string is uppercase

Return type:: IsUpper

graphistry.compute.le(val)#

Return whether a given value is less than or equal to a threshold

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
Return type:: LE

graphistry.compute.let#: alias of ASTLet

graphistry.compute.limit(value)#

Create a LIMIT operation for GFQL row pipelines.

Parameters:: value (Any)
Return type:: ASTCall

graphistry.compute.lt(val)#

Return whether a given value is less than a threshold

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
Return type:: LT

graphistry.compute.match(pat, case=True, flags=0, na=None)#

Return whether a given pattern is at the start of a string

Parameters:

pat (str)
case (bool)
flags (int)
na (bool | None)

Return type:

Match

graphistry.compute.n#: alias of ASTNode

graphistry.compute.ne(val)#

Return whether a given value is not equal to a threshold.

Follows openCypher/SQL three-valued logic: a NULL/NA cell is NOT a match (null <> x is unknown), so null rows are excluded — the same as eq and the other comparisons.

Parameters:: val (int | float | number | Timestamp | datetime | date | time | DateTimeWire | DateWire | TimeWire | TemporalValue | str)
Return type:: NE

graphistry.compute.notna()#

Return whether a given value is not NA

Return type:: NotNA

graphistry.compute.notnull()#

Return whether a given string is not null

Return type:: NotNull

graphistry.compute.order_by(keys)#

Create an ORDER BY operation for GFQL row pipelines.

Parameters:: keys (Iterable[Tuple[Any, str]])
Return type:: ASTCall

graphistry.compute.ref#: alias of ASTRef

graphistry.compute.remote#: alias of ASTRemoteGraph

graphistry.compute.return_(items)#

Python-safe alias for Cypher RETURN semantics.

Parameters:: items (Iterable[str | Tuple[str, Any]])
Return type:: ASTCall

graphistry.compute.rows(table='nodes', source=None, alias_endpoints=None, binding_ops=None)#

Create a row-source operation for GFQL row pipelines.

This operation converts graph outputs into a row table context (nodes or edges), optionally constrained to a named alias/source tag.

Parameters:

table (str)
source (str | None)
alias_endpoints (Dict[str, str] | None)
binding_ops (List[Dict[str, Any]] | None)

Return type:

graphistry.compute.select(items)#

Create a row projection operation for GFQL row pipelines.

Parameters:: items (Iterable[str | Tuple[str, Any]])
Return type:: ASTCall

graphistry.compute.skip(value)#

Create a SKIP operation for GFQL row pipelines.

Parameters:: value (Any)
Return type:: ASTCall

graphistry.compute.startswith(pat, case=True, na=None)#

Return whether a given pattern or tuple of patterns is at the start of a string.

Parameters:

pat (str | tuple) – Pattern (str) or tuple of patterns to match at start of string. When tuple, returns True if the string starts with ANY pattern (OR logic).
case (bool) – If True, case-sensitive matching (default: True).
na (bool | None) – Fill value for missing values (default: None).

Returns:

Startswith predicate.

Return type:

Startswith

Examples#

>>> n({"name": startswith("John")})
>>> n({"name": startswith("john", case=False)})
>>> n({"filename": startswith(("test_", "demo_"))})
>>> n({"filename": startswith(("TEST", "DEMO"), case=False)})

graphistry.compute.temporal_value_from_json(d)#

Factory function to create temporal value from JSON dict

Parameters:: d (Dict[str, Any])
Return type:: TemporalValue

graphistry.compute.unwind(expr, as_='value')#

Create a constrained UNWIND row expansion operation.

Parameters:

expr (Any)
as_ (str)

Return type:

graphistry.compute.where_rows(filter_dict=None, expr=None)#

Create a row-table WHERE operation using filter_dict and/or expression.

Parameters:

filter_dict (Dict[str, Any] | None)
expr (str | None)

Return type:

graphistry.compute.with_(items, extend=False)#

Python-safe alias for Cypher WITH row projection semantics.

Parameters:

items (Iterable[str | Tuple[str, Any]])
extend (bool)

Return type: