GFQL Remote Mode#

You can run GFQL queries and GPU Python remotely, such as when data is already remote, gets big, or you would like to use a remote GPU

Basic Usage#

Run chain remotely and fetch results#

from graphistry import n, e
g2 = g1.gfql_remote([n(), e(), n()])
assert len(g2._nodes) <= len(g1._nodes)

gfql_remote() accepts the same input types as local gfql():

Note

Collections are visualization URL settings; apply them after GFQL results (for example, g2.collections(...)). The GFQL remote/upload APIs do not accept collections payloads yet.

Method chain_remote runs chain remotely and fetches the computed graph

Chain / List[ASTObject]: Native GFQL chain syntax (as above).
Cypher string: Compiled locally, sent as wire-protocol JSON.
ASTLet / Let dict: DAG patterns with named bindings.

# Cypher string (compiled locally, sent as Chain wire format)
g2 = g1.gfql_remote("MATCH (a)-[r]->(b) WHERE a.score > 10 RETURN a, b")

# Cypher string with params
g2 = g1.gfql_remote(
    "MATCH (n) WHERE n.score > $cutoff RETURN n",
    params={"cutoff": 10},
)

# GRAPH constructor (compiled locally, sent as Chain wire format)
g2 = g1.gfql_remote("GRAPH { MATCH (a)-[r]->(b) WHERE a.score > 10 }")

# Multi-stage pipeline (compiled locally, sent as Let wire format)
g2 = g1.gfql_remote(
    "GRAPH g1 = GRAPH { MATCH (a)-[r]->(b) WHERE a.score > 10 } "
    "GRAPH g2 = GRAPH { USE g1 CALL graphistry.degree.write() } "
    "USE g2 MATCH (n) RETURN n.id, n.degree ORDER BY n.degree DESC"
)

Method gfql_remote runs the query remotely and fetches the computed graph.

chain: GFQL query — Chain, List[ASTObject], ASTLet, Dict, or Cypher string.
output_type: Defaulting to “all”, whether to return the nodes (‘nodes’), edges (‘edges’), or both. See gfql_remote_shape to return only metadata.
node_col_subset: Optionally limit which node attributes are returned to an allowlist.
edge_col_subset: Optionally limit which edge attributes are returned to an allowlist.
engine: Optional execution engine. Engine is typically not set, defaulting to ‘auto’. Use ‘cudf’ for GPU acceleration and ‘pandas’ for CPU.
validate: Defaulting to True, whether to validate the query and data.

Note

A public GraphSchema bound with g.bind(schema=schema) is used by local validation APIs such as g.gfql_validate(...) and g.gfql(..., validate=True). That schema surface is experimental in this release, and gfql_remote(...) does not currently send the schema object to the server. If you want declared-schema checks before a remote run, call g.gfql_validate(query) locally first, then call g.gfql_remote(query). Remote schema transport is tracked as a follow-on capability.

Manual CPU, GPU engine selection#

By default, GFQL will decide which engine to use based on workload characteristics like the dataset size. You can override this default by specifying which engine to use.

GPU#

Run on GPU remotely and fetch results

from graphistry import n, e
g2 = g1.gfql_remote([n(), e(), n()], engine='cudf')
assert len(g2._nodes) <= len(g1._nodes)

CPU#

Run on CPU remotely and fetch results

from graphistry import n, e
g2 = g1.gfql_remote([n(), e(), n()], engine='pandas')

Explicit uploads#

Explicit uploads via upload will bind the field Plottable::dataset_id, so subsequent remote calls know to skip re-uploading. Always using explicit uploads can make code more predictable for larger codebases.

from graphistry import n, e
g2 = g1.upload()
assert g2._dataset_id is not None, "Uploading sets `dataset_id` for subsequent calls"

g3a = g2.gfql_remote([n()])
g3b = g2.gfql_remote([n(), e(), n()])
assert len(g3a._nodes) >= len(g3b._nodes)

Bind to existing remote data#

If data is already uploaded and your user has access to it, such as from a previous session or shared from another user, you can bind it to a local Plottable for remote access.

import graphistry
from graphistry import  n, e

g1 = graphistry.bind(dataset_id='abc123')
assert g1._nodes is None, "Binding does not fetch data"

connected_graph_g = g1.gfql_remote([n(), e()])
connected_nodes_df = connected_graph_g._nodes
print(connected_nodes_df.shape)

Download less#

You may not need to download all – or any – of your results, which can significantly speed up execution

Return only nodes#

g1.gfql_remote([n(), e(), n()], output_type="nodes")

Return only nodes and specific columns#

cols = [g1._node, 'time']
g2b = g1.gfql_remote(
  [n(), e(), n()],
  output_type="nodes",
  node_col_subset=cols)
assert len(g2b._nodes.columns) == len(cols)

Return only edges#

g2a = g1.gfql_remote([n(), e(), n()], output_type="edges")

Return only edges and specific columns#

cols = [g1._source, g1._destination, 'time']
g2b = g1.gfql_remote([n(), e(), n()],
  output_type="edges",
  edge_col_subset=cols)
assert len(g2b._edges.columns) == len(cols)

Return metadata but not the actual graph#

from graphistry import n, e
shape_df = g1.chain_remote_shape([n(), e(), n()])
assert len(shape_df) == 2
print(shape_df)

Remote Python#

You can also run full GPU Python tasks remotely, such as for more complicated code, or if you want the server itself to perform fetching such as from a database.

Run remote python on the current graph#

import graphistry
from graphistry import n, e

# Fully self-contained so can be transferred
def my_remote_trim_graph_task(g):

    # Trick: You can also put database fetch calls here instead of using 'g'!
    return (g
        .nodes(g._nodes[:10])
        .edges(g._edges[:10])
    )

# Upload any local graph data to the remote server
g2 = g1.upload()

g3 = g2.python_remote_g(my_remote_trim_graph_task)

assert len(g3._nodes) == 10
assert len(g3._edges) == 10

Run Python on an existing graph, return a table#

import graphistry

g = graphistry.bind(dataset_id='ds-abc-123')

def first_n_edges(g):
    return g._edges[:10]

some_edges_df = g.python_remote_table(first_n_edges)

assert len(some_edges_df) == 10

Run Python on an existing graph, return JSON#

import graphistry

g = graphistry.bind(dataset_id='ds-abc-123')

def first_n_edges_shape(g):
    return {'num_edges': len(g._edges[:10])}

obj = g.python_remote_json(first_n_edges_shape)

assert obj['num_edges'] == 10

GFQL Remote Mode

Contents

GFQL Remote Mode#

Basic Usage#

Run chain remotely and fetch results#

Manual CPU, GPU engine selection#

GPU#

CPU#

Explicit uploads#

Bind to existing remote data#

Download less#

Return only nodes#

Return only nodes and specific columns#

Return only edges#

Return only edges and specific columns#

Return metadata but not the actual graph#

Remote Python#

Run remote python on the current graph#

Run Python on an existing graph, return a table#

Run Python on an existing graph, return JSON#