GFQL Remote Mode#

You can run GFQL queries and GPU Python remotely, such as when data is already remote, gets big, or you would like to use a remote GPU

Basic Usage#

Run chain remotely and fetch results#

from graphistry import n, e
g2 = g1.chain_remote([n(), e(), n()])
assert len(g2._nodes) <= len(g1._nodes)

Method chain_remote runs chain remotely and fetched the computed graph

  • chain: Sequence of graph node and edge matchers (ASTObject instances).

  • output_type: Defaulting to “all”, whether to return the nodes (‘nodes’), edges (‘edges’), or both. See chain_remote_shape to return only metadata.

  • node_col_subset: Optionally limit which node attributes are returned to an allowlist.

  • edge_col_subset: Optionally limit which edge attributes are returned to an allowlist.

  • engine: Optional execution engine. Engine is typically not set, defaulting to ‘auto’. Use ‘cudf’ for GPU acceleration and ‘pandas’ for CPU.

  • validate: Defaulting to True, whether to validate the query and data.

Manual CPU, GPU engine selection#

By default, GFQL will decide which engine to use based on workload characteristics like the dataset size. You can override this default by specifying which engine to use.

GPU#

Run on GPU remotely and fetch results

from graphistry import n, e
g2 = g1.chain_remote([n(), e(), n()], engine='cudf')
assert len(g2._nodes) <= len(g1._nodes)

CPU#

Run on CPU remotely and fetch results

from graphistry import n, e
g2 = g1.chain_remote([n(), e(), n()], engine='pandas')

Explicit uploads#

Explicit uploads via upload will bind the field Plottable::dataset_id, so subsequent remote calls know to skip re-uploading. Always using explicit uploads can make code more predictable for larger codebases.

from graphistry import n, e
g2 = g1.upload()
assert g2._dataset_id is not None, "Uploading sets `dataset_id` for subsequent calls"

g3a = g2.chain_remote([n()])
g3b = g2.chain_remote([n(), e(), n()])
assert len(g3a._nodes) >= len(g3b._nodes)

Bind to existing remote data#

If data is already uploaded and your user has access to it, such as from a previous session or shared from another user, you can bind it to a local Plottable for remote access.

import graphistry
from graphistry import  n, e

g1 = graphistry.bind(dataset_id='abc123')
assert g1._nodes is None, "Binding does not fetch data"

connected_graph_g = g1.chain_remote([n(), e()])
connected_nodes_df = connected_graph_g._nodes
print(connected_nodes_df.shape)

Download less#

You may not need to download all – or any – of your results, which can significantly speed up execution

Return only nodes#

g1.chain_remote([n(), e(), n()], output_type="nodes")

Return only nodes and specific columns#

cols = [g1._node, 'time']
g2b = g1.chain_remote(
  [n(), e(), n()],
  output_type="nodes",
  node_col_subset=cols)
assert len(g2b._nodes.columns) == len(cols)

Return only edges#

g2a = g1.chain_remote([n(), e(), n()], output_type="edges")

Return only edges and specific columns#

cols = [g1._source, g1._destination, 'time']
g2b = g1.chain_remote([n(), e(), n()],
  output_type="edges",
  edge_col_subset=cols)
assert len(g2b._edges.columns) == len(cols)

Return metadata but not the actual graph#

from graphistry import n, e
shape_df = g1.chain_remote_shape([n(), e(), n()])
assert len(shape_df) == 2
print(shape_df)

Remote Python#

You can also run full GPU Python tasks remotely, such as for more complicated code, or if you want the server itself to perform fetching such as from a database.

Run remote python on the current graph#

import graphistry
from graphistry import n, e

# Fully self-contained so can be transferred
def my_remote_trim_graph_task(g):

    # Trick: You can also put database fetch calls here instead of using 'g'!
    return (g
        .nodes(g._nodes[:10])
        .edges(g._edges[:10])
    )

# Upload any local graph data to the remote server
g2 = g1.upload()

g3 = g2.chain_remote_python(my_remote_trim_graph_task)

assert len(g3._nodes) == 10
assert len(g3._edges) == 10

Run Python on an existing graph, return a table#

import graphistry

g = graphistry.bind(dataset_id='ds-abc-123')

def first_n_edges(g):
    return g._edges[:10]

some_edges_df = g.remote_python_table(first_n_edges)

assert len(some_edges_df) == 10

Run Python on an existing graph, return JSON#

import graphistry

g = graphistry.bind(dataset_id='ds-abc-123')

def first_n_edges_shape(g):
    return {'num_edges': len(g._edges[:10])}

obj = g.remote_python_json(first_n_edges_shape)

assert obj['num_edges'] == 10