Tutorial: GPU Python remote mode#

Running GPU Python on remote servers helps with scenarios like large workloads benefiting from GPU acceleration depite no local GPU, when the data is already on a remote Graphistry server, and other team and production setting needs.

The following examples walk through several common scenarios:

  • Uploading data and running Python remotely on it

  • Binding to existing remote data and running Python remotely on it

  • Control how much data is returned

  • Control CPU vs GPU execution

See also the sibling tutorial for running pure GFQL queries remotely for typical scenarios. When viable, we recommend sticking to GFQL for safety, clarity, and performance reasons.

Setup#

Note: Ensure the remote Python endpoint is enabled on the server, and the user is flagged for using it

Imports#

[6]:
import pandas as pd
import graphistry
from graphistry import n, e_undirected, e_forward
graphistry.__version__
[6]:
'0+unknown'
[23]:
graphistry.register(api=3, username='FILL_ME_IN', password='FILL_ME_IN', protocol='https', server='hub.graphistry.com')

Data#

[8]:
e_df = pd.DataFrame({
    's': ['a', 'b', 'c'],
    'd': ['b', 'c', 'd'],
    'v': ['x', 'y', 'z'],
    'u': [2, 4, 6]
})

g = graphistry.edges(e_df, 's', 'd')

Upload data#

We will upload the graph.

See the GFQL remote mode tutorial for how to use g2 = graphistry.bind(dataset_id=my_id) for existing remote data.

[9]:
%%time
g2 = g.upload()

{
    'dataset_id': g2._dataset_id,
    'nodes_file_id': g2._nodes_file_id,
    'edges_file_id': g2._edges_file_id
}
CPU times: user 70.1 ms, sys: 1.24 ms, total: 71.3 ms
Wall time: 2.03 s
[9]:
{'dataset_id': '0a56aa27ec1e4112b1458e960dc6f674',
 'nodes_file_id': None,
 'edges_file_id': '271a00f639a748fcaaaf620437bcd0f2'}

Remotely query the data#

Define your remote function as a top-level method def task(g): ..., or pass in a named method (Callable). If the passed-in Callable does not have name task, the Python client will try to rename it to task for you.

The remote Python endpoint can return graphs, dataframes, and JSON objects in a way that plays nicely with Python type checking. Hint which by using the different calling forms:

  • python_remote_g(): For returning a Plottable (graph)

  • python_remote_json(): For returning JSON values

  • python_remote_table(): For returning a pd.DataFrame

By default, the parquet data format is used for safely and efficiently transporting graphs and dataframes return types, and JSON format transport for JSON return types.

Return a graph#

The below shows two aspects:

  • Code provided as a Python source string defining a top-level function def task(g: Plottable) -> Plottable

  • Remote invocation python_remote_g() that implies that task() will return a Plottable (graph )

[10]:
g3 = g2.python_remote_g("""

from graphistry import Plottable

def task(g: Plottable) -> Plottable:
  '''
  Fill in the nodes table based on the edges table and return the combined
  '''

  return g.materialize_nodes()

""")

g3._edges
[10]:
s d v u
0 a b x 2
1 b c y 4
2 c d z 6
[11]:
g3._nodes
[11]:
id
0 a
1 b
2 c
3 d

Run a local Callable remotely#

You can also pass self-contained python functions for code that is easier to read and works with your developer and automation tools

Note that only the source code is transferred to the server; there should be no associated local references

[12]:
def materialize_nodes(g):
    return g.materialize_nodes()

g3b = g2.python_remote_g(materialize_nodes)

g3b._nodes
[12]:
id
0 a
1 b
2 c
3 d

Return a table#

For remotely calling functions that return dataframes, instead call python_remote_table():

[13]:
nodes_df = g2.python_remote_table("""

import pandas as pd
from graphistry import Plottable

def task(g: Plottable) -> pd.DataFrame:
  '''
  Fill in the nodes table based on the edges table and return it
  '''

  return g.materialize_nodes()._nodes

""")

nodes_df
[13]:
id
0 a
1 b
2 c
3 d

And as before, you can also pass in a self-contained Python function:

[14]:
def g_to_materialized_nodes(g):
    return g.materialize_nodes()._nodes

nodes_df = g2.python_remote_table(g_to_materialized_nodes)

nodes_df
[14]:
id
0 a
1 b
2 c
3 d

Return arbitrary JSON#

The remote Python endpoint also supports returning arbitrary JSON-format data via python_remote_json():

[15]:
shape = g2.python_remote_json("""

from typing import Dict
from graphistry import Plottable

def task(g: Plottable) -> Dict[str, int]:
  '''
  Fill in the nodes table based on the edges table and return it
  '''

  return {'num_edges': len(g._edges), 'num_nodes': len(g.materialize_nodes()._nodes)}
""")

shape['num_nodes'], shape['num_edges']
[15]:
(4, 3)

And by passing in a self-contained Python function:

[16]:
def g_to_shape(g):
  """
  Fill in the nodes table based on the edges table and return it
  """

  return {'num_edges': len(g._edges), 'num_nodes': len(g.materialize_nodes()._nodes)}


g2.python_remote_json(g_to_shape)
[16]:
{'num_edges': 3, 'num_nodes': 4}

Enforce GPU mode#

Override engine="cudf" for GPU mode and engine="pandas" for CPU mode:

[20]:
def report_types(g):
    return {
        'edges': str(type(g._edges)),
        'nodes': str(type(g.materialize_nodes()._nodes))
    }

g2.python_remote_json(report_types)
[20]:
{'edges': "<class 'cudf.core.dataframe.DataFrame'>",
 'nodes': "<class 'cudf.core.dataframe.DataFrame'>"}
[21]:
def report_types(g):
    return {
        'edges': str(type(g._edges)),
        'nodes': str(type(g.materialize_nodes()._nodes))
    }

g2.python_remote_json(report_types, engine='pandas')
[21]:
{'edges': "<class 'pandas.core.frame.DataFrame'>",
 'nodes': "<class 'pandas.core.frame.DataFrame'>"}
[22]:
def report_types(g):
    return {
        'edges': str(type(g._edges)),
        'nodes': str(type(g.materialize_nodes()._nodes))
    }

g2.python_remote_json(report_types, engine='cudf')
[22]:
{'edges': "<class 'cudf.core.dataframe.DataFrame'>",
 'nodes': "<class 'cudf.core.dataframe.DataFrame'>"}
[ ]:

[ ]: