GFQL Quick Reference#

GFQL is the first fully vectorized dataframe-native graph query language with an open-source GPU runtime. This quick reference page provides short examples of various parameters and usage patterns.

If you are new to Cypher: Cypher is a graph query language popularized by Neo4j and related tools. It uses ASCII-art graph patterns such as (n1)-[e1]->(n2) to describe node-edge-node traversals, so GFQL docs use that notation as a familiar shorthand when discussing Cypher syntax through g.gfql("MATCH ...").

Basic Usage#

Chaining Operations

g.gfql([...], engine=EngineAbstract.AUTO)

gfql sequences multiple matchers for more complex patterns of paths and subgraphs

query: Sequence of graph node/edge matchers and optional row-pipeline call steps (for example, rows(), where_rows(), return_(), order_by(), limit()), or an equivalent GFQL chain object.
engine: Optional execution engine. Engine is typically not set, defaulting to ‘auto’. Use ‘polars’ for a CPU columnar speedup (up to ~38x over pandas, no GPU), ‘cudf’ or ‘polars-gpu’ for NVIDIA GPU acceleration, and ‘pandas’ for the default CPU path. See Choosing an Engine.

Native GFQL chains are typed Python inputs. Pass the list, dict envelope, or Chain object itself; strings passed to g.gfql(...) are interpreted as query text for the selected string language.

Use this page as a quick MATCH/chain reference. For row-pipeline RETURN semantics, see GFQL RETURN (Row Pipelines).

Choose The Right Entrypoint#

Use g.gfql([...]) for native GFQL operators and g.gfql("MATCH ...") for Cypher syntax on the current graph.
If a tool receives a serialized native chain such as "[{'type': ...}]", decode it into a real Python list/dict/Chain before calling g.gfql(...). g.gfql(str) is reserved for Cypher query text by default.
Use g.gfql_remote([...]) for remote GFQL when the dataset size or hardware profile calls for remote execution, including remote GPU execution. See GFQL Remote Mode.

Warning

graphistry.cypher(”…”) and g.cypher(”…”) are a separate remote database Cypher path (for example, Neo4j/Neptune integrations), not the GFQL execution surface summarized on this page.

Graph State Vs Row State#

Graph state keeps a traversable graph in _nodes and _edges. Matchers, graph-preserving call(…) transforms, let() / ref() graph DAG stages, local Cypher CALL graphistry.*.write() queries, and local Cypher GRAPH { MATCH … } constructors stay in graph state. (GRAPH { } is a GFQL extension — see Cypher Syntax In GFQL for details.)
Row state stores tabular results in _nodes and uses an empty placeholder _edges frame. Row-pipeline steps such as rows(), with_(), select(), return_(), group_by(), and row-returning local Cypher CALL … YIELD … RETURN … queries move into row state.
A bare local Cypher procedure call without .write() also moves into row state. For example, CALL graphistry.degree() projects its default output columns into _nodes and clears _edges.
If you want to enrich a graph and keep matching locally, use a graph-preserving call() / let() pattern or a bare local Cypher CALL graphistry.*.write(). The local Cypher compiler currently supports graphistry.degree.write() plus graphistry.igraph.<alg>.write() and graphistry.cugraph.<alg>.write() for algorithms exposed through compute_igraph() / compute_cugraph(), along with a curated NetworkX subset including graphistry.nx.pagerank.write(), graphistry.nx.betweenness_centrality.write(), graphistry.nx.degree_centrality.write(), graphistry.nx.closeness_centrality.write(), graphistry.nx.eigenvector_centrality.write(), graphistry.nx.katz_centrality.write(), graphistry.nx.connected_components.write(), graphistry.nx.strongly_connected_components.write(), graphistry.nx.core_number.write(), graphistry.nx.hits.write(), graphistry.nx.edge_betweenness_centrality.write(), and graphistry.nx.k_core.write().

Cypher Strings Through `g.gfql()`#

Use g.gfql("MATCH ...") when you want Cypher syntax and declarative graph semantics on a bound graph instead of writing the equivalent GFQL chain by hand:

Do not pass stringified Python or JSON native-chain literals to g.gfql(...). Materialize those serialized values before calling GFQL, or emit Cypher text intentionally.

result = g.gfql(
    "MATCH (n1:Person)-[e1:FOLLOWS]->(n2:Person) "
    "RETURN n1.name AS source_name, n2.name AS target_name "
    "ORDER BY source_name, target_name "
    "LIMIT $top_n",
    params={"top_n": 5},
)

result._nodes

For the dedicated guide, helper APIs, and direct-vs-translation guidance, see Cypher Syntax In GFQL.

Node Matchers#

n(filter_dict=None, name=None, query=None)

n matches nodes based on their attributes.

Filter nodes based on attributes.
Parameters:
- filter_dict: {attribute: value} or {attribute: condition_function}
- name: Optional label; adds a boolean column in the result.
- query: Custom query string (e.g., “age > 30 and country == ‘USA’”).

Examples:

Match nodes where type is ‘person’:
```
n({"type": "person"})
```
Match nodes with age greater than 30:
```
n({"age": lambda x: x > 30})
```

Use a custom query string:

n(query="age > 30 and country == 'USA'")

Edge Matchers#

e_forward(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)
e_reverse(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)
e_undirected(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)

# alias for e_undirected
e(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)

e matches edges based on their attributes (undirected). May also include matching on edge’s source and destination nodes.

Traverse edges in the forward direction.
Parameters:
- edge_match: {attribute: value} or {attribute: condition_function}
- edge_query: Custom query string for edge attributes.
- hops: int, number of hops to traverse.
- min_hops/max_hops: Inclusive traversal bounds (min defaults to 1 unless max_hops is 0; max defaults to hops).
- output_min_hops/output_max_hops: Optional post-filter slice; defaults keep all traversed hops up to max_hops.
- label_node_hops/label_edge_hops: Optional column names for hop numbers; label_seeds=True adds hop 0 for seeds.
- to_fixed_point: bool, continue traversal until no more matches.
- source_node_match: Filter for source nodes.
- destination_node_match: Filter for destination nodes.
- source_node_query: Custom query string for source nodes.
- destination_node_query: Custom query string for destination nodes.
- name: Optional label.

Examples:

Traverse up to 2 hops forward on edges where status is ‘active’:
```
e_forward({"status": "active"}, hops=2)
```

Traverse 2..4 hops but show only hops 3..4 with labels:

e_forward(
    {"status": "active"},
    min_hops=2,
    max_hops=4,
    output_min_hops=3,
    label_edge_hops="edge_hop"
)

Use custom edge query strings:

e_forward(edge_query="weight > 5 and type == 'connects'")

Filter source and destination nodes with match dictionaries:

e_forward(
    source_node_match={"status": "active"},
    destination_node_match={"age": lambda x: x < 30}
)

Filter source and destination nodes with queries:

e_forward(
    source_node_query="status == 'active'",
    destination_node_query="age < 30"
)

Label matched edges:
```
e_forward(name="active_edges")
```

e_reverse, e_forward, and e are aliases.

e_reverse: Same as e_forward, but traverses in reverse.
e: Traverses edges regardless of direction.

Predicates#

graphistry.compute.predicates.ASTPredicate.ASTPredicate

Matches using a predicate on entity attributes.

See GFQL Operator Reference for more information.

Example:

Match nodes where category is ‘A’, ‘B’, or ‘C’:

from graphistry import n, is_in

n({"category": is_in(["A", "B", "C"])})

Where (Same-Path Constraints)#

Use where to relate attributes across named steps in a chain.

from graphistry import n, e_forward, col, compare

g.gfql(
    [
        n({"type": "account"}, name="a"),
        e_forward(name="e"),
        n({"type": "user"}, name="c"),
    ],
    where=[
        compare(col("a", "owner_id"), "==", col("c", "owner_id")),
        compare(col("e", "org_id"), "==", col("a", "org_id")),
    ],
)

compare() can relate node and edge columns when the column types align. Supported compare(…, op, …) operators: ==, !=, <, <=, >, >=. WHERE works with g.gfql([…], where=[…]); Chain(…, where=[…]) is the equivalent explicit form. Multiple WHERE comparisons are ANDed.

Row Pipelines (MATCH … RETURN style)#

Use row-pipeline operators to move from pattern matching to tabular Cypher-like RETURN processing.

from graphistry import n, e_forward, gt
from graphistry.compute import rows, where_rows, return_, order_by, limit

top_people = g.gfql([
    n({"type": "Person"}),
    e_forward({"type": "FOLLOWS"}),
    n({"type": "Person", "score": gt(10)}, name="p"),
    rows(table="nodes", source="p"),
    where_rows(expr="score >= 50 AND name STARTS WITH 'A'"),
    return_(["id", "name", ("score_bucket", "score / 10")]),
    order_by([("score_bucket", "desc"), ("name", "asc")]),
    limit(25),
])

top_people._nodes

Notes:

rows(table=”nodes” or table=”edges”, source=”alias”) picks the active row table. source must be an alias introduced earlier via name=”…” on a matcher. In the example above, source=”p” refers to n(…, name=”p”), so only rows matched by alias p are used.
Edge example: e_forward(…, name=”e”) followed by rows(table=”edges”, source=”e”) scopes rows to edges matched as e.
If source is omitted (for example, rows(table=”nodes”)), the full active nodes/edges table is used.
return_([“col”]) is shorthand for return_([(“col”, “col”)]).

with_(…) and select(…) share projection semantics with return_(…):

from graphistry.compute import rows, with_, select, return_

# Equivalent projections
rows(table="nodes", source="p")
return_(["id", ("score2", "score * 2")])
with_(["id", ("score2", "score * 2")])
select([("id", "id"), ("score2", "score * 2")])

return_([“id”]), with_([“id”]), and select([(“id”, “id”)]) all project the same id column.

where_rows() evaluates row expressions and filter dictionaries in a vectorized dataframe execution path (pandas/cuDF engines).
In where_rows(expr=”…”), supported comparison operators are =, !=, <>, <, <=, >, >=.

Combined Examples#

MATCH with same-path WHERE constraints:

from graphistry import n, e_forward, col, compare

g.gfql(
    [
        n({"type": "account"}, name="a"),
        e_forward({"status": "active"}, name="e"),
        n({"type": "user"}, name="u"),
    ],
    where=[compare(col("a", "org_id"), "==", col("u", "org_id"))],
)

MATCH then RETURN (row pipeline):

from graphistry import n, e_forward, gt
from graphistry.compute import rows, where_rows, return_, order_by, limit

g.gfql([
    n({"type": "Person"}),
    e_forward({"type": "FOLLOWS"}),
    n({"type": "Person", "score": gt(0)}, name="p"),
    rows(table="nodes", source="p"),
    where_rows(expr="score >= 50"),
    return_(["id", "name", "score"]),
    order_by([("score", "desc"), ("name", "asc")]),
    limit(10),
])

Find people connected to transactions via active relationships:

g.gfql([
    n({"type": "person"}),
    e_forward({"status": "active"}),
    n({"type": "transaction"})
])

Label nodes and edges during traversal:

g.gfql([
    n({"id": "start_node"}, name="start"),
    e_forward(name="edge1"),
    n({"level": 2}, name="middle"),
    e_forward(name="edge2"),
    n({"type": "end_type"}, name="end")
])

Traverse until no more matches (fixed point):

g.gfql([
    n({"status": "infected"}),
    e_forward(to_fixed_point=True),
    n(name="reachable")
])

Filter by multiple conditions:

g.gfql([
    n({"type": is_in(["server", "database"])}),
    e_undirected({"protocol": "TCP"}, hops=3),
    n(query="risk_level >= 8")
])

Use custom queries in matchers:

g.gfql([
    n(query="age > 30 and country == 'USA'"),
    e_forward(edge_query="weight > 5"),
    n(query="status == 'active'")
])

Engine Selection (CPU and GPU)#

The same query runs on four interchangeable engines with identical results. Unsupported engine/query combinations are rejected before execution during validation, compilation, or planning rather than silently falling back. Pick one with engine=. See Choosing an Engine for the full decision matrix.

CPU columnar speedup (no GPU): 'polars' — up to ~38x over pandas on real graphs.

g.gfql([...], engine='polars')   # keep your existing pandas frames; just the keyword changes

NVIDIA GPU: 'cudf' (eager) or 'polars-gpu' (fused plan on GPU).
```
g.gfql([...], engine='polars-gpu')
```

Example with cuDF DataFrames:

import cudf

e_gdf = cudf.from_pandas(edge_df)
n_gdf = cudf.from_pandas(node_df)

g = graphistry.nodes(n_gdf, 'node_id').edges(e_gdf, 'src', 'dst')
g.gfql([...], engine='cudf')

Remote Mode#

Query existing remote data

g = graphistry.bind(dataset_id='ds-abc-123')

nodes_df = g.gfql_remote([n()])._nodes

Upload graph and run GFQL

g2 = g1.upload()

g3 = g2.gfql_remote([n(), e(), n()])

Enforce CPU and GPU mode on remote GFQL

g3a = g2.gfql_remote([n(), e(), n()], engine='pandas')
g3b = g2.gfql_remote([n(), e(), n()], engine='cudf')

Return only nodes and certain columns

cols = ['id', 'name']
g2b = g1.gfql_remote([n(), e(), n()], output_type="edges", edge_col_subset=cols)

Return only edges and certain columns

cols = ['src', 'dst']
g2b = g1.gfql_remote([n(), e(), n()], output_type="edges", edge_col_subset=cols)

Return only shape metadata

shape_df = g1.chain_remote_shape([n(), e(), n()])

Run remote Python and get back a graph

def my_remote_trim_graph_task(g):
    return (g
        .nodes(g._nodes[:10])
        .edges(g._edges[:10])
    )

g2 = g1.upload()
g3 = g2.python_remote_g(my_remote_trim_graph_task)

Run remote Python and get back a table

def first_n_edges(g):
    return g._edges[:10]

some_edges_df = g.python_remote_table(first_n_edges)

Run remote Python and get back JSON

def first_n_edges(g):
    return g._edges[:10].to_json()

some_edges_json = g.python_remote_json(first_n_edges)

Run remote Python and ensure runs on CPU or GPU

g3a = g2.python_remote_g(my_remote_trim_graph_task, engine='pandas')
g3b = g2.python_remote_g(my_remote_trim_graph_task, engine='cudf')

Run remote Python, passing as a string

g2 = g1.upload()

# ensure method is called "task" and takes a single argument "g"
g3 = g2.python_remote_g("""
    def task(g):
        return (g
            .nodes(g._nodes[:10])
            .edges(g._edges[:10])
        )
""")

Let Bindings and DAG Patterns#

Use Let bindings to create directed acyclic graph (DAG) patterns with named operations. Lists are treated as implicit Chains.

Basic Let with named bindings:

from graphistry import let, ref, n, e_forward, gt

result = g.gfql(let({
    'suspects': n({'risk_score': gt(80)}),
    'connections': [
        n({'risk_score': gt(80)}),
        e_forward({'type': 'transaction'}),
        n()
    ]
}))

# Access results by name
suspects = result._nodes[result._nodes['suspects']]
connections = result._edges[result._edges['connections']]

Complex DAG with multiple references:

from graphistry import let, ref, n, e_forward, gt

result = g.gfql(let({
    'high_value': n({'balance': gt(100000)}),
    'large_transfers': [
        n({'balance': gt(100000)}),
        e_forward({'type': 'transfer', 'amount': gt(10000)}),
        n()
    ],
    'suspicious': ref('large_transfers', [
        n({'created_recent': True, 'verified': False})
    ])
}))

Call Operations#

Run graph algorithms like PageRank, community detection, and layouts directly within your GFQL queries:

Compute PageRank:

from graphistry import call, let, ref, n, e

# Use let() to compose filter + enrichment
result = g.gfql(let({
    'persons': [n({'type': 'person'}), e(), n()],
    'ranked': ref('persons', [call('compute_cugraph', {'alg': 'pagerank', 'damping': 0.85})])
}))

# Results have pagerank column
top_nodes = result._nodes.sort_values('pagerank', ascending=False).head(10)

Enrich a graph, then keep matching:

g_enriched = g.gfql("CALL graphistry.degree.write()")
assert not g_enriched._edges.empty
top_degree = g_enriched.gfql(
    "MATCH (n) WHERE n.degree >= 2 RETURN n.id AS id, n.degree AS degree ORDER BY degree DESC LIMIT 10"
)

Local note: a bare g.gfql(“CALL graphistry.*.write()”) stays in graph state and can feed later MATCH queries. g.gfql(“CALL … YIELD … RETURN …”) still targets row-returning procedure flows.

Return procedure rows instead of an enriched graph:

degree_rows = g.gfql("CALL graphistry.degree()")
assert degree_rows._edges.empty

# Row state: _nodes has nodeId/degree columns and _edges is empty
degree_rows._nodes

Community detection with Louvain:

from graphistry import call, let, ref, n, e_forward

# Use let() to compose traversal + community detection
result = g.gfql(let({
    'reachable': [n({'active': True}), e_forward(to_fixed_point=True), n()],
    'communities': ref('reachable', [call('compute_cugraph', {'alg': 'louvain'})])
}))

# Results have community column
communities = result._nodes.groupby('community').size()

Filter and compute within Let:

from graphistry import call, let, ref, n, e, gt

# Split mixed chain into separate bindings
result = g.gfql(let({
    'suspects': [n({'flagged': True}), e(), n()],
    'ranked': ref('suspects', [
        call('compute_cugraph', {'alg': 'pagerank'})
    ]),
    'influencers': ref('ranked', [
        n({'pagerank': gt(0.01)})
    ])
}))

Apply layout algorithms:

from graphistry import call, let, ref, n, e_forward, is_in

# Use let() to compose traversal + layout
result = g.gfql(let({
    'entities': [n({'type': is_in(['person', 'company'])}), e_forward(), n()],
    'positioned': ref('entities', [call('fa2_layout', {'iterations': 100})])
}))

# Results have x, y coordinates for visualization
result.plot()

Tip: For subset-based coloring after GFQL, use result.collections(...) and see Layout Settings & Visualization Embedding.

Remote Graph References#

Reference graphs on remote servers for distributed computing:

Basic remote reference:

from graphistry.compute import remote

result = g.gfql([
    remote(dataset_id='fraud-network-2024'),
    n({'risk_score': gt(90)}),
    e_forward()
])

Combine remote and local data in Let:

from graphistry.compute import remote

result = g.gfql(let({
    'remote_data': remote(dataset_id='historical-2023'),
    'high_risk': ref('remote_data', [
        n({'risk_score': gt(95)})
    ]),
    'connections': ref('remote_data', [
        n({'risk_score': gt(95)}),
        e_forward({'type': 'transaction'}),
        n()
    ])
}))

Advanced Usage#

Traversal with source and destination node filters and queries:

e_forward(
    edge_query="type == 'follows' and weight > 2",
    source_node_match={"status": "active"},
    destination_node_query="age < 30",
    hops=2,
    name="social_edges"
)

Node matcher with all parameters:

n(
    filter_dict={"department": "sales"},
    query="age > 25 and tenure > 2",
    name="experienced_sales"
)

Edge matcher with all parameters:

e_reverse(
    edge_match={"transaction_type": "refund"},
    edge_query="amount > 100",
    source_node_match={"status": "inactive"},
    destination_node_match={"region": "EMEA"},
    name="large_refunds"
)

Parameter Summary#

Common Parameters:
- filter_dict: Attribute filters (e.g., {“status”: “active”})
- query: Custom query string (e.g., “age > 30”)
- hops: Max hops to traverse (shorthand for max_hops, default 1)
- to_fixed_point: Continue traversal until no more matches (bool, default False)
- name: Label for matchers (str)
- source_node_match, destination_node_match: Filters for connected nodes
- source_node_query, destination_node_query: Queries for connected nodes
- edge_match: Filters for edges
- edge_query: Query for edges
- engine: Execution engine (EngineAbstract.AUTO, ‘cudf’, etc.)

Traversal Directions#

Forward Traversal: e_forward(…)
Reverse Traversal: e_reverse(…)
Undirected Traversal: e_undirected(…)

Tips and Best Practices#

Limit hops for performance: Specify hops to control traversal depth.
Use naming for analysis: Apply name to label and filter results.
Combine filters: Use filter_dict and query for precise matching.
Leverage GPU acceleration: Use engine=’cudf’ for large datasets.
Avoid infinite loops: Be cautious with to_fixed_point=True in cyclic graphs.

Examples at a Glance#

Find all paths between two nodes:

g.gfql([
    n({g._node: "Alice"}),
    e_undirected(hops=3),
    n({g._node: "Bob"})
])

Match nodes with IDs in a range:
```
n(query="100 <= id <= 200")
```

Traverse edges with specific labels:

e_forward({"label": is_in(["knows", "likes"])})

Identify subgraphs based on attributes:

g.gfql([
    n({"community": "A"}),
    e_undirected(hops=2),
    n({"community": "B"}, name="bridge_nodes")
])

Custom edge and node queries:

g.gfql([
    n(query="age >= 18"),
    e_forward(edge_query="interaction == 'message'"),
    n(query="location == 'NYC'")
])

GFQL Quick Reference

Contents

GFQL Quick Reference#

Basic Usage#

Choose The Right Entrypoint#

Graph State Vs Row State#

Cypher Strings Through g.gfql()#

Node Matchers#

Edge Matchers#

Predicates#

Where (Same-Path Constraints)#

Row Pipelines (MATCH … RETURN style)#

Combined Examples#

Engine Selection (CPU and GPU)#

Remote Mode#

Let Bindings and DAG Patterns#

Call Operations#

Remote Graph References#

Advanced Usage#

Parameter Summary#

Traversal Directions#

Tips and Best Practices#

Examples at a Glance#

Cypher Strings Through `g.gfql()`#