Cypher to GFQL Python & Wire Protocol Mapping#
Translate existing Cypher workloads to GPU-accelerated GFQL with minimal code changes.
Introduction#
This specification shows how to translate Cypher queries to both GFQL Python code and Wire Protocol JSON, enabling migration from Cypher-based systems, LLM pipelines (text → Cypher → GFQL), language-agnostic API integration, and secure query generation without code execution.
What Maps 1-to-1#
When translating from Cypher, you’ll encounter three scenarios:
Direct Translation: Most pattern matching maps cleanly to pure GFQL.
Row-Pipeline Translation:
RETURN/WITH/ORDER BY/SKIP/LIMIT/DISTINCT/GROUP BYmap to GFQL row operators.GFQL Advantages: Some capabilities go beyond what Cypher offers.
Direct Translations#
Graph patterns:
(a)-[r]->(b)→ chain operationsProperty filters: WHERE clauses embed into operations
Path traversals: Variable-length paths use
hopsparameterPattern composition: Multiple patterns become sequential operations
Same-path constraints:
WHEREacross steps →g.gfql([...], where=[...])
Row-Pipeline Translation (MATCH ... RETURN)#
Row source selection:
rows(table=..., source=...)Row filtering:
where_rows(filter_dict=..., expr=...)Projection:
return_(...)/with_(...)/select(...)Sorting/paging:
order_by(...),skip(...),limit(...)Deduplication:
distinct()Aggregation:
group_by(keys=[...], aggregations=[...])
These row-pipeline operators are call steps inside the same chain list passed to
g.gfql([...]) (or to Chain([...])), not top-level g.gfql() keyword args:
from graphistry import n, e_forward
from graphistry.compute import rows, where_rows, return_, order_by, limit
g.gfql([
n({"type": "Person"}, name="p"),
e_forward({"type": "FOLLOWS"}),
n({"type": "Person"}, name="q"),
rows(table="nodes", source="q"),
where_rows(expr="score >= 50"),
return_([("id", "id"), ("name", "name"), ("score", "score")]),
order_by([("score", "desc")]),
limit(25),
])
from graphistry.compute.chain import Chain
query = Chain([
n({"type": "Person"}, name="p"),
e_forward({"type": "FOLLOWS"}),
n({"type": "Person"}, name="q"),
rows(table="nodes", source="q"),
where_rows(expr="score >= 50"),
return_(["id", "name", "score"]),
])
g.gfql(query)
Projection sequencing and placement rules:
Multiple
return_(...)/with_(...)/select(...)steps are valid and execute in list order; each step projects from the current row table produced by previous steps.Interior mixing is invalid: do not place call steps between traversal steps (
n()/e_*()), e.g.[n(...), return_(...), e_forward(...)]. Keep call steps in boundary prefix/suffix segments around traversal blocks.
When You Still Need DataFrames#
Unsupported Cypher clauses (for example
OPTIONAL MATCH)Arbitrary joins across disconnected intermediate result sets
Custom functions outside the current row-expression subset
GFQL-Only Super-Powers#
Edge properties: Query edges as first-class entities
Dataframe-native: Zero-cost transitions between graph and tabular operations
GPU acceleration: Parallel execution on NVIDIA hardware
Heterogeneous graphs: No schema constraints on types or properties
Integrated visualization: Layouts like
group_in_a_box_layoutfor community visualizationAlgorithm chaining: Combine community detection with layout algorithms
Quick Example#
Cypher:
MATCH (p:Person)-[r:FOLLOWS]->(q:Person)
WHERE p.age > 30
Python:
g.gfql([
n({"type": "Person", "age": gt(30)}, name="p"),
e_forward({"type": "FOLLOWS"}, name="r"),
n({"type": "Person"}, name="q")
])
Wire Protocol:
{"type": "Chain", "chain": [
{"type": "Node", "filter_dict": {"type": "Person", "age": {"type": "GT", "val": 30}}, "name": "p"},
{"type": "Edge", "direction": "forward", "edge_match": {"type": "FOLLOWS"}, "name": "r"},
{"type": "Node", "filter_dict": {"type": "Person"}, "name": "q"}
]}
Translation Tables#
Node Patterns#
Cypher |
Python |
Wire Protocol |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Row-Pipeline Translation Tables#
Use these as chain steps inside g.gfql([...]) / Chain([...]).
Cypher |
Python chain step |
Wire Protocol call (compact) |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scope rows to alias |
|
|
Same-Path WHERE Predicates#
Use g.gfql([...], where=[...]) when the predicate compares multiple steps.
Cypher:
MATCH (n1)-[e1]->(n2)-[e2]->(n3)
WHERE n1.a > n2.b AND e1.x = e2.y
Python:
from graphistry import n, e_forward, col, compare
g.gfql(
[n(name="n1"), e_forward(name="e1"), n(name="n2"), e_forward(name="e2"), n(name="n3")],
where=[
compare(col("n1", "a"), ">", col("n2", "b")),
compare(col("e1", "x"), "==", col("e2", "y")),
],
)
Wire Protocol:
{
"type": "Chain",
"chain": [
{"type": "Node", "name": "n1"},
{"type": "Edge", "direction": "forward", "name": "e1"},
{"type": "Node", "name": "n2"},
{"type": "Edge", "direction": "forward", "name": "e2"},
{"type": "Node", "name": "n3"}
],
"where": [
{"gt": {"left": "n1.a", "right": "n2.b"}},
{"eq": {"left": "e1.x", "right": "e2.y"}}
]
}
MATCH ... RETURN Row Pipeline#
Use GFQL call steps after the pattern match to encode Cypher RETURN behavior.
Cypher:
MATCH (p:Person)-[:FOLLOWS]->(q:Person)
WHERE q.score >= 50
RETURN q.id AS id, q.name AS name, q.score AS score
ORDER BY score DESC, name ASC
LIMIT 25
Python:
from graphistry import n, e_forward, gt
from graphistry.compute import rows, where_rows, return_, order_by, limit
g.gfql([
n({"type": "Person"}),
e_forward({"type": "FOLLOWS"}),
n({"type": "Person", "score": gt(0)}, name="q"),
rows(table="nodes", source="q"),
where_rows(expr="score >= 50"),
return_(["id", "name", "score"]),
order_by([("score", "desc"), ("name", "asc")]),
limit(25),
])
Wire Protocol:
{
"type": "Chain",
"chain": [
{"type": "Node", "filter_dict": {"type": "Person"}},
{"type": "Edge", "direction": "forward", "edge_match": {"type": "FOLLOWS"}},
{"type": "Node", "filter_dict": {"type": "Person", "score": {"type": "GT", "val": 0}}, "name": "q"},
{"type": "Call", "function": "rows", "params": {"table": "nodes", "source": "q"}},
{"type": "Call", "function": "where_rows", "params": {"expr": "score >= 50"}},
{"type": "Call", "function": "select", "params": {"items": [["id", "id"], ["name", "name"], ["score", "score"]]}},
{"type": "Call", "function": "order_by", "params": {"keys": [["score", "desc"], ["name", "asc"]]}},
{"type": "Call", "function": "limit", "params": {"value": 25}}
]
}
Edge Patterns#
Cypher |
Python |
Wire Protocol (compact) |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Predicates#
Cypher |
Python |
Wire Protocol |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Complete Examples#
Friend of Friend#
Cypher:
MATCH (u:User {name: 'Alice'})-[:FRIEND*2]->(fof:User)
WHERE fof.active = true
Python:
g.gfql([
n({"type": "User", "name": "Alice"}),
e_forward({"type": "FRIEND"}, min_hops=2, max_hops=2),
n({"type": "User", "active": True}, name="fof")
])
Wire Protocol:
{"type": "Chain", "chain": [
{"type": "Node", "filter_dict": {"type": "User", "name": "Alice"}},
{"type": "Edge", "direction": "forward", "edge_match": {"type": "FRIEND"}, "min_hops": 2, "max_hops": 2},
{"type": "Node", "filter_dict": {"type": "User", "active": true}, "name": "fof"}
]}
Same-Path Constraint#
Cypher:
MATCH (a:Account)-[:TRANSFER]->(c:User)
WHERE a.owner_id = c.owner_id
Python:
from graphistry import n, e_forward, col, compare
g.gfql(
[n({"type": "Account"}, name="a"), e_forward(), n({"type": "User"}, name="c")],
where=[compare(col("a", "owner_id"), "==", col("c", "owner_id"))],
)
Wire Protocol:
{"type": "Chain", "chain": [
{"type": "Node", "filter_dict": {"type": "Account"}, "name": "a"},
{"type": "Edge", "direction": "forward"},
{"type": "Node", "filter_dict": {"type": "User"}, "name": "c"}
], "where": [{"eq": {"left": "a.owner_id", "right": "c.owner_id"}}]}
Fraud Detection#
Cypher:
MATCH (a:Account)-[t:TRANSFER]->(b:Account)
WHERE t.amount > 10000 AND t.date > date('2024-01-01')
Python:
g.gfql([
n({"type": "Account"}),
e_forward({
"type": "TRANSFER",
"amount": gt(10000),
"date": gt(date(2024,1,1))
}, name="t"),
n({"type": "Account"})
])
Wire Protocol:
{"type": "Chain", "chain": [
{"type": "Node", "filter_dict": {"type": "Account"}},
{"type": "Edge", "direction": "forward", "edge_match": {
"type": "TRANSFER",
"amount": {"type": "GT", "val": 10000},
"date": {"type": "GT", "val": {"type": "date", "value": "2024-01-01"}}
}, "name": "t"},
{"type": "Node", "filter_dict": {"type": "Account"}}
]}
Complex Aggregation Example#
Cypher:
MATCH (u:User)-[t:TRANSACTION]->(m:Merchant)
WHERE t.date > date('2024-01-01')
RETURN t.category AS category, count(*) as cnt, sum(t.amount) as total
ORDER BY total DESC
LIMIT 10
Python:
from datetime import date
from graphistry import n, e_forward, gt
from graphistry.compute import rows, where_rows, group_by, return_, order_by, limit
analysis = g.gfql([
n({"type": "User"}),
e_forward({"type": "TRANSACTION", "date": gt(date(2024, 1, 1))}, name="t"),
rows(table="edges", source="t"),
where_rows(expr="amount IS NOT NULL"),
group_by(
keys=["category"],
aggregations=[
("cnt", "count", "amount"),
("total", "sum", "amount"),
],
),
return_(["category", "cnt", "total"]),
order_by([("total", "desc")]),
limit(10),
])
Note: If the aggregation/function you need is outside the supported
group_by subset, fall back to dataframe post-processing.
Row-Pipeline Operations Mapping#
Cypher Feature |
GFQL Python Row Operation |
Notes |
|---|---|---|
|
|
String item shorthand maps |
|
|
Same projection semantics as |
|
|
Deduplicate active row table |
|
|
Multi-key sorting supported |
|
|
Row offset |
|
|
Row cap |
|
|
Scalar expression subset |
|
|
Grouped count |
|
|
Grouped sum |
|
|
Nulls excluded from collection |
Named patterns |
|
Scope row table to a named match alias |
Key Differences#
Feature |
Python |
Wire Protocol |
|---|---|---|
Temporal values |
|
|
Direct equality |
|
|
Comparisons |
|
|
Collections |
|
|
Not Supported#
CREATE,DELETE,SET: GFQL is read-only.OPTIONAL MATCH: no direct equivalent yet (requires outer-join semantics).Full Cypher expression/function surface in row expressions: current vectorized subset only.
Multiple disconnected
MATCHpatterns in one query: use separate GFQL chains and explicit dataframe joins.
Practical fallback: keep pattern traversal and row-pipeline stages in GFQL, then apply final custom dataframe logic in pandas/cuDF when needed.
Best Practices#
Direct Translation First: Try pure GFQL before adding DataFrame operations
Use Named Patterns: Label important results with
name=for easy accessFilter Early: Apply selective node filters before traversing edges
Type Consistency: Ensure wire protocol types match expected column types
Validate JSON: Test wire protocol against schema before sending
LLM Integration Guide#
When building translators:
Given Cypher: {cypher_query}
Generate both:
1. Python: Human-readable GFQL code
2. Wire Protocol: JSON for API calls
Rules:
- (n:Label) → Python: n({"type": "Label"}) → JSON: {"type": "Node", "filter_dict": {"type": "Label"}}
- Cross-step WHERE → `g.gfql([...], where=[compare(col(...), op, col(...))])`
- RETURN/WITH/ORDER BY/SKIP/LIMIT/DISTINCT/GROUP BY → row-pipeline call steps (`rows`, `where_rows`, `return_`, `order_by`, `skip`, `limit`, `distinct`, `group_by`)
- Unsupported expressions/functions → explicitly mark as unsupported instead of silently rewriting
See Also#
GFQL Wire Protocol Specification - Full wire protocol specification
GFQL Language Specification - Language specification
GFQL Python Embedding - Python implementation details