Translate Between SQL, Pandas, Cypher, and GFQL#

This guide provides a comparison between SQL, Pandas, Cypher, and GFQL, helping you translate familiar queries into GFQL.

Introduction#

GFQL (GraphFrame Query Language) is designed to be intuitive for users familiar with SQL, Cypher, or dataframe like Pandas and Spark. By comparing equivalent queries across these languages, you can quickly grasp GFQL’s syntax, benefits, and start utilizing its powerful graph querying capabilities within your workflows.

GFQL operates on graph DataFrames - graphs represented as node and edge DataFrames. This DataFrame-native approach enables seamless integration with the PyData ecosystem and natural vectorization for both CPU and GPU processing.

GFQL accepts both native chain syntax (g.gfql([n(), e(), n()])) and Cypher strings (g.gfql("MATCH ...")). Most examples below show both forms. GFQL also extends Cypher with GRAPH { } constructors for graph-state results and composable multi-stage pipelines — see Cypher Syntax In GFQL for the full Cypher-in-GFQL guide.

Who Is This Guide For?#

Data Scientists: Familiar with Pandas or SQL, exploring graph relationships.
Engineers: Integrating graph queries into applications.
DBAs: Understanding how GFQL complements SQL for graph data.
Graph Specialists: Experienced with Cypher, integrating graph queries into Python.

Common Graph and Query Tasks#

We’ll cover a range of common graph and query tasks:

Finding Nodes with Specific Properties #

Objective: Find all nodes where the type is "person".

SQL

SELECT * FROM nodes
WHERE type = 'person';

Pandas

people_nodes_df = nodes_df[ nodes_df['type'] == 'person' ]

Cypher

MATCH (n {type: 'person'})
RETURN n;

GFQL (chain syntax)

from graphistry import n

# df[['id', 'type', ...]]
g.gfql([ n({"type": "person"}) ])._nodes

GFQL (Cypher syntax)

# Row result — same as standard Cypher MATCH/RETURN
g.gfql("MATCH (n {type: 'person'}) RETURN n")._nodes

# Graph result — GFQL extension, keeps graph state
g.gfql("GRAPH { MATCH (n {type: 'person'}) }")._nodes

Explanation:

GFQL chain: n({"type": "person"}) filters nodes where type is "person". g.gfql([...]) applies this filter to the graph g, and ._nodes retrieves the resulting nodes. The performance is similar to that of Pandas (CPU) or cuDF (GPU).
GFQL Cypher: The same query as a Cypher string. MATCH ... RETURN gives row output; GRAPH { MATCH ... } keeps graph state (both _nodes and _edges).

—

Exploring Relationships Between Nodes #

Objective: Find all edges connecting nodes of type "person" to nodes of type "company".

SQL

SELECT e.*
FROM edges e
JOIN nodes n1 ON e.src = n1.id
JOIN nodes n2 ON e.dst = n2.id
WHERE n1.type = 'person' AND n2.type = 'company';

Pandas

merged_df = edges_df.merge(
    nodes_df[['id', 'type']], left_on='src', right_on='id', suffixes=('', '_src')
).merge(
    nodes_df[['id', 'type']], left_on='dst', right_on='id', suffixes=('', '_dst')
)

result = merged_df[
    (merged_df['type_src'] == 'person') &
    (merged_df['type_dst'] == 'company')
]

Cypher

MATCH (n1 {type: 'person'})-[e]->(n2 {type: 'company'})
RETURN e;

GFQL (chain syntax)

from graphistry import n, e_forward

# df[['src', 'dst', ...]]
g.gfql([
    n({"type": "person"}), e_forward(), n({"type": "company"})
])._edges

GFQL (Cypher syntax)

# Graph result — keeps matched subgraph with edges
g.gfql(
    "GRAPH { MATCH (n1 {type: 'person'})-[e]->(n2 {type: 'company'}) }"
)._edges

# Row result — returns edge properties as rows
g.gfql(
    "MATCH (n1 {type: 'person'})-[e]->(n2 {type: 'company'}) RETURN e"
)._nodes

Explanation:

GFQL chain: Starts from nodes of type "person", traverses forward edges, and reaches nodes of type "company". This version starts to gain the legibility and maintainability benefits of graph query syntax for graph tasks, and maintains the performance benefits of automatically vectorized pandas and GPU-accelerated cuDF.
GFQL Cypher: GRAPH { MATCH ... } returns the matched subgraph (graph state with _edges); MATCH ... RETURN e returns edge properties as rows.
Same-path constraints: Use where to relate attributes across steps (same-path scope only; see GFQL WHERE (Same-Path Constraints)).

—

Performing Multi-Hop Traversals #

Objective: Find nodes that are two hops away from node "Alice".

SQL

WITH first_hop AS (
    SELECT e1.dst AS node_id
    FROM edges e1
    WHERE e1.src = 'Alice'
),
second_hop AS (
    SELECT e2.dst AS node_id
    FROM edges e2
    JOIN first_hop fh ON e2.src = fh.node_id
)
SELECT * FROM nodes
WHERE id IN (SELECT node_id FROM second_hop);

Pandas

first_hop = edges_df[ edges_df['src'] == 'Alice' ]['dst']
second_hop = edges_df[ edges_df['src'].isin(first_hop) ]['dst']
result_nodes_df = nodes_df[ nodes_df['id'].isin(second_hop) ]

Cypher

MATCH (n {id: 'Alice'})-->()-->(m)
RETURN m;

GFQL (chain syntax)

from graphistry import n, e_forward

# df[['id', ...]]
g.gfql([
    n({g._node: "Alice"}), e_forward(), e_forward(), n(name='m')
])._nodes.query('m')

GFQL (Cypher syntax)

# Row result — return 2-hop destinations
g.gfql("MATCH (n {id: 'Alice'})-->()-->(m) RETURN m")._nodes

# Graph result — return the 2-hop subgraph
g.gfql("GRAPH { MATCH (n {id: 'Alice'})-->()-->(m) }")

GFQL (bounded hop alternative)

# Same intent using hop() with explicit bounds + optional labels
g.hop(
    nodes=pd.DataFrame({g._node: ['Alice']}),
    min_hops=2,
    max_hops=2
)

Explanation:

min_hops/max_hops express the same bounded traversal intent as a Cypher pattern like [*2..2] after translation into native GFQL. Direct g.gfql(“MATCH …”) now supports the endpoint-only [*…] slice, but hop() still gives you the full native GFQL control surface for labels, output slicing, and more complex multihop rewrites. If you set label_node_hops/label_edge_hops, those column names will store the hop step (nodes = first arrival, edges = traversal step); omit or None to skip labels.
output_min_hops/output_max_hops (optional) slice the displayed hops after traversal. By default, all traversed hops up to max_hops remain visible; set output_min_hops if you want to drop early hops (e.g., traverse 2..4 but only show 3..4). Invalid slices (e.g., output_min_hops > max_hops or output_max_hops < min_hops) raise a ValueError.

Explanation:

GFQL: Starts at node "Alice", performs two forward hops, and obtains nodes two steps away. Results are in nodes_df. Building on the expressive and performance benefits of the previous 1-hop example, it begins adding the parallel path finding benefits of GFQL over Cypher, which benefits both CPU and GPU usage.

—

Filtering Edges and Nodes with Conditions #

Objective: Find all edges where the weight is greater than 0.5.

SQL

SELECT * FROM edges
WHERE weight > 0.5;

Pandas

filtered_edges_df = edges_df[ edges_df['weight'] > 0.5 ]

Cypher

MATCH ()-[e]->()
WHERE e.weight > 0.5
RETURN e;

GFQL (chain syntax)

from graphistry import e_forward

# df[['src', 'dst', 'weight', ...]]
g.gfql([ e_forward(edge_query='weight > 0.5') ])._edges

GFQL (Cypher syntax)

g.gfql(
    "MATCH ()-[e]->() WHERE e.weight > 0.5 RETURN e"
)._nodes

Explanation:

GFQL chain: Uses e_forward(edge_query='weight > 0.5') to filter edges where weight > 0.5. This version introduces the string query form that can be convenient. Underneath, it still benefits from the vectorized execution of Pandas and cuDF.
GFQL Cypher: Same filter as a Cypher WHERE clause.

—

Aggregations and Grouping #

Objective: Count the number of outgoing edges for each node.

SQL

SELECT src, COUNT(*) AS out_degree
FROM edges
GROUP BY src;

Pandas

out_degree = edges_df.groupby('src').size().reset_index(name='out_degree')

Cypher

MATCH (n)-[e]->()
RETURN n.id AS node_id, COUNT(e) AS out_degree;

GFQL (dataframe)

# df[['src', 'out_degree']]
g._edges.groupby('src').size().reset_index(name='out_degree')

GFQL (Cypher syntax)

# Enrich graph with degree columns, then query
g.gfql("CALL graphistry.degree.write()")._nodes

# Or as a single pipeline — enrich then return top nodes
g.gfql(
    "GRAPH g1 = GRAPH { CALL graphistry.degree.write() } "
    "USE g1 "
    "MATCH (n) RETURN n.id AS node_id, n.degree_out AS out_degree "
    "ORDER BY out_degree DESC LIMIT 10"
)._nodes

Explanation:

GFQL dataframe: Performs aggregation directly on g._edges using standard dataframe operations. Or even shorter, call g.get_degrees() to enrich each node with in, out, and total degrees.
GFQL Cypher: CALL graphistry.degree.write() enriches the graph with degree columns in graph state. Wrap in GRAPH { } to compose with subsequent queries in a single expression.

—

All Paths and Connectivity #

Objective: Find all paths between nodes "Alice" and "Bob" that go through friendships.

SQL

WITH RECURSIVE path AS (
    -- Base case: Start from "Alice" (no type or edge restrictions)
    SELECT e.src, e.dst, ARRAY[e.src, e.dst] AS full_path, 1 AS hop
    FROM edges e
    WHERE e.src = 'Alice'

    UNION ALL

    -- Recursive case: Expand path where intermediate src/dst are 'people' and edge is 'friend'
    SELECT e.src, e.dst, full_path || e.dst, p.hop + 1
    FROM edges e
    JOIN path p ON e.src = p.dst
    JOIN nodes n_src ON e.src = n_src.id  -- Check src type for intermediate nodes
    JOIN nodes n_dst ON e.dst = n_dst.id  -- Check dst type for intermediate nodes
    WHERE n_src.type = 'person' AND n_dst.type = 'person'  -- Intermediate nodes must be 'people'
    AND e.type = 'friend'  -- Intermediate edges must be 'friend'
    AND e.dst != ALL(full_path)  -- Avoid cycles (optional)
)
-- Final filter to ensure the path ends with "Bob"
SELECT *
FROM path
WHERE dst = 'Bob';

Pandas

def find_paths_fixed_point(edges_df, nodes_df, start_node, end_node):
    # Initialize paths with base case (start with 'Alice')
    paths = [{'path': [start_node], 'last_node': start_node}]
    all_paths = []
    expanded = True  # Continue loop as long as there are paths to expand

    while expanded:
        new_paths = []
        expanded = False

        # Expand each path
        for path in paths:
            last_node = path['last_node']

            # Find all outgoing 'friend' edges from the last node
            valid_edges = edges_df.merge(nodes_df, left_on='dst', right_on='id') \
                                .merge(nodes_df, left_on='src', right_on='id') \
                                [(edges_df['src'] == last_node) &
                                (edges_df['type'] == 'friend') &
                                (nodes_df['type_x'] == 'person') &  # src is 'person'
                                (nodes_df['type_y'] == 'person')]   # dst is 'person'

            for _, edge in valid_edges.iterrows():
                new_path = path['path'] + [edge['dst']]

                # If we reached 'Bob', add to all_paths
                if edge['dst'] == end_node:
                    all_paths.append(new_path)
                else:
                    # Otherwise, add to new paths to continue expanding
                    new_paths.append({'path': new_path, 'last_node': edge['dst']})
                    expanded = True  # Mark that we found new paths to expand

        # Stop if no new paths were found (fixed-point behavior)
        paths = new_paths

    return all_paths

# Run the pathfinding function to fixed point
paths = find_paths_fixed_point(edges_df, nodes_df, 'Alice', 'Bob')

Cypher

MATCH p = (n1 {id: 'Alice'})-[e:friend*]-(n2 {id: 'Bob'})
WHERE ALL(rel IN relationships(p) WHERE type(rel) = 'friend')
AND ALL(node IN NODES(p) WHERE node.type = 'person')
RETURN p;

GFQL

# g._edges: df[['src', 'dst', ...]]
# g._nodes: df[['id', ...]]
g.gfql([
    n({"id": "Alice"}),
    e_forward(
        source_node_query='type == "person"',
        edge_query='type == "friend"',
        destination_node_query='type == "person"',
        to_fixed_point=True),
    n({"id": "Bob"})
])

Explanation:

GFQL: Uses e(to_fixed_point=True) to find edge sequences of arbitrary length between nodes "Alice" and "Bob". The SQL and Pandas version suffer from syntactic and semantic imepedance mismatch with graph tasks on this example.

—

Community Detection and Clustering #

Objective: Identify communities within the graph using the Louvain algorithm.

SQL and Pandas

Not designed for complex graph algorithms like community detection.

Cypher

CALL algo.louvain.stream() YIELD nodeId, communityId

GFQL (Python)

# g._nodes: df[['id', 'louvain']]
g.compute_cugraph('louvain')._nodes

GFQL (Cypher syntax)

# Graph-preserving enrichment via CALL .write()
g.gfql("CALL graphistry.cugraph.louvain.write()")._nodes

# Full pipeline: filter subgraph, enrich, query results
g.gfql(
    "GRAPH g1 = GRAPH { MATCH (a)-[r]->(b) WHERE a.score > 5 } "
    "GRAPH g2 = GRAPH { USE g1 CALL graphistry.cugraph.louvain.write() } "
    "USE g2 "
    "MATCH (n) RETURN n.id AS id, n.louvain AS community "
    "ORDER BY community, id"
)._nodes

Explanation:

GFQL Python: Enriches with many algorithms such as the GPU-accelerated graphistry.plugins.cugraph.compute_cugraph() for community detection. Any CPU and GPU library can be used, with top plugins already natively supported out-of-the-box.
GFQL Cypher: CALL graphistry.cugraph.louvain.write() runs the same algorithm via GFQL’s Cypher surface. Wrapping in GRAPH { } with USE enables single-expression pipelines that filter, enrich, and query.

—

Time-Windowed Graph Analytics #

Objective: Find all edges between nodes "Alice" and "Bob" that occurred in the last 7 days.

SQL

SELECT * FROM edges
WHERE ((src = 'Alice' AND dst = 'Bob') OR (src = 'Bob' AND dst = 'Alice'))
  AND timestamp >= NOW() - INTERVAL '7 days';

Warning

This version incorrectly simplifies to a two-hop relationship. For multihop scenarios, refer to All Paths and Connectivity for more advanced techniques.

Pandas

filtered_edges_df = edges_df[
    ((edges_df['src'] == 'Alice') & (edges_df['dst'] == 'Bob')) |
    ((edges_df['src'] == 'Bob') & (edges_df['dst'] == 'Alice')) &
    (edges_df['timestamp'] >= pd.Timestamp.now() - pd.Timedelta(days=7))
]

Warning

This version incorrectly simplifies to a two-hop relationship. For multihop scenarios, refer to All Paths and Connectivity for more advanced techniques.

Cypher

MATCH path = (a {id: 'Alice'})-[e]-(b {id: 'Bob'})
WHERE e.timestamp >= datetime().subtract(duration({days: 7}))
RETURN e;

GFQL

from graphistry import n, e_forward, is_in

past_week = pd.Timestamp.now() - pd.Timedelta(7)
g.gfql([
    n({"id": is_in(["Alice", "Bob"])}),
    e_forward(edge_query=f'timestamp >= "{past_week}"'),
    n({"id": is_in(["Alice", "Bob"])})
])._edges

Explanation:

SQL and Pandas: These versions incorrectly simplify to a two-hop relationships; for multihop scenarios, refer to All Paths and Connectivity.
GFQL: Utilizes the chain method to filter edges between "Alice" and "Bob" based on a timestamp within the last 7 days. This approach allows for multihop relationships as it leverages the graph’s structure, and further using cuDF for GPU acceleration when available.

—

Parallel Pathfinding #

Objective: Find all paths from "Alice" to "Bob" and "Charlie" in parallel. Parallel pathfinding is particularly interesting because it allows for efficient querying of multiple target nodes at the same time, reducing the time and complexity required to compute multiple independent paths, especially in large graphs.

SQL

Not suitable: SQL is not designed for pathfinding on graphs.

Pandas

Not suitable: Pandas is not designed for pathfinding across graphs.

Cypher

Warning

Cypher is path-oriented and does not natively support parallel pathfinding. Each path must be processed individually, which can result in performance bottlenecks for large graphs or multiple targets. Neo4j users can utilize the APOC or GDS libraries to add parallelism, but this is a limited external workaround, rather than a native strength.

MATCH (a {id: 'Alice'}), (target)
WHERE target.id IN ['Bob', 'Charlie']
CALL algo.shortestPath.stream(a, target)
YIELD nodeId, cost
RETURN nodeId, cost;

GFQL

from graphistry import n, e_forward, is_in

# g._nodes: cudf.DataFrame[['src', 'dst', ...]]
g.gfql([
    n({"id": "Alice"}),
    e_forward(to_fixed_point=False),
    n({"id": is_in(["Bob", "Charlie"])})
], engine='cudf')

Explanation:

Cypher: Cypher processes paths individually and does not support native parallelism. Libraries like APOC or GDS offer a way to achieve parallel execution, but this adds complexity.
GFQL: GFQL natively supports parallel pathfinding using a bulk wavefront algorithm, processing all paths at once, making it highly efficient in GPU-accelerated environments.

—

GPU Execution #

Objective*: Execute pathfinding queries on the GPU, computing all paths from "Alice" to "Bob" and "Charlie" simultaneously across hardware resources.

SQL

Not suitable: SQL is not designed for parallel execution of graph queries.

Pandas

Not suitable: Pandas is not designed for parallel execution across graphs.

Cypher

Not suitable: Popular Cypher engines like Neo4j do not natively support GPU execution.

GFQL

from graphistry import n, e_forward, is_in

# Executing pathfinding queries in parallel
g.gfql([
    n({"id": "Alice"}),
    e_forward(to_fixed_point=False),
    n({"id": is_in(["Bob", "Charlie"])})
], engine='cudf')

Explanation:

This example builds on the previous one, showing how GFQL handles parallel execution natively. GFQL benefits from bulk vector processing, which boosts performance in both CPU and GPU modes:

In CPU environments, the bulk processing model accelerates query execution algorithmically and takes advantage of hardware parallelism, improving efficiency.
In GPU mode, GFQL natively parallelizes pathfinding, further leveraging hardware acceleration to process multiple paths concurrently and quickly, making it highly efficient for large-scale graph traversals.

—

GFQL Functions and Equivalents#

Node Matching

SQL: SELECT * FROM nodes WHERE ...
Pandas: nodes_df[ condition ]
Cypher: MATCH (n {property: value})
GFQL chain: n({ "property": value })
GFQL Cypher: g.gfql("MATCH (n {property: value}) RETURN n")

Edge Matching

SQL: SELECT * FROM edges WHERE ...
Pandas: edges_df[ condition ]
Cypher: MATCH ()-[e {property: value}]->()
GFQL chain: e_forward({ "property": value }) or e_reverse({ "property": value }) or e({ "property": value })
GFQL Cypher: g.gfql("MATCH ()-[e {property: value}]->() RETURN e")

Traversal

SQL: Complex joins or recursive queries
Pandas: Multiple merges; not efficient for deep traversals
Cypher: Patterns like ()-[]->() for traversal
GFQL chain: Chains of n(), e_forward(), e_reverse(), and e() functions
GFQL Cypher: g.gfql("MATCH (a)-[]->(b) RETURN ...") or g.gfql("GRAPH { MATCH ... }")

Graph-State Results (subgraph extraction)

SQL: Not applicable
Pandas: Manual node/edge DataFrame filtering
Cypher: Not supported (Cypher always returns rows)
GFQL chain: g.gfql([n(...), e_forward(), n()]) — inherently graph-returning
GFQL Cypher: g.gfql("GRAPH { MATCH (a)-[r]->(b) WHERE ... }") — GFQL extension

Graph Enrichment (algorithms)

SQL: Not applicable
Pandas: External library calls
Cypher: CALL algo.pagerank.stream()
GFQL Python: g.compute_cugraph('pagerank') or g.compute_igraph('pagerank')
GFQL Cypher: g.gfql("CALL graphistry.cugraph.pagerank.write()") or g.gfql("GRAPH { CALL graphistry.cugraph.pagerank.write() }")

Multi-Stage Pipelines

SQL: CTEs or temp tables
Pandas: Sequential variable assignment
Cypher: Not supported in standard Cypher (row-only)
GFQL chain: Sequential g.gfql([...]) calls
GFQL Cypher: GRAPH g1 = GRAPH { ... } GRAPH g2 = GRAPH { USE g1 CALL ... } USE g2 MATCH ... RETURN ...

Tips for Users#

Data Scientists and Analysts: Use your Pandas knowledge. GFQL operates on dataframes, allowing familiar operations.
Engineers and Developers: Integrate GFQL into Python applications without extra infrastructure.
Database Administrators: Complement SQL queries with GFQL for graph data without changing databases.
Graph Enthusiasts: Start with simple queries and explore complex analytics. Visualize results using PyGraphistry.

Additional Resources#

GFQL Quick Reference
GFQL Operator Reference: Use predicates for filtering on nodee and edge attributes.
10 Minutes to PyGraphistry: Visualize GFQL queries with GPU-accelerated tools.

Conclusion#

GFQL bridges the gap between traditional querying languages and graph analytics. By translating queries from SQL, Pandas, and Cypher into GFQL, you can leverage powerful graph queries within your Python workflows.

Start exploring GFQL today and unlock new insights from your graph data!

Translate Between SQL, Pandas, Cypher, and GFQL

Contents

Translate Between SQL, Pandas, Cypher, and GFQL#

Introduction#

Who Is This Guide For?#

Common Graph and Query Tasks#

GFQL Functions and Equivalents#

Tips for Users#

Additional Resources#

Conclusion#