Translating Between SQL, Pandas, Cypher, and GFQL¶
This guide provides a comparison between SQL, Pandas, Cypher, and GFQL, helping you translate familiar queries into GFQL.
Introduction¶
GFQL (GraphFrame Query Language) is designed to be intuitive for users familiar with SQL, Pandas, or Cypher. By comparing equivalent queries across these languages, you can quickly grasp GFQL’s syntax and start utilizing its powerful graph querying capabilities within your Python workflows.
Who Is This Guide For?¶
Data Scientists and Analysts: Familiar with Pandas or SQL, looking to explore graph relationships in their data.
Graph Engineers and Developers: Experienced with Cypher or other graph query languages, interested in integrating graph queries into Python applications.
Database Administrators: Looking to understand how GFQL can complement traditional SQL queries for graph data.
Anyone Exploring Graph Analytics: Interested in learning how to perform graph queries using a dataframe-native approach.
Common Graph and Query Tasks¶
We’ll cover a range of common graph and query tasks:
Finding Nodes with Specific Properties
Exploring Relationships Between Nodes
Performing Multi-Hop Traversals
Filtering Edges and Nodes with Conditions
Aggregations and Grouping
All Paths and Connectivity
Community Detection and Clustering
Translation Examples¶
Example 1: Finding Nodes with Specific Properties¶
Objective: Find all nodes where the type is “person”.
SQL
SELECT * FROM nodes
WHERE type = 'person';
Pandas
people_nodes_df = nodes_df[ nodes_df['type'] == 'person' ]
Cypher
MATCH (n {type: 'person'})
RETURN n;
GFQL
from graphistry import n
people_nodes_df = g.chain([ n({"type": "person"}) ])._nodes
Explanation:
GFQL: n({“type”: “person”}) filters nodes where type is “person”. g.chain([…]) applies this filter to the graph g, and ._nodes retrieves the resulting nodes. The performance is similar to that of Pandas (CPU) or cuDF (GPU).
—
Example 2: Exploring Relationships Between Nodes¶
Objective: Find all edges connecting nodes of type “person” to nodes of type “company”.
SQL
SELECT e.*
FROM edges e
JOIN nodes n1 ON e.src = n1.id
JOIN nodes n2 ON e.dst = n2.id
WHERE n1.type = 'person' AND n2.type = 'company';
Pandas
merged_df = edges_df.merge(
nodes_df[['id', 'type']], left_on='src', right_on='id', suffixes=('', '_src')
).merge(
nodes_df[['id', 'type']], left_on='dst', right_on='id', suffixes=('', '_dst')
)
result = merged_df[
(merged_df['type_src'] == 'person') &
(merged_df['type_dst'] == 'company')
]
Cypher
MATCH (n1 {type: 'person'})-[e]->(n2 {type: 'company'})
RETURN e;
GFQL
from graphistry import n, e_forward
g_result = g.chain([
n({"type": "person"}),
e_forward(),
n({"type": "company"})
])
edges_df = g_result._edges
Explanation:
GFQL: Starts from nodes of type “person”, traverses forward edges, and reaches nodes of type “company”. The resulting edges are stored in edges_df.
—
Example 3: Performing Multi-Hop Traversals¶
Objective: Find nodes that are two hops away from node “Alice”.
SQL
WITH first_hop AS (
SELECT e1.dst AS node_id
FROM edges e1
WHERE e1.src = 'Alice'
),
second_hop AS (
SELECT e2.dst AS node_id
FROM edges e2
JOIN first_hop fh ON e2.src = fh.node_id
)
SELECT * FROM nodes
WHERE id IN (SELECT node_id FROM second_hop);
Pandas
first_hop = edges_df[ edges_df['src'] == 'Alice' ]['dst']
second_hop = edges_df[ edges_df['src'].isin(first_hop) ]['dst']
result_nodes_df = nodes_df[ nodes_df['id'].isin(second_hop) ]
Cypher
MATCH (n {id: 'Alice'})-->()-->(m)
RETURN m;
GFQL
from graphistry import n, e_forward
g_2_hops = g.chain([
n({g._node: "Alice"}),
e_forward(),
e_forward()
])
nodes_df = g_2_hops._nodes
Explanation:
GFQL: Starts at node “Alice”, performs two forward hops, and obtains nodes two steps away. Results are in nodes_df.
—
Example 4: Filtering Edges and Nodes with Conditions¶
Objective: Find all edges where the weight is greater than 0.5.
SQL
SELECT * FROM edges
WHERE weight > 0.5;
Pandas
filtered_edges_df = edges_df[ edges_df['weight'] > 0.5 ]
Cypher
MATCH ()-[e]->()
WHERE e.weight > 0.5
RETURN e;
GFQL
from graphistry import e_forward
filtered_edges_df = g.chain([ e_forward(edge_query='weight > 0.5') ])._edges
Explanation:
GFQL: Uses e_forward({“weight”: lambda w: w > 0.5}) to filter edges where weight > 0.5.
—
Example 5: Aggregations and Grouping¶
Objective: Count the number of outgoing edges for each node.
SQL
SELECT src, COUNT(*) AS out_degree
FROM edges
GROUP BY src;
Pandas
out_degree = edges_df.groupby('src').size().reset_index(name='out_degree')
Cypher
MATCH (n)-[e]->()
RETURN n.id AS node_id, COUNT(e) AS out_degree;
GFQL
out_degree = g._edges.groupby('src').size().reset_index(name='out_degree')
Explanation:
GFQL: Performs aggregation directly on g._edges using standard dataframe operations. Or even shorter, call g.get_degrees() to enrich each node with in, out, and total degrees.
—
Example 6: All Paths and Connectivity¶
Objective: Find all paths between nodes "Alice" and "Bob".
SQL and Pandas
Not suitable for path calculations in graphs.
Cypher
MATCH p = (n {id: 'Alice'})-[*]-(m {id: 'Bob'})
RETURN p;
GFQL
g.chain([ n({g._node: "Alice"}), e(to_fixed_point=True), n({g._node: "Bob"}) ])
Explanation:
GFQL: Uses e(to_fixed_point=True) to find edge sequences of arbitrary length between nodes “Alice” and “Bob”.
—
Example 7: Community Detection and Clustering¶
Objective: Identify communities within the graph using the Louvain algorithm.
SQL and Pandas
Not designed for complex graph algorithms like community detection.
Cypher
CALL algo.louvain.stream() YIELD nodeId, communityId
GFQL
nodes_df = g.compute_cugraph('louvain')._nodes[['id', 'louvain']]
Explanation:
GFQL: Uses GPU-accelerated algorithms via compute_cugraph() for community detection.
—
GFQL Functions and Equivalents¶
Node Matching
SQL: SELECT * FROM nodes WHERE …
Pandas: nodes_df[ condition ]
Cypher: MATCH (n {property: value})
GFQL: n({ “property”: value })
Edge Matching
SQL: SELECT * FROM edges WHERE …
Pandas: edges_df[ condition ]
Cypher: MATCH ()-[e {property: value}]->()
GFQL: e_forward({ “property”: value }) or e_reverse({ “property”: value })
Traversal
SQL: Complex joins or recursive queries.
Pandas: Multiple merges; not efficient for deep traversals.
Cypher: Patterns like ()-[]->() for traversal.
GFQL: Chains of n(), e_forward(), e_reverse() functions.
Tips for Users¶
Data Scientists and Analysts: Use your Pandas knowledge. GFQL operates on dataframes, allowing familiar operations.
Engineers and Developers: Integrate GFQL into Python applications without extra infrastructure.
Database Administrators: Complement SQL queries with GFQL for graph data without changing databases.
Graph Enthusiasts: Start with simple queries and explore complex analytics. Visualize results using PyGraphistry.
Additional Resources¶
GFQL Documentation: Detailed documentation for advanced usage.
GFQL Predicates: Use predicates for complex filtering conditions.
PyGraphistry Integration: Visualize GFQL queries with GPU-accelerated tools.
Conclusion¶
GFQL bridges the gap between traditional querying languages and graph analytics. By translating queries from SQL, Pandas, and Cypher into GFQL, you can leverage powerful graph queries within your Python workflows.
Start exploring GFQL today and unlock new insights from your graph data!