GFQL Wire Protocol Specification#

Introduction#

The GFQL Wire Protocol defines the JSON serialization format for GFQL queries, enabling:

  • Client-server communication

  • Query persistence and storage

  • Cross-language interoperability between Python, JavaScript, and other clients

  • Configuration-driven query generation

Design Principles#

  • Type Safety: Tagged dictionaries preserve type information

  • Self-Describing: Each object includes type metadata

  • Extensible: Schema supports future additions

  • Round-Trip Safe: Lossless serialization/deserialization

Protocol Overview#

Message Structure#

All GFQL wire protocol messages are JSON objects with a type field:

{
  "type": "MessageType",
  "payload": {}
}

Supported Message Types#

  • Chain: Complete query chain

  • Let: DAG pattern with named bindings

  • ChainRef: Reference to Let binding with optional chain

  • RemoteGraph: Reference to remote dataset

  • Call: Algorithm/transformation invocation

  • Node: Node matcher operation

  • Edge: Edge traversal operation

  • Predicates: GT, LT, EQ, IsIn, Between, etc.

  • Temporal values: datetime, date, time

Message Structure#

All GFQL wire protocol messages are JSON objects with a type field that identifies the message type. The protocol uses discriminated unions for polymorphic types.

Type Identification#

Each object includes a type field:

  • Operations: "Node", "Edge", "Chain", "Let", "ChainRef", "RemoteGraph", "Call"

  • Predicates: "GT", "LT", "IsIn", etc.

  • Temporal values: "datetime", "date", "time"

This enables unambiguous deserialization and validation.

Operation Serialization#

Node Operation#

Python:

n({"type": "person", "age": gt(30)}, name="adults")

Wire Format:

{
  "type": "Node",
  "filter_dict": {
    "type": "person",
    "age": {
      "type": "GT",
      "val": 30
    }
  },
  "name": "adults"
}

Edge Operation#

Python:

e_forward(
    {"type": "transaction"},
    min_hops=2,
    max_hops=4,
    output_min_hops=3,
    label_edge_hops="edge_hop",
    source_node_match={"active": True},
    name="txns"
)

Wire Format:

{
  "type": "Edge",
  "direction": "forward",
  "edge_match": { "type": "transaction" },
  "min_hops": 2,
  "max_hops": 4,
  "output_min_hops": 3,
  "label_edge_hops": "edge_hop",
  "source_node_match": { "active": true },
  "name": "txns"
}

Optional fields:

  • hops (shorthand for max_hops)

  • output_min_hops

  • output_max_hops

  • label_node_hops, label_edge_hops, label_seeds

  • to_fixed_point

Chain#

Python:

from graphistry import n, e_forward

g.gfql([
    n({"id": "Alice"}),
    e_forward({"type": "friend"}),
    n({"status": "active"})
])

Wire Format:

{
  "type": "Chain",
  "chain": [
    {
      "type": "Node",
      "filter_dict": {"id": "Alice"}
    },
    {
      "type": "Edge",
      "direction": "forward",
      "edge_match": {"type": "friend"}
    },
    {
      "type": "Node",
      "filter_dict": {"status": "active"}
    }
  ]
}

Optional fields:

  • where: list of same-path comparisons using eq, neq, lt, le, gt, ge with left/right as alias.column strings. Multiple entries are ANDed. Operator mapping:

    • eq maps to ==

    • neq maps to !=

    • lt maps to <

    • le maps to <=

    • gt maps to >

    • ge maps to >=

Chain with WHERE (wire format):

{
  "type": "Chain",
  "chain": [
    {"type": "Node", "filter_dict": {"type": "account"}, "name": "a"},
    {"type": "Edge", "direction": "forward"},
    {"type": "Node", "filter_dict": {"type": "user"}, "name": "c"}
  ],
  "where": [{"eq": {"left": "a.owner_id", "right": "c.owner_id"}}]
}

WHERE Validation Errors#

The parser and same-path validator reject malformed or unresolved WHERE clauses before execution.

Unsupported operator key:

{
  "type": "Chain",
  "chain": [{"type": "Node", "name": "a"}, {"type": "Node", "name": "c"}],
  "where": [{"lte": {"left": "a.owner_id", "right": "c.owner_id"}}]
}

Expected error: Unsupported WHERE operator 'lte'.

Missing required keys:

{
  "type": "Chain",
  "chain": [{"type": "Node", "name": "a"}, {"type": "Node", "name": "c"}],
  "where": [{"eq": {"left": "a.owner_id"}}]
}

Expected error: WHERE clause must have 'left' and 'right' keys.

Alias not bound in the chain:

{
  "type": "Chain",
  "chain": [
    {"type": "Node", "name": "a"},
    {"type": "Edge", "direction": "forward", "name": "e"},
    {"type": "Node", "name": "c"}
  ],
  "where": [{"eq": {"left": "missing.owner_id", "right": "c.owner_id"}}]
}

Expected error: WHERE references aliases with no node/edge bindings: missing.

Let Operation#

Python:

let({
    'persons': n({'type': 'Person'}),
    'adults': ref('persons', [n({'age': ge(18)})])
})

Wire Format:

{
  "type": "Let",
  "bindings": {
    "persons": {
      "type": "Node",
      "filter_dict": {"type": "Person"}
    },
    "adults": {
      "type": "ChainRef",
      "ref": "persons",
      "chain": [{
        "type": "Node",
        "filter_dict": {
          "age": {"type": "GE", "val": 18}
        }
      }]
    }
  }
}

ChainRef Operation#

ChainRef executes on the referenced graph; bindings used for edge traversal should retain edges (for example, from an Edge or Chain binding).

Python:

ref('base_graph', [
    e_forward({'weight': gt(0.5)}),
    n({'status': 'active'})
])

Wire Format:

{
  "type": "ChainRef",
  "ref": "base_graph",
  "chain": [
    {
      "type": "Edge",
      "direction": "forward",
      "edge_match": {"weight": {"type": "GT", "val": 0.5}}
    },
    {
      "type": "Node",
      "filter_dict": {"status": "active"}
    }
  ]
}

RemoteGraph Operation#

Python:

remote(dataset_id='fraud-network-2024')

Wire Format:

{
  "type": "RemoteGraph",
  "dataset_id": "fraud-network-2024"
}

Call Operation#

Python:

call('compute_cugraph', {'alg': 'pagerank', 'damping': 0.85})

Wire Format:

{
  "type": "Call",
  "function": "compute_cugraph",
  "params": {
    "alg": "pagerank",
    "damping": 0.85
  }
}

Note

For the complete list of safelisted layout calls—including the radial variants—refer to GFQL Built-in Call Reference.

Row-Pipeline Call Serialization#

Row-pipeline operators use the same existing Call envelope. There is no wire-format envelope change for row pipelines; only function/params values vary by operator.

rows:

{"type": "Call", "function": "rows", "params": {"table": "nodes", "source": "q"}}

where_rows:

{"type": "Call", "function": "where_rows", "params": {"expr": "score >= 50"}}

where_rows.expr supports comparison operators: =, !=, <>, <, <=, >, >=. where_rows can also use predicate dictionaries on the active row table:

{"type": "Call", "function": "where_rows", "params": {"filter_dict": {"score": {"type": "GE", "val": 50}}}}

WHERE context summary:

  • Chain-level same-path where uses lower-case operator keys (eq, neq, lt, le, gt, ge) with left/right alias-column references.

  • Row-level where_rows(filter_dict=...) uses predicate envelopes like GT, GE, LT, LE, EQ, NE on active row-table columns.

select:

{"type": "Call", "function": "select", "params": {"items": [["id", "id"], ["score", "score"]]}}

with_:

{"type": "Call", "function": "with_", "params": {"items": [["id", "id"]]}}

order_by:

{"type": "Call", "function": "order_by", "params": {"keys": [["score", "desc"], ["name", "asc"]]}}

skip:

{"type": "Call", "function": "skip", "params": {"value": 20}}

limit:

{"type": "Call", "function": "limit", "params": {"value": 10}}

distinct:

{"type": "Call", "function": "distinct", "params": {}}

unwind:

{"type": "Call", "function": "unwind", "params": {"expr": "tags", "as_": "tag"}}

group_by:

{"type": "Call", "function": "group_by", "params": {"keys": ["category"], "aggregations": [["cnt", "count"], ["total", "sum", "amount"]]}}

return_(...) is serialized as function: "select" with equivalent items.

Row-Call Validation Errors#

Row-call payloads are validated before execution. Invalid payloads fail fast.

Invalid rows.table enum:

{"type": "Call", "function": "rows", "params": {"table": "invalid"}}

Expected error: parameter validation failure (table must be "nodes" or "edges").

Invalid where_rows.expr type:

{"type": "Call", "function": "where_rows", "params": {"expr": 123}}

Expected error: parameter validation failure (expr must be a non-empty string).

Invalid order_by direction:

{"type": "Call", "function": "order_by", "params": {"keys": [["score", "up"]]}}

Expected error: parameter validation failure (direction must be "asc" or "desc").

Invalid group_by payload shape:

{"type": "Call", "function": "group_by", "params": {"keys": [], "aggregations": []}}

Expected error: parameter validation failure (non-empty keys and valid aggregation specs required).

Predicate Serialization#

Comparison Predicates#

{"type": "GT", "val": 100}
{"type": "LT", "val": 50.5}
{"type": "GE", "val": "2024-01-01"}
{"type": "LE", "val": true}
{"type": "EQ", "val": "active"}
{"type": "NE", "val": null}

Between Predicate#

{
  "type": "Between",
  "lower": 10,
  "upper": 20,
  "inclusive": true
}

IsIn Predicate#

{
  "type": "IsIn",
  "options": ["A", "B", "C"]
}

String Predicates#

Basic forms (defaults: case=true, na=null, flags=0):

{"type": "Contains", "pat": "search", "case": true, "flags": 0, "na": null, "regex": true}
{"type": "Startswith", "pat": "prefix", "case": true, "na": null}
{"type": "Endswith", "pat": "suffix", "case": true, "na": null}
{"type": "Match", "pat": "^[A-Z]+\\d+$", "case": true, "flags": 0, "na": null}
{"type": "Fullmatch", "pat": "^[A-Z]+$", "case": true, "flags": 0, "na": null}

Case-insensitive matching (using case=false):

{"type": "Startswith", "pat": "prefix", "case": false, "na": null}
{"type": "Fullmatch", "pat": "^test$", "case": false, "flags": 0, "na": null}

Tuple patterns (OR logic - match any):

{"type": "Startswith", "pat": ["app", "ban"], "case": true, "na": null}
{"type": "Endswith", "pat": [".jpg", ".png", ".gif"], "case": true, "na": null}

NA handling (fill value for missing data):

{"type": "Startswith", "pat": "test", "case": true, "na": false}
{"type": "Endswith", "pat": "end", "case": true, "na": true}

Notes:

  • pat: Pattern string or array of strings (array uses OR logic)

  • case: Case-sensitive if true (default: true)

  • na: Fill value for null/missing values (default: null preserves NA)

  • flags: Regex flags for Match/Fullmatch (default: 0)

  • regex: Whether pattern is regex for Contains (default: true)

Null Predicates#

{"type": "IsNull"}
{"type": "NotNull"}
{"type": "IsNA"}
{"type": "NotNA"}

Temporal Check Predicates#

{"type": "IsMonthStart"}
{"type": "IsYearEnd"}
{"type": "IsLeapYear"}

Type Serialization#

Scalar Types#

"hello world"        // string
42                   // integer
3.14159             // float
true                // boolean
null                // null

Temporal Types#

DateTime#

{
  "type": "datetime",
  "value": "2024-01-15T10:30:00",
  "timezone": "America/New_York"  // Optional, defaults to "UTC"
}

Date#

{
  "type": "date",
  "value": "2024-01-15"
}

Time#

{
  "type": "time",
  "value": "14:30:00.123456"
}

Temporal comparisons use standard predicate envelopes over these typed temporal values:

  • GT, GE, LT, LE, EQ, NE

Example:

{
  "type": "GE",
  "val": {
    "type": "date",
    "value": "2024-01-01"
  }
}

Note: The timezone field is optional for DateTime values and defaults to “UTC” if omitted. This ensures consistent behavior across systems while allowing explicit timezone specification when needed.

Examples#

MATCH ... RETURN Row Pipeline#

Python:

g.gfql([
    n({"type": "Person"}),
    e_forward({"type": "FOLLOWS"}),
    n({"type": "Person"}, name="q"),
    rows(table="nodes", source="q"),
    where_rows(expr="score >= 50"),
    return_(["id", "name", "score"]),
    order_by([("score", "desc"), ("name", "asc")]),
    limit(25),
])

Wire Format:

{
  "type": "Chain",
  "chain": [
    {"type": "Node", "filter_dict": {"type": "Person"}},
    {"type": "Edge", "direction": "forward", "edge_match": {"type": "FOLLOWS"}},
    {"type": "Node", "filter_dict": {"type": "Person"}, "name": "q"},
    {"type": "Call", "function": "rows", "params": {"table": "nodes", "source": "q"}},
    {"type": "Call", "function": "where_rows", "params": {"expr": "score >= 50"}},
    {"type": "Call", "function": "select", "params": {"items": [["id", "id"], ["name", "name"], ["score", "score"]]}},
    {"type": "Call", "function": "order_by", "params": {"keys": [["score", "desc"], ["name", "asc"]]}},
    {"type": "Call", "function": "limit", "params": {"value": 25}}
  ]
}

User 360 Query#

Python:

g.gfql([
    n({"customer_id": "C123"}),
    e_forward({
        "type": "purchase",
        "timestamp": gt(pd.Timestamp("2024-01-01"))
    })
])

Wire Format:

{
  "type": "Chain",
  "chain": [
    {
      "type": "Node",
      "filter_dict": {
        "customer_id": "C123"
      }
    },
    {
      "type": "Edge",
      "direction": "forward",
      "edge_match": {
        "type": "purchase",
        "timestamp": {
          "type": "GT",
          "val": {
            "type": "datetime",
            "value": "2024-01-01T00:00:00",
            "timezone": "UTC"
          }
        }
      }
    }
  ]
}

Cyber Security Pattern#

Python:

g.gfql([
    n({"ip": is_in(["192.168.1.100", "192.168.1.101"])}),
    e_forward(
        edge_query="port IN [22, 23, 3389]",
        to_fixed_point=True
    ),
    n({"type": "server", "critical": True})
])

Wire Format:

{
  "type": "Chain",
  "chain": [
    {
      "type": "Node",
      "filter_dict": {
        "ip": {
          "type": "IsIn",
          "options": ["192.168.1.100", "192.168.1.101"]
        }
      }
    },
    {
      "type": "Edge",
      "direction": "forward",
      "edge_query": "port IN [22, 23, 3389]",
      "to_fixed_point": true
    },
    {
      "type": "Node",
      "filter_dict": {
        "type": "server",
        "critical": true
      }
    }
  ]
}

Best Practices#

  1. Always include type fields: Every object must have a type

  2. Use ISO formats: Dates and times in ISO 8601

  3. Handle timezones consistently: Include timezone for datetime values when precision matters (defaults to UTC)

  4. Validate before sending: Use JSON Schema validation

  5. Handle unknown fields: Ignore unrecognized fields for compatibility

See Also#