GFQL Quick Reference#

This quick reference page provides short examples of various parameters and usage patterns.

Basic Usage#

Chaining Operations

g.gfql(ops=[...], engine=EngineAbstract.AUTO)

gfql sequences multiple matchers for more complex patterns of paths and subgraphs

ops: Sequence of graph node and edge matchers (ASTObject instances).
engine: Optional execution engine. Engine is typically not set, defaulting to ‘auto’. Use ‘cudf’ for GPU acceleration and ‘pandas’ for CPU.

Node Matchers#

n(filter_dict=None, name=None, query=None)

n matches nodes based on their attributes.

Filter nodes based on attributes.
Parameters:
- filter_dict: {attribute: value} or {attribute: condition_function}
- name: Optional label; adds a boolean column in the result.
- query: Custom query string (e.g., “age > 30 and country == ‘USA’”).

Examples:

Match nodes where type is ‘person’:
```
n({"type": "person"})
```
Match nodes with age greater than 30:
```
n({"age": lambda x: x > 30})
```

Use a custom query string:

n(query="age > 30 and country == 'USA'")

Edge Matchers#

e_forward(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)
e_reverse(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)
e_undirected(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)

# alias for e_undirected
e(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)

e matches edges based on their attributes (undirected). May also include matching on edge’s source and destination nodes.

Traverse edges in the forward direction.
Parameters:
- edge_match: {attribute: value} or {attribute: condition_function}
- edge_query: Custom query string for edge attributes.
- hops: int, number of hops to traverse.
- min_hops/max_hops: Inclusive traversal bounds (min defaults to 1 unless max_hops is 0; max defaults to hops).
- output_min_hops/output_max_hops: Optional post-filter slice; defaults keep all traversed hops up to max_hops.
- label_node_hops/label_edge_hops: Optional column names for hop numbers; label_seeds=True adds hop 0 for seeds.
- to_fixed_point: bool, continue traversal until no more matches.
- source_node_match: Filter for source nodes.
- destination_node_match: Filter for destination nodes.
- source_node_query: Custom query string for source nodes.
- destination_node_query: Custom query string for destination nodes.
- name: Optional label.

Examples:

Traverse up to 2 hops forward on edges where status is ‘active’:
```
e_forward({"status": "active"}, hops=2)
```

Traverse 2..4 hops but show only hops 3..4 with labels:

e_forward(
    {"status": "active"},
    min_hops=2,
    max_hops=4,
    output_min_hops=3,
    label_edge_hops="edge_hop"
)

Use custom edge query strings:

e_forward(edge_query="weight > 5 and type == 'connects'")

Filter source and destination nodes with match dictionaries:

e_forward(
    source_node_match={"status": "active"},
    destination_node_match={"age": lambda x: x < 30}
)

Filter source and destination nodes with queries:

e_forward(
    source_node_query="status == 'active'",
    destination_node_query="age < 30"
)

Label matched edges:
```
e_forward(name="active_edges")
```

e_reverse, e_forward, and e are aliases.

e_reverse: Same as e_forward, but traverses in reverse.
e: Traverses edges regardless of direction.

Predicates#

graphistry.compute.predicates.ASTPredicate.ASTPredicate

Matches using a predicate on entity attributes.

See GFQL Operator Reference for more information.

Example:

Match nodes where category is ‘A’, ‘B’, or ‘C’:

from graphistry import n, is_in

n({"category": is_in(["A", "B", "C"])})

Combined Examples#

Find people connected to transactions via active relationships:

g.gfql([
    n({"type": "person"}),
    e_forward({"status": "active"}),
    n({"type": "transaction"})
])

Label nodes and edges during traversal:

g.gfql([
    n({"id": "start_node"}, name="start"),
    e_forward(name="edge1"),
    n({"level": 2}, name="middle"),
    e_forward(name="edge2"),
    n({"type": "end_type"}, name="end")
])

Traverse until no more matches (fixed point):

g.gfql([
    n({"status": "infected"}),
    e_forward(to_fixed_point=True),
    n(name="reachable")
])

Filter by multiple conditions:

g.gfql([
    n({"type": is_in(["server", "database"])}),
    e_undirected({"protocol": "TCP"}, hops=3),
    n(query="risk_level >= 8")
])

Use custom queries in matchers:

g.gfql([
    n(query="age > 30 and country == 'USA'"),
    e_forward(edge_query="weight > 5"),
    n(query="status == 'active'")
])

GPU Acceleration#

Enable GPU mode:
```
g.gfql([...], engine='cudf')
```

Example with cuDF DataFrames:

import cudf

e_gdf = cudf.from_pandas(edge_df)
n_gdf = cudf.from_pandas(node_df)

g = graphistry.nodes(n_gdf, 'node_id').edges(e_gdf, 'src', 'dst')
g.gfql([...], engine='cudf')

Remote Mode#

Query existing remote data

g = graphistry.bind(dataset_id='ds-abc-123')

nodes_df = g.gfql_remote([n()])._nodes

Upload graph and run GFQL

g2 = g1.upload()

g3 = g2.gfql_remote([n(), e(), n()])

Enforce CPU and GPU mode on remote GFQL

g3a = g2.gfql_remote([n(), e(), n()], engine='pandas')
g3b = g2.gfql_remote([n(), e(), n()], engine='cudf')

Return only nodes and certain columns

cols = ['id', 'name']
g2b = g1.gfql_remote([n(), e(), n()], output_type="edges", edge_col_subset=cols)

Return only edges and certain columns

cols = ['src', 'dst']
g2b = g1.gfql_remote([n(), e(), n()], output_type="edges", edge_col_subset=cols)

Return only shape metadata

shape_df = g1.chain_remote_shape([n(), e(), n()])

Run remote Python and get back a graph

def my_remote_trim_graph_task(g):
    return (g
        .nodes(g._nodes[:10])
        .edges(g._edges[:10])
    )

g2 = g1.upload()
g3 = g2.python_remote_g(my_remote_trim_graph_task)

Run remote Python and get back a table

def first_n_edges(g):
    return g._edges[:10]

some_edges_df = g.python_remote_table(first_n_edges)

Run remote Python and get back JSON

def first_n_edges(g):
    return g._edges[:10].to_json()

some_edges_json = g.python_remote_json(first_n_edges)

Run remote Python and ensure runs on CPU or GPU

g3a = g2.python_remote_g(my_remote_trim_graph_task, engine='pandas')
g3b = g2.python_remote_g(my_remote_trim_graph_task, engine='cudf')

Run remote Python, passing as a string

g2 = g1.upload()

# ensure method is called "task" and takes a single argument "g"
g3 = g2.python_remote_g("""
    def task(g):
        return (g
            .nodes(g._nodes[:10])
            .edges(g._edges[:10])
        )
""")

Let Bindings and DAG Patterns#

Use Let bindings to create directed acyclic graph (DAG) patterns with named operations. Lists are treated as implicit Chains.

Basic Let with named bindings:

from graphistry import let, ref, n, e_forward, gt

result = g.gfql(let({
    'suspects': n({'risk_score': gt(80)}),
    'connections': [
        n({'risk_score': gt(80)}),
        e_forward({'type': 'transaction'}),
        n()
    ]
}))

# Access results by name
suspects = result._nodes[result._nodes['suspects']]
connections = result._edges[result._edges['connections']]

Complex DAG with multiple references:

from graphistry import let, ref, n, e_forward, gt

result = g.gfql(let({
    'high_value': n({'balance': gt(100000)}),
    'large_transfers': [
        n({'balance': gt(100000)}),
        e_forward({'type': 'transfer', 'amount': gt(10000)}),
        n()
    ],
    'suspicious': ref('large_transfers', [
        n({'created_recent': True, 'verified': False})
    ])
}))

Call Operations#

Run graph algorithms like PageRank, community detection, and layouts directly within your GFQL queries:

Compute PageRank:

from graphistry import call, let, ref, n, e

# Use let() to compose filter + enrichment
result = g.gfql(let({
    'persons': [n({'type': 'person'}), e(), n()],
    'ranked': ref('persons', [call('compute_cugraph', {'alg': 'pagerank', 'damping': 0.85})])
}))

# Results have pagerank column
top_nodes = result._nodes.sort_values('pagerank', ascending=False).head(10)

Community detection with Louvain:

from graphistry import call, let, ref, n, e_forward

# Use let() to compose traversal + community detection
result = g.gfql(let({
    'reachable': [n({'active': True}), e_forward(to_fixed_point=True), n()],
    'communities': ref('reachable', [call('compute_cugraph', {'alg': 'louvain'})])
}))

# Results have community column
communities = result._nodes.groupby('community').size()

Filter and compute within Let:

from graphistry import call, let, ref, n, e, gt

# Split mixed chain into separate bindings
result = g.gfql(let({
    'suspects': [n({'flagged': True}), e(), n()],
    'ranked': ref('suspects', [
        call('compute_cugraph', {'alg': 'pagerank'})
    ]),
    'influencers': ref('ranked', [
        n({'pagerank': gt(0.01)})
    ])
}))

Apply layout algorithms:

from graphistry import call, let, ref, n, e_forward, is_in

# Use let() to compose traversal + layout
result = g.gfql(let({
    'entities': [n({'type': is_in(['person', 'company'])}), e_forward(), n()],
    'positioned': ref('entities', [call('fa2_layout', {'iterations': 100})])
}))

# Results have x, y coordinates for visualization
result.plot()

Tip: For subset-based coloring after GFQL, use result.collections(...) and see Layout Settings & Visualization Embedding.

Remote Graph References#

Reference graphs on remote servers for distributed computing:

Basic remote reference:

from graphistry import remote

result = g.gfql([
    remote(dataset_id='fraud-network-2024'),
    n({'risk_score': gt(90)}),
    e_forward()
])

Combine remote and local data in Let:

result = g.gfql(let({
    'remote_data': remote(dataset_id='historical-2023'),
    'high_risk': ref('remote_data', [
        n({'risk_score': gt(95)})
    ]),
    'connections': ref('remote_data', [
        n({'risk_score': gt(95)}),
        e_forward({'type': 'transaction'}),
        n()
    ])
}))

Advanced Usage#

Traversal with source and destination node filters and queries:

e_forward(
    edge_query="type == 'follows' and weight > 2",
    source_node_match={"status": "active"},
    destination_node_query="age < 30",
    hops=2,
    name="social_edges"
)

Node matcher with all parameters:

n(
    filter_dict={"department": "sales"},
    query="age > 25 and tenure > 2",
    name="experienced_sales"
)

Edge matcher with all parameters:

e_reverse(
    edge_match={"transaction_type": "refund"},
    edge_query="amount > 100",
    source_node_match={"status": "inactive"},
    destination_node_match={"region": "EMEA"},
    name="large_refunds"
)

Parameter Summary#

Common Parameters:
- filter_dict: Attribute filters (e.g., {“status”: “active”})
- query: Custom query string (e.g., “age > 30”)
- hops: Max hops to traverse (shorthand for max_hops, default 1)
- to_fixed_point: Continue traversal until no more matches (bool, default False)
- name: Label for matchers (str)
- source_node_match, destination_node_match: Filters for connected nodes
- source_node_query, destination_node_query: Queries for connected nodes
- edge_match: Filters for edges
- edge_query: Query for edges
- engine: Execution engine (EngineAbstract.AUTO, ‘cudf’, etc.)

Traversal Directions#

Forward Traversal: e_forward(…)
Reverse Traversal: e_reverse(…)
Undirected Traversal: e_undirected(…)

Tips and Best Practices#

Limit hops for performance: Specify hops to control traversal depth.
Use naming for analysis: Apply name to label and filter results.
Combine filters: Use filter_dict and query for precise matching.
Leverage GPU acceleration: Use engine=’cudf’ for large datasets.
Avoid infinite loops: Be cautious with to_fixed_point=True in cyclic graphs.

Examples at a Glance#

Find all paths between two nodes:

g.gfql([
    n({g._node: "Alice"}),
    e_undirected(hops=3),
    n({g._node: "Bob"})
])

Match nodes with IDs in a range:
```
n(query="100 <= id <= 200")
```

Traverse edges with specific labels:

e_forward({"label": is_in(["knows", "likes"])})

Identify subgraphs based on attributes:

g.gfql([
    n({"community": "A"}),
    e_undirected(hops=2),
    n({"community": "B"}, name="bridge_nodes")
])

Custom edge and node queries:

g.gfql([
    n(query="age >= 18"),
    e_forward(edge_query="interaction == 'message'"),
    n(query="location == 'NYC'")
])

GFQL Quick Reference

Contents