GFQL Quick Reference#

This quick reference page provides short examples of various parameters and usage patterns.

Basic Usage#

Chaining Operations

g.gfql(ops=[...], engine=EngineAbstract.AUTO)

gfql sequences multiple matchers for more complex patterns of paths and subgraphs

  • ops: Sequence of graph node and edge matchers (ASTObject instances).

  • engine: Optional execution engine. Engine is typically not set, defaulting to ‘auto’. Use ‘cudf’ for GPU acceleration and ‘pandas’ for CPU.

Node Matchers#

n(filter_dict=None, name=None, query=None)

n matches nodes based on their attributes.

  • Filter nodes based on attributes.

  • Parameters:

    • filter_dict: {attribute: value} or {attribute: condition_function}

    • name: Optional label; adds a boolean column in the result.

    • query: Custom query string (e.g., “age > 30 and country == ‘USA’”).

Examples:

  • Match nodes where type is ‘person’:

    n({"type": "person"})
    
  • Match nodes with age greater than 30:

    n({"age": lambda x: x > 30})
    
  • Use a custom query string:

    n(query="age > 30 and country == 'USA'")
    

Edge Matchers#

e_forward(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)
e_reverse(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)
e_undirected(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)

# alias for e_undirected
e(edge_match=None, hops=1, min_hops=None, max_hops=None, output_min_hops=None, output_max_hops=None, label_node_hops=None, label_edge_hops=None, label_seeds=False, to_fixed_point=False, source_node_match=None, destination_node_match=None, source_node_query=None, destination_node_query=None, edge_query=None, name=None)

e matches edges based on their attributes (undirected). May also include matching on edge’s source and destination nodes.

  • Traverse edges in the forward direction.

  • Parameters:

    • edge_match: {attribute: value} or {attribute: condition_function}

    • edge_query: Custom query string for edge attributes.

    • hops: int, number of hops to traverse.

    • min_hops/max_hops: Inclusive traversal bounds (min defaults to 1 unless max_hops is 0; max defaults to hops).

    • output_min_hops/output_max_hops: Optional post-filter slice; defaults keep all traversed hops up to max_hops.

    • label_node_hops/label_edge_hops: Optional column names for hop numbers; label_seeds=True adds hop 0 for seeds.

    • to_fixed_point: bool, continue traversal until no more matches.

    • source_node_match: Filter for source nodes.

    • destination_node_match: Filter for destination nodes.

    • source_node_query: Custom query string for source nodes.

    • destination_node_query: Custom query string for destination nodes.

    • name: Optional label.

Examples:

  • Traverse up to 2 hops forward on edges where status is ‘active’:

    e_forward({"status": "active"}, hops=2)
    
  • Traverse 2..4 hops but show only hops 3..4 with labels:

    e_forward(
        {"status": "active"},
        min_hops=2,
        max_hops=4,
        output_min_hops=3,
        label_edge_hops="edge_hop"
    )
    
  • Use custom edge query strings:

    e_forward(edge_query="weight > 5 and type == 'connects'")
    
  • Filter source and destination nodes with match dictionaries:

    e_forward(
        source_node_match={"status": "active"},
        destination_node_match={"age": lambda x: x < 30}
    )
    
  • Filter source and destination nodes with queries:

    e_forward(
        source_node_query="status == 'active'",
        destination_node_query="age < 30"
    )
    
  • Label matched edges:

    e_forward(name="active_edges")
    

e_reverse, e_forward, and e are aliases.

  • e_reverse: Same as e_forward, but traverses in reverse.

  • e: Traverses edges regardless of direction.

Predicates#

graphistry.compute.predicates.ASTPredicate.ASTPredicate

  • Matches using a predicate on entity attributes.

See GFQL Operator Reference for more information.

Example:

  • Match nodes where category is ‘A’, ‘B’, or ‘C’:

    from graphistry import n, is_in
    
    n({"category": is_in(["A", "B", "C"])})
    

Combined Examples#

  • Find people connected to transactions via active relationships:

    g.gfql([
        n({"type": "person"}),
        e_forward({"status": "active"}),
        n({"type": "transaction"})
    ])
    
  • Label nodes and edges during traversal:

    g.gfql([
        n({"id": "start_node"}, name="start"),
        e_forward(name="edge1"),
        n({"level": 2}, name="middle"),
        e_forward(name="edge2"),
        n({"type": "end_type"}, name="end")
    ])
    
  • Traverse until no more matches (fixed point):

    g.gfql([
        n({"status": "infected"}),
        e_forward(to_fixed_point=True),
        n(name="reachable")
    ])
    
  • Filter by multiple conditions:

    g.gfql([
        n({"type": is_in(["server", "database"])}),
        e_undirected({"protocol": "TCP"}, hops=3),
        n(query="risk_level >= 8")
    ])
    
  • Use custom queries in matchers:

    g.gfql([
        n(query="age > 30 and country == 'USA'"),
        e_forward(edge_query="weight > 5"),
        n(query="status == 'active'")
    ])
    

GPU Acceleration#

  • Enable GPU mode:

    g.gfql([...], engine='cudf')
    
  • Example with cuDF DataFrames:

    import cudf
    
    e_gdf = cudf.from_pandas(edge_df)
    n_gdf = cudf.from_pandas(node_df)
    
    g = graphistry.nodes(n_gdf, 'node_id').edges(e_gdf, 'src', 'dst')
    g.gfql([...], engine='cudf')
    

Remote Mode#

  • Query existing remote data

    g = graphistry.bind(dataset_id='ds-abc-123')
    
    nodes_df = g.gfql_remote([n()])._nodes
    
  • Upload graph and run GFQL

    g2 = g1.upload()
    
    g3 = g2.gfql_remote([n(), e(), n()])
    
  • Enforce CPU and GPU mode on remote GFQL

    g3a = g2.gfql_remote([n(), e(), n()], engine='pandas')
    g3b = g2.gfql_remote([n(), e(), n()], engine='cudf')
    
  • Return only nodes and certain columns

    cols = ['id', 'name']
    g2b = g1.gfql_remote([n(), e(), n()], output_type="edges", edge_col_subset=cols)
    
  • Return only edges and certain columns

    cols = ['src', 'dst']
    g2b = g1.gfql_remote([n(), e(), n()], output_type="edges", edge_col_subset=cols)
    
  • Return only shape metadata

    shape_df = g1.chain_remote_shape([n(), e(), n()])
    
  • Run remote Python and get back a graph

    def my_remote_trim_graph_task(g):
        return (g
            .nodes(g._nodes[:10])
            .edges(g._edges[:10])
        )
    
    g2 = g1.upload()
    g3 = g2.python_remote_g(my_remote_trim_graph_task)
    
  • Run remote Python and get back a table

    def first_n_edges(g):
        return g._edges[:10]
    
    some_edges_df = g.python_remote_table(first_n_edges)
    
  • Run remote Python and get back JSON

    def first_n_edges(g):
        return g._edges[:10].to_json()
    
    some_edges_json = g.python_remote_json(first_n_edges)
    
  • Run remote Python and ensure runs on CPU or GPU

    g3a = g2.python_remote_g(my_remote_trim_graph_task, engine='pandas')
    g3b = g2.python_remote_g(my_remote_trim_graph_task, engine='cudf')
    
  • Run remote Python, passing as a string

    g2 = g1.upload()
    
    # ensure method is called "task" and takes a single argument "g"
    g3 = g2.python_remote_g("""
        def task(g):
            return (g
                .nodes(g._nodes[:10])
                .edges(g._edges[:10])
            )
    """)
    

Let Bindings and DAG Patterns#

Use Let bindings to create directed acyclic graph (DAG) patterns with named operations:

  • Basic Let with named bindings:

    from graphistry import let, ref, Chain
    
    result = g.gfql(let({
        'suspects': [n({'risk_score': gt(80)})],
        'connections': ref('suspects', [
            e_forward({'type': 'transaction'}),
            n()
        ])
    }))
    
    # Access results by name
    suspects = result._nodes[result._nodes['suspects']]
    connections = result._edges[result._edges['connections']]
    
  • Complex DAG with multiple references:

    from graphistry import Chain
    
    result = g.gfql(let({
        'high_value': [n({'balance': gt(100000)})],
        'large_transfers': ref('high_value', [
            e_forward({'type': 'transfer', 'amount': gt(10000)}),
            n()
        ]),
        'suspicious': ref('large_transfers', [
            n({'created_recent': True, 'verified': False})
        ])
    }))
    

Call Operations#

Run graph algorithms like PageRank, community detection, and layouts directly within your GFQL queries:

  • Compute PageRank:

    from graphistry import call, let, ref, n, e
    
    # Use let() to compose filter + enrichment
    result = g.gfql(let({
        'persons': [n({'type': 'person'}), e(), n()],
        'ranked': ref('persons', [call('compute_cugraph', {'alg': 'pagerank', 'damping': 0.85})])
    }))
    
    # Results have pagerank column
    top_nodes = result._nodes.sort_values('pagerank', ascending=False).head(10)
    
  • Community detection with Louvain:

    from graphistry import call, let, ref, n, e_forward
    
    # Use let() to compose traversal + community detection
    result = g.gfql(let({
        'reachable': [n({'active': True}), e_forward(to_fixed_point=True), n()],
        'communities': ref('reachable', [call('compute_cugraph', {'alg': 'louvain'})])
    }))
    
    # Results have community column
    communities = result._nodes.groupby('community').size()
    
  • Filter and compute within Let:

    from graphistry import call, let, ref, n, e, gt
    
    # Split mixed chain into separate bindings
    result = g.gfql(let({
        'suspects': [n({'flagged': True}), e(), n()],
        'ranked': ref('suspects', [
            call('compute_cugraph', {'alg': 'pagerank'})
        ]),
        'influencers': ref('ranked', [
            n({'pagerank': gt(0.01)})
        ])
    }))
    
  • Apply layout algorithms:

    from graphistry import call, let, ref, n, e_forward, is_in
    
    # Use let() to compose traversal + layout
    result = g.gfql(let({
        'entities': [n({'type': is_in(['person', 'company'])}), e_forward(), n()],
        'positioned': ref('entities', [call('fa2_layout', {'iterations': 100})])
    }))
    
    # Results have x, y coordinates for visualization
    result.plot()
    

Tip: For subset-based coloring after GFQL, use result.collections(...) and see Layout Settings & Visualization Embedding.

Remote Graph References#

Reference graphs on remote servers for distributed computing:

  • Basic remote reference:

    from graphistry import remote
    
    result = g.gfql([
        remote(dataset_id='fraud-network-2024'),
        n({'risk_score': gt(90)}),
        e_forward()
    ])
    
  • Combine remote and local data in Let:

    result = g.gfql(let({
        'remote_data': remote(dataset_id='historical-2023'),
        'high_risk': ref('remote_data', [
            n({'risk_score': gt(95)})
        ]),
        'connections': ref('high_risk', [
            e_forward({'type': 'transaction'}),
            n()
        ])
    }))
    

Advanced Usage#

  • Traversal with source and destination node filters and queries:

    e_forward(
        edge_query="type == 'follows' and weight > 2",
        source_node_match={"status": "active"},
        destination_node_query="age < 30",
        hops=2,
        name="social_edges"
    )
    
  • Node matcher with all parameters:

    n(
        filter_dict={"department": "sales"},
        query="age > 25 and tenure > 2",
        name="experienced_sales"
    )
    
  • Edge matcher with all parameters:

    e_reverse(
        edge_match={"transaction_type": "refund"},
        edge_query="amount > 100",
        source_node_match={"status": "inactive"},
        destination_node_match={"region": "EMEA"},
        name="large_refunds"
    )
    

Parameter Summary#

  • Common Parameters:

    • filter_dict: Attribute filters (e.g., {“status”: “active”})

    • query: Custom query string (e.g., “age > 30”)

    • hops: Max hops to traverse (shorthand for max_hops, default 1)

    • to_fixed_point: Continue traversal until no more matches (bool, default False)

    • name: Label for matchers (str)

    • source_node_match, destination_node_match: Filters for connected nodes

    • source_node_query, destination_node_query: Queries for connected nodes

    • edge_match: Filters for edges

    • edge_query: Query for edges

    • engine: Execution engine (EngineAbstract.AUTO, ‘cudf’, etc.)

Traversal Directions#

  • Forward Traversal: e_forward(…)

  • Reverse Traversal: e_reverse(…)

  • Undirected Traversal: e_undirected(…)

Tips and Best Practices#

  • Limit hops for performance: Specify hops to control traversal depth.

  • Use naming for analysis: Apply name to label and filter results.

  • Combine filters: Use filter_dict and query for precise matching.

  • Leverage GPU acceleration: Use engine=’cudf’ for large datasets.

  • Avoid infinite loops: Be cautious with to_fixed_point=True in cyclic graphs.

Examples at a Glance#

  • Find all paths between two nodes:

    g.gfql([
        n({g._node: "Alice"}),
        e_undirected(hops=3),
        n({g._node: "Bob"})
    ])
    
  • Match nodes with IDs in a range:

    n(query="100 <= id <= 200")
    
  • Traverse edges with specific labels:

    e_forward({"label": is_in(["knows", "likes"])})
    
  • Identify subgraphs based on attributes:

    g.gfql([
        n({"community": "A"}),
        e_undirected(hops=2),
        n({"community": "B"}, name="bridge_nodes")
    ])
    
  • Custom edge and node queries:

    g.gfql([
        n(query="age >= 18"),
        e_forward(edge_query="interaction == 'message'"),
        n(query="location == 'NYC'")
    ])