GFQL Validation Fundamentals#

Learn how to use GFQL’s built-in validation system to catch errors early and build robust graph applications.

What You’ll Learn#

  • How GFQL automatically validates queries

  • Understanding structured error messages with error codes

  • Schema validation against your data

  • Pre-execution validation for performance

  • Collecting all errors vs fail-fast mode

Prerequisites#

  • Basic Python knowledge

  • PyGraphistry installed (pip install graphistry[ai])

Setup and Imports#

First, let’s import the necessary modules and create sample data.

[ ]:
# Core imports
import pandas as pd
import graphistry
from graphistry.compute.chain import Chain
from graphistry.compute.ast import n, e_forward, e_reverse

# Exception types for error handling
from graphistry.compute.exceptions import (
    GFQLValidationError,
    GFQLSyntaxError,
    GFQLTypeError,
    GFQLSchemaError,
    ErrorCode
)

# Check version
print(f"PyGraphistry version: {graphistry.__version__}")
print("\nValidation is now built-in to GFQL operations!")

Automatic Syntax Validation#

GFQL validates operations automatically when you create them. No need to call separate validation functions!

[ ]:
# Example 1: Valid chain creation
try:
    chain = Chain([
        n({'type': 'customer'}),
        e_forward(),
        n()
    ])
    print("Valid chain created successfully!")
    print(f"Chain has {len(chain.chain)} operations")
except GFQLValidationError as e:
    print(f"Validation error: {e}")
[ ]:
# Example 2: Invalid parameter - negative hops
try:
    chain = Chain([
        n(),
        e_forward(hops=-1),  # Invalid: negative hops
        n()
    ])
except GFQLTypeError as e:
    print(f"Caught validation error!")
    print(f"   Error code: {e.code}")
    print(f"   Message: {e.message}")
    print(f"   Field: {e.context.get('field')}")
    print(f"   Suggestion: {e.context.get('suggestion')}")

Understanding Error Codes#

GFQL uses structured error codes for programmatic handling:

[ ]:
# Display available error codes
print("Error Code Categories:")
print("\nE1xx - Syntax Errors:")
print(f"  {ErrorCode.E101}: Invalid type (e.g., chain not a list)")
print(f"  {ErrorCode.E103}: Invalid parameter value")
print(f"  {ErrorCode.E104}: Invalid direction")
print(f"  {ErrorCode.E105}: Missing required field")

print("\nE2xx - Type Errors:")
print(f"  {ErrorCode.E201}: Type mismatch")
print(f"  {ErrorCode.E204}: Invalid name type")

print("\nE3xx - Schema Errors:")
print(f"  {ErrorCode.E301}: Column not found")
print(f"  {ErrorCode.E302}: Incompatible column type")

Create Sample Data#

[ ]:
# Create sample data
nodes_df = pd.DataFrame({
    'id': ['a', 'b', 'c', 'd', 'e'],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'type': ['customer', 'customer', 'product', 'product', 'customer'],
    'score': [100, 85, 95, 120, 110],
    'active': [True, True, False, True, False]
})

edges_df = pd.DataFrame({
    'src': ['a', 'b', 'c', 'd', 'e'],
    'dst': ['c', 'd', 'a', 'b', 'c'],
    'weight': [1.0, 2.5, 0.8, 1.2, 3.0],
    'edge_type': ['buys', 'buys', 'recommends', 'recommends', 'buys']
})

# Create graph using canonical graphistry.edges() and graphistry.nodes()
g = graphistry.edges(edges_df, 'src', 'dst').nodes(nodes_df, 'id')

print("Graph created with:")
print(f"  Nodes: {len(g._nodes)} (columns: {list(g._nodes.columns)})")
print(f"  Edges: {len(g._edges)} (columns: {list(g._edges.columns)})")

Schema Validation (Runtime)#

When you execute a chain, GFQL automatically validates against your data schema:

[ ]:
# Valid query - columns exist
try:
    result = g.gfql([
        n({'type': 'customer'}),
        e_forward({'edge_type': 'buys'}),
        n({'type': 'product'})
    ])
    print(f"Query executed successfully!")
    print(f"   Found {len(result._nodes)} nodes")
    print(f"   Found {len(result._edges)} edges")
except GFQLSchemaError as e:
    print(f"Schema error: {e}")
[ ]:
# Invalid query - column doesn't exist
try:
    result = g.gfql([
        n({'category': 'VIP'})  # 'category' column doesn't exist
    ])
except GFQLSchemaError as e:
    print(f"Schema validation caught the error!")
    print(f"   Error code: {e.code}")
    print(f"   Message: {e.message}")
    print(f"   Suggestion: {e.context.get('suggestion')}")

Type Mismatch Detection#

GFQL detects when you use the wrong type of value or predicate for a column:

[ ]:
# Type mismatch: string value on numeric column
try:
    result = g.gfql([
        n({'score': 'high'})  # 'score' is numeric, not string
    ])
except GFQLSchemaError as e:
    print(f"Type mismatch detected!")
    print(f"   {e}")
    print(f"\n   Column type: {e.context.get('column_type')}")
[ ]:
# Using predicates
from graphistry.compute.predicates.numeric import gt
from graphistry.compute.predicates.str import contains

# Correct: numeric predicate on numeric column
try:
    result = g.gfql([n({'score': gt(90)})])
    print(f"Valid: Found {len(result._nodes)} high-scoring nodes")
except GFQLSchemaError as e:
    print(f"Error: {e}")

# Wrong: string predicate on numeric column
try:
    result = g.gfql([n({'score': contains('9')})])
except GFQLSchemaError as e:
    print(f"\nPredicate type mismatch caught!")
    print(f"   {e.message}")
    print(f"   Suggestion: {e.context.get('suggestion')}")

Pre-Execution Validation#

For better performance, you can validate queries before execution:

[ ]:
# Pre-validate to catch errors early
chain_to_test = Chain([
    n({'missing_col': 'value'}),
    e_forward({'also_missing': 'value'})
])

# Method 1: Use validate_schema parameter
try:
    result = g.gfql(chain_to_test.chain, validate_schema=True)
except GFQLSchemaError as e:
    print("Pre-execution validation caught error!")
    print(f"   Error: {e}")
    print("   (No graph operations were performed)")
[ ]:
# Method 2: Validate chain object directly
from graphistry.compute.validate_schema import validate_chain_schema

# Check if chain is compatible with graph schema
try:
    validate_chain_schema(g, chain_to_test)
    print("Chain is valid for this graph schema")
except GFQLSchemaError as e:
    print(f"Schema incompatibility: {e}")

Collect All Errors vs Fail-Fast#

By default, validation fails on the first error. You can collect all errors instead:

[ ]:
# Create a chain with multiple errors
problematic_chain = Chain([
    n({'missing1': 'value', 'score': 'not-a-number'}),  # 2 errors
    e_forward({'missing2': 'value'}),  # 1 error
    n({'type': gt(5)})  # 1 error: numeric predicate on string column
])

# Fail-fast mode (default)
print("Fail-fast mode:")
try:
    problematic_chain.validate()
except GFQLValidationError as e:
    print(f"  Stopped at first error: {e}")

# Collect-all mode
print("\nCollect-all mode:")
errors = problematic_chain.validate(collect_all=True)
print(f"  Found {len(errors)} syntax/type errors")

# For schema validation
schema_errors = validate_chain_schema(g, problematic_chain, collect_all=True)
print(f"  Found {len(schema_errors)} schema errors:")
for i, error in enumerate(schema_errors):
    print(f"\n  Error {i+1}: [{error.code}] {error.message}")
    if error.context.get('suggestion'):
        print(f"    Suggestion: {error.context['suggestion']}")

Error Handling Best Practices#

[ ]:
# Comprehensive error handling example
def safe_chain_execution(g, operations):
    """Execute chain with proper error handling."""
    try:
        # Create chain
        chain = Chain(operations)

        # Pre-validate if desired
        # errors = chain.validate_schema(g, collect_all=True)
        # if errors:
        #     print(f"Warning: {len(errors)} schema issues found")

        # Execute
        result = g.gfql(operations)
        return result

    except GFQLSyntaxError as e:
        print(f"Syntax Error [{e.code}]: {e.message}")
        if e.context.get('suggestion'):
            print(f"  Try: {e.context['suggestion']}")
        return None

    except GFQLTypeError as e:
        print(f"Type Error [{e.code}]: {e.message}")
        print(f"  Field: {e.context.get('field')}")
        print(f"  Value: {e.context.get('value')}")
        return None

    except GFQLSchemaError as e:
        print(f"Schema Error [{e.code}]: {e.message}")
        if e.code == ErrorCode.E301:
            print("  Column not found in data")
        elif e.code == ErrorCode.E302:
            print("  Type mismatch between query and data")
        return None

# Test with valid query
print("Valid query:")
result = safe_chain_execution(g, [
    n({'type': 'customer'}),
    e_forward()
])
if result:
    print(f"  Success! Found {len(result._nodes)} nodes")

# Test with invalid query
print("\nInvalid query:")
result = safe_chain_execution(g, [
    n({'invalid_column': 'value'})
])

Summary#

Key Takeaways#

  1. Automatic Validation: GFQL validates automatically - no separate validation calls needed

  2. Structured Errors: Error codes (E1xx, E2xx, E3xx) help with programmatic handling

  3. Helpful Messages: Errors include suggestions for fixing issues

  4. Two Validation Stages:

    • Syntax/Type: During chain construction

    • Schema: During execution (or pre-execution)

  5. Flexible Modes: Choose between fail-fast or collect-all errors

Quick Reference#

# Automatic syntax validation
chain = Chain([...])  # Validates syntax/types

# Runtime schema validation
result = g.gfql([...])  # Validates against data

# Pre-execution schema validation
result = g.gfql([...], validate_schema=True)

# Validate chain against graph schema
from graphistry.compute.validate_schema import validate_chain_schema
validate_chain_schema(g, chain)  # Throws GFQLSchemaError if invalid

# Collect all syntax errors
errors = chain.validate(collect_all=True)

# Collect all schema errors
schema_errors = validate_chain_schema(g, chain, collect_all=True)

Next Steps#

  • Explore more complex query patterns

  • Learn about GFQL predicates for advanced filtering

  • Use validation in production applications