GFQL Validation Fundamentals#
Learn how to use GFQL’s built-in validation system to catch errors early and build robust graph applications.
What You’ll Learn#
How GFQL automatically validates queries
Understanding structured error messages with error codes
Schema validation against your data
Pre-execution validation for performance
Collecting all errors vs fail-fast mode
Prerequisites#
Basic Python knowledge
PyGraphistry installed (
pip install graphistry[ai])
Setup and Imports#
First, let’s import the necessary modules and create sample data.
[ ]:
# Core imports
import pandas as pd
import graphistry
from graphistry.compute.chain import Chain
from graphistry.compute.ast import n, e_forward, e_reverse
# Exception types for error handling
from graphistry.compute.exceptions import (
GFQLValidationError,
GFQLSyntaxError,
GFQLTypeError,
GFQLSchemaError,
ErrorCode
)
# Check version
print(f"PyGraphistry version: {graphistry.__version__}")
print("\nValidation is now built-in to GFQL operations!")
Automatic Syntax Validation#
GFQL validates operations automatically when you create them. No need to call separate validation functions!
[ ]:
# Example 1: Valid chain creation
try:
chain = Chain([
n({'type': 'customer'}),
e_forward(),
n()
])
print("Valid chain created successfully!")
print(f"Chain has {len(chain.chain)} operations")
except GFQLValidationError as e:
print(f"Validation error: {e}")
[ ]:
# Example 2: Invalid parameter - negative hops
try:
chain = Chain([
n(),
e_forward(hops=-1), # Invalid: negative hops
n()
])
except GFQLTypeError as e:
print(f"Caught validation error!")
print(f" Error code: {e.code}")
print(f" Message: {e.message}")
print(f" Field: {e.context.get('field')}")
print(f" Suggestion: {e.context.get('suggestion')}")
Understanding Error Codes#
GFQL uses structured error codes for programmatic handling:
[ ]:
# Display available error codes
print("Error Code Categories:")
print("\nE1xx - Syntax Errors:")
print(f" {ErrorCode.E101}: Invalid type (e.g., chain not a list)")
print(f" {ErrorCode.E103}: Invalid parameter value")
print(f" {ErrorCode.E104}: Invalid direction")
print(f" {ErrorCode.E105}: Missing required field")
print("\nE2xx - Type Errors:")
print(f" {ErrorCode.E201}: Type mismatch")
print(f" {ErrorCode.E204}: Invalid name type")
print("\nE3xx - Schema Errors:")
print(f" {ErrorCode.E301}: Column not found")
print(f" {ErrorCode.E302}: Incompatible column type")
Create Sample Data#
[ ]:
# Create sample data
nodes_df = pd.DataFrame({
'id': ['a', 'b', 'c', 'd', 'e'],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'type': ['customer', 'customer', 'product', 'product', 'customer'],
'score': [100, 85, 95, 120, 110],
'active': [True, True, False, True, False]
})
edges_df = pd.DataFrame({
'src': ['a', 'b', 'c', 'd', 'e'],
'dst': ['c', 'd', 'a', 'b', 'c'],
'weight': [1.0, 2.5, 0.8, 1.2, 3.0],
'edge_type': ['buys', 'buys', 'recommends', 'recommends', 'buys']
})
# Create graph using canonical graphistry.edges() and graphistry.nodes()
g = graphistry.edges(edges_df, 'src', 'dst').nodes(nodes_df, 'id')
print("Graph created with:")
print(f" Nodes: {len(g._nodes)} (columns: {list(g._nodes.columns)})")
print(f" Edges: {len(g._edges)} (columns: {list(g._edges.columns)})")
Schema Validation (Runtime)#
When you execute a chain, GFQL automatically validates against your data schema:
[ ]:
# Valid query - columns exist
try:
result = g.gfql([
n({'type': 'customer'}),
e_forward({'edge_type': 'buys'}),
n({'type': 'product'})
])
print(f"Query executed successfully!")
print(f" Found {len(result._nodes)} nodes")
print(f" Found {len(result._edges)} edges")
except GFQLSchemaError as e:
print(f"Schema error: {e}")
[ ]:
# Invalid query - column doesn't exist
try:
result = g.gfql([
n({'category': 'VIP'}) # 'category' column doesn't exist
])
except GFQLSchemaError as e:
print(f"Schema validation caught the error!")
print(f" Error code: {e.code}")
print(f" Message: {e.message}")
print(f" Suggestion: {e.context.get('suggestion')}")
Type Mismatch Detection#
GFQL detects when you use the wrong type of value or predicate for a column:
[ ]:
# Type mismatch: string value on numeric column
try:
result = g.gfql([
n({'score': 'high'}) # 'score' is numeric, not string
])
except GFQLSchemaError as e:
print(f"Type mismatch detected!")
print(f" {e}")
print(f"\n Column type: {e.context.get('column_type')}")
[ ]:
# Using predicates
from graphistry.compute.predicates.numeric import gt
from graphistry.compute.predicates.str import contains
# Correct: numeric predicate on numeric column
try:
result = g.gfql([n({'score': gt(90)})])
print(f"Valid: Found {len(result._nodes)} high-scoring nodes")
except GFQLSchemaError as e:
print(f"Error: {e}")
# Wrong: string predicate on numeric column
try:
result = g.gfql([n({'score': contains('9')})])
except GFQLSchemaError as e:
print(f"\nPredicate type mismatch caught!")
print(f" {e.message}")
print(f" Suggestion: {e.context.get('suggestion')}")
Pre-Execution Validation#
For better performance, you can validate queries before execution:
[ ]:
# Pre-validate to catch errors early
chain_to_test = Chain([
n({'missing_col': 'value'}),
e_forward({'also_missing': 'value'})
])
# Method 1: Use validate_schema parameter
try:
result = g.gfql(chain_to_test.chain, validate_schema=True)
except GFQLSchemaError as e:
print("Pre-execution validation caught error!")
print(f" Error: {e}")
print(" (No graph operations were performed)")
[ ]:
# Method 2: Validate chain object directly
from graphistry.compute.validate_schema import validate_chain_schema
# Check if chain is compatible with graph schema
try:
validate_chain_schema(g, chain_to_test)
print("Chain is valid for this graph schema")
except GFQLSchemaError as e:
print(f"Schema incompatibility: {e}")
Collect All Errors vs Fail-Fast#
By default, validation fails on the first error. You can collect all errors instead:
[ ]:
# Create a chain with multiple errors
problematic_chain = Chain([
n({'missing1': 'value', 'score': 'not-a-number'}), # 2 errors
e_forward({'missing2': 'value'}), # 1 error
n({'type': gt(5)}) # 1 error: numeric predicate on string column
])
# Fail-fast mode (default)
print("Fail-fast mode:")
try:
problematic_chain.validate()
except GFQLValidationError as e:
print(f" Stopped at first error: {e}")
# Collect-all mode
print("\nCollect-all mode:")
errors = problematic_chain.validate(collect_all=True)
print(f" Found {len(errors)} syntax/type errors")
# For schema validation
schema_errors = validate_chain_schema(g, problematic_chain, collect_all=True)
print(f" Found {len(schema_errors)} schema errors:")
for i, error in enumerate(schema_errors):
print(f"\n Error {i+1}: [{error.code}] {error.message}")
if error.context.get('suggestion'):
print(f" Suggestion: {error.context['suggestion']}")
Error Handling Best Practices#
[ ]:
# Comprehensive error handling example
def safe_chain_execution(g, operations):
"""Execute chain with proper error handling."""
try:
# Create chain
chain = Chain(operations)
# Pre-validate if desired
# errors = chain.validate_schema(g, collect_all=True)
# if errors:
# print(f"Warning: {len(errors)} schema issues found")
# Execute
result = g.gfql(operations)
return result
except GFQLSyntaxError as e:
print(f"Syntax Error [{e.code}]: {e.message}")
if e.context.get('suggestion'):
print(f" Try: {e.context['suggestion']}")
return None
except GFQLTypeError as e:
print(f"Type Error [{e.code}]: {e.message}")
print(f" Field: {e.context.get('field')}")
print(f" Value: {e.context.get('value')}")
return None
except GFQLSchemaError as e:
print(f"Schema Error [{e.code}]: {e.message}")
if e.code == ErrorCode.E301:
print(" Column not found in data")
elif e.code == ErrorCode.E302:
print(" Type mismatch between query and data")
return None
# Test with valid query
print("Valid query:")
result = safe_chain_execution(g, [
n({'type': 'customer'}),
e_forward()
])
if result:
print(f" Success! Found {len(result._nodes)} nodes")
# Test with invalid query
print("\nInvalid query:")
result = safe_chain_execution(g, [
n({'invalid_column': 'value'})
])
Summary#
Key Takeaways#
Automatic Validation: GFQL validates automatically - no separate validation calls needed
Structured Errors: Error codes (E1xx, E2xx, E3xx) help with programmatic handling
Helpful Messages: Errors include suggestions for fixing issues
Two Validation Stages:
Syntax/Type: During chain construction
Schema: During execution (or pre-execution)
Flexible Modes: Choose between fail-fast or collect-all errors
Quick Reference#
# Automatic syntax validation
chain = Chain([...]) # Validates syntax/types
# Runtime schema validation
result = g.gfql([...]) # Validates against data
# Pre-execution schema validation
result = g.gfql([...], validate_schema=True)
# Validate chain against graph schema
from graphistry.compute.validate_schema import validate_chain_schema
validate_chain_schema(g, chain) # Throws GFQLSchemaError if invalid
# Collect all syntax errors
errors = chain.validate(collect_all=True)
# Collect all schema errors
schema_errors = validate_chain_schema(g, chain, collect_all=True)
Next Steps#
Explore more complex query patterns
Learn about GFQL predicates for advanced filtering
Use validation in production applications