Geospatial Network Visualization with Kepler.gl Integration#
Visualizing geospatial networks combines the power of graph analysis with geographic context. This tutorial demonstrates how to create interactive map-based visualizations using PyGraphistry’s Kepler.gl integration to analyze company networks with real-world geographic coordinates.
PyGraphistry’s Kepler.gl integration enables you to overlay network data on maps, making it ideal for analyzing location-based relationships like supply chains, business partnerships, and regional connections.
Key Benefits#
Geographic Context: Visualize networks with real-world coordinates, making spatial patterns and regional clusters immediately visible.
Interactive Mapping: Leverage Kepler.gl’s powerful map visualization capabilities including multiple layer types (points, arcs, hexbins) and dynamic filtering.
Flexible Data Integration: Seamlessly combine network topology with geospatial attributes from sources like Wikidata, enriching your analysis with location-based insights.
Tutorial#
Follow this tutorial to create geospatial network visualizations:
Fetch company data with geographic coordinates from Wikidata
Create a network of company relationships
Visualize the network on an interactive map using Kepler.gl layers
[ ]:
import graphistry
graphistry.__version__
[ ]:
# API key page (free GPU account): https://hub.graphistry.com/users/personal/key/
# graphistry.register(
# api=3,
# personal_key_id=FILL_ME_IN,
# personal_key_secret=FILL_ME_IN
# )
Data: Fetching Company Information from Wikidata#
We’ll use a SPARQL query to fetch 5,000 companies with headquarters coordinates and related business attributes from Wikidata. The query retrieves:
Company names and headquarters locations
Geographic coordinates (latitude/longitude)
Business metrics (employees, revenue, market cap)
Additional metadata (industry, country, website, stock ticker)
[ ]:
import requests
import pandas as pd
# SPARQL query to fetch data from Wikidata
sparql_query = """
# Companies with HQ coordinates + commonly populated attributes
SELECT ?company ?companyLabel ?hq ?hqLabel
?countryLabel ?hqCountryLabel ?industryLabel
?inception ?employees ?revenue ?marketCap ?netIncome
?ticker ?isin ?website
(xsd:decimal(STRBEFORE(STRAFTER(STR(?coord), "Point("), " ")) AS ?longitude)
(xsd:decimal(STRAFTER(STRBEFORE(STR(?coord), ")"), " ")) AS ?latitude)
WHERE {
# Company w/ headquarters
?company wdt:P31 wd:Q4830453 ; # instance of: business/enterprise
wdt:P159 ?hq . # headquarters location
# Require coordinates (ensures non-sparse lat/long)
?hq wdt:P625 ?coord .
# Frequently populated company attributes
OPTIONAL { ?company wdt:P17 ?country . } # country
OPTIONAL { ?hq wdt:P17 ?hqCountry . } # HQ country
OPTIONAL { ?company wdt:P452 ?industry . } # industry
OPTIONAL { ?company wdt:P571 ?inception . } # inception (founded)
OPTIONAL { ?company wdt:P1128 ?employees . } # number of employees
OPTIONAL { ?company wdt:P2139 ?revenue . } # revenue
OPTIONAL { ?company wdt:P2295 ?marketCap . } # market cap
OPTIONAL { ?company wdt:P2293 ?netIncome . } # net income
OPTIONAL { ?company wdt:P249 ?ticker . } # stock ticker
OPTIONAL { ?company wdt:P946 ?isin . } # ISIN
OPTIONAL { ?company wdt:P856 ?website . } # website
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
LIMIT 5000
"""
# Wikidata SPARQL endpoint URL
wikidata_endpoint = "https://query.wikidata.org/sparql"
# Set up headers for the request
headers = {
"Accept": "application/sparql-results+json" # Request JSON format
}
# Send the GET request to the SPARQL endpoint
response = requests.get(wikidata_endpoint, headers=headers, params={"query": sparql_query})
# Check if the request was successful
if response.status_code == 200:
# Parse the JSON response
data = response.json()
# Extract the results
results = data["results"]["bindings"]
# Create a list of dictionaries for the DataFrame
processed_results = []
for item in results:
row = {}
for key, value in item.items():
row[key] = value["value"]
processed_results.append(row)
# Create a pandas DataFrame
companies_df = pd.DataFrame(processed_results)
print("Data successfully fetched")
else:
print(f"Error fetching data: {response.status_code}")
print(response.text)
[14]:
# Convert compatible columns to float type
def convert_to_float(df):
"""Convert columns that can be interpreted as float to float type."""
df_converted = df.copy()
for col in df_converted.columns:
# Skip if already float
if df_converted[col].dtype == "float64":
continue
try:
# Try to convert to float
df_converted[col] = pd.to_numeric(df_converted[col], errors="coerce")
# If all values became NaN, revert to original
if df_converted[col].isna().all():
df_converted[col] = df[col]
except:
# Keep original if conversion fails
pass
return df_converted
# Apply float conversion
companies_df = convert_to_float(companies_df)
[15]:
len(companies_df)
[15]:
5000
Data Validation#
[16]:
companies_df.dtypes
[16]:
company object
hq object
inception object
companyLabel float64
hqLabel object
countryLabel object
hqCountryLabel object
industryLabel object
longitude float64
latitude float64
employees float64
isin object
website object
revenue float64
marketCap float64
dtype: object
[17]:
invalid_lat_long_count = companies_df[(companies_df["latitude"].isna()) |
(companies_df["longitude"].isna()) |
(companies_df["latitude"] == 0) |
(companies_df["longitude"] == 0)].shape[0]
print(f"Number of records with invalid latitude or longitude: {invalid_lat_long_count}")
Number of records with invalid latitude or longitude: 0
Creating Synthetic Relationships#
For demonstration purposes, we’ll create synthetic relationships between companies with timestamps and relationship types.
[18]:
import numpy as np
# Create pairs of companies. For simplicity, let's create random pairs.
# In a real scenario, you would likely have a specific way to determine relationships.
num_edges = 20000 # You can adjust the number of edges
src_companies = np.random.choice(companies_df["company"], num_edges)
dest_companies = np.random.choice(companies_df["company"], num_edges)
# Generate random timestamps between 2010 and 2025
start_date = pd.to_datetime("2010-01-01")
end_date = pd.to_datetime("2025-12-31")
time_range = end_date - start_date
random_seconds = np.random.rand(num_edges) * time_range.total_seconds()
random_timestamps = start_date + pd.to_timedelta(random_seconds, unit="s")
# Create a list of possible relationship types
relationship_types = ["acquisition", "partnership", "investment", "merger", "collaboration"] # Add more types as needed
# Assign random relationship types
random_types = np.random.choice(relationship_types, num_edges)
# Create the edges_df DataFrame
edges_df = pd.DataFrame({
"src": src_companies,
"dest": dest_companies,
"timestamp": random_timestamps,
"type": random_types
})
display(edges_df.head(3))
| src | dest | timestamp | type | |
|---|---|---|---|---|
| 0 | http://www.wikidata.org/entity/Q391795 | http://www.wikidata.org/entity/Q429877 | 2011-03-07 05:44:15.732827432 | acquisition |
| 1 | http://www.wikidata.org/entity/Q403685 | http://www.wikidata.org/entity/Q476185 | 2011-03-04 15:24:34.197431505 | merger |
| 2 | http://www.wikidata.org/entity/Q320111 | http://www.wikidata.org/entity/Q181697 | 2012-09-19 04:47:02.804731473 | merger |
Standard Graph Visualization#
The bare minimum: When your data contains latitude and longitude columns, Graphistry automatically enables map-based visualization with a simple call to .plot().
This is the easiest way to get started with geographic visualization - no explicit Kepler configuration needed. Simply: 1. Have geographic coordinates in your data (latitude and longitude columns) 2. Set layout_settings(play=0) to disable auto-layout 3. Call .plot()
Graphistry automatically detects the geographic columns and renders your network on an interactive map with default Kepler.gl settings.
[30]:
g = graphistry.bind(source="src", destination="dest", node="company") \
.nodes(companies_df) \
.edges(edges_df) \
.layout_settings(play=0)
[32]:
g.plot()
[32]:
Configure: Custom Datasets and Layers#
Taking control: While the default visualization is convenient, you often want precise control over the data available to kepler and how your data appears on the map.
This section demonstrates how to explicitly populate Kepler datasets and layers using PyGraphistry’s encoding methods:
Datasets (
.encode_kepler_dataset()): Define which data to make available to Kepler (nodes, edges, etc.)Layers (
.encode_kepler_layer()): Configure how each dataset appears on the map (point layers, line layers, colors, visibility, etc.)
By explicitly configuring datasets and layers, you gain fine-grained control over: - Which data appears in which layer - Visual properties (colors, opacity, thickness) - Layer visibility (toggle layers on/off by default) - Layer types (points, lines, arcs, hexbins, etc.)
Note how the edge coordinates (edgeSourceLatitude, edgeSourceLongitude, etc.) are automatically created when an edge type dataset is specified.
[34]:
# Create Kepler encoding with datasets and layers
g2 = g \
.encode_kepler_dataset(
id="companies-dataset",
type="nodes",
label="Companies"
) \
.encode_kepler_dataset(
id="relationships-dataset",
type="edges",
label="Relationships",
) \
.encode_kepler_layer({
"id": "companies-layer",
"type": "point",
"config": {
"dataId": "companies-dataset",
"label": "Company Headquarters",
"columns": {
"lat": "latitude",
"lng": "longitude"
}
}
}) \
.encode_kepler_layer({
"id": "relationships-layer",
"type": "line",
"config": {
"dataId": "relationships-dataset",
"label": "Company Relationships",
"columns": {
"lat0": "edgeSourceLatitude",
"lng0": "edgeSourceLongitude",
"lat1": "edgeTargetLatitude",
"lng1": "edgeTargetLongitude"
},
"isVisible": False,
"color": [100, 200, 200],
"visConfig": {
"opacity": 0.01,
"thickness": 1
}
}
})
print("Kepler encoding applied successfully")
Kepler encoding applied successfully
[35]:
g2.plot()
[35]:
Configure: Choropleth Maps with Computed Columns#
The power of maps: This final example demonstrates the full capabilities of PyGraphistry’s Kepler integration by combining multiple advanced features.
What This Example Demonstrates#
Choropleth Maps: Geographic regions (countries) colored by aggregated metrics - a powerful way to visualize regional patterns
Computed Columns: Dynamic data transformations that aggregate company-level data (market cap, revenue) to country-level statistics without an external pipeline
Multi-layer Visualization: Combining point layers (companies), arc layers (relationships), and geojson layers (countries) in a single view
The Computed Column Magic#
The computed_columns feature calculates a Price-to-Sales (P/S) ratio for each country by: - Aggregating market cap values by country (mean of marketCap) - Normalizing by revenue (mean of revenue) - Binning results into categories for color encoding - Computing: (avg marketCap by country) / (avg revenue by country)
This creates a choropleth showing which countries have companies with higher valuation multiples relative to their revenue - a key financial metric for investors.
Why This Matters#
This example shows how geographic visualization can transform complex raw financial data into intuitive visual insights: - Spatial patterns: See which regions have overvalued vs undervalued companies - Multi-scale analysis: View individual companies and country-level aggregates simultaneously - Interactive exploration: Toggle layers to focus on different aspects (companies, relationships, regional metrics)
This progression from simple geographic plotting → explicit layer control → computed aggregations demonstrates how PyGraphistry’s Kepler integration enables sophisticated geospatial analytics with relatively simple Python code.
[36]:
from graphistry.kepler import KeplerDataset, KeplerLayer, KeplerEncoding
# Create visualization with countries colored by price-to-sales ratio
kepler_ps_encoding = (
KeplerEncoding()
# Nodes dataset with company data
.with_dataset(
KeplerDataset(
id="companies",
type="nodes",
label="Companies"
)
)
# Edges dataset with mapped coordinates
.with_dataset(
KeplerDataset(
id="relationships",
type="edges",
label="Relationships",
map_node_coords=True
)
)
# Countries dataset with computed price-to-sales ratio
.with_dataset(
KeplerDataset(
id="countries",
type="countries",
label="Countries by P/S Ratio",
resolution=110,
boundary_lakes=False,
computed_columns={
"avg_price_to_sales": {
"type": "aggregate",
"computeFromDataset": "companies",
"sourceKey": "hqCountryLabel",
"targetKey": "name",
"aggregate": "mean",
"aggregateCol": "marketCap",
"normalizer": "mean",
"normalizerCol": "revenue",
"bins": [0, 0.05, 0.1, 0.15, 0.2, 0.3, 0.5, 1, 999999],
"right": False,
"includeLowest": True
}
}
)
)
# # Company point layer
.with_layer(
KeplerLayer({
"id": "nodes",
"type": "point",
"config": {
"dataId": "companies",
"label": "Company Headquarters",
"columns": {
"lat": "latitude",
"lng": "longitude"
},
"isVisible": False,
"color": [255, 140, 0],
"visConfig": {
"radius": 5,
"fixedRadius": False,
"opacity": 0.6,
"outline": True,
"thickness": 1,
"strokeColor": [255, 255, 255],
"colorRange": {
"name": "Global Warming",
"type": "sequential",
"category": "Uber",
"colors": ["#5A1846", "#900C3F", "#C70039", "#E3611C", "#F1920E", "#FFC300"]
}
}
}
})
)
# Edge arc layer (initially hidden)
.with_layer(
KeplerLayer({
"id": "relationship-arcs",
"type": "arc",
"config": {
"dataId": "relationships",
"label": "Company Relationships",
"columns": {
"lat0": "edgeSourceLatitude",
"lng0": "edgeSourceLongitude",
"lat1": "edgeTargetLatitude",
"lng1": "edgeTargetLongitude"
},
"isVisible": False,
"color": [100, 200, 200],
"visConfig": {
"opacity": 0.01,
"thickness": 1
}
}
})
)
# Countries geojson layer with color encoding by P/S ratio
.with_layer(
KeplerLayer({
"id": "countries-ps-layer",
"type": "geojson",
"config": {
"dataId": "countries",
"label": "Countries by Price-to-Sales Ratio",
"columns": {
"geojson": "_geometry"
},
"isVisible": True,
"visConfig": {
"opacity": 0.7,
"strokeOpacity": 0.8,
"thickness": 0.5,
"strokeColor": [60, 60, 60],
"colorRange": {
"name": "Custom P/S Gradient",
"type": "sequential",
"category": "Custom",
"colors": [
"#000000", # Black for lowest P/S (0-0.5)
"#1a0a00", # Very dark brown (0.5-1)
"#331400", # Dark brown (1-2)
"#4d1f00", # Brown (2-3)
"#802d00", # Dark orange-brown (3-5)
"#b34000", # Medium orange (5-7)
"#e65c00", # Bright orange (7-10)
"#ff8c1a" # Vibrant orange for highest P/S (10+)
]
},
"filled": True,
"outline": True,
"extruded": False,
"wireframe": False
}
},
"visualChannels": {
"colorField": {
"name": "avg_price_to_sales",
"type": "string"
},
"colorScale": "ordinal",
"sizeField": None,
"sizeScale": "linear"
}
})
)
# Configure options
.with_options(
center_map=True,
read_only=False
)
# Configure settings
.with_config(
cull_unused_columns=False
)
)
# Apply kepler encoding to graph
g3 = g.encode_kepler(kepler_ps_encoding)
print("Price-to-Sales Ratio Visualization Created")
print(f"Datasets: {len(kepler_ps_encoding.datasets)}")
print(f"Layers: {len(kepler_ps_encoding.layers)}")
print("\nComputed column calculates: marketCap / revenue aggregated by country")
print("P/S Ratio bins: [0-0.5, 0.5-1, 1-2, 2-3, 3-5, 5-7, 7-10, 10+]")
print("Color scale: Black (low P/S) → Orange (high P/S)")
Price-to-Sales Ratio Visualization Created
Datasets: 3
Layers: 3
Computed column calculates: marketCap / revenue aggregated by country
P/S Ratio bins: [0-0.5, 0.5-1, 1-2, 2-3, 3-5, 5-7, 7-10, 10+]
Color scale: Black (low P/S) → Orange (high P/S)
[ ]:
g3.plot()