KeplerDataset#

Configuration class for Kepler.gl datasets.

class graphistry.kepler.KeplerDataset(raw_dict: Dict[str, Any])#
class graphistry.kepler.KeplerDataset(raw_dict: None = None, *, id: str | None = None, type: Literal['nodes'], label: str | None = None, include: List[str] | None = None, exclude: List[str] | None = None, computed_columns: Dict[str, Any] | None = None)
class graphistry.kepler.KeplerDataset(raw_dict: None = None, *, id: str | None = None, type: Literal['edges'], label: str | None = None, include: List[str] | None = None, exclude: List[str] | None = None, computed_columns: Dict[str, Any] | None = None, map_node_coords: bool | None = None, map_node_coords_mapping: Dict[str, str] | None = None)
class graphistry.kepler.KeplerDataset(raw_dict: None = None, *, id: str | None = None, type: Literal['countries', 'zeroOrderAdminRegions'], label: str | None = None, include: List[str] | None = None, exclude: List[str] | None = None, computed_columns: Dict[str, Any] | None = None, resolution: Literal[10, 50, 110] | None = None, boundary_lakes: bool | None = None, filter_countries_by_col: str | None = None, include_countries: List[str] | None = None, exclude_countries: List[str] | None = None)
class graphistry.kepler.KeplerDataset(raw_dict: None = None, *, id: str | None = None, type: Literal['states', 'provinces', 'firstOrderAdminRegions'], label: str | None = None, include: List[str] | None = None, exclude: List[str] | None = None, computed_columns: Dict[str, Any] | None = None, boundary_lakes: bool | None = None, filter_countries_by_col: str | None = None, include_countries: List[str] | None = None, exclude_countries: List[str] | None = None, filter_1st_order_regions_by_col: str | None = None, include_1st_order_regions: List[str] | None = None, exclude_1st_order_regions: List[str] | None = None)

Bases: object

Configure a Kepler.gl dataset for visualization.

Creates a dataset configuration that makes Graphistry data (nodes/edges) or geographic data (countries/states) available to Kepler.gl for visualization.

Common parameters (all dataset types):

Parameters:
  • raw_dict (Optional[Dict[str, Any]]) – Native Kepler.gl dataset dictionary (if provided, all other params ignored)

  • id (Optional[str]) – Dataset identifier (auto-generated if None)

  • type (Optional[str]) – Dataset type - ‘nodes’, ‘edges’, ‘countries’, ‘states’, etc.

  • label (Optional[str]) – Display label (defaults to id)

  • include (Optional[List[str]]) – Columns to include (whitelist)

  • exclude (Optional[List[str]]) – Columns to exclude (blacklist)

  • computed_columns (Optional[Dict[str, Any]]) – Computed/aggregated columns for data enrichment

  • kwargs (Any)

For nodes type:

No additional parameters beyond common ones.

For edges type:

Parameters:
  • map_node_coords (Optional[bool]) – Auto-map source/target node coordinates to edges (adds columns: edgeSourceLatitude, edgeSourceLongitude, edgeTargetLatitude, edgeTargetLongitude)

  • map_node_coords_mapping (Optional[Dict[str, str]]) – Custom column names for mapped coordinates. Dict mapping default names to custom names, e.g., {“edgeSourceLongitude”: “src_lng”, “edgeSourceLatitude”: “src_lat”, “edgeTargetLongitude”: “dst_lng”, “edgeTargetLatitude”: “dst_lat”}

  • raw_dict (Dict[str, Any] | None)

  • id (str | None)

  • type (str | None)

  • label (str | None)

  • kwargs (Any)

For countries/zeroOrderAdminRegions type:

Parameters:
  • resolution (Optional[Literal[10, 50, 110]]) – Map resolution (10=high, 50=medium, 110=low)

  • boundary_lakes (Optional[bool]) – Include lake boundaries (default: True)

  • filter_countries_by_col (Optional[str]) – Column to filter countries

  • include_countries (Optional[List[str]]) – Countries to include

  • exclude_countries (Optional[List[str]]) – Countries to exclude

  • raw_dict (Dict[str, Any] | None)

  • id (str | None)

  • type (str | None)

  • label (str | None)

  • kwargs (Any)

For states/provinces/firstOrderAdminRegions type:

Parameters:
  • boundary_lakes (Optional[bool]) – Include lake boundaries (default: True)

  • filter_countries_by_col (Optional[str]) – Column to filter countries

  • include_countries (Optional[List[str]]) – Countries to include

  • exclude_countries (Optional[List[str]]) – Countries to exclude

  • filter_1st_order_regions_by_col (Optional[str]) – Column to filter regions

  • include_1st_order_regions (Optional[List[str]]) – Regions to include

  • exclude_1st_order_regions (Optional[List[str]]) – Regions to exclude

  • raw_dict (Dict[str, Any] | None)

  • id (str | None)

  • type (str | None)

  • label (str | None)

  • kwargs (Any)

Example: Node dataset
from graphistry import KeplerDataset

# Basic node dataset
ds = KeplerDataset(id="companies", type="nodes", label="Companies")

# With column filtering
ds = KeplerDataset(
    type="nodes",
    include=["name", "latitude", "longitude", "revenue"]
)
Example: Edge dataset with coordinate mapping
# Auto-map source/target node coordinates to edges
ds = KeplerDataset(
    type="edges",
    map_node_coords=True
)
Example: Countries with computed columns
# High-resolution countries with aggregated metrics
ds = KeplerDataset(
    type="countries",
    resolution=10,
    computed_columns={
        "avg_revenue": {
            "type": "aggregate",
            "computeFromDataset": "companies",
            "sourceKey": "country",
            "targetKey": "name",
            "aggregate": "mean",
            "aggregateCol": "revenue"
        }
    }
)
Example: Using raw_dict
# Pass through native Kepler.gl dataset dict
ds = KeplerDataset({
    "info": {"id": "my-dataset", "label": "My Data"},
    "data": {...}
})
id: str | None#
label: str | None#
to_dict()#

Serialize to dictionary format for Kepler.gl.

Return type:

Dict[str, Any]

type: str | None#

Note

For the native Kepler.gl dataset format when using raw_dict, see Kepler.gl Dataset Format.

Computed Columns#

computed_columns (dict, optional)

Define computed columns for data enrichment. Each computed column is added as a new column to the current dataset (the dataset where computed_columns is defined). The key in the dictionary becomes the new column name.

Structure:

{
    "new_column_name": {          # The key becomes the new column name in THIS dataset
        "type": "aggregate",              # Aggregation type
        "computeFromDataset": "source_dataset_id",
        "sourceKey": "join_column",       # Column in source dataset
        "targetKey": "join_column",       # Column in target (this) dataset
        "aggregate": "mean",              # Aggregation function: mean, sum, min, max, count
        "aggregateCol": "value_column",   # Column to aggregate
        "normalizer": "mean",             # Optional: normalize by another aggregation
        "normalizerCol": "divisor_col",   # Optional: column for normalization
        "bins": [0, 1, 2, 5, 10],        # Optional: bin continuous values
        "right": False,                   # Optional: bin right-inclusivity
        "includeLowest": True             # Optional: include lowest bin edge
    }
}

Example: A countries dataset can create avg_revenue by aggregating company revenue via country name.

Computed Column Fields:

  • type (str): Currently supports “aggregate”

  • computeFromDataset (str): ID of the dataset to aggregate from (the source)

  • sourceKey (str): Join column in the source dataset

  • targetKey (str): Join column in the target dataset (this dataset)

  • aggregate (str): Aggregation function name as string. Common options: “mean”, “sum”, “min”, “max”, “count”, “std”, “var”, “median”, “first”, “last”, “prod”, “nunique”. See cuDF groupby aggregation docs for full list.

  • aggregateCol (str): Column name to aggregate from the source dataset

  • normalizer (str, optional): Secondary aggregation function for normalization (e.g., divide mean by mean). Uses same aggregation function names as aggregate.

  • normalizerCol (str, optional): Column for normalization denominator

  • bins (List[float], optional): Bin edges for discretizing continuous values

  • right (bool, optional): Whether bins are right-inclusivity

  • includeLowest (bool, optional): Whether to include the lowest bin edge

Example#

# Aggregate data from another dataset
countries_with_stats = KeplerDataset(
    id="countries-stats",
    type="countries",
    resolution=110,
    computed_columns={
        "avg_revenue": {
            "type": "aggregate",
            "computeFromDataset": "companies",
            "sourceKey": "country",
            "targetKey": "name",
            "aggregate": "mean",
            "aggregateCol": "revenue"
        }
    }
)

See Also#