Tutorial: Using Azure Data Explorer’s Persistent Graphs with Kusto & Graphistry#
This tutorial demonstrates integrating Azure Data Explorer’s (ADX) Persistent Graphs with PyGraphistry, enabling easy GPU-accelerated graph visualization and analytics.
Why Integrate#
Microsoft’s ADX Persistent Graphs lets you define and reuse graph relationships directly with ADX. Native support brings reuse and speed.
PyGraphistry’s GPU-accelerated visual analytics pipelinesthat make complex graph investigations more interactive, intuitive, and insightful. Teams typically use Graphistry from existing workflows in notebooks, dashboards, and custom web apps to quickly make insightful graph experiences.
Together, they simplify and accelerate full investigations into data already in Azure Data Explorer. Teams get to leverage their exsting investments into Kusto Query Language (KQL) and gain the ability to answer relationship-centric questions in domains like security, IT operations, user behavior, and supply chains, even at large scales.
For a genAI-native approach where analysts can work in natural language to talk to Kusto and generate Graphistry visualizations, you may also be interested in Louie.ai.
Tutorial Outline#
You’ll learn to:
Query Kusto and ADX graphs with PyGraphistry
Create persistent graphs in Azure Data Explorer from a CSV
Explore and visualize results as dataframes and Graphistry GPU graph visualizations
Create graph pipelines with PyGraphistry
Let’s begin!
Setup#
Install pygraphistry and the Kusto python client
# Just Graphistry; bring your own Kusto install
pip install graphistry
# Bundled Kusto install
pip install graphistry[kusto]
Take it for a spin:#
Connect to Kusto and Graphistry#
Get a free Graphistry Hub GPU API key or run your own server
To learn more about authentication methods for different Graphistry configurations, check out API authentication to Graphistry servers
[1]:
import graphistry
from datetime import datetime
[ ]:
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html
[ ]:
KUSTO_CONF = {
"cluster": "https://<clustername>.<region>.kusto.windows.net",
"database": "<YourDatabase>"
}
graphistry.configure_kusto(**KUSTO_CONF)
Ingest data into your Azure Data Explorer cluster.#
Import the RedTeam50k dataset used in our UMAP cyber demo notebook into your Azure Data Explorer cluster.
The dataset is a massaged version of the dataset publish by Alexander D. Kent.
Executing using graphistry#
With your registered and configured pygraphistry object it is now easy to execute Kusto.
We load the redteam50k dataset into our cluster.
The “kql” function returns a list of dataframes.
[ ]:
graphistry.kql(""".execute script <|
.create-or-alter function graphistryRedTeam50k () {
externaldata(index:long, event_time:long, src_domain:string, dst_domain:string, src_computer:string, dst_computer:string, auth_type:string, logontype:string, authentication_orientation:string, success_or_failure:string, RED:int, feats:string, feats2:string)
[
h@"https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/graphistry_redteam50k.csv"
]
with(format="csv", ignoreFirstRecord=true)
| extend event_time = datetime(2024-01-01) + event_time * 1s
}
""")
Grabbing a sample of data#
[5]:
# Grabbing the first dataframe
df = graphistry.kql("graphistryRedTeam50k | take 100")
df.head(5)
Query returned 1 results shapes: [(100, 13)] in 0.374 sec
[5]:
| index | event_time | src_domain | dst_domain | src_computer | dst_computer | auth_type | logontype | authentication_orientation | success_or_failure | RED | feats | feats2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30526246 | 2024-01-02 19:16:45+00:00 | C7048$@DOM1 | C7048$@DOM1 | C7048 | TGT | ? | ? | TGS | Success | 0 | C7048 TGT ? ? | C7048 TGT |
| 1 | 5928201 | 2024-01-01 10:28:10+00:00 | C15034$@DOM1 | C15034$@DOM1 | C15034 | C467 | ? | ? | TGS | Success | 0 | C15034 C467 ? ? | C15034 C467 |
| 2 | 21160461 | 2024-01-02 08:29:52+00:00 | U2075@DOM1 | U2075@DOM1 | C529 | C529 | ? | Network | LogOff | Success | 0 | C529 C529 ? Network | C529 C529 |
| 3 | 2182328 | 2024-01-01 06:06:59+00:00 | C3547$@DOM1 | C3547$@DOM1 | C457 | C457 | ? | Network | LogOff | Success | 0 | C457 C457 ? Network | C457 C457 |
| 4 | 28495743 | 2024-01-02 16:26:12+00:00 | C567$@DOM1 | C567$@DOM1 | C574 | C523 | Kerberos | Network | LogOn | Success | 0 | C574 C523 Kerberos Network | C574 C523 |
Building the schema and persisting the graph#
A graph model defines the specifications of a graph stored in your database metadata.
Schema definition: * Node and edge types with their properties * Data source mappings: Instructions for building the graph from tabular data * Labels: Both static (predefined) and dynamic (generated at runtime) labels for nodes and edges * Graph models contain the blueprint for creating graph snapshots, not the actual graph data.
Read more: Kusto Graph models
[ ]:
GRAPH_NAME = "graphistryRedTeamGraph"
graphistry.kql(f".create-or-alter graph_model {GRAPH_NAME}" + """```
{
"Schema": {
"Nodes": {
"Computer": {"computerName": "string", "RED":"int"},
"Domain": {"domainName": "string", "RED":"int"}
},
"Edges": {
"AUTHENTICATES": {
"event_time": "datetime",
"src_computer": "string",
"dst_computer": "string",
"src_domain": "string",
"dst_domain": "string",
"auth_type": "string",
"logontype": "string",
"authentication_orientation": "string",
"success_or_failure": "string",
"RED": "int"
}
}
},
"Definition": {
"Steps": [
{
"Kind": "AddNodes",
"Query": "graphistryRedTeam50k | project computerName = src_computer, RED, nodeType = 'Computer'",
"NodeIdColumn": "computerName",
"Labels": ["Computer"],
"LabelsColumn": "nodeType"
},
{
"Kind": "AddNodes",
"Query": "graphistryRedTeam50k | project computerName = dst_computer, RED, nodeType = 'Computer'",
"NodeIdColumn": "computerName",
"Labels": ["Computer"],
"LabelsColumn": "nodeType"
},
{
"Kind": "AddNodes",
"Query": "graphistryRedTeam50k | project domainName = src_domain, nodeType = 'Domain',RED",
"NodeIdColumn": "domainName",
"Labels": ["Domain"],
"LabelsColumn": "nodeType"
},
{
"Kind": "AddNodes",
"Query": "graphistryRedTeam50k | project domainName = dst_domain, nodeType = 'Domain',RED",
"NodeIdColumn": "domainName",
"Labels": ["Domain"],
"LabelsColumn": "nodeType"
},
{
"Kind": "AddEdges",
"Query": "graphistryRedTeam50k | project event_time, src_computer, dst_computer, src_domain, dst_domain, auth_type, logontype, authentication_orientation, success_or_failure, RED",
"SourceColumn": "src_computer",
"TargetColumn": "dst_computer",
"Labels": ["AUTHENTICATES"]
}
]
}
}```
""")
Making the snapshot#
A graph snapshot is the actual graph instance materialized from a graph model. It represents:
A specific point-in-time view of the data as defined by the model
The nodes, edges, and their properties in a queryable format
A self-contained entity that persists until explicitly removed
Snapshots are the entities you query when working with persistent graphs. Read more: Kusto Graph snapshot
[7]:
# create snapshot name dynamically by adding current timestamp
timestamp = datetime.now().strftime("%m_%d_%Y_%H_%M_%S")
snapshot_name = "InitialSnap_" + timestamp # append timestamp to always get a unique snapshot name for each run
snapshot_name
[7]:
'InitialSnap_07_07_2025_21_37_03'
[ ]:
graph_snapshot_query = f".make graph_snapshot {snapshot_name} from {GRAPH_NAME}"
graphistry.kql(graph_snapshot_query)
Graph Visualization#
Once your data, persistent graph and snapshot is created in your Azure Data Explorer cluster it is time to see the power of Graphistry’s GPU-accelerated visual interface.
The kusto_graph function accepts two parameters. The name of the graph, and the name of your snapshot (snap_name=”name”). If you don’t provide a snapshot it will grab the latest snapshot.
The function returns a Graphistry plottable object.
You can inspect the nodes and edges, add customizations or .plot() it as is.
[9]:
g = graphistry.kusto_graph(GRAPH_NAME, snap_name=snapshot_name)
Query returned 2 results shapes: [(21984, 5), (50749, 12)] in 2.153 sec
Plotting your object#
[10]:
g.plot()
[10]:
Changing colors, icons and more#
Our data consists of two datasets where one contains verified red team activity. In the dataset these are tagged with the value 1 in the column RED.
Let’s make our red nodes pop out in our visualization. As our data is split into two different type of nodes “Computer” and “Domain” We also add some icons to make it easier to distinguish the different nodetypes we have.
Learn more here: Graphistry Visualization
[11]:
g2 = g.encode_point_color(
"RED",
categorical_mapping={
1: "red"
},
default_mapping='silver'
)
g3 = g2.encode_point_icon(
'nodeType',
shape="circle",
categorical_mapping={
"Computer": "laptop",
"Domain": "server"
},
default_mapping="question")
g3.plot()
[11]:
Next steps#
Data:
A. D. Kent, “Comprehensive, Multi-Source Cybersecurity Events,”
Los Alamos National Laboratory, http://dx.doi.org/10.17021/1179829, 2015.
@Misc{kent-2015-cyberdata1,
author = {Alexander D. Kent},
title = {{Comprehensive, Multi-Source Cyber-Security Events}},
year = {2015},
howpublished = {Los Alamos National Laboratory},
doi = {10.17021/1179829}
}