DGL Utils#
- class graphistry.dgl_utils.DGLGraphMixin(*args, **kwargs)#
Bases:
FeatureMixinAutomagic DGL models from Graphistry Instances.
- build_gnn(X_nodes=None, X_edges=None, y_nodes=None, y_edges=None, weight_column=None, reuse_if_existing=True, featurize_edges=True, use_node_scaler=None, use_node_scaler_target=None, use_edge_scaler=None, use_edge_scaler_target=None, train_split=0.8, device='cpu', inplace=False, *args, **kwargs)#
Builds GNN model using (DGL)[https://www.dgl.ai/]
Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#
- param X_nodes:
Which node dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
- param X_edges:
Which edge dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
- param y_nodes:
Optional target column from nodes dataframe.
- param y_edges:
Optional target column from edges dataframe
- param weight_column:
Optional Weight column if explicit edges table exists with said weights. Otherwise, weight_column is inhereted by UMAP.
- param train_split:
Randomly assigns a train and test mask according to the split value, default 80%.
- param use_node_scaler:
selects which scaling to use on featurized nodes dataframe. Default None
- param use_edge_scaler:
selects which scaling to use on featurized edges dataframe. Default None
- param device:
device to run model, default cpu, with gpu the other choice. Can be handled in outer scope.
- param inplace:
default, False, whether to return Graphistry instance in place or not.
- Parameters:
X_nodes (List[str] | str | DataFrame | None)
X_edges (List[str] | str | DataFrame | None)
y_nodes (List[str] | str | DataFrame | None)
y_edges (List[str] | str | DataFrame | None)
weight_column (str | None)
reuse_if_existing (bool)
featurize_edges (bool)
use_node_scaler (str | None)
use_node_scaler_target (str | None)
use_edge_scaler (str | None)
use_edge_scaler_target (str | None)
train_split (float)
device (str)
inplace (bool)
- convert_kwargs(*args, **kwargs)#
- dgl_lazy_init(train_split=0.8, device='cpu')#
Initialize DGL graph lazily :return:
- Parameters:
train_split (float)
device (str)
- graphistry.dgl_utils.convert_to_torch(X_enc, y_enc)#
Convert X and y to torch tensors compatible with DGL ndata/edata.
- Parameters:
X_enc (DataFrame) – DataFrame matrix of values for model matrix
y_enc (DataFrame | None) – DataFrame matrix of values for target
- Returns:
Dictionary of torch-encoded arrays
- graphistry.dgl_utils.get_available_devices()#
Get IDs of all available GPUs.
- Returns:
device (torch.device): Main device (GPU 0 or CPU). gpu_ids (list): List of IDs of all GPUs that are available.
- graphistry.dgl_utils.get_torch_train_test_mask(n, ratio=0.8)#
Generate random train/test torch boolean masks.
- Parameters:
n (int) – Size of mask
ratio (float) – Train/test split ratio (fraction of True entries)
- Returns:
Tuple of (train_mask, test_mask)
- graphistry.dgl_utils.pandas_to_dgl_graph(df, src, dst, weight_col=None, device='cpu')#
Convert an edge DataFrame to a DGL graph plus adjacency matrix.
Example:
g, sp_mat, ordered_nodes_dict = pandas_to_sparse_adjacency(df, 'to_node', 'from_node')
- Parameters:
df (DataFrame) – DataFrame with source/destination (and optional weight) columns
src (str) – Source column name for the COO matrix
dst (str) – Destination column name for the COO matrix
weight_col (str | None) – Optional weight column when constructing the COO matrix
device (str) – Whether to put the DGL graph on CPU or GPU
- Returns:
Tuple of (DGL graph, sparse adjacency matrix, node index mapping)
- Return type:
Tuple[dgl.DGLGraph, scipy.sparse.coo_matrix, Dict]
- graphistry.dgl_utils.pandas_to_sparse_adjacency(df, src, dst, weight_col)#
Build a COO sparse adjacency matrix from an edge DataFrame.
- Parameters:
df – Edge DataFrame
src – Source column
dst – Destination column
weight_col – Optional weight column
- Returns:
Tuple of (COO sparse matrix, node index mapping)
- graphistry.dgl_utils.reindex_edgelist(df, src, dst)#
Relabel edges so DGL gets contiguous integer node IDs.
Example:
df, ordered_nodes_dict = reindex_edgelist(df, 'to_node', 'from_node')
- Parameters:
df – Edge DataFrame
src – Source column name
dst – Destination column name
- Returns:
Tuple of (reindexed DataFrame, ordered node mapping)