DGL Utils#

class graphistry.dgl_utils.DGLGraphMixin(*args, **kwargs)#

Bases: FeatureMixin

Automagic DGL models from Graphistry Instances.

build_gnn(X_nodes=None, X_edges=None, y_nodes=None, y_edges=None, weight_column=None, reuse_if_existing=True, featurize_edges=True, use_node_scaler=None, use_node_scaler_target=None, use_edge_scaler=None, use_edge_scaler_target=None, train_split=0.8, device='cpu', inplace=False, *args, **kwargs)#

Builds GNN model using (DGL)[https://www.dgl.ai/]

Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#

param X_nodes:: Which node dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
param X_edges:: Which edge dataframe columns to featurize. If None, will use all columns. If passing in explicit dataframe, will set them as attributes.
param y_nodes:: Optional target column from nodes dataframe.
param y_edges:: Optional target column from edges dataframe
param weight_column:: Optional Weight column if explicit edges table exists with said weights. Otherwise, weight_column is inhereted by UMAP.
param train_split:: Randomly assigns a train and test mask according to the split value, default 80%.
param use_node_scaler:: selects which scaling to use on featurized nodes dataframe. Default None
param use_edge_scaler:: selects which scaling to use on featurized edges dataframe. Default None
param device:: device to run model, default cpu, with gpu the other choice. Can be handled in outer scope.
param inplace:: default, False, whether to return Graphistry instance in place or not.

Parameters:

X_nodes (List[str] | str | DataFrame | None)
X_edges (List[str] | str | DataFrame | None)
y_nodes (List[str] | str | DataFrame | None)
y_edges (List[str] | str | DataFrame | None)
weight_column (str | None)
reuse_if_existing (bool)
featurize_edges (bool)
use_node_scaler (str | None)
use_node_scaler_target (str | None)
use_edge_scaler (str | None)
use_edge_scaler_target (str | None)
train_split (float)
device (str)
inplace (bool)

convert_kwargs(*args, **kwargs)#

dgl_lazy_init(train_split=0.8, device='cpu')#

Initialize DGL graph lazily :return:

Parameters:

train_split (float)
device (str)

graphistry.dgl_utils.convert_to_torch(X_enc, y_enc)#

Convert X and y to torch tensors compatible with DGL ndata/edata.

Parameters:

X_enc (DataFrame) – DataFrame matrix of values for model matrix
y_enc (DataFrame | None) – DataFrame matrix of values for target

Returns:

Dictionary of torch-encoded arrays

graphistry.dgl_utils.get_available_devices()#

Get IDs of all available GPUs.

Returns:: device (torch.device): Main device (GPU 0 or CPU). gpu_ids (list): List of IDs of all GPUs that are available.

graphistry.dgl_utils.get_torch_train_test_mask(n, ratio=0.8)#

Generate random train/test torch boolean masks.

Parameters:

n (int) – Size of mask
ratio (float) – Train/test split ratio (fraction of True entries)

Returns:

Tuple of (train_mask, test_mask)

graphistry.dgl_utils.pandas_to_dgl_graph(df, src, dst, weight_col=None, device='cpu')#

Convert an edge DataFrame to a DGL graph plus adjacency matrix.

Example:

g, sp_mat, ordered_nodes_dict = pandas_to_sparse_adjacency(df, 'to_node', 'from_node')

Parameters:

df (DataFrame) – DataFrame with source/destination (and optional weight) columns
src (str) – Source column name for the COO matrix
dst (str) – Destination column name for the COO matrix
weight_col (str | None) – Optional weight column when constructing the COO matrix
device (str) – Whether to put the DGL graph on CPU or GPU

Returns:

Tuple of (DGL graph, sparse adjacency matrix, node index mapping)

Return type:

Tuple[dgl.DGLGraph, scipy.sparse.coo_matrix, Dict]

graphistry.dgl_utils.pandas_to_sparse_adjacency(df, src, dst, weight_col)#

Build a COO sparse adjacency matrix from an edge DataFrame.

Parameters:

df – Edge DataFrame
src – Source column
dst – Destination column
weight_col – Optional weight column

Returns:

Tuple of (COO sparse matrix, node index mapping)

graphistry.dgl_utils.reindex_edgelist(df, src, dst)#

Relabel edges so DGL gets contiguous integer node IDs.

Example:

df, ordered_nodes_dict = reindex_edgelist(df, 'to_node', 'from_node')

Parameters:

df – Edge DataFrame
src – Source column name
dst – Destination column name

Returns:

Tuple of (reindexed DataFrame, ordered node mapping)

DGL Utils

Contents

DGL Utils#

Will auto-featurize, and if no explicit edges are found, automatically UMAP to produce implicit edges.#