Distributed Graph Representation

SAR represents the full graph as \(N^2\) graph shards where \(N\) is the number of workers/partitions. Each graph shard represents the edges from one partition to another and the features associated with these edges. sar.core.GraphShard represents a single graph shard. Each worker stores the \(N\) graph shards containing the incoming edges for the nodes in the worker’s partition. These \(N\) graph shards are managed by the sar.core.GraphShardManager class. sar.core.GraphShardManager implements a distributed version of update_all and apply_edges which are the main methods used by GNNs to create and exchange messages in the graph. sar.core.GraphShardManager implements update_all and apply_edges in a sequential manner by iterating through the \(N\) graph shards to sequentially create and aggregate messages from each partition into the local partition.

The sar.core.GraphShardManager.get_full_partition_graph() method can be used to combine the worker’s \(N\) graph shards into one monolithic graph object that represents all the incoming edges for nodes in the local partition. It returns a sar.core.DistributedBlock object. The implementation of update_all and apply_edges in sar.core.DistributedBlock is not sequential. Instead. It fetches all remote features in one step and aggregates all incoming messages to the local partition in one step.

In the distributed implementation of the sequential backward pass in update_all and apply_edges in sar.core.GraphShardManager, it is not possible for SAR to automatically detect if the message function uses any learnable parameters, and thus SAR will not be able to backpropagate gradients to these message parameters. To tell SAR that a message function is parameterized, use the sar.core.message_has_parameters() decorator to decorate message functions that use learnable parameters.

Limitations of the distributed graph objects

Keep in mind that the distributed graph class sar.core.GraphShardManager does not implement all the functionality of DGL’s native graph class. For example, it does not impelement the successors and predecessors methods. It supports primarily the methods of DGL’s native graphs that are relevant to GNNs such as update_all, apply_edges, and local_scope. It also supports setting graph node and edge features through the dictionaries srcdata, dstdata, and edata. To remain compatible with DGLGraph sar.core.GraphShardManager provides also access to the ndata member, which works as alias to srcdata, however it is not accessible when working with MFGs.

sar.core.GraphShardManager also supports the in_degrees and out_degrees members and supports querying the number of nodes and edges in the graph.

The update_all method in sar.core.GraphShardManager only supports the 4 standard reduce functions in dgl: max, min, sum, and mean. The reason behind this is that SAR runs a sequential reduction of messages and therefore requires that \(reduce(msg_1,msg_2,msg_3) = reduce(msg_1,reduce(msg_2,msg_3))\).

Relevant classes methods

GraphShard(shard_edges_features, src_range, ...)

Encapsulates information for all edges incoming from one partition to the local partition.

GraphShardManager(graph_shards, ...)

Manages the local graph partition and exposes a subset of the interface of dgl.heterograph.DGLGraph.

DistributedBlock(block, ...)

A wrapper around a dgl.DGLBlock object.


A decorator for message functions that use learnable parameters.