Welcome to SAR’s documentation!

SAR is a pure Python library built on top of DGL to accelerate distributed training of Graph Neural Networks (GNNs) on large graphs. SAR supports both full-batch training and sampling-based training. For full-batch training, SAR supports the Sequenial Aggregation and Rematerialization (SAR) scheme to reduce peak per-machine memory consumption and guarantee that model memory consumption per worker goes down linearly with the number of workers. This is achieved by eliminating most of the data redundancy (due to the halo effect) involved in standard spatially parallel training.

SAR uses the graph partition data generated by DGL’s partitioning utilities. It can thus be used as a drop in replacement for DGL’s sampling-based distributed training. SAR enables scalable, distributed training on very large graphs, and supports multiple training modes that balance speed against memory efficiency. SAR requires minimal changes to existing single-host DGL training code. See the quick start guide to get started using SAR.

Index