DataLoader

class DataLoader(full_graph_manager: GraphShardManager, seed_nodes: Tensor, graph_sampler: DistNeighborSampler, batch_size: int = 1, drop_last: bool = False, shuffle: bool = False, precompute_optimized_batches: bool = True, optimized_batches_cache: str | None = None, num_workers: int = 0)

A dataloader for distributed node sampling

Parameters:
  • full_graph_manager (GraphShardManager) – The distributed graph from which to sample

  • seed_nodes (Tensor) – The seed nodes for sampling

  • graph_sampler (DistNeighborSampler) – The distributed sampling object. The object must expose the sample routine that will be used to sample the distributed graph

  • batch_size (int) – Batch size

  • drop_last (bool) – Drop the last batch

  • shuffle (bool) – Shuffle the seed nodes each iteration

  • precompute_optimized_batches (bool) – Create balanced node minibatches that minimizes the number of edges between nodes in different minibatches.

  • optimized_batches_cache (Optional[str]) – The file name prefix for the cache files that will be used to store the created minibatches. If provided, the files will be created if they do not exist. If they exist, the minibatch data will be loaded from them.

  • num_workers (int) – The number of worker processes that will be spawned to do the distributed sampling