Network learning
dataloader()
1 | dataloader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, *, prefetch_factor=2, persistent_workers=False) |
dataset style
- dataset: (dataset class). if we use IterableDataset, data loading order is entirely controlled by the user-defined iterable.
- data loading order and sampler: the map-style datasets represents a map from indices/keys(possibly non-integral) to data samples, so there is a need to specify the sequence of indices/keys. A sampler could randomly permute a list of indices and yield each one at a time, or yield a small number of them for mini-batch SGD. This is why sampler or batch_sampler is not compatible with iterable-style datasets, since such datasets have no notion of a key or an index
batched and non-batched data
- batched and non-batched data: relative arguments:
batch_size,drop_lastandbatch_sampler- when
batch_size(int, optional. default=1) is notNone(ie. it can beNone), the dataloader yields batched samples instead of individual samples. drop_last(bool, optional. default=False) setTrueto drop the last incomplete batch, if the dataset size is not divisible by the batch_sizebatch_sampler(Sampler class or iterable, optional.)
- when
In this case,loading from a map-style dataset is roughly equivalent with:
1 | for indices in batch_sampler: |
and loading from an iterable-style dataset is roughly quivalent with:
1 | dataset_iter = iter(dataset) |
- disable automatic batching: when both
batch_sizeandbatch_samplerareNone, automatic batching is disabled. Each sample obtained from thedatasetis processed with the function passed as thecollate_fnargument.
In this case, loading from a map-style dataset is roughly equivalent wtih:
1 | for index in sampler: |
and loading from an iterable-style dataset is roughly quivalent with:
1 | for data in iter(dataset): |
collate_fn(callable, optional) When automatic batching is disabled, the defaultcollate_fnsimply converts NumPy arrays into PyTorch Tensors, and keep everything else untouched.- when automatic batching is disabled,
collate_fnis called with each individual data sample, and the output is yielded from the dataloader iterator. In this case, the defaultcollate_fnsimply converts NumPy arrays in PyTorch tensors. - when automatic batching is enabled,
collate_fnis called with a list of data samples at each time. It is expected to collate the input samples into a batch for yielding from the dataloader iterator.
For instance,if there is a input dataset which returns a tuple (data, index), then the defaultcollate_fnclolates a list of data and a list of index, i.e. [batchsize, data] and [batchsize, index]
- when automatic batching is disabled,
other arguments
shuffle(bool, optional, default= False) setTrueto have the data reshuffled at every eopch.num_workers(int, optional, default=0 ) how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process.
single-process or multi-process data loading
not needed for now.