neuromancer.dataset module
- class neuromancer.dataset.DictDataset(datadict, name='train')[source]
Bases:
Dataset
Basic dataset compatible with neuromancer Trainer
- class neuromancer.dataset.GraphDataset(node_attr: Dict | None = {}, edge_attr: Dict | None = {}, graph_attr: Dict | None = {}, metadata: Dict | None = {}, seq_len: int = 6, seq_horizon: int = 1, seq_stride: int = 1, graphs: Dict | None = None, build_graphs: str | None = None, connectivity_radius: float = 0.015, graph_self_loops=True, name: str = 'data')[source]
Bases:
Dataset
- static collate_fn(x)[source]
Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.
- Parameters:
batch – (list of dict str: torch.Tensor) dataset sample. Requires key ‘edge_index’
- class neuromancer.dataset.LitDataModule(data_setup_function, hparam_config=None, **kwargs)[source]
Bases:
LightningDataModule
A Neuromancer-specific class inheriting from PyTorch Lightning LightningDataModule This class converts a data_setup_function (which yields Neuromancer DictDatasets associated with a Neuromancer Problem) to a LightningDataModule such that it integrates with LitProblem and LitTrainer
- setup(stage=None)[source]
Setup is a preprecessing stage required by LightningDataModules. Here we create the data splits from the data setup function, and we do data splitting and check that the user has properly named the DictDatasets
- test_dataloader()[source]
An iterable or collection of iterables specifying test samples.
For more information about multiple dataloaders, see this section.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Note
If you don’t need a test dataset and a
test_step()
, you don’t need to implement this method.
- train_dataloader()[source]
An iterable or collection of iterables specifying training samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- val_dataloader()[source]
An iterable or collection of iterables specifying validation samples.
For more information about multiple dataloaders, see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.fit()
validate()
prepare_data()
Note
Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Note
If you don’t need a validation dataset and a
validation_step()
, you don’t need to implement this method.
- class neuromancer.dataset.SequenceDataset(data, nsteps=1, moving_horizon=False, name='data')[source]
Bases:
Dataset
- collate_fn(batch)[source]
Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.
- Parameters:
batch – (dict str: torch.Tensor) dataset sample.
- class neuromancer.dataset.StaticDataset(data, name='data')[source]
Bases:
Dataset
- neuromancer.dataset.denormalize_01(M, Mmin, Mmax)[source]
denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data
- neuromancer.dataset.denormalize_11(M, Mmin, Mmax)[source]
denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data
- neuromancer.dataset.get_sequence_dataloaders(data, nsteps, moving_horizon=False, norm_type=None, split_ratio=None, num_workers=0, batch_size=None)[source]
This function will generate dataloaders and open-loop sequence dictionaries for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s original training setup.
- Parameters:
data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.
nsteps – (int) length of windowed subsequences for N-step training.
moving_horizon – (bool) whether to use moving horizon batching.
norm_type – (str) type of normalization; see function normalize_data for more info.
split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.
num_workers – (int, optional) how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
batch_size – (int, optional) how many samples per batch to load (default: full-batch via len(data)).
- neuromancer.dataset.get_static_dataloaders(data, norm_type=None, split_ratio=None, num_workers=0, batch_size=32)[source]
This will generate dataloaders for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s training setup.
- Parameters:
data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.
norm_type – (str) type of normalization; see function normalize_data for more info.
split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.get_static_dataloaders
- neuromancer.dataset.normalize_01(M, Mmin=None, Mmax=None)[source]
- Parameters:
M – (2-d np.array) Data to be normalized
Mmin – (int) Optional minimum. If not provided is inferred from data.
Mmax – (int) Optional maximum. If not provided is inferred from data.
- Returns:
(2-d np.array) Min-max normalized data
- neuromancer.dataset.normalize_11(M, Mmin=None, Mmax=None)[source]
- Parameters:
M – (2-d np.array) Data to be normalized
Mmin – (int) Optional minimum. If not provided is inferred from data.
Mmax – (int) Optional maximum. If not provided is inferred from data.
- Returns:
(2-d np.array) Min-max normalized data
- neuromancer.dataset.normalize_data(data, norm_type, stats=None)[source]
Normalize data, optionally using arbitrary statistics (e.g. computed from train split).
- Parameters:
data – (dict str: np.array) data dictionary.
norm_type – (str) type of normalization to use; can be “zero-one”, “one-one”, or “zscore”.
stats – (dict str: np.array) statistics to use for normalization. Default is None, in which case stats are inferred by underlying normalization function.
- neuromancer.dataset.split_sequence_data(data, nsteps, moving_horizon=False, split_ratio=None)[source]
Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.
- Parameters:
data – (dict str: np.array or list[str: np.array]) data dictionary.
nsteps – (int) N-step prediction horizon for batching data; used here to ensure split lengths are evenly divisible by N.
moving_horizon – (bool) whether batches use a sliding window with stride 1; else stride of N is assumed.
split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.
- neuromancer.dataset.split_static_data(data, split_ratio=None)[source]
Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.
- Parameters:
data – (dict str: np.array or list[str: np.array]) data dictionary.
split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.