neuromancer.dataset module

class neuromancer.dataset.DictDataset(datadict, name='train')[source]

Bases: Dataset

Basic dataset compatible with neuromancer Trainer

collate_fn(batch)[source]

Wraps the default PyTorch batch collation function and adds a name field.

Parameters:

batch – (dict str: torch.Tensor) dataset sample.

class neuromancer.dataset.GraphDataset(node_attr: Dict | None = {}, edge_attr: Dict | None = {}, graph_attr: Dict | None = {}, metadata: Dict | None = {}, seq_len: int = 6, seq_horizon: int = 1, seq_stride: int = 1, graphs: Dict | None = None, build_graphs: str | None = None, connectivity_radius: float = 0.015, graph_self_loops=True, name: str = 'data')[source]

Bases: Dataset

build_graphs(feature, self_loops)[source]
try:

from torch_geometric.nn import radius_graph

except:

static collate_fn(x)[source]

Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.

Parameters:

batch – (list of dict str: torch.Tensor) dataset sample. Requires key ‘edge_index’

make_map()[source]

Order the sample sequences

shuffle()[source]

Randomizes the order of sample sequences

class neuromancer.dataset.LitDataModule(data_setup_function, hparam_config=None, **kwargs)[source]

Bases: LightningDataModule

A Neuromancer-specific class inheriting from PyTorch Lightning LightningDataModule This class converts a data_setup_function (which yields Neuromancer DictDatasets associated with a Neuromancer Problem) to a LightningDataModule such that it integrates with LitProblem and LitTrainer

setup(stage=None)[source]

Setup is a preprecessing stage required by LightningDataModules. Here we create the data splits from the data setup function, and we do data splitting and check that the user has properly named the DictDatasets

test_dataloader()[source]

An iterable or collection of iterables specifying test samples.

For more information about multiple dataloaders, see this section.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

train_dataloader()[source]

An iterable or collection of iterables specifying training samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

val_dataloader()[source]

An iterable or collection of iterables specifying validation samples.

For more information about multiple dataloaders, see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~lightning.pytorch.trainer.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

  • fit()

  • validate()

  • prepare_data()

  • setup()

Note

Lightning tries to add the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

class neuromancer.dataset.SequenceDataset(data, nsteps=1, moving_horizon=False, name='data')[source]

Bases: Dataset

collate_fn(batch)[source]

Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.

Parameters:

batch – (dict str: torch.Tensor) dataset sample.

get_full_batch()[source]
get_full_sequence()[source]
class neuromancer.dataset.StaticDataset(data, name='data')[source]

Bases: Dataset

collate_fn(batch)[source]

Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and simply adds a “name” field to a batch.

Parameters:

batch – (dict str: torch.Tensor) dataset sample.

get_full_batch()[source]
neuromancer.dataset.batch_tensor(x: Tensor, steps: int, mh: bool = False)[source]
neuromancer.dataset.denormalize_01(M, Mmin, Mmax)[source]

denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data

neuromancer.dataset.denormalize_11(M, Mmin, Mmax)[source]

denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data

neuromancer.dataset.destandardize(M, mean, std)[source]
neuromancer.dataset.get_sequence_dataloaders(data, nsteps, moving_horizon=False, norm_type=None, split_ratio=None, num_workers=0, batch_size=None)[source]

This function will generate dataloaders and open-loop sequence dictionaries for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s original training setup.

Parameters:
  • data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.

  • nsteps – (int) length of windowed subsequences for N-step training.

  • moving_horizon – (bool) whether to use moving horizon batching.

  • norm_type – (str) type of normalization; see function normalize_data for more info.

  • split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.

  • num_workers – (int, optional) how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)

  • batch_size – (int, optional) how many samples per batch to load (default: full-batch via len(data)).

neuromancer.dataset.get_static_dataloaders(data, norm_type=None, split_ratio=None, num_workers=0, batch_size=32)[source]

This will generate dataloaders for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s training setup.

Parameters:
  • data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.

  • norm_type – (str) type of normalization; see function normalize_data for more info.

  • split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.get_static_dataloaders

neuromancer.dataset.normalize_01(M, Mmin=None, Mmax=None)[source]
Parameters:
  • M – (2-d np.array) Data to be normalized

  • Mmin – (int) Optional minimum. If not provided is inferred from data.

  • Mmax – (int) Optional maximum. If not provided is inferred from data.

Returns:

(2-d np.array) Min-max normalized data

neuromancer.dataset.normalize_11(M, Mmin=None, Mmax=None)[source]
Parameters:
  • M – (2-d np.array) Data to be normalized

  • Mmin – (int) Optional minimum. If not provided is inferred from data.

  • Mmax – (int) Optional maximum. If not provided is inferred from data.

Returns:

(2-d np.array) Min-max normalized data

neuromancer.dataset.normalize_data(data, norm_type, stats=None)[source]

Normalize data, optionally using arbitrary statistics (e.g. computed from train split).

Parameters:
  • data – (dict str: np.array) data dictionary.

  • norm_type – (str) type of normalization to use; can be “zero-one”, “one-one”, or “zscore”.

  • stats – (dict str: np.array) statistics to use for normalization. Default is None, in which case stats are inferred by underlying normalization function.

neuromancer.dataset.read_file(file_or_dir)[source]
neuromancer.dataset.split_sequence_data(data, nsteps, moving_horizon=False, split_ratio=None)[source]

Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.

Parameters:
  • data – (dict str: np.array or list[str: np.array]) data dictionary.

  • nsteps – (int) N-step prediction horizon for batching data; used here to ensure split lengths are evenly divisible by N.

  • moving_horizon – (bool) whether batches use a sliding window with stride 1; else stride of N is assumed.

  • split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.

neuromancer.dataset.split_static_data(data, split_ratio=None)[source]

Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.

Parameters:
  • data – (dict str: np.array or list[str: np.array]) data dictionary.

  • split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.

neuromancer.dataset.standardize(M, mean=None, std=None)[source]
neuromancer.dataset.unbatch_tensor(x: Tensor, mh: bool = False)[source]