Dataset
- class neuromancer.dataset.DictDataset(datadict, name='train')[source]
Basic dataset compatible with neuromancer Trainer
- class neuromancer.dataset.GraphDataset(node_attr: Dict | None = {}, edge_attr: Dict | None = {}, graph_attr: Dict | None = {}, metadata: Dict | None = {}, seq_len: int = 6, seq_horizon: int = 1, seq_stride: int = 1, graphs: Dict | None = None, build_graphs: str | None = None, connectivity_radius: float = 0.015, graph_self_loops=True, name: str = 'data')[source]
-
- static collate_fn(x)[source]
Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.
- Parameters:
batch – (list of dict str: torch.Tensor) dataset sample. Requires key ‘edge_index’
- class neuromancer.dataset.SequenceDataset(data, nsteps=1, moving_horizon=False, name='data')[source]
- collate_fn(batch)[source]
Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.
- Parameters:
batch – (dict str: torch.Tensor) dataset sample.
- class neuromancer.dataset.StaticDataset(data, name='data')[source]
- neuromancer.dataset.denormalize_01(M, Mmin, Mmax)[source]
denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data
- neuromancer.dataset.denormalize_11(M, Mmin, Mmax)[source]
denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data
- neuromancer.dataset.get_sequence_dataloaders(data, nsteps, moving_horizon=False, norm_type=None, split_ratio=None, num_workers=0, batch_size=None)[source]
This function will generate dataloaders and open-loop sequence dictionaries for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s original training setup.
- Parameters:
data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.
nsteps – (int) length of windowed subsequences for N-step training.
moving_horizon – (bool) whether to use moving horizon batching.
norm_type – (str) type of normalization; see function normalize_data for more info.
split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.
num_workers – (int, optional) how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
batch_size – (int, optional) how many samples per batch to load (default: full-batch via len(data)).
- neuromancer.dataset.get_static_dataloaders(data, norm_type=None, split_ratio=None, num_workers=0, batch_size=32)[source]
This will generate dataloaders for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s training setup.
- Parameters:
data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.
norm_type – (str) type of normalization; see function normalize_data for more info.
split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.get_static_dataloaders
- neuromancer.dataset.normalize_01(M, Mmin=None, Mmax=None)[source]
- Parameters:
M – (2-d np.array) Data to be normalized
Mmin – (int) Optional minimum. If not provided is inferred from data.
Mmax – (int) Optional maximum. If not provided is inferred from data.
- Returns:
(2-d np.array) Min-max normalized data
- neuromancer.dataset.normalize_11(M, Mmin=None, Mmax=None)[source]
- Parameters:
M – (2-d np.array) Data to be normalized
Mmin – (int) Optional minimum. If not provided is inferred from data.
Mmax – (int) Optional maximum. If not provided is inferred from data.
- Returns:
(2-d np.array) Min-max normalized data
- neuromancer.dataset.normalize_data(data, norm_type, stats=None)[source]
Normalize data, optionally using arbitrary statistics (e.g. computed from train split).
- Parameters:
data – (dict str: np.array) data dictionary.
norm_type – (str) type of normalization to use; can be “zero-one”, “one-one”, or “zscore”.
stats – (dict str: np.array) statistics to use for normalization. Default is None, in which case stats are inferred by underlying normalization function.
- neuromancer.dataset.split_sequence_data(data, nsteps, moving_horizon=False, split_ratio=None)[source]
Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.
- Parameters:
data – (dict str: np.array or list[str: np.array]) data dictionary.
nsteps – (int) N-step prediction horizon for batching data; used here to ensure split lengths are evenly divisible by N.
moving_horizon – (bool) whether batches use a sliding window with stride 1; else stride of N is assumed.
split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.
- neuromancer.dataset.split_static_data(data, split_ratio=None)[source]
Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.
- Parameters:
data – (dict str: np.array or list[str: np.array]) data dictionary.
split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.