Dataset

class neuromancer.dataset.DictDataset(datadict, name='train')[source]

Basic dataset compatible with neuromancer Trainer

collate_fn(batch)[source]

Wraps the default PyTorch batch collation function and adds a name field.

Parameters:

batch – (dict str: torch.Tensor) dataset sample.

class neuromancer.dataset.GraphDataset(node_attr: Dict | None = {}, edge_attr: Dict | None = {}, graph_attr: Dict | None = {}, metadata: Dict | None = {}, seq_len: int = 6, seq_horizon: int = 1, seq_stride: int = 1, graphs: Dict | None = None, build_graphs: str | None = None, connectivity_radius: float = 0.015, graph_self_loops=True, name: str = 'data')[source]
build_graphs(feature, self_loops)[source]
try:

from torch_geometric.nn import radius_graph

except:

static collate_fn(x)[source]

Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.

Parameters:

batch – (list of dict str: torch.Tensor) dataset sample. Requires key ‘edge_index’

make_map()[source]

Order the sample sequences

shuffle()[source]

Randomizes the order of sample sequences

class neuromancer.dataset.SequenceDataset(data, nsteps=1, moving_horizon=False, name='data')[source]
collate_fn(batch)[source]

Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and does some light post-processing to transpose the data for NeuroMANCER models and add a “name” field.

Parameters:

batch – (dict str: torch.Tensor) dataset sample.

get_full_batch()[source]
get_full_sequence()[source]
class neuromancer.dataset.StaticDataset(data, name='data')[source]
collate_fn(batch)[source]

Batch collation for dictionaries of samples generated by this dataset. This wraps the default PyTorch batch collation function and simply adds a “name” field to a batch.

Parameters:

batch – (dict str: torch.Tensor) dataset sample.

get_full_batch()[source]
neuromancer.dataset.batch_tensor(x: Tensor, steps: int, mh: bool = False)[source]
neuromancer.dataset.denormalize_01(M, Mmin, Mmax)[source]

denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data

neuromancer.dataset.denormalize_11(M, Mmin, Mmax)[source]

denormalize min max norm :param M: (2-d np.array) Data to be normalized :param Mmin: (int) Minimum value :param Mmax: (int) Maximum value :return: (2-d np.array) Un-normalized data

neuromancer.dataset.destandardize(M, mean, std)[source]
neuromancer.dataset.get_sequence_dataloaders(data, nsteps, moving_horizon=False, norm_type=None, split_ratio=None, num_workers=0, batch_size=None)[source]

This function will generate dataloaders and open-loop sequence dictionaries for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s original training setup.

Parameters:
  • data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.

  • nsteps – (int) length of windowed subsequences for N-step training.

  • moving_horizon – (bool) whether to use moving horizon batching.

  • norm_type – (str) type of normalization; see function normalize_data for more info.

  • split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.

  • num_workers – (int, optional) how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)

  • batch_size – (int, optional) how many samples per batch to load (default: full-batch via len(data)).

neuromancer.dataset.get_static_dataloaders(data, norm_type=None, split_ratio=None, num_workers=0, batch_size=32)[source]

This will generate dataloaders for a given dictionary of data. Dataloaders are hard-coded for full-batch training to match NeuroMANCER’s training setup.

Parameters:
  • data – (dict str: np.array or list[dict str: np.array]) data dictionary or list of data dictionaries; if latter is provided, multi-sequence datasets are created and splits are computed over the number of sequences rather than their lengths.

  • norm_type – (str) type of normalization; see function normalize_data for more info.

  • split_ratio – (list float) percentage of data in train and development splits; see function split_sequence_data for more info.get_static_dataloaders

neuromancer.dataset.normalize_01(M, Mmin=None, Mmax=None)[source]
Parameters:
  • M – (2-d np.array) Data to be normalized

  • Mmin – (int) Optional minimum. If not provided is inferred from data.

  • Mmax – (int) Optional maximum. If not provided is inferred from data.

Returns:

(2-d np.array) Min-max normalized data

neuromancer.dataset.normalize_11(M, Mmin=None, Mmax=None)[source]
Parameters:
  • M – (2-d np.array) Data to be normalized

  • Mmin – (int) Optional minimum. If not provided is inferred from data.

  • Mmax – (int) Optional maximum. If not provided is inferred from data.

Returns:

(2-d np.array) Min-max normalized data

neuromancer.dataset.normalize_data(data, norm_type, stats=None)[source]

Normalize data, optionally using arbitrary statistics (e.g. computed from train split).

Parameters:
  • data – (dict str: np.array) data dictionary.

  • norm_type – (str) type of normalization to use; can be “zero-one”, “one-one”, or “zscore”.

  • stats – (dict str: np.array) statistics to use for normalization. Default is None, in which case stats are inferred by underlying normalization function.

neuromancer.dataset.read_file(file_or_dir)[source]
neuromancer.dataset.split_sequence_data(data, nsteps, moving_horizon=False, split_ratio=None)[source]

Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.

Parameters:
  • data – (dict str: np.array or list[str: np.array]) data dictionary.

  • nsteps – (int) N-step prediction horizon for batching data; used here to ensure split lengths are evenly divisible by N.

  • moving_horizon – (bool) whether batches use a sliding window with stride 1; else stride of N is assumed.

  • split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.

neuromancer.dataset.split_static_data(data, split_ratio=None)[source]

Split a data dictionary into train, development, and test sets. Splits data into thirds by default, but arbitrary split ratios for train and development can be provided.

Parameters:
  • data – (dict str: np.array or list[str: np.array]) data dictionary.

  • split_ratio – (list float) Two numbers indicating percentage of data included in train and development sets (out of 100.0). Default is None, which splits data into thirds.

neuromancer.dataset.standardize(M, mean=None, std=None)[source]
neuromancer.dataset.unbatch_tensor(x: Tensor, mh: bool = False)[source]