util¶

Python and numpy functions.

class util.ExponentialRunningMean(alpha=1.0)[source]¶

Calculates the running mean of row vectors batchwise given a sequence of matrices.

Parameters:	alpha – (float) Higher alpha discounts older observations faster. The smaller the alpha, the further you take into consideration the past.

__call__(samples)[source]¶

Parameters:	samples – a matrix of samples to incorporate into running mean
Returns:	running average over axis

class util.Parser(prog=None, usage=None, description=None, epilog=None, version=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True)[source]¶: Hack for Sphinx documentation of scripts to work correctly.

class util.RunningMean(axis=0)[source]¶

Calculates the batchwise running mean from rows, columns, or values of a matrix.

Parameters:	axis – The axis to calculate the running mean over. If axis==None then the running mean for the entire array is taken.

__call__(samples)[source]¶

Parameters:	samples – a matrix of samples to incorporate into running mean
Returns:	running average over axis

util.get_mask(lens, num_tokens)[source]¶

For masking output of lm_rnn for jagged sequences for correct gradient update. Sequence length of 0 will output nan for that row of mask so don’t do this.

Parameters:

lens – Numpy vector of sequence lengths
num_tokens – (int) Number of predicted tokens in sentence.

Returns:

A numpy array mask MB X num_tokens For each row there are: lens[i] values of 1/lens[i]

followed by num_tokens - lens[i] zeros

util.get_multivariate_loss_names(loss_spec)[source]¶

For use in conjunction with tf_ops.multivariate_loss. Gives the names of all contributors (columns) of the loss matrix.

Parameters:	loss_spec – A list of 3-tuples of the form (input_name, loss_function, dimension) where input_name is the same as a target in datadict, loss_function takes two parameters, a target and prediction, and dimension is the dimension of the target.
Returns:	loss_names is a list concatenated_feature_size long with names of all loss contributors.

util.make_feature_spec(dataspec)[source]¶

Makes lists of all the continuous and categorical features to be used as input features of a neural network.

Parameters:	dataspec – (dict) From a json specification of the purpose of fields in the csv input file (See docs for formatting)
Returns:	(dict) features {‘categorical’: [categorical_feature_1, …, categorical_feature_j], ‘continuous’: [continuous_feature_1, …, continuous_feature_k]}

util.make_loss_spec(dataspec, mvn)[source]¶

Makes a list of tuples for each target to be used in training a multiple output neural network modeling a mixed joint distribution of discrete and continuous variables. :param dataspec: (dict) From a json specification of the purpose of fields in the csv input file (See docs for formatting) :param mvn: Tensorflow function for calculating type of multivariate loss for continuous target vectors.

Can be tf_ops.diag_mvn_loss, tf_ops.full_mvn_loss, tf_ops.eyed_mvn_loss

Returns:	A list of tuples of the form: (target_name, loss_function, dimension) where dimension is the dimension of the target vector (for categorical features this is the number of classes, for continuous targets this is the size of the continuous target vector)