tf_ops¶

Functions for building tensorflow computational graph models.

tf_ops.batch_normalize(tensor_in, epsilon=1e-05, decay=0.999)[source]¶

Batch Normalization: Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift

An exponential moving average of means and variances in calculated to estimate sample mean and sample variance for evaluations. For testing pair placeholder is_training with [0] in feed_dict. For training pair placeholder is_training with [1] in feed_dict. Example:

Let train = 1 for training and train = 0 for evaluation

bn_deciders = {decider:[train] for decider in tf.get_collection('bn_deciders')}
feed_dict.update(bn_deciders)

During training the running statistics are updated, and batch statistics are used for normalization. During testing the running statistics are not updated, and running statistics are used for normalization.

Parameters:	tensor_in – (tf.Tensor) Input Tensor. epsilon – (float) A float number to avoid being divided by 0. decay – (float) For exponential decay estimate of running mean and variance.
Returns:	(tf.Tensor) Tensor with variance bounded by a unit and mean of zero according to the batch.

tf_ops.batch_softmax_dist_loss(truth, h, dimension, scale_range=1.0)[source]¶

This function paired with a tensorflow optimizer is multinomial logistic regression. It is designed for cotegorical predictions.

Parameters:

truth – (tf.Tensor) A tensorflow vector tensor of integer class labels.
h – (tf.Tensor) A placeholder if doing simple multinomial logistic regression, or the output of some neural network.
dimension – (int) Number of classes in output distribution.
scale_range – (float) For scaling the weight matrices (by default weights are initialized two 1/sqrt(fan_in)) for tanh activation and sqrt(2/fan_in) for relu activation.

Returns:

(tf.Tensor, shape = [MB, Sequence_length]) Cross-entropy of true distribution vs. predicted distribution.

tf_ops.bidir_lm_rnn(x, t, token_embed, layers, seq_len=None, context_vector=None, cell=<class 'tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell'>)[source]¶

Token level bidirectional LSTM language model that uses a sentence level context vector.

Parameters:

x – Input to rnn
t – Targets for language model predictions (typically next token in sequence)
token_embed – (tensor) MB X ALPHABET_SIZE.
layers – A list of hidden layer sizes for stacked lstm
seq_len – A 1D tensor of mini-batch size for variable length sequences
context_vector – (tensor) MB X 2*CONTEXT_LSTM_OUTPUT_DIM. Optional context to append to each token embedding
cell – (class) A tensorflow RNNCell sub-class

Returns:

(tensor) tuple-token_losses , (list of tensors) hidden_states, (tensor) final_hidden

tf_ops.diag_mvn_loss(truth, h, scale_range=1.0, variance_floor=0.1)[source]¶

Takes the output of a neural network after it’s last activation, performs an affine transform. It returns the mahalonobis distances between the targets and the result of the affine transformation, according to a parametrized Normal distribution with diagonal covariance. The log of the determinant of the parametrized covariance matrix is meant to be minimized to avoid a trivial optimization.

Parameters:

truth – (tf.Tensor) The targets for this minibatch.
h – (tf.Tensor) The output of dnn. (Here the output of dnn , h, is assumed to be the same dimension as truth)
scale_range – (float) For scaling the weight matrices (by default weights are initialized two 1/sqrt(fan_in)) for tanh activation and sqrt(2/fan_in) for relu activation.
variance_floor – (float, positive) To ensure model doesn’t find trivial optimization.

Returns:

(tf.Tensor shape=[MB X D], tf.Tensor shape=[MB X 1]) Loss matrix, log_of_determinants of covariance matrices.

tf_ops.dnn(x, layers=[100, 408], act=<function relu>, scale_range=1.0, norm=None, keep_prob=None, name='nnet')[source]¶

An arbitrarily deep neural network. Output has non-linear activation.

Parameters:

x – (tf.tensor) Input to the network.
layers – List of integer sizes of network layers.
act – Activation function to produce hidden layers of neural network.
scale_range – (float) Scaling factor for initial range of weights (Set to 1/sqrt(fan_in) for tanh, sqrt(2/fan_in) for relu.
norm – Normalization function. Could be layer_norm or other function that retains shape of tensor.
keep_prob – (float) The percent of nodes to keep in dropout layers.
name – (str) For naming and variable scope.

Returns:

(tf.Tensor) Output of neural net. This will be just following a non linear transform, so that final activation has not been applied.

tf_ops.dropout(tensor_in, prob)[source]¶

Adds dropout node. Dropout A Simple Way to Prevent Neural Networks from Overfitting

Parameters:	tensor_in – Input tensor. prob – The percent of units to keep.
Returns:	Tensor of the same shape of tensor_in.

tf_ops.eyed_mvn_loss(truth, h, scale_range=1.0)[source]¶

This function takes the output of a neural network after it’s last activation, performs an affine transform, and returns the squared error of this result and the target.

Parameters:	truth – A tensor of target vectors. h – The output of a neural network post activation. scale_range – For scaling the weight matrices (by default weights are initialized two 1/sqrt(fan_in)) for

tanh activation and sqrt(2/fan_in) for relu activation. :return: (tf.Tensor[MB X D], None) squared_error, None

tf_ops.fan_scale(initrange, activation, tensor_in)[source]¶

Creates a scaling factor for weight initialization according to best practices.

Parameters:	initrange – Scaling in addition to fan_in scale. activation – A tensorflow non-linear activation function tensor_in – Input tensor to layer of network to scale weights for.
Returns:	(float) scaling factor for weight initialization.

tf_ops.full_mvn_loss(truth, h)[source]¶

Takes the output of a neural network after it’s last activation, performs an affine transform. It returns the mahalonobis distances between the targets and the result of the affine transformation, according to a parametrized Normal distribution. The log of the determinant of the parametrized covariance matrix is meant to be minimized to avoid a trivial optimization.

Parameters:	truth – Actual datapoints to compare against learned distribution h – output of neural network (after last non-linear transform)
Returns:	(tf.Tensor[MB X D], tf.Tensor[MB X 1]) Loss matrix, log_of_determinants of covariance matrices.

tf_ops.ident(tensor_in)[source]¶

The identity function

Parameters:	tensor_in – Input to operation.
Returns:	tensor_in

tf_ops.join_multivariate_inputs(feature_spec, specs, embedding_ratio, max_embedding, min_embedding)[source]¶

Makes placeholders for all input data, performs a lookup on an embedding matrix for each categorical feature, and concatenates the resulting real-valued vectors from individual features into a single vector for each data point in the batch.

Parameters:

feature_spec – A dict {categorical: [c1, c2, …, cp], continuous:[f1, f2, …,fk] which lists which features to use as categorical and continuous inputs to the model. c1, …, cp, f1, …,fk should match a key in specs.
specs – A python dict containing information about which indices in the incoming data point correspond to which features. Entries for continuous features list the indices for the feature, while entries for categorical features contain a dictionary- {‘index’: i, ‘num_classes’: c}, where i and c are the index into the datapoint, and number of distinct categories for the category in question.
embedding_ratio – Determines size of embedding vectors for each categorical feature: num_classes*embedding_ratio (within limits below)
max_embedding – A limit on how large an embedding vector can be.
min_embedding – A limit on how small an embedding vector can be.

Returns:

A tuple (x, placeholderdict): (tensor with shape [None, Sum_of_lengths_of_all_continuous_feature_vecs_and_embedding_vecs], dict to store tf placeholders to pair with data, )

tf_ops.layer_norm(h)[source]¶

Parameters:	h – (tensor) Hidden layer of neural network
Returns:	(tensor) Hidden layer after layer_norm transform

tf_ops.layer_norm_rnn(inputs, initial_state=None, layers=(10, ), sequence_lengths=None, state_index=-1)[source]¶

Parameters:

inputs – A list with length the number of time steps of longest sequence in the batch. inputs contains matrices of shape=[num_sequences X feature_dimension]
initial_state – Initialized first hidden states. A tuple of len(layers) tuples of cell and hidden state tensors
layers – list of number of nodes in each of stacked lstm layers
sequence_lengths – A vector of sequence lengths of size batch_size
state_index – If -1, last state is returned, if None all states are returned, if 1, second state is returned.

Returns:

hidden_states, current_state

tf_ops.lm_rnn(x, t, token_embed, layers, seq_len=None, context_vector=None, cell=<class 'tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell'>)[source]¶

Token level LSTM language model that uses a sentence level context vector.

Parameters:

x – (tensor) Input to rnn
t – (tensor) Targets for language model predictions (typically next token in sequence)
token_embed – (tensor) MB X ALPHABET_SIZE.
layers – A list of hidden layer sizes for stacked lstm
seq_len – A 1D tensor of mini-batch size for variable length sequences
context_vector – (tensor) MB X 2*CONTEXT_LSTM_OUTPUT_DIM. Optional context to append to each token embedding
cell – (class) A tensorflow RNNCell sub-class

Returns:

(tuple) token_losses (tensor), hidden_states (list of tensors), final_hidden (tensor)

tf_ops.multivariate_loss(h, loss_spec, placeholder_dict, variance_floor=0.01)[source]¶

Computes a multivariate loss according to loss_spec.

Parameters:

h – Final hidden layer of dnn or rnn. (Post-activation)
loss_spec –
A tuple of 3-tuples of the form (input_name, loss_function, dimension) where input_name is the same as a target in datadict,

loss_function takes two parameters, a target and prediction, and dimension is the dimension of the target.
placeholder_dict – A dictionary to store placeholder tensors for target values.
variance_floor – (float) Parameter for diag_mvn_loss.

Return loss_matrix:

(MB X concatenated_feature_size Tensor) Contains loss for all contributors for each data point.

tf_ops.softmax_dist_loss(truth, h, dimension, scale_range=1.0, U=None)[source]¶

This function paired with a tensorflow optimizer is multinomial logistic regression. It is designed for cotegorical predictions.

Parameters:

truth – A tensorflow vector tensor of integer class labels.
h – A placeholder if doing simple multinomial logistic regression, or the output of some neural network.
dimension – Number of classes in output distribution.
scale_range – For scaling the weight matrices (by default weights are initialized two 1/sqrt(fan_in)) for tanh activation and sqrt(2/fan_in) for relu activation.
U – Optional weight tensor (If you is not provided a new weight tensor is made)

Returns:

(Tensor[MB X 1]) Cross-entropy of true distribution vs. predicted distribution.

tf_ops.swapping_rnn(inputs, initial_state=None, layers=(10, ), sequence_lengths=None, state_index=-1)[source]¶

Parameters:

inputs – A list with length the number of time steps of longest sequence in the batch. inputs contains matrices of shape=[num_sequences X feature_dimension]
initial_state – Initialized first hidden states. A tuple of len(layers) tuples of cell and hidden state tensors
layers – list of number of nodes in each of stacked lstm layers
sequence_lengths – A vector of sequence lengths of size batch_size
state_index – If -1, last state is returned, if None all states are returned, if 1, second state is returned.

Returns:

tf_ops.true_bptt_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None, state_index=1)[source]¶

Creates a recurrent neural network specified by RNNCell cell. The simplest form of RNN network generated is:

  state = cell.zero_state(...)
  outputs = []
  for input_ in inputs:
      output, state = cell(input_, state)
      outputs.append(output)
return (outputs, state)

However, a few other options are available: An initial state can be provided. If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time), and properly propagates the state at an example’s sequence length to the final state output. The dynamic calculation performed is, at time t for batch row b,

(output, state)(b, t) = (t >= sequence_length(b)) ? (zeros(cell.output_size), states(b, sequence_length(b) - 1)) : cell(input(b, t), state(b, t - 1))

Parameters:

cell – An instance of RNNCell.
inputs – A length T list of inputs, each a tensor of shape [batch_size, input_size].
initial_state – (optional) An initial state for the RNN. If cell.state_size is an integer, this must be a tensor of appropriate type and shape [batch_size x cell.state_size]. If cell.state_size is a tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size.
dtype – (optional) The data type for the initial state. Required if initial_state is not provided.
sequence_length – Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size [batch_size], values in [0, T).
scope – VariableScope for the created subgraph; defaults to “RNN”.
state_index – (int) If -1 final state is returned, if 1 state after first rnn step is returned. If anything else all states are returned

Returns:

A pair (outputs, state) where:

outputs is a length T list of outputs (one for each input)
state is the final state or a a length T list of cell states

Raise:

TypeError: If cell is not an instance of RNNCell. ValueError: If inputs is None or an empty list, or if the input depth

(column size) cannot be inferred from inputs via shape inference.

tf_ops.weights(distribution, shape, dtype=tf.float32, initrange=1e-05, seed=None, l2=0.0, name='weights')[source]¶

Wrapper parameterizing common constructions of tf.Variables.

Parameters:

distribution – A string identifying distribution ‘tnorm’ for truncated normal, ‘rnorm’ for random normal, ‘constant’ for constant, ‘uniform’ for uniform.
shape – Shape of weight tensor.
dtype – dtype for weights
initrange – Scales standard normal and trunctated normal, value of constant dist., and range of uniform dist. [-initrange, initrange].
seed – For reproducible results.
l2 – Floating point number determining degree of of l2 regularization for these weights in gradient descent update.
name – For variable scope.

Returns:

A tf.Variable.