Linear¶

Structured linear maps which are drop in replacements for torch.nn.Linear

Todo

Generalize to batch matrix multiplication for arbitrary N-dimensional tensors
Additional linear parametrizations:
- Strictly diagonally dominant matrix is non-singular:
  
  https://en.wikipedia.org/wiki/Diagonally_dominant_matrix
- Doubly stochastic matrix:
  
  https://en.wikipedia.org/wiki/Doubly_stochastic_matrix
  
  https://github.com/btaba/sinkhorn_knopp
  
  https://github.com/HeddaCohenIndelman/Learning-Gumbel-Sinkhorn-Permutations-w-Pytorch
- Hamiltonian matrix:
  
  https://en.wikipedia.org/wiki/Hamiltonian_matrix
- Regular split: \(A = B − C\) is a regular splitting of \(A\) if \(B^{−1} ≥ 0\) and \(C ≥ 0\):
  
  https://en.wikipedia.org/wiki/Matrix_splitting

Pytorch weight initializations used in this module:

torch.nn.init.xavier_normal_(tensor, gain=1.0)
torch.nn.init.kaiming_normal_(tensor, a=0, mode=’fan_in’, nonlinearity=’leaky_relu’)
torch.nn.init.orthogonal_(tensor, gain=1)
torch.nn.init.sparse_(tensor, sparsity, std=0.01)

class linear.BoundedNormLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, p=2, **kwargs)[source]¶

sigma_min <= ||A||_p <= sigma_max p = type of the matrix norm sigma_min = minimum allowed value of eigenvalues sigma_max = maximum allowed value of eigenvalues

reg_error()[source]¶

Regularization error associated with linear map parametrization.

Returns:: (torch.float)

class linear.ButterflyLinear(insize, outsize, bias=False, complex=False, tied_weight=True, increasing_stride=True, ortho_init=False, **kwargs)[source]¶

Sparse structured linear maps from: https://github.com/HazyResearch/learning-circuits

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

forward(x)[source]¶

0-

param x:: (torch.Tensor, shape=[batchsize, in_features])
return:: (torch.Tensor, shape=[batchsize, out_features])

class linear.DampedSkewSymmetricLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=0.5, **kwargs)[source]¶

Skew-symmetric linear map with damping.

https://en.wikipedia.org/wiki/Skew-symmetric_matrix

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.GershgorinLinear(insize, outsize, bias=False, sigma_min=0.0, sigma_max=1.0, real=True, **kwargs)[source]¶

Uses Gershgorin Disc parametrization to constrain eigenvalues of the matrix. See:

https://arxiv.org/abs/2011.13492

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

linear.Hprod(x, u, k)[source]¶: Helper function for computing matrix multiply via householder reflection representation. :param x: (torch.Tensor shape=[batchsize, dimension]) :param u: (torch.Tensor shape=[dimension]) :param k: (int) :return: (torch.Tensor shape=[batchsize, dimension])

class linear.IdentityGradReLU(*args, **kwargs)[source]¶

We can implement our own custom autograd Functions by subclassing torch.autograd.Function and implementing the forward and backward passes which operate on Tensors.

static backward(ctx, grad_output)[source]¶: In the backward pass we receive a Tensor containing the gradient of the loss with respect to the output, and we need to compute the gradient of the loss with respect to the input. Here we are just passing through the previous gradient since we want the gradient for this max operation to be gradient of identity.

static forward(ctx, input)[source]¶: In the forward pass we receive a Tensor containing the input and return a Tensor containing the output. ctx is a context object that can be used to stash information for backward computation. You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method.

class linear.IdentityInitLinear(insize, outsize, bias=False, **kwargs)[source]¶: Linear map initialized to Identity matrix.

class linear.IdentityLinear(insize, outsize, bias=False, **kwargs)[source]¶: Identity operation compatible with all LinearBase functionality.

class linear.L0Linear(insize, outsize, bias=True, weight_decay=1.0, droprate_init=0.5, temperature=0.6666666666666666, lamda=1.0)[source]¶

Implementation of L0 regularization for the input units of a fully connected layer

Reference implementation: https://github.com/AMLab-Amsterdam/L0_regularization/blob/master/l0_layers.py
Paper: https://arxiv.org/pdf/1712.01312.pdf

Note

This implementation may need to be adjusted as there is the same sampling for each input in the minibatch which may inhibit convergence. Also, there will be a different sampling for each call during training so it may cause issues included in a layer for a recurrent computation (fx in state space model).

cdf_qz(x)[source]¶: Implements the CDF of the ‘stretched’ concrete distribution

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

get_eps(size)[source]¶: Uniform random numbers for the concrete distribution

quantile_concrete(x)[source]¶: Implements the quantile, aka inverse CDF, of the ‘stretched’ concrete distribution

reg_error()[source]¶: Expected L0 norm under the stochastic gates, takes into account and re-weights also a potential L2 penalty

class linear.LassoLinear(insize, outsize, bias=False, gamma=1.0, **kwargs)[source]¶

From https://leon.bottou.org/publications/pdf/compstat-2010.pdf

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

forward(x)[source]¶

0-

param x:: (torch.Tensor, shape=[batchsize, in_features])
return:: (torch.Tensor, shape=[batchsize, out_features])

reg_error()[source]¶

Regularization error associated with linear map parametrization.

Returns:: (torch.float)

class linear.LassoLinearRELU(insize, outsize, bias=False, gamma=1.0, **kwargs)[source]¶

From https://leon.bottou.org/publications/pdf/compstat-2010.pdf

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

reg_error()[source]¶

Regularization error associated with linear map parametrization.

Returns:: (torch.float)

class linear.LeftStochasticLinear(insize, outsize, bias=False, **kwargs)[source]¶

A left stochastic matrix is a real square matrix, with each column summing to 1.

https://en.wikipedia.org/wiki/Stochastic_matrix

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.Linear(insize, outsize, bias=False, **kwargs)[source]¶

Wrapper for torch.nn.Linear with additional slim methods returning matrix, eigenvectors, eigenvalues and regularization error.

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

forward(x)[source]¶

0-

param x:: (torch.Tensor, shape=[batchsize, in_features])
return:: (torch.Tensor, shape=[batchsize, out_features])

class linear.LinearBase(insize, outsize, bias=False, provide_weights=True)[source]¶

Base class defining linear map interface.

property device¶

abstract effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

eig(eigenvectors=False)[source]¶

Returns the eigenvalues (optionally eigenvectors) of the linear map used in matrix multiplication.

Parameters:: eigenvectors – (bool) Whether to return eigenvectors along with eigenvalues.
Returns:: (torch.Tensor) Vector of eigenvalues, optionally a tuple including a matrix of eigenvectors.

forward(x)[source]¶

0-

param x:: (torch.Tensor, shape=[batchsize, in_features])
return:: (torch.Tensor, shape=[batchsize, out_features])

reg_error()[source]¶

Regularization error associated with linear map parametrization.

Returns:: (torch.float)

class linear.NonNegativeLinear(insize, outsize, bias=False, **kwargs)[source]¶

Positive parametrization of linear map via Relu.

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.OrthogonalLinear(insize, outsize, bias=False, **kwargs)[source]¶

Orthogonal parametrization via householder reflection

https://arxiv.org/abs/1612.00188

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

forward(x)[source]¶

0-

param x:: (torch.Tensor, shape=[batchsize, in_features])
return:: (torch.Tensor, shape=[batchsize, out_features])

class linear.PSDLinear(insize, outsize, bias=False, **kwargs)[source]¶

Symmetric Positive semi-definite matrix.

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.PerronFrobeniusLinear(insize, outsize, bias=False, sigma_min=0.8, sigma_max=1.0, **kwargs)[source]¶

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.RightStochasticLinear(insize, outsize, bias=False, **kwargs)[source]¶

A right stochastic matrix is a real square matrix, with each row summing to 1.

https://en.wikipedia.org/wiki/Stochastic_matrix

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.SVDLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶

Linear map with constrained eigenvalues via approximate SVD factorization. Soft SVD based regularization of matrix \(A\). \(A = U \Sigma V\). \(U,V\) are unitary matrices (orthogonal for real matrices \(A\)). \(\Sigma\) is a diagonal matrix of singular values (square roots of eigenvalues).

https://arxiv.org/abs/2101.01864

This below paper uses the same factorization and orthogonality constraint as implemented here but enforces a low rank prior on the map by introducing a sparse prior on the singular values:

https://openaccess.thecvf.com/content_CVPRW_2020/papers/w40/Yang_Learning_Low-Rank_Deep_Neural_Networks_via_Singular_Vector_Orthogonality_Regularization_CVPRW_2020_paper.pdf

Also a similar regularization on the factors as to our implementation:

https://pdfs.semanticscholar.org/78b2/9eba4d6c836483c0aa67d637205e95223ae4.pdf

effective_W()[source]¶

Returns:: Matrix for linear transformation with dominant eigenvalue between sigma_max and sigma_min

orthogonal_error(weight)[source]¶

reg_error()[source]¶: Regularization error enforces orthogonality constraint for matrix factors

class linear.SVDLinearLearnBounds(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶

class linear.SchurDecompositionLinear(insize, outsize, bias=False, l2=0.01, **kwargs)[source]¶

https://papers.nips.cc/paper/9513-non-normal-recurrent-neural-network-nnrnn-learning-long-time-dependencies-while-improving-expressivity-with-transient-dynamics.pdf

build_T(T)[source]¶

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

reg_error()[source]¶

Regularization error associated with linear map parametrization.

Returns:: (torch.float)

class linear.SkewSymmetricLinear(insize, outsize, bias=False, **kwargs)[source]¶

Skew-symmetric (or antisymmetric) matrix \(A\) (effective_W) is a square matrix whose transpose equals its negative. \(A = -A^T\)

https://en.wikipedia.org/wiki/Skew-symmetric_matrix

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.SpectralLinear(insize, outsize, bias=False, n_U_reflectors=None, n_V_reflectors=None, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶

SVD paramaterized linear map of form \(U \Sigma V\) via Householder reflection. Singular values can be constrained to a range. Translated from tensorflow code:

https://github.com/zhangjiong724/spectral-RNN/blob/master/code/spectral_rnn.py

Sigma()[source]¶

Umultiply(x)[source]¶

Vmultiply(x)[source]¶

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

forward(x)[source]¶

0-

param x:: (torch.Tensor, shape=[batchsize, in_features])
return:: (torch.Tensor, shape=[batchsize, out_features])

class linear.SplitLinear(insize, outsize, bias=False, **kwargs)[source]¶

\(A = B − C\), with \(B ≥ 0\) and \(C ≥ 0\).

https://en.wikipedia.org/wiki/Matrix_splitting

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.SquareLinear(insize, outsize, bias=False, provide_weights=True, **kwargs)[source]¶

Base class for linear map parametrizations that assume a square matrix.

abstract effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.StableSplitLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶

\(A = B − C\), with stable B and stable C

https://en.wikipedia.org/wiki/Matrix_splitting

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.SymmetricLinear(insize, outsize, bias=False, **kwargs)[source]¶

Symmetric matrix \(A\) (effective_W) is a square matrix that is equal to its transpose. \(A = A^T\)

https://en.wikipedia.org/wiki/Symmetric_matrix

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.SymmetricSVDLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶: \(U = V\)

class linear.SymmetricSpectralLinear(insize, outsize, bias=False, n_reflectors=None, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶: \(U = V\)

class linear.SymplecticLinear(insize, outsize, bias=False, **kwargs)[source]¶

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply

class linear.TrivialNullSpaceLinear(insize, outsize, bias=False, rank=None, epsilon=0.1, **kwargs)[source]¶

Matrix with trivial null space as defined via eq. 2 in https://arxiv.org/abs/1808.00924

effective_W()[source]¶

The matrix used in the equivalent matrix multiplication for the parametrization

Returns:: (torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply