Linear¶
Structured linear maps which are drop in replacements for torch.nn.Linear
Todo
Generalize to batch matrix multiplication for arbitrary N-dimensional tensors
Additional linear parametrizations:
- Strictly diagonally dominant matrix is non-singular:
- Hamiltonian matrix:
- Regular split: \(A = B − C\) is a regular splitting of \(A\) if \(B^{−1} ≥ 0\) and \(C ≥ 0\):
Pytorch weight initializations used in this module:
torch.nn.init.xavier_normal_(tensor, gain=1.0)
torch.nn.init.kaiming_normal_(tensor, a=0, mode=’fan_in’, nonlinearity=’leaky_relu’)
torch.nn.init.orthogonal_(tensor, gain=1)
torch.nn.init.sparse_(tensor, sparsity, std=0.01)
- class linear.BoundedNormLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, p=2, **kwargs)[source]¶
sigma_min <= ||A||_p <= sigma_max p = type of the matrix norm sigma_min = minimum allowed value of eigenvalues sigma_max = maximum allowed value of eigenvalues
- class linear.ButterflyLinear(insize, outsize, bias=False, complex=False, tied_weight=True, increasing_stride=True, ortho_init=False, **kwargs)[source]¶
Sparse structured linear maps from: https://github.com/HazyResearch/learning-circuits
- class linear.DampedSkewSymmetricLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=0.5, **kwargs)[source]¶
Skew-symmetric linear map with damping.
- class linear.GershgorinLinear(insize, outsize, bias=False, sigma_min=0.0, sigma_max=1.0, real=True, **kwargs)[source]¶
Uses Gershgorin Disc parametrization to constrain eigenvalues of the matrix. See:
- linear.Hprod(x, u, k)[source]¶
Helper function for computing matrix multiply via householder reflection representation. :param x: (torch.Tensor shape=[batchsize, dimension]) :param u: (torch.Tensor shape=[dimension]) :param k: (int) :return: (torch.Tensor shape=[batchsize, dimension])
- class linear.IdentityGradReLU(*args, **kwargs)[source]¶
We can implement our own custom autograd Functions by subclassing torch.autograd.Function and implementing the forward and backward passes which operate on Tensors.
- static backward(ctx, grad_output)[source]¶
In the backward pass we receive a Tensor containing the gradient of the loss with respect to the output, and we need to compute the gradient of the loss with respect to the input. Here we are just passing through the previous gradient since we want the gradient for this max operation to be gradient of identity.
- static forward(ctx, input)[source]¶
In the forward pass we receive a Tensor containing the input and return a Tensor containing the output. ctx is a context object that can be used to stash information for backward computation. You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method.
- class linear.IdentityInitLinear(insize, outsize, bias=False, **kwargs)[source]¶
Linear map initialized to Identity matrix.
- class linear.IdentityLinear(insize, outsize, bias=False, **kwargs)[source]¶
Identity operation compatible with all LinearBase functionality.
- class linear.L0Linear(insize, outsize, bias=True, weight_decay=1.0, droprate_init=0.5, temperature=0.6666666666666666, lamda=1.0)[source]¶
Implementation of L0 regularization for the input units of a fully connected layer
Reference implementation: https://github.com/AMLab-Amsterdam/L0_regularization/blob/master/l0_layers.py
Note
This implementation may need to be adjusted as there is the same sampling for each input in the minibatch which may inhibit convergence. Also, there will be a different sampling for each call during training so it may cause issues included in a layer for a recurrent computation (fx in state space model).
- effective_W()[source]¶
The matrix used in the equivalent matrix multiplication for the parametrization
- Returns:
(torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply
- class linear.LassoLinear(insize, outsize, bias=False, gamma=1.0, **kwargs)[source]¶
From https://leon.bottou.org/publications/pdf/compstat-2010.pdf
- effective_W()[source]¶
The matrix used in the equivalent matrix multiplication for the parametrization
- Returns:
(torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply
- class linear.LassoLinearRELU(insize, outsize, bias=False, gamma=1.0, **kwargs)[source]¶
From https://leon.bottou.org/publications/pdf/compstat-2010.pdf
- class linear.LeftStochasticLinear(insize, outsize, bias=False, **kwargs)[source]¶
A left stochastic matrix is a real square matrix, with each column summing to 1.
- class linear.Linear(insize, outsize, bias=False, **kwargs)[source]¶
Wrapper for torch.nn.Linear with additional slim methods returning matrix, eigenvectors, eigenvalues and regularization error.
- class linear.LinearBase(insize, outsize, bias=False, provide_weights=True)[source]¶
Base class defining linear map interface.
- property device¶
- abstract effective_W()[source]¶
The matrix used in the equivalent matrix multiplication for the parametrization
- Returns:
(torch.Tensor, shape=[insize, outsize]) Matrix used in matrix multiply
- eig(eigenvectors=False)[source]¶
Returns the eigenvalues (optionally eigenvectors) of the linear map used in matrix multiplication.
- Parameters:
eigenvectors – (bool) Whether to return eigenvectors along with eigenvalues.
- Returns:
(torch.Tensor) Vector of eigenvalues, optionally a tuple including a matrix of eigenvectors.
- class linear.NonNegativeLinear(insize, outsize, bias=False, **kwargs)[source]¶
Positive parametrization of linear map via Relu.
- class linear.OrthogonalLinear(insize, outsize, bias=False, **kwargs)[source]¶
Orthogonal parametrization via householder reflection
- class linear.PSDLinear(insize, outsize, bias=False, **kwargs)[source]¶
Symmetric Positive semi-definite matrix.
- class linear.PerronFrobeniusLinear(insize, outsize, bias=False, sigma_min=0.8, sigma_max=1.0, **kwargs)[source]¶
- class linear.RightStochasticLinear(insize, outsize, bias=False, **kwargs)[source]¶
A right stochastic matrix is a real square matrix, with each row summing to 1.
- class linear.SVDLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶
Linear map with constrained eigenvalues via approximate SVD factorization. Soft SVD based regularization of matrix \(A\). \(A = U \Sigma V\). \(U,V\) are unitary matrices (orthogonal for real matrices \(A\)). \(\Sigma\) is a diagonal matrix of singular values (square roots of eigenvalues).
This below paper uses the same factorization and orthogonality constraint as implemented here but enforces a low rank prior on the map by introducing a sparse prior on the singular values:
Also a similar regularization on the factors as to our implementation:
- class linear.SVDLinearLearnBounds(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶
- class linear.SchurDecompositionLinear(insize, outsize, bias=False, l2=0.01, **kwargs)[source]¶
- class linear.SkewSymmetricLinear(insize, outsize, bias=False, **kwargs)[source]¶
Skew-symmetric (or antisymmetric) matrix \(A\) (effective_W) is a square matrix whose transpose equals its negative. \(A = -A^T\)
- class linear.SpectralLinear(insize, outsize, bias=False, n_U_reflectors=None, n_V_reflectors=None, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶
SVD paramaterized linear map of form \(U \Sigma V\) via Householder reflection. Singular values can be constrained to a range. Translated from tensorflow code:
- class linear.SplitLinear(insize, outsize, bias=False, **kwargs)[source]¶
\(A = B − C\), with \(B ≥ 0\) and \(C ≥ 0\).
- class linear.SquareLinear(insize, outsize, bias=False, provide_weights=True, **kwargs)[source]¶
Base class for linear map parametrizations that assume a square matrix.
- class linear.StableSplitLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶
\(A = B − C\), with stable B and stable C
- class linear.SymmetricLinear(insize, outsize, bias=False, **kwargs)[source]¶
Symmetric matrix \(A\) (effective_W) is a square matrix that is equal to its transpose. \(A = A^T\)
- class linear.SymmetricSVDLinear(insize, outsize, bias=False, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶
\(U = V\)
- class linear.SymmetricSpectralLinear(insize, outsize, bias=False, n_reflectors=None, sigma_min=0.1, sigma_max=1.0, **kwargs)[source]¶
\(U = V\)
- class linear.TrivialNullSpaceLinear(insize, outsize, bias=False, rank=None, epsilon=0.1, **kwargs)[source]¶
Matrix with trivial null space as defined via eq. 2 in https://arxiv.org/abs/1808.00924