Structured Linear Maps (SLiM)

Abstract

Several key contributions for design and analysis of neural networks have been developed by viewing recurrent and residual networks as dynamics models. Prominent examples are guarantees on the stability of deep residual and recurrent networks by constraining the eigenvalues of a network's linear maps using orthogonal, spectral, or symplectic parametrizations. For specific applications, intuition or domain knowledge about system dynamics can also inform the design of custom linear maps such as the Perron-Frobenius parametrization presented in Constrained Neural Ordinary Differential Equations with Stability Guarantees by Tuor, et. al. Our design space includes five structured linear map parametrizations which can incorporate strong inductive bias as well as guarantees suitable for a large extent of known systems models. Our intent is that the genetic algorithm can discover the linear map parametrizations most suitable for data efficient learning of systems models which provide physically plausible predictions for the system under investigation. For a basic sparsity inducing prior we employ a Lasso variant implemented with gradient descent as described in Large Scale Machine Learning with Stochastic Gradient Descent by Bottou. An extensive family of learnable structured sparse linear maps which includes Fourier transformations and correlation operators can be parametrized using a product of butterfly matrices, so called K-family matrices, as introduced in Kaleidoscope: An Efficient, Learnable Representation for All Structured Linear Maps by Dao et. al. Our library of structural linear map priors also contains two alternative parametrizations to constrain the matrix singular values: spectral factorizations as presented in Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization by Zhang, as well as a faster approximate singular value decomposition based parametrization which we introduce here. Both methods for constraining singular values parametrize the matrix as a product of {\bf U\Sigma V}UΣV. Zhang's method provides orthogonal parametrizations of {\bf U}U and {\bf V}V via Householder reflections. Our own method initializes {\bf U}U and {\bf V}V as random orthogonal matrices and introduces a regularization term to enforce orthogonality as parameters are updated. For both factorizations, bounds \texttt{min}min and \texttt{max}max are placed on the nonzero elements of the diagonal matrix {\bf \Sigma}Σ where: {\bf \Sigma} = \texttt{max}{\bf \mathbb{I}} - \texttt{diag}\bigl((\texttt{max} - \texttt{min}) * \sigma({\bf p})\bigr)Σ=maxI−diag((max−min)&lowast;σ(p)) with {\bf p}p a randomly initialized vector and \sigmaσ the elementwise logistic sigmoid. The final structured parametrization in the linear map library, Perron-Frobenius (PF), bounds the dominant eigenvalue of the matrix which can guarantee stability of the learned system and global dynamic properties such as dissipation. Figure 2 plots the eigenvalues and matrix heat maps for linear state transition maps learned using the six parametrizations in our library.

Author

Skomski,Elliott

Drgona,Jan

Tuor,Aaron

Vasisht,Soumya

Exploratory License

Not eligible for exploratory license

Market Sector

Data Sciences