Cisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrix-norm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cisse et al. propose to use a per-layer regularizer of the form:
$R(W) = \|W^TW – I\|$
where $I$ is the identity matrix. During training, this regularizer is supposed to ensure that the learned weigth matrices are orthonormal – an efficient alternative to regular matrix manifold optimization techniques (see the paper).
[1] Matthias Hein, Maksym Andriushchenko: Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. CoRR abs/1705.08475 (2017)
Also see this summary at [davidstutz.de](https://davidstutz.de/category/reading/).