On orthogonality and learning recurrent networks with long term dependencies
Eugene Vorontsov
and
Chiheb Trabelsi
and
Samuel Kadoury
and
Chris Pal
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG, cs.NE
First published: 2017/01/31 (7 years ago) Abstract: It is well known that it is challenging to train deep neural networks and
recurrent neural networks for tasks that exhibit long term dependencies. The
vanishing or exploding gradient problem is a well known issue associated with
these challenges. One approach to addressing vanishing and exploding gradients
is to use either soft or hard constraints on weight matrices so as to encourage
or enforce orthogonality. Orthogonal matrices preserve gradient norm during
backpropagation and can therefore be a desirable property; however, we find
that hard constraints on orthogonality can negatively affect the speed of
convergence and model performance. This paper explores the issues of
optimization convergence, speed and gradient stability using a variety of
different methods for encouraging or enforcing orthogonality. In particular we
propose a weight matrix factorization and parameterization strategy through
which we can bound matrix norms and therein control the degree of expansivity
induced during backpropagation.