First published: 2017/01/19 (6 years ago) Abstract: We explore recently proposed variational dropout technique which provided an
elegant Bayesian interpretation to dropout. We extend variational dropout to
the case when dropout rate is unknown and show that it can be found by
optimizing evidence variational lower bound. We show that it is possible to
assign and find individual dropout rates to each connection in DNN.
Interestingly such assignment leads to extremely sparse solutions both in
fully-connected and convolutional layers. This effect is similar to automatic
relevance determination (ARD) effect in empirical Bayes but has a number of
advantages. We report up to 128 fold compression of popular architectures
without a large loss of accuracy providing additional evidence to the fact that
modern deep architectures are very redundant.