How Can We Be So Dense? The Benefits of Using Highly Sparse Representations
Ahmad, Subutai
and
Scheinkman, Luiz
arXiv e-Print archive - 2019 via Local Bibsonomy
Keywords:
dblp
Ahmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.
$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$
where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).