Rotation equivariant vector field networks
Diego Marcos
and
Michele Volpi
and
Nikos Komodakis
and
Devis Tuia
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CV
First published: 2016/12/29 (7 years ago) Abstract: We propose a method to encode rotation equivariance or invariance into
convolutional neural networks (CNNs). Each convolutional filter is applied with
several orientations and returns a vector field that represents the magnitude
and angle of the highest scoring rotation at the given spatial location. To
propagate information about the main orientation of the different features to
each layer in the network, we propose an enriched orientation pooling, i.e. max
and argmax operators over the orientation space, allowing to keep the
dimensionality of the feature maps low and to propagate only useful
information. We name this approach RotEqNet. We apply RotEqNet to three
datasets: first, a rotation invariant classification problem, the MNIST-rot
benchmark, in which we improve over the state-of-the-art results. Then, a
neuron membrane segmentation benchmark, where we show that RotEqNet can be
applied successfully to obtain equivariance to rotation with a simple fully
convolutional architecture. Finally, we improve significantly the
state-of-the-art on the problem of estimating cars' absolute orientation in
aerial images, a problem where the output is required to be covariant with
respect to the object's orientation.
This work deals with rotation equivariant convolutional filters. The idea is that when you rotate an image you should not need to relearn new filters to deal with this rotation. First we can look at how convolutions typically handle rotation and how we would expect a rotation invariant solution to perform below:
| | |
| - | - |
| https://i.imgur.com/cirTi4S.png | https://i.imgur.com/iGpUZDC.png |
| | | |
The method computes all possible rotations of the filter which results in a list of activations where each element represents a different rotation. From this list the maximum is taken which results in a two dimensional output for every pixel (rotation, magnitude). This happens at the pixel level so the result is a vector field over the image.
https://i.imgur.com/BcnuI1d.png
We can visualize their degree selection method with a figure from https://arxiv.org/abs/1603.04392 which determined the rotation of a building:
https://i.imgur.com/hPI8J6y.png
We can also think of this approach as attention \cite{1409.0473} where they attend over the possible rotations to obtain a score for each possible rotation value to pass on. The network can learn to adjust the rotation value to be whatever value the later layers will need.
------------------------
Results on [Rotated MNIST](http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/MnistVariations) show an impressive improvement in training speed and generalization error:
https://i.imgur.com/YO3poOO.png