[link]
Summary by Joseph Paul Cohen 7 years ago
"The SE module can learn some nonlinear global interactions already known to be useful, such as spatial normalization. The channel wise weights make it somewhat more powerful than divisive normalization as it can learn feature-specific inhibitions (ie: if we see a lot of flower parts, the probability of boat features should be diminished). It also has some similarity to bio inhibitory circuits." By jcannell on reddit
Slides: http://image-net.org/challenges/talks_2017/SENet.pdf
Summary by the author Jie Hu:
Our motivation is to explicitly model the interdependence between feature channels. In addition, we do not intend to introduce a new spatial dimension for the integration of feature channels, but rather a new "feature re-calibration" strategy. Specifically, it is through learning the way to automatically obtain the importance of each feature channel, and then in accordance with this importance to enhance the useful features and inhibit the current task is not useful features.
https://i.imgur.com/vXyBg4j.png
The above figure is a schematic diagram of our proposed SE module. Given an input $x$, the number of characteristic channels is $c_1$, and the characteristic number of a characteristic channel is $c_2$ by a series of convolution and other general transformations. Unlike traditional CNNs, we then re-calibrate the features we received in the next three operations.
The first is the Squeeze operation, we carry out the feature compression along the spatial dimension, and turn each two-dimensional feature channel into a real number. The real number has a global sense of the wild, and the output dimension and the number of input channels Match. It characterizes the global distribution of responses on the feature channel, and makes it possible to obtain a global sense of the field near the input, which is very useful in many tasks.
Followed by the Excitation operation, which is a mechanism similar to the door in a circular neural network. The weight is generated for each feature channel by the parameter $w$, where the parameter w is learned to explicitly model the correlation between the feature channels.
Reddit thread: https://www.reddit.com/r/MachineLearning/comments/6pt99z/r_squeezeandexcitation_networks_ilsvrc_2017/
more
less