First published: 2017/08/30 (5 years ago) Abstract: We propose a DTCWT ScatterNet Convolutional Neural Network (DTSCNN) formed by
replacing the first few layers of a CNN network with a parametric log based
DTCWT ScatterNet. The ScatterNet extracts edge based invariant representations
that are used by the later layers of the CNN to learn high-level features. This
improves the training of the network as the later layers can learn more complex
patterns from the start of learning because the edge representations are
already present. The efficient learning of the DTSCNN network is demonstrated
on CIFAR-10 and Caltech-101 datasets. The generic nature of the ScatterNet
front-end is shown by an equivalent performance to pre-trained CNN front-ends.
A comparison with the state-of-the-art on CIFAR-10 and Caltech-101 datasets is
ScatterNets incorporates geometric knowledge of images to produce discriminative and invariant (translation and rotation) features i.e. edge information. The same outcome as CNN's first layers hold. So why not replace that first layer/s with an equivalent, fixed, structure and let the optimizer find the best weights for the CNN with its leading-edge removed.
The main motivations of the idea of replacing the first convolutional, ReLU and pooling layers of the CNN with a two-layer parametric log-based Dual-Tree Complex Wavelets Transform (DTCWT), covered by a few papers, were:
Despite the success of CNNs, the design and optimizing configuration of these networks is not well understood which makes it difficult to develop these networks
This improves the training of the network as the later layers can learn more complex patterns from the start of learning because the edge representations are already present
Converge faster as it has fewer filter weights to learn
My takeaway: a slight reduction in the amount of data necessary for training!
On CIFAR10 and Caltech-101 with 14 self-made CNN with increasing depth, VGG, NIN and WideResnet:
When doing transfer learning(Imagenet): DTSCNN outperformed (“useful margin”) all the CNN architectures counterpart when finetuning with only 1000 examples(balanced over classes). While on larger datasets the gap decreases ending on par with. However, when freezing the first layers on VGG and NIN, as in DTSCNN, the NIN results are in par with, while VGG outperforms!
DTSCNN learns faster in the rate but reaches the same target with minor speedup (few mins)
Complexity analysis in terms of weights and operations is missing
Datasets: CIFAR-10 & Caltech-101, is a good start point (further step with a substantial dataset like COCO would be a plus). For other modalities/domains, please try and let me know
Great work but ablation study is missing such as comparing full training WResNet+DTCWT vs. WResNet
14 citation so far (Cambridge): probably low value per money at the moment