daisukelab's profile - ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Visualizing the Loss Landscape of Neural Nets
Hao Li and Zheng Xu and Gavin Taylor and Christoph Studer and Tom Goldstein
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, cs.CV, stat.ML
more

[link] Summary by daisukelab 7 years ago

- Presents a simple visualization method based on “filter normalization.”
- Observed that __the deeper networks become, neural loss landscapes become more chaotic__; causes a dramatic drop in generalization error, and ultimately to a lack of trainability.
- Observed that __skip connections promote flat minimizers and prevent the transition to chaotic behavior__; helps explain why skip connections are necessary for training extremely deep networks.
- Quantitatively measures non-convexity.
- Studies the visualization of SGD optimization trajectories.

arxiv.org
scholar.google.com

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size
Iandola, Forrest N. and Moskewicz, Matthew W. and Ashraf, Khalid and Han, Song and Dally, William J. and Keutzer, Kurt
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by daisukelab 8 years ago

While preserving accuracy,
- Network architecture improvement decreases parameters 51X (240MB to 4.8MB).
- By using Deep Compression, parameters shrinks more 10X more  (4.8MB to 0.47MB).

Even improves more accuracy for about 2% by using Simple Bypass (shortcut connection).

They show insightful architectural design strategies;
1. Less 3x3 filters to decrease size,
2. Decrease input channels also to decrease size,
3. Downsample late to have larger activation maps to lead to higher accuracy.

And great insights about CNN design space exploration by parametrize microarchitecture,
- Squeeze Ratio to find good balance between weight size and accuracy.
- 3x3 filter percentage to find enough number of it.

arxiv.org
scholar.google.com

Hello Edge: Keyword Spotting on Microcontrollers
Yundong Zhang and Naveen Suda and Liangzhen Lai and Vikas Chandra
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.SD, cs.CL, cs.LG, cs.NE, eess.AS
more

[link] Summary by daisukelab 8 years ago

- Result of thourough research which not only covers major research, but also compares under same criteria/ dataset; This is also a great survey.
- Train on 32-bit FP model, run 8-bit model. No retraining required to convert to 8-bit w/o loss in accuracy.
- Provides comparison concerning computing resource, it's useful to design for typical (ARM) microcontroller systems.
- MobileNet inspired DS-CNN runs small and accurate, achieves the best accuracies of 94.4% ~ 95.4%. Maybe SOTA.
- Apatche licensed code/ pretrained models are available at https://github.com/ARM-software/ML-KWS-for-MCU.

https://i.imgur.com/qahXKBn.png

arxiv.org
arxiv-vanity.com
scholar.google.com

mixup: Beyond Empirical Risk Minimization
Hongyi Zhang and Moustapha Cisse and Yann N. Dauphin and David Lopez-Paz
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, stat.ML
more

[link] Summary by daisukelab 8 years ago

Very efficient data augmentation method. Linear-interpolate training set x and y randomly at every epoch.
```python
for (x1, y1), (x2, y2) in zip(loader1, loader2):
    lam = numpy.random.beta(alpha, alpha)
    x = Variable(lam * x1 + (1. - lam) * x2)
    y = Variable(lam * y1 + (1. - lam) * y2)
    optimizer.zero_grad()
    loss(net(x), y).backward()
    optimizer.step()
```
- ERM (Empirical Risk Minimization) is $\alpha = 0$ version of mixup, i.e. not using mixup.
- Reduces the memorization of corrupt labels.
- Increases robustness to adversarial examples.
- Stabilizes the training of GAN.

arxiv.org
scholar.google.com

PoTrojan: powerful neural-level trojan designs in deep learning models
Minhui Zou and Yang Shi and Chengliang Wang and Fangyu Li and WenZhan Song and Yu Wang
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.CR, cs.LG
more

2	[link] Summary by daisukelab 8 years ago To keep it simple, this figure shows the basic idea. https://i.imgur.com/a2I4EGY.png more less

arxiv.org
arxiv-vanity.com
scholar.google.com

YOLO9000: Better, Faster, Stronger
Joseph Redmon and Ali Farhadi
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

[link] Summary by daisukelab 9 years ago

YOLOv2 is improved YOLO;
 - can change image size for varying tradeoff between speed and accuracy;
 - uses anchor boxes to predict bounding boxes;
 - overcomes localization errors and lower recall not by bigger nor ensemble but using variety of ideas from past work (batch normalization, multi-scaling and etc) to keep the network simple and fast;
 - "With batch nor-malization we can remove dropout from the model without overfitting"
 -  gets 78.6 mAP at 40 FPS.

YOLO9000;
 - uses WordTree representation which enables multi-label classification as well as making classification dataset also applicable to detection;
 - is a model trained simultaneously both for detection on COCO and classification on ImageNet;
 - is validated for detecting not labeled object classes;
 - detects more than 9000 different object classes in real-time.

daisukelab

sciscore: 2