Léo Paillier's profile - ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Self-Normalizing Neural Networks
Günter Klambauer and Thomas Unterthiner and Andreas Mayr and Sepp Hochreiter
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, stat.ML
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Design Feed-Forward Neural Network (fully connected) that can be trained even with very deep architectures.

*   _Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/), [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html), [Tox21](https://tripod.nih.gov/tox21/challenge/) and [UCI tasks](https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits).
*   _Code:_ [here](https://github.com/bioinf-jku/SNNs)

## Inner-workings:

They introduce a new activation functio the Scaled Exponential Linear Unit (SELU) which has the nice property of making neuron activations converge to a fixed point with zero-mean and unit-variance.  
They also demonstrate that upper and lower bounds and the variance and mean for very mild conditions which basically means that there will be no exploding or vanishing gradients.

The activation function is:  
[![screen shot 2017-06-14 at 11 38 27 am](https://user-images.githubusercontent.com/17261080/27125901-1a4f7276-50f6-11e7-857d-ebad1ac94789.png)](https://user-images.githubusercontent.com/17261080/27125901-1a4f7276-50f6-11e7-857d-ebad1ac94789.png)  
With specific parameters for alpha and lambda to ensure the previous properties. The tensorflow impementation is:

    def selu(x):
        alpha = 1.6732632423543772848170429916717
        scale = 1.0507009873554804934193349852946
        return scale*np.where(x>=0.0, x, alpha*np.exp(x)-alpha)
    

They also introduce a new dropout (alpha-dropout) to compensate for the fact that [![screen shot 2017-06-14 at 11 44 42 am](https://user-images.githubusercontent.com/17261080/27126174-e67d212c-50f6-11e7-8952-acad98b850be.png)](https://user-images.githubusercontent.com/17261080/27126174-e67d212c-50f6-11e7-8952-acad98b850be.png)

## Results:

Batch norm becomes obsolete and they are also able to train deeper architectures. This becomes a good choice to replace shallow architectures where random forest or SVM used to be the best results. They outperform most other techniques on small datasets.  
[![screen shot 2017-06-14 at 11 36 30 am](https://user-images.githubusercontent.com/17261080/27125798-bd04c256-50f5-11e7-8a74-b3b6a3fe82ee.png)](https://user-images.githubusercontent.com/17261080/27125798-bd04c256-50f5-11e7-8a74-b3b6a3fe82ee.png)

Might become a new standard for fully-connected activations in the future.

arxiv.org
scholar.google.com

StreetStyle: Exploring world-wide clothing styles from millions of photos
Matzen, Kevin and Bala, Kavita and Snavely, Noah
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Analyze large scale dataset of fashion images to discover visually consistent style clusters.

*   _Dataset:_ StreetStye-27K.
*   _Code:_ demo [here](http://streetstyle.cs.cornell.edu/)

## New dataset: StreetStye-27K

1.  **Photos (100 million)**: from Instagram using the [API](https://www.instagram.com/developer/) to retrieve images with the correct location and time.
2.  **People (14.5 million)**: they run two algorithms to normalize the body position in the image:
    *   [Face++](http://www.faceplusplus.com/) to detect and localize faces.
    *   [Deformable Part Model](http://people.cs.uchicago.edu/%7Erbg/latent-release5/) to estimate the visibility of the rest of the body.
3.  **Clothing annotations (27K)**: Amazon Mechanical Turk with quality control. 4000$ for the whole dataset.

## Architecture:

Usual GoogLeNet but they use [Isotonice Regression](http://fastml.com/classifier-calibration-with-platts-scaling-and-isotonic-regression/) to correct the bias.

## Unsupervised clustering:

They proceed as follow:

1.  Compute the features embedding for a subset of the overall dataset selected to represent location and time.
2.  Apply L2 normalization.
3.  Use PCA to find the vector representing 90% of the variance (165 here).
4.  Cluster them using a [GMM](https://en.wikipedia.org/wiki/Mixture_model#Multivariate_Gaussian_mixture_model) with 400 mixtures which represent the clusters.

They compute fashion clusters for city or bigger entities:  
[![screen shot 2017-06-15 at 12 04 06 pm](https://user-images.githubusercontent.com/17261080/27176447-d33fc2dc-51c2-11e7-9191-dbf972ee96a1.png)](https://user-images.githubusercontent.com/17261080/27176447-d33fc2dc-51c2-11e7-9191-dbf972ee96a1.png)

## Results:

Pretty standard techniques but all patched together to produce interesting visualizations.

doi.acm.org
sci-hub
scholar.google.com

Visual attribute transfer through deep image analogy
Liao, Jing and Yao, Yuan and Yuan, Lu and Hua, Gang and Kang, Sing Bing
ACM Special Interest Group on computer GRAPHics - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Transfer visual attribute (color, tone, texture, and style, etc) between two semantically-meaningful images such as a picture and a sketch.

## Inner workings:

### Image analogy

An image analogy A:A′::B:B′ is a relation where:

*   B′ relates to B in the same way as A′ relates to A
*   A and A′ are in pixel-wise correspondences
*   B and B′ are in pixel-wise correspondences

In this paper only a source image A and an example image B′ are given, and both A′ and B represent latent images to be estimated.

[![screen shot 2017-05-18 at 10 43 48 am](https://cloud.githubusercontent.com/assets/17261080/26193907/f080e212-3bb6-11e7-9441-7b255e4219f5.png)](https://cloud.githubusercontent.com/assets/17261080/26193907/f080e212-3bb6-11e7-9441-7b255e4219f5.png)

### Dense correspondence

In order to find dense correspondences between two images they use features from previously trained CNN (VGG-19) and retrieve all the ReLU layers.

The mapping is divided in two sub-mappings that are easier to compute, first a visual attribute transformation and then a space transformation.

[![screen shot 2017-05-18 at 11 04 58 am](https://cloud.githubusercontent.com/assets/17261080/26194835/03ccd94a-3bba-11e7-93ca-9420d4d96162.png)](https://cloud.githubusercontent.com/assets/17261080/26194835/03ccd94a-3bba-11e7-93ca-9420d4d96162.png)

## Architecture:

The algorithm proceeds as follow:

1.  Compute features at each layer for the input image using a pre-trained CNN and initialize feature maps of latent images with coarsest layer.
2.  For said layer compute a forward and reverse nearest-neighbor field (NNF, basically an offset field).
3.  Use this NNF with the feature of the input current layer to compute the features of the latent images.
4.  Upsample the NNF and use it as the initialization for the NNF of the next layer.

[![screen shot 2017-05-18 at 11 14 33 am](https://cloud.githubusercontent.com/assets/17261080/26195178/35277e0e-3bbb-11e7-82ce-037466314640.png)](https://cloud.githubusercontent.com/assets/17261080/26195178/35277e0e-3bbb-11e7-82ce-037466314640.png)

## Results:

Impressive quality on all type of visual transfer but veryyyyy slow! (~3min on GPUs for one image).

[![screen shot 2017-05-18 at 11 36 47 am](https://cloud.githubusercontent.com/assets/17261080/26196151/54ef423c-3bbe-11e7-9433-b29be5091fae.png)](https://cloud.githubusercontent.com/assets/17261080/26196151/54ef423c-3bbe-11e7-9433-b29be5091fae.png)

arxiv.org
scholar.google.com

pix2code: Generating Code from a Graphical User Interface Screenshot
Beltramelli, Tony
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

Generate code from a UI screenshot.

_Code:_ [Demo](https://youtu.be/pqKeXkhFA3I) and [code](https://github.com/tonybeltramelli/pix2code) to come.

## Inner-workings:

Decomposed the problem in three steps:

1.  a computer vision problem of understanding the given scene and inferring the objects present, their identities, positions, and poses.
2.  a language modeling problem of understanding computer code and generating syntactically and semantically correct samples.
3.  use the solutions to both previous sub-problems by exploiting the latent variables inferred from scene understanding to generate corresponding textual descriptions of the objects represented by these variables.

They also introduce a Domain Specific Languages (DSL) for modeling purposes.

## Architecture:

*   Vision model: usual AlexNet-like architecture
*   Language model: use onehot encoding for the words in the DSL vocabulary which is then fed into a LSTM
*   Combined model: LSTM too.

[![screen shot 2017-06-16 at 11 34 28 am](https://user-images.githubusercontent.com/17261080/27221124-c9cadcc6-5287-11e7-9d38-c4234af92912.png)](https://user-images.githubusercontent.com/17261080/27221124-c9cadcc6-5287-11e7-9d38-c4234af92912.png)

## Results:

Clearly not ready for any serious use but promising results!  
[![screen shot 2017-06-16 at 11 57 45 am](https://user-images.githubusercontent.com/17261080/27222031-0bf8e7de-528b-11e7-896f-cdb410f928c3.png)](https://user-images.githubusercontent.com/17261080/27222031-0bf8e7de-528b-11e7-896f-cdb410f928c3.png)

arxiv.org
scholar.google.com

A System for Accessible Artificial Intelligence
Olson, Randal S. and Sipper, Moshe and Cava, William La and Tartarone, Sharon and Vitale, Steven and Fu, Weixuan and Holmes, John H. and Moore, Jason H.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Develop a platform to make AI accessible

_Website:_ [here](http://pennai.org/)

## Inner-workings:

Platform for AI with deep learning and genetic programming. More focused on biology.

## Architecture:

[![screen shot 2017-06-26 at 11 00 07 am](https://user-images.githubusercontent.com/17261080/27690782-8b71f8c8-5ce2-11e7-9d84-77a4dd519e18.jpg)](https://user-images.githubusercontent.com/17261080/27690782-8b71f8c8-5ce2-11e7-9d84-77a4dd519e18.jpg)

## Results:

Just announced, keep an eye on it.

arxiv.org
scholar.google.com

Domain Adaptation with Randomized Multilinear Adversarial Networks
Long, Mingsheng and Cao, Zhangjie and Wang, Jianmin and Jordan, Michael I.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Perform domain-adaptation by adapting several layers using a randomized representation and not just the final layer thus performing alignment of the joint distribution and not just the marginals.

_Dataset:_ [Office](https://cs.stanford.edu/%7Ejhoffman/domainadapt/) and [ImageCLEF-DA1](http://imageclef.org/2014/adaptation).

## Inner-workings:

Basically an improvement on [RevGrad](https://arxiv.org/pdf/1505.07818.pdf) where instead of using the last embedding layer for the discriminator, a bunch of them is used.  
To avoid dimension explosion when using the tensor product of all layers they instead use a randomized multi-linear representation:  
[![screen shot 2017-06-01 at 5 35 46 pm](https://cloud.githubusercontent.com/assets/17261080/26687736/cff20446-46f0-11e7-918e-b60baa10aa67.png)](https://cloud.githubusercontent.com/assets/17261080/26687736/cff20446-46f0-11e7-918e-b60baa10aa67.png)  
Where:

*   d is the dimension of the embedding (they use 1024)
*   R is random matrix for which each element as a null average and variance of 1 (Bernoulli, Gaussian and Uniform are tried)
*   z^l is the l-th layer
*   ⊙ represents the Hadamard product  
    In practice they don't use all layers but just the 3-4 last layers for ResNet and AlexNet.

## Architecture:

[![screen shot 2017-06-01 at 5 34 44 pm](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d98-46f0-11e7-89d1-15452cbb527e.png)](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d98-46f0-11e7-89d1-15452cbb527e.png)

They use the usual losses for domain adaptation with: - F minimizing the cross-entropy loss for classification and trying to reduce the gap between the distributions (indicated by D). - D maximizing the gap between the distributions.

[![screen shot 2017-06-01 at 5 40 53 pm](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff70-46f1-11e7-917d-05129ab190b0.png)](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff70-46f1-11e7-917d-05129ab190b0.png)

## Results:

Improvement on state-of-the-art results for most tasks in the dataset, very easy to implement with any pre-trained network out of the box.

arxiv.org
arxiv-vanity.com
scholar.google.com

Softmax GAN
Min Lin
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, cs.NE
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Replace the usual GAN loss with a softmax croos-entropy loss to stabilize GAN training.

_Dataset:_ [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
## Inner working:

Linked to recent work such as WGAN or Loss-Sensitive GAN that focus on objective functions with non-vanishing gradients to avoid the situation where the discriminator `D` becomes too good and the gradient vanishes.

Thus they first introduce two targets for the discriminator `D` and the generator `G`:  
[![screen shot 2017-04-24 at 6 18 11 pm](https://cloud.githubusercontent.com/assets/17261080/25347232/767049bc-291a-11e7-906e-c19a92bb7431.png)](https://cloud.githubusercontent.com/assets/17261080/25347232/767049bc-291a-11e7-906e-c19a92bb7431.png)  
[![screen shot 2017-04-24 at 6 18 24 pm](https://cloud.githubusercontent.com/assets/17261080/25347233/7670ff60-291a-11e7-974f-83eb9269d238.png)](https://cloud.githubusercontent.com/assets/17261080/25347233/7670ff60-291a-11e7-974f-83eb9269d238.png)

And then the two new losses:  
[![screen shot 2017-04-24 at 6 19 50 pm](https://cloud.githubusercontent.com/assets/17261080/25347275/a303aa0a-291a-11e7-86b4-abd42c83d4a8.png)](https://cloud.githubusercontent.com/assets/17261080/25347275/a303aa0a-291a-11e7-86b4-abd42c83d4a8.png)  
[![screen shot 2017-04-24 at 6 19 55 pm](https://cloud.githubusercontent.com/assets/17261080/25347276/a307bc6c-291a-11e7-98b3-cbd7182090cd.png)](https://cloud.githubusercontent.com/assets/17261080/25347276/a307bc6c-291a-11e7-98b3-cbd7182090cd.png)

## Architecture:

They use the DCGAN architecture and simply change the loss and remove the batch normalization and other empirical techniques used to stabilize training.  
They show that the soft-max GAN is still robust to training.

arxiv.org
scholar.google.com

Generate To Adapt: Aligning Domains using Generative Adversarial Networks
Sankaranarayanan, Swami and Balaji, Yogesh and Castillo, Carlos D. and Chellappa, Rama
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

 _Objective:_ Use a GAN to learn an embedding invariant from domain shift.
_Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/), [SVHN](http://ufldl.stanford.edu/housenumbers/), USPS, [OFFICE](https://cs.stanford.edu/%7Ejhoffman/domainadapt/) and [CFP](http://mukh.com/).


## Architecture:

The total network is composed of several sub-networks:

1.  `F`, the Feature embedding network that takes as input an image from either the source or target dataset and generate a feature vector.
2.  `C`, the Classifier network when the image come from the source dataset.
3.  `G`, the Generative network that learns to generate an image similar to the source dataset using an image embedding from `F` and a random noise vector.
4.  `D`, the Discriminator network that tries to guess if an image is either from the source or the generative network.

`G` and `D` play a minimax game where `D` tries to classify the generated samples as fake and `G` tries to fool `D` by producing examples that are as realistic as possible.

The scheme for training the network is the following:

[![screen shot 2017-04-14 at 5 50 22 pm](https://cloud.githubusercontent.com/assets/17261080/25048122/f2a648b6-213a-11e7-93bd-954981bd3838.png)](https://cloud.githubusercontent.com/assets/17261080/25048122/f2a648b6-213a-11e7-93bd-954981bd3838.png)

## Results:

Very interesting, the generated image is just a side-product but the overall approach seems to be the state-of-the-art at the time of writing (the paper was published one week ago).

proceedings.mlr.press
scholar.google.com

Neural Episodic Control
Pritzel, Alexander and Uria, Benigno and Srinivasan, Sriram and Badia, Adrià Puigdomènech and Vinyals, Oriol and Hassabis, Demis and Wierstra, Daan and Blundell, Charles
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

 _Objective:_ Reduce learning time for [DQN](https://deepmind.com/research/dqn/)-type architectures.

They introduce a new network element, called DND (Differentiable Neural Dictionary) which is basically a dictionary that uses any key (especially embeddings) and computes the value by using kernel between keys. Plus it's differentiable.

## Architecture:

They use basically a network in two steps:

1.  A classical CNN network that computes and embedding for every image.
2.  A DND for all possible actions (controller input) that stores the embedding as key and estimated reward as value.

Also they use a buffer to store all tuples (previous image, action, reward, next image) and for training basic technique is used.

[![screen shot 2017-04-12 at 11 23 32 am](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)

## Results:

Clearly improves learning speed but in the end other techniques catchup and it gets outperformed.

arxiv.org
scholar.google.com

Mask R-CNN
He, Kaiming and Gkioxari, Georgia and Dollár, Piotr and Girshick, Ross B.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Image segmentation and pose estimation with an extension of Faster R-CNN.

_Dataset:_ [COCO](http://mscoco.org/) and [Cityscapes](https://www.cityscapes-dataset.com/).


## Inner workings:

The core operator of Faster R-CNN is the _RoIPool_ which performs coarse spatial quantization for feature extraction but introduce misalignment for pixel-pixel comparison which is what segmentation is. The paper introduce a new layer _RoIAlign_ that faithfully preserves exact spatial location.

One important point is that mask and class prediction are decoupled, the segmentation is proposed for each class without competing and the class predictor finally elects the winner.

## Architecture:

Based on Faster R-CNN but with an added mask subnetwork that computes a segmentation mask for each class.

Different feature extractors and proposers are tried, see two examples below:  
[![screen shot 2017-05-22 at 7 25 04 pm](https://cloud.githubusercontent.com/assets/17261080/26320765/659bfd6e-3f24-11e7-9184-393e83e9108d.png)](https://cloud.githubusercontent.com/assets/17261080/26320765/659bfd6e-3f24-11e7-9184-393e83e9108d.png)

## Results:

Runs at about 200ms per frame on a GPU for segmentation (2 days training on a single 8-GPU) and 5 fps for pose estimation.  
Very impressive segmentation and pose estimation:

[![screen shot 2017-05-22 at 7 26 57 pm 1](https://cloud.githubusercontent.com/assets/17261080/26320824/a9a0909c-3f24-11e7-8e06-b2f132aad2d7.png)](https://cloud.githubusercontent.com/assets/17261080/26320824/a9a0909c-3f24-11e7-8e06-b2f132aad2d7.png)

[![screen shot 2017-05-22 at 7 29 26 pm](https://cloud.githubusercontent.com/assets/17261080/26320929/08b71c4a-3f25-11e7-8eb5-959ceb7b6112.png)](https://cloud.githubusercontent.com/assets/17261080/26320929/08b71c4a-3f25-11e7-8eb5-959ceb7b6112.png)

arxiv.org
arxiv-vanity.com
scholar.google.com

BEGAN: Boundary Equilibrium Generative Adversarial Networks
David Berthelot and Thomas Schumm and Luke Metz
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, stat.ML
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Improve GANs convergence to more diverse and visually pleasing images at higher resolution using a novel equilibrium method between the discriminator and the generator that also simplifies training procedures.

_Dataset:_ [LFW](http://vis-www.cs.umass.edu/lfw/)

## Inner workings:

They try to match the distribution of the errors (assumed to be normally distributed) instead of matching the distribution of the samples directly. In order to do this they compute the Wasserstein distance between a pixel-wise autoencoder loss distributions of real and generated samples defined as follow:

1.  Autoencoder loss:

[![screen shot 2017-04-24 at 3 46 32 pm](https://cloud.githubusercontent.com/assets/17261080/25340190/429f9788-2905-11e7-88dc-b44567b9cd34.png)](https://cloud.githubusercontent.com/assets/17261080/25340190/429f9788-2905-11e7-88dc-b44567b9cd34.png)

2.  Wasserstein distance for two normal distributions μ1 = N(m1, C1) and μ2 = N(m2, C2)

[![screen shot 2017-04-24 at 3 46 44 pm](https://cloud.githubusercontent.com/assets/17261080/25340191/42b23474-2905-11e7-9810-58d5326bf886.png)](https://cloud.githubusercontent.com/assets/17261080/25340191/42b23474-2905-11e7-9810-58d5326bf886.png)

They also introduce an equilibrium concept to account for the situation when `G` and `D` are not well balanced and the discriminator `D` wins easily. This is controlled by what they call the diversity ratio that balances between auto-encoding real images and discriminating real from generated images. It is defined as follow:  
[![screen shot 2017-04-24 at 3 56 29 pm](https://cloud.githubusercontent.com/assets/17261080/25340609/992c2188-2906-11e7-8c51-498bbd293119.png)](https://cloud.githubusercontent.com/assets/17261080/25340609/992c2188-2906-11e7-8c51-498bbd293119.png)

To maintain this balance they use a standard SGD but they introduce a variable `kt` initially 0 to control how much emphasis is put on the generator `G`. This removes the need to do `x` steps on `D` followed by `y` steps on `G` or to pretrained one of the two.  
[![screen shot 2017-04-24 at 3 59 57 pm](https://cloud.githubusercontent.com/assets/17261080/25340859/4ee06476-2907-11e7-971f-90421449cb51.png)](https://cloud.githubusercontent.com/assets/17261080/25340859/4ee06476-2907-11e7-971f-90421449cb51.png)

Finally they derive a global convergence measure by using the equilibrium concept that can be used to determine when the network has reached its final state or if the model has collapsed:  
[![screen shot 2017-04-24 at 4 04 12 pm](https://cloud.githubusercontent.com/assets/17261080/25340998/b8bf6ad6-2907-11e7-8afa-294cae32c6af.png)](https://cloud.githubusercontent.com/assets/17261080/25340998/b8bf6ad6-2907-11e7-8afa-294cae32c6af.png)

## Architecture:

They tried to keep the architecture simple to really study the impact of their new equilibrium principle and loss. They don't use batch normalization, dropout, transpose convolutions or exponential growth for convolution filters.

[![screen shot 2017-04-24 at 4 09 29 pm](https://cloud.githubusercontent.com/assets/17261080/25341219/6fb7be28-2908-11e7-8774-287c1b7d7684.png)](https://cloud.githubusercontent.com/assets/17261080/25341219/6fb7be28-2908-11e7-8774-287c1b7d7684.png)

## Results:

They trained on images from 32x32 to 256x256, but at higher resolution images tend to lose sharpness. Nevertheless images are very very good!  
[![screen shot 2017-04-24 at 4 20 30 pm](https://cloud.githubusercontent.com/assets/17261080/25341699/f99b0770-2909-11e7-84a0-3ac0436771e5.png)](https://cloud.githubusercontent.com/assets/17261080/25341699/f99b0770-2909-11e7-84a0-3ac0436771e5.png)

arxiv.org
scholar.google.com

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Image-to-image translation to perform visual attribute transfer using unpaired images.

_Dataset:_ [Cityscapes](https://www.cityscapes-dataset.com/), [CMP Facade](http://cmp.felk.cvut.cz/%7Etylecr1/facade/), [UT Zappos50k](http://vision.cs.utexas.edu/projects/finegrained/utzap50k/) and [ImageNet](http://www.image-net.org/).

_Code:_ [CycleGAN](https://github.com/junyanz/CycleGAN)

## Inner-workings:

Basically two GANs for each domain with their respective Generator and Discriminator plus two additional losses (called consistency losses) to make sure that translating to the other domain then back yields an image that is still realistic.  
[![screen shot 2017-06-02 at 10 24 45 am](https://cloud.githubusercontent.com/assets/17261080/26717449/bcd8a9cc-477d-11e7-9137-fd277a0ec04f.png)](https://cloud.githubusercontent.com/assets/17261080/26717449/bcd8a9cc-477d-11e7-9137-fd277a0ec04f.png)

For the consistency los they use a pixel-wise L1 norm:  
[![screen shot 2017-06-02 at 10 31 22 am](https://cloud.githubusercontent.com/assets/17261080/26717733/bc088cdc-477e-11e7-96af-2defa06a1660.png)](https://cloud.githubusercontent.com/assets/17261080/26717733/bc088cdc-477e-11e7-96af-2defa06a1660.png)

## Architecture:

Based on [Perceptual losses for real-time style transfer and super-resolution](https://arxiv.org/pdf/1603.08155.pdf), code available [here](https://github.com/jcjohnson/fast-neural-style).  
Training seems to employ several tricks and then even use a batch of 1.

## Results:

Very impressive and the really key point is that you don't need paired images which makes this trainable on any domain with the same representation behind.  
[![screen shot 2017-06-02 at 10 26 29 am](https://cloud.githubusercontent.com/assets/17261080/26717502/f6d1fb7e-477d-11e7-8174-7bdd621cf1b6.png)](https://cloud.githubusercontent.com/assets/17261080/26717502/f6d1fb7e-477d-11e7-8174-7bdd621cf1b6.png)

arxiv.org
arxiv-vanity.com
scholar.google.com

Understanding deep learning requires rethinking generalization
Chiyuan Zhang and Samy Bengio and Moritz Hardt and Benjamin Recht and Oriol Vinyals
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Theoretical study of Deep Neural Network, their expressivity and regularizations.

## Results:

The key findings of the article are:

### A. Deep neural networks easily fit random labels.

Both when randomizing labels, replacing images with raw noise or all situations in-between.

1.  The effective capacity of neural networks is sufficient for memorizing the entire data set.
2.  Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels.
3.  Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

### B. Explicit regularization may improve generalization performance, but is neither necessary nor by itself sufficient for controlling generalization error.

By explicit regularization they mean batch normalisation, weight decay, dropout, data augmentation, etc.

### C. Generically large neural networks can express any labeling of the training data.

More formally, a very simple two-layer ReLU network with `p = 2n + d` parameters can express any labeling of any sample of size `n` in `d` dimensions.

### D. The optimization algorithm itself is implicitly regularizing the solution.

SGD acts as an implicit regularizer and properties are inherited by models that were trained using SGD.

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Adversarial Discriminative Domain Adaptation
Tzeng, Eric and Hoffman, Judy and Saenko, Kate and Darrell, Trevor
Conference and Computer Vision and Pattern Recognition - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Define a framework for Adversarial Domain Adaptation and propose a new architecture as state-of-the-art.

 _Dataset:_ MNIST, USPS, SVHN and NYUD.   

## Inner workings:

Subsumes previous work in a generalized framework where designing a new method is now simplified to the space of making three design choices:

*   whether to use a generative or discriminative base model.
*   whether to tie or untie the weights.
*   which adversarial learning objective to use.

[![screen shot 2017-04-18 at 5 10 01 pm](https://cloud.githubusercontent.com/assets/17261080/25138167/15d5e644-245a-11e7-9fb8-636ce4111036.png)](https://cloud.githubusercontent.com/assets/17261080/25138167/15d5e644-245a-11e7-9fb8-636ce4111036.png)

## Architecture:

[![screen shot 2017-04-18 at 5 14 44 pm](https://cloud.githubusercontent.com/assets/17261080/25138526/07848bd0-245b-11e7-94c9-f6ae7ccea76f.png)](https://cloud.githubusercontent.com/assets/17261080/25138526/07848bd0-245b-11e7-94c9-f6ae7ccea76f.png)

## Results:

Interesting as the theoretical framework seem to converge with other papers and their architecture improves on previous papers performance even if it's not a huge improvement.

doi.org
sci-hub
scholar.google.com

Cost-Effective Active Learning for Deep Image Classification
Wang, Keze and Zhang, Dongyu and Li, Ya and Zhang, Ruimao and Lin, Liang
IEEE Trans. Circuits Syst. Video Techn. - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Specifically adapt Active Learning to Image Classification with deep learning

_Dataset:_ [CARC](https://bcsiriuschen.github.io/CARC/) and [Caltech-256](http://authors.library.caltech.edu/7694/)

## Inner-workings:

They labels from two sources:

* The most informative/uncertain samples are manually labeled using Least confidence, margin sampling and entropy, see [Active Learning Literature Survey](https://github.com/Deepomatic/papers/issues/192).
* The other kind is the samples with high prediction confidence that are automatically labelled. They represent the majority of samples.

## Architecture:

[![screen shot 2017-06-29 at 3 57 43 pm](https://user-images.githubusercontent.com/17261080/27691277-d4547196-5ce3-11e7-849c-aadd30d71d68.png)](https://user-images.githubusercontent.com/17261080/27691277-d4547196-5ce3-11e7-849c-aadd30d71d68.png)

They proceed with the following steps:

1. Initialization: they manually annotate a given number of images for each class in order to pre-trained the network.
2. Complementary sample selection: they fix the network, identity the most uncertain label for manual annotation and automatically annotate the most certain one if their entropy is higher than a given threshold.
3. CNN fine-tuning: they train the network using the whole pool of already labeled data and pseudo-labeled. Then they put all the automatically labeled images back into the unlabelled pool.
4. Threshold updating: as the network gets more and more confident the threshold for auto-labelling is linearly reducing. The idea is that the network gets a more reliable representation and its trustability increases.

## Results:

Roughly divide by 2 the number of annotation needed.
⚠️ I don't feel like this paper can be trusted 100% ⚠️

arxiv.org
arxiv-vanity.com
scholar.google.com

Wasserstein GAN
Martin Arjovsky and Soumith Chintala and Léon Bottou
arXiv e-Print archive - 2017 via Local arXiv
Keywords: stat.ML, cs.LG
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Robust unsupervised learning of a probability distribution using a new module called the `critic` and the `Earth-mover distance`.

 _Dataset:_ [LSUN-Bedrooms](http://lsun.cs.princeton.edu/2016/)

## Inner working:

Basically train a `critic` until convergence to retrieve the Wasserstein-1 distance, see pseudo-algorithm below:

[![screen shot 2017-05-03 at 5 05 09 pm](https://cloud.githubusercontent.com/assets/17261080/25667162/003c9330-3023-11e7-9081-c181011f4e6f.png)](https://cloud.githubusercontent.com/assets/17261080/25667162/003c9330-3023-11e7-9081-c181011f4e6f.png)

## Results:

*   Easier training: no need for batch normalization and no need to fine-tune generator/discriminator balance.
*   Less sensitivity to network architecture.
*   Very good proxy that correlates very well with sample quality.
*   Non-vanishing gradients.

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Learning from Noisy Large-Scale Datasets with Minimal Supervision
Veit, Andreas and Alldrin, Neil and Chechik, Gal and Krasin, Ivan and Gupta, Abhinav and Belongie, Serge J.
Conference and Computer Vision and Pattern Recognition - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Predict labels using a very large dataset with noisy labels and a much smaller (3 orders of magnitude) dataset with human-verified annotations.

_Dataset:_ [Open image](https://research.googleblog.com/2016/09/introducing-open-images-dataset.html)


## Architecture:

Contrary to other approaches they use the clean labels, the noisy labels but also image features. They basically train 3 networks:

1.  A feature extractor for the image.
2.  A label Cleaning Network that predicts to learn verified labels from noisy labels + image feature.
3.  An image classifier that predicts using just the image.

[![screen shot 2017-04-12 at 11 10 56 am](https://cloud.githubusercontent.com/assets/17261080/24950258/c4764106-1f70-11e7-82e4-c1111ffc089e.png)](https://cloud.githubusercontent.com/assets/17261080/24950258/c4764106-1f70-11e7-82e4-c1111ffc089e.png)

## Results:

Overall better performance but not breath-taking improvement: from `AP 83.832 / MAP 61.82` for a NN trained only on labels to `AP 87.67 / MAP 62.38` with their approach.

arxiv.org
arxiv-vanity.com
scholar.google.com

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen and Jason Yosinski and Yoshua Bengio and Alexey Dosovitskiy and Jeff Clune
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Find a generative model that avoids usual shortcomings: (i) high-resolution, (ii) variety of images and (iii) matching the dataset diversity.

_Dataset:_ [ImageNet](https://www.image-net.org/)

## Inner-workings:

The idea is to find an image that maximizes the probability for a given label by using a variant of a Markov Chain Monte Carlo (MCMC) sampler.  
[![screen shot 2017-06-01 at 12 31 14 pm](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png)](https://cloud.githubusercontent.com/assets/17261080/26675978/3c9e6d94-46c6-11e7-9f67-477c4036a891.png)  
Where the first term ensures that we stay in the image manifold that we're trying to find and don't just produce adversarial examples and the second term makes sure that find an image corresponding to the label we're looking for.

Basically we start with a random image and iteratively find a better image to match the label we're trying to generate.

### MALA-approx:

MALA-approx is the MCMC sampler based on the Metropolis-Adjusted Langevin Algorithm that they use in the paper, it is defined iteratively as follow:  
[![screen shot 2017-06-01 at 12 25 45 pm](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png)](https://cloud.githubusercontent.com/assets/17261080/26675866/bf15cc28-46c5-11e7-9620-659d26f84bf8.png)  
where:

*   epsilon1 makes the image more generic.
*   epsilon2 increases confidence in the chosen class.
*   epsilon3 adds noise to encourage diversity.

### Image prior:

They try several priors for the images:

1.  PPGN-x: p(x) is modeled with a Denoising Auto-Encoder (DAE).

[![screen shot 2017-06-01 at 1 48 33 pm](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png)](https://cloud.githubusercontent.com/assets/17261080/26678501/1737c64e-46d1-11e7-82a4-7ee0aa8bfe2f.png)

2.  DGN-AM: use a latent space to model x with h using a GAN.

[![screen shot 2017-06-01 at 1 49 41 pm](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png)](https://cloud.githubusercontent.com/assets/17261080/26678517/2e743194-46d1-11e7-95dc-9bb638128242.png)

3.  PPGN-h: incorporates a prior for p(h) using a DAE.

[![screen shot 2017-06-01 at 1 51 14 pm](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png)](https://cloud.githubusercontent.com/assets/17261080/26678579/6bd8cb58-46d1-11e7-895d-f9432b7e5e1f.png)

4.  Joint PPGN-h: to increases expressivity of G, model h by first modeling x in the DAE.

[![screen shot 2017-06-01 at 1 51 23 pm](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png)](https://cloud.githubusercontent.com/assets/17261080/26678622/a7bf2f68-46d1-11e7-9209-98f97e0a218d.png)

5.  Noiseless joint PPGN-h: same as previous but without noise.

[![screen shot 2017-06-01 at 1 54 11 pm](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png)](https://cloud.githubusercontent.com/assets/17261080/26678655/d5499220-46d1-11e7-93d0-d48a6b6fa1a8.png)

### Conditioning:

In the paper they mostly use conditioning on label but captions or pretty much anything can also be used.  
[![screen shot 2017-06-01 at 2 26 53 pm](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png)](https://cloud.githubusercontent.com/assets/17261080/26679654/6297ab86-46d6-11e7-86fa-f763face01ca.png)

## Architecture:

The final architecture using a pretrained classifier network is below. Note that only G and D are trained.  
[![screen shot 2017-06-01 at 2 29 49 pm](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png)](https://cloud.githubusercontent.com/assets/17261080/26679785/db143520-46d6-11e7-9668-72864f1a8eb1.png)

## Results:

Pretty much any base network can be used with minimal training of G and D. It produces very realistic image with a great diversity, see below for examples of 227x227 images with ImageNet.  
[![screen shot 2017-06-01 at 2 32 38 pm](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png)](https://cloud.githubusercontent.com/assets/17261080/26679884/4494002a-46d7-11e7-882e-c69aff2ddd17.png)

arxiv.org
arxiv-vanity.com
scholar.google.com

YOLO9000: Better, Faster, Stronger
Joseph Redmon and Ali Farhadi
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Train on both classification and detection image to make a better faster and stronger detector.

_Dataset:_ [ImageNet](http://www.image-net.org/), [COCO](http://mscoco.org/) and [WordNet](https://wordnet.princeton.edu/).


## Architecture:

Apart from amelioration such as batch norm or other general tweaking the real improvements come from:

1.  Using both a classification dataset and a detection dataset at the same time.
2.  Replacing the usual final soft-max layer (which assumes that all labels are mutually exclusive) with a WordTree label hierarchy base on WordNet which enables the network to predict `dog` even if it doesn't know if it's a `Fox Terrier`.

[![screen shot 2017-04-12 at 7 24 28 pm](https://cloud.githubusercontent.com/assets/17261080/24970727/b7abaf02-1fb5-11e7-8b78-2a430a861cbd.png)](https://cloud.githubusercontent.com/assets/17261080/24970727/b7abaf02-1fb5-11e7-8b78-2a430a861cbd.png)

## Results:

State of the art results at full resolution and possibility to lower performance to gain in computation time.

[![screen shot 2017-04-12 at 7 31 26 pm](https://cloud.githubusercontent.com/assets/17261080/24971010/a51556f8-1fb6-11e7-9289-fc277b182686.png)](https://cloud.githubusercontent.com/assets/17261080/24971010/a51556f8-1fb6-11e7-9289-fc277b182686.png)

arxiv.org
scholar.google.com

Deep Information Propagation
Schoenholz, Samuel S. and Gilmer, Justin and Ganguli, Surya and Sohl-Dickstein, Jascha
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Fondamental analysis of random networks using mean-field theory. Introduces two scales controlling network behavior.

## Results:

Guide to choose hyper-parameters for random networks to be nearly critical (in between order and chaos). This in turn implies that information can propagate forward and backward and thus the network is trainable (not vanishing or exploding gradient).

Basically for any given number of layers and initialization covariances for weights and biases, tells you if the network will be trainable or not, kind of an architecture validation tool.

**To be noted:** any amount of dropout removes the critical point and therefore imply an upper bound on trainable network depth.

## Caveats:

*   Consider only bounded activation units: no relu, etc.
*   Applies directly only to fully connected feed-forward networks: no convnet, etc.

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
Huang, Jonathan and Rathod, Vivek and Sun, Chen and Zhu, Menglong and Korattikara, Anoop and Fathi, Alireza and Fischer, Ian and Wojna, Zbigniew and Song, Yang and Guadarrama, Sergio and Murphy, Kevin
Conference and Computer Vision and Pattern Recognition - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Compare several meta-architectures and hyper-parameters in the same framework for easy comparison.

## Architectures:

Four meta architectures:

1.  R-CNN
2.  Faster R-CNN
3.  SSD
4.  YOLO Architecture (not evaluated in the paper)

[![screen shot 2017-05-05 at 3 12 57 pm](https://cloud.githubusercontent.com/assets/17261080/25746807/5a294360-31a5-11e7-808e-d48497a16cd5.png)](https://cloud.githubusercontent.com/assets/17261080/25746807/5a294360-31a5-11e7-808e-d48497a16cd5.png)

## Results:

Very interesting to know which framework to implement or not at first glance.

arxiv.org
arxiv-vanity.com
scholar.google.com

Neural Architecture Search with Reinforcement Learning
Barret Zoph and Quoc V. Le
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG, cs.AI, cs.NE
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Design a network that will itself find the best architecture for a given task.

_Dataset:_ [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html) and [PTB](https://catalog.ldc.upenn.edu/ldc99t42).


## Inner-workings:

The meta-network (a RNN) generates a string specifying the child network parameters. Such a child network is then trained for 35-50 epochs and its accuracy is used as the reward to train the meta-network with Reinforcement Learning.  
The RNN first generates networks with few layers (6) then this number is increased as training progresses.

## Architecture:

They develop one architecture for CNN where they predict each layers characteristic plus it's possible skip-connection:  
[![screen shot 2017-05-24 at 8 13 01 am](https://cloud.githubusercontent.com/assets/17261080/26389176/d807de42-4058-11e7-942a-8a129558e126.png)](https://cloud.githubusercontent.com/assets/17261080/26389176/d807de42-4058-11e7-942a-8a129558e126.png)

And one specific for LTSM-style:  
[![screen shot 2017-05-24 at 8 13 26 am](https://cloud.githubusercontent.com/assets/17261080/26389190/e2bfd506-4058-11e7-9168-62abd040156e.png)](https://cloud.githubusercontent.com/assets/17261080/26389190/e2bfd506-4058-11e7-9168-62abd040156e.png)

## Distributed setting:

Bellow is the distributed setting that they use with parameter servers connected to replicas (GPUs) that trained child networks.  
[![screen shot 2017-05-24 at 8 09 05 am](https://cloud.githubusercontent.com/assets/17261080/26389084/5e354456-4058-11e7-83a9-089cb2c115b7.png)](https://cloud.githubusercontent.com/assets/17261080/26389084/5e354456-4058-11e7-83a9-089cb2c115b7.png)

## Results:

Overall they trained 12800 networks on 800 GPUs but they achieve state of the art results which not human intervention except the vocabulary selection (activation type, type of cells, etc). Next step, transfer learning from one task to another for the meta-network?

arxiv.org
scholar.google.com

Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation
Ghifary, Muhammad and Kleijn, W. Bastiaan and Zhang, Mengjie and Balduzzi, David and Li, Wen
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Build a network easily trainable by back-propagation to perform unsupervised domain adaptation while at the same time learning a good embedding for both source and target domains.

_Dataset:_ [SVHN](ufldl.stanford.edu/housenumbers/), [MNIST](yann.lecun.com/exdb/mnist/), [USPS](https://www.otexts.org/1577), [CIFAR](https://www.cs.toronto.edu/%7Ekriz/cifar.html) and [STL](https://cs.stanford.edu/%7Eacoates/stl10/).


## Architecture:

Very similar to RevGrad but with some differences.

Basically a shared encoder and then a classifier and a reconstructor.  
[![screen shot 2017-05-22 at 6 11 22 pm](https://cloud.githubusercontent.com/assets/17261080/26318076/21361592-3f1a-11e7-9213-9cc07cfe2f2a.png)](https://cloud.githubusercontent.com/assets/17261080/26318076/21361592-3f1a-11e7-9213-9cc07cfe2f2a.png)

The two losses are:

*   the usual cross-entropy with softmax for the classifier
*   the pixel-wise squared loss for reconstruction

Which are then combined using a trade-off hyper-parameter between classification and reconstruction.

They also use data augmentation to generate additional training data during the supervised training using only geometrical deformation: translation, rotation, skewing, and scaling

Plus denoising to reconstruct clean inputs given their noisy counterparts (zero-masked noise and Gaussian noise).

## Results:

Outperforms state of the art on most tasks at the time, now outperformed itself by Generate To Adapt on most tasks.

jmlr.org
scholar.google.com

Domain-Adversarial Training of Neural Networks
Ganin, Yaroslav and Ustinova, Evgeniya and Ajakan, Hana and Germain, Pascal and Larochelle, Hugo and Laviolette, François and Marchand, Mario and Lempitsky, Victor S.
Journal of Machine Learning Research - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Find a feature representation that cannot discriminate between the training (source) and test (target) domains using a discriminator trained directly on this embedding.

_Dataset:_ MNIST, SYN Numbers, SVHN, SYN Signs, OFFICE, PRID, VIPeR and CUHK.

## Architecture:

The basic idea behind this paper is to use a standard classifier network and chose one layer that will be the feature representation. The network before this layer is called the `Feature Extractor` and after the `Label Predictor`. Then a new network called a `Domain Classifier` is introduced that takes as input the extracted feature, its objective is to tell if a computed feature embedding came from an image from the source or target dataset.

At training the aim is to minimize the loss of the `Label Predictor` while maximizing the loss of the `Domain Classifier`. In theory we should end up with a feature embedding where the discriminator can't tell if the image came from the source or target domain, thus the domain shift should have been eliminated.

To maximize the domain loss, a new layer is introduced, the `Gradient Reversal Layer` which is equal to the identity during the forward-pass but reverse the gradient in the back-propagation phase. This enables the network to be trained using simple gradient descent algorithms.

What is interesting with this approach is that any initial network can be used by simply adding a few new set of layers for the domain classifiers. Below is a generic architecture.

[![screen shot 2017-04-18 at 1 59 53 pm](https://cloud.githubusercontent.com/assets/17261080/25129680/590f57ee-243f-11e7-8927-91124303b584.png)](https://cloud.githubusercontent.com/assets/17261080/25129680/590f57ee-243f-11e7-8927-91124303b584.png)

## Results:

Their approach is working but for some domain adaptation it completely fails and overall its performance are not great. Since then the state-of-the-art has changed, see DANN combined with GAN or ADDA.

papers.nips.cc
scholar.google.com

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Ren, Shaoqing and He, Kaiming and Girshick, Ross B. and Sun, Jian
Neural Information Processing Systems Conference - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Improve on Fast R-CNN and [SPPnet](https://arxiv.org/abs/1406.4729) by incorporating the region proposal network directly.

_Dataset:_ [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/).


Both Fast R-CNN and SPPnet takes as input an image and several possibles objects (corresponding to regions of interest) and score each of them. They are thus two different entities:

1.  A region proposal network.
2.  A classification/detection network (Fast R-CNN/SSPnet).

## Architecture:

First image features are extracted using a state of the art ConvNet, then they are used for both Region proposal and actual detection/classification on those regions.

[![screen shot 2017-04-14 at 2 59 28 pm](https://cloud.githubusercontent.com/assets/17261080/25043807/01a287b6-2123-11e7-944c-01493371df29.png)](https://cloud.githubusercontent.com/assets/17261080/25043807/01a287b6-2123-11e7-944c-01493371df29.png)

## Results:

By incorporating the region proposal network right after the feature ConvNet its computation cost becomes basically free which leads to an elegant solution (only one network) but more importantly greatly improve speed at test time.

arxiv.org
scholar.google.com

Deep Residual Learning for Image Recognition
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Solve the degradation problem where adding layers induces a higher training error.

_Dataset:_ [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html), [PASCAL](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/).

## Inner-workings:

They argue that it is easier to learn the difference to the identity (the residual) than the actual mapping. Basically they start with the identity and learn the residual mapping.  
This allows for easier training and thus deeper network.

## Architecture:

They introduce two new building block for Residual Networks, depending on the input dimensionality:  
[![screen shot 2017-05-31 at 3 49 59 pm](https://cloud.githubusercontent.com/assets/17261080/26635061/d489dbe2-4618-11e7-911e-68772265ee9f.png)](https://cloud.githubusercontent.com/assets/17261080/26635061/d489dbe2-4618-11e7-911e-68772265ee9f.png)  
[![screen shot 2017-05-31 at 3 57 47 pm](https://cloud.githubusercontent.com/assets/17261080/26635420/f6f22af8-4619-11e7-9639-ed651f8b18bb.png)](https://cloud.githubusercontent.com/assets/17261080/26635420/f6f22af8-4619-11e7-9639-ed651f8b18bb.png)

That can then be chained to produce network such as:  
[![screen shot 2017-05-31 at 3 54 16 pm](https://cloud.githubusercontent.com/assets/17261080/26635258/7b64530c-4619-11e7-81c8-5d6be547da77.png)](https://cloud.githubusercontent.com/assets/17261080/26635258/7b64530c-4619-11e7-81c8-5d6be547da77.png)

## Results:

Won most 1st places, very impressive and adding layers do increase accuracy.

arxiv.org
arxiv-vanity.com
scholar.google.com

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford and Luke Metz and Soumith Chintala
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG, cs.CV
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Propose a more stable set of architectures for training GAN and show that they learn good representations of images for supervised learning and generative modeling.

_Dataset:_ [LSUN](http://www.yf.io/p/lsun) and [ImageNet 1k](www.image-net.org/).

## Architecture:

Below are the guidelines for making DCGANs.  
[![screen shot 2017-04-24 at 10 58 17 am](https://cloud.githubusercontent.com/assets/17261080/25329644/f3885f7c-28dc-11e7-8895-051124c8ff6c.png)](https://cloud.githubusercontent.com/assets/17261080/25329644/f3885f7c-28dc-11e7-8895-051124c8ff6c.png)

And here is a sample network:  
[![screen shot 2017-04-24 at 10 57 54 am](https://cloud.githubusercontent.com/assets/17261080/25329634/e9c14abc-28dc-11e7-8bed-068f7f7bc78d.png)](https://cloud.githubusercontent.com/assets/17261080/25329634/e9c14abc-28dc-11e7-8bed-068f7f7bc78d.png)

A tensorflow implementation can be found [here](https://github.com/carpedm20/DCGAN-tensorflow) along with an [online demo](https://carpedm20.github.io/faces/).

## Results:

Quite interesting especially concerning the structure learned in the Z-space and how this can be used for interpolation or object removal, see the example that is shown everywhere:  
[![screen shot 2017-04-24 at 11 20 03 am](https://cloud.githubusercontent.com/assets/17261080/25330458/080b6b4e-28e0-11e7-9ab6-ce58ef5b5562.png)](https://cloud.githubusercontent.com/assets/17261080/25330458/080b6b4e-28e0-11e7-9ab6-ce58ef5b5562.png)

Nonetheless the network is still generating small images (32x32).

www.aaai.org
sci-hub
scholar.google.com

Active Learning by Learning
Hsu, Wei-Ning and Lin, Hsuan-Tien
AAAI Conference on Artificial Intelligence - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

Automatically learn which Active Learning strategy to use.

_Code:_ [here](https://github.com/ntucllab/libact)

## Inner-workings:

They use the multi-armed bandit framework where each arm is an Active Learning strategy.

The core RL algorithm used is [EXP4.P](https://arxiv.org/abs/1002.4058) which is itself based on EXP4 (**Exp**onential weighting for **Exp**loration and **Exp**lotation with **Exp**erts). They make only slight adjustments to the reward function.

## Algorithm:

[![screen shot 2017-06-14 at 7 33 46 pm](https://user-images.githubusercontent.com/17261080/27146101-6d8392b4-5138-11e7-8e12-5617b258ddfa.png)](https://user-images.githubusercontent.com/17261080/27146101-6d8392b4-5138-11e7-8e12-5617b258ddfa.png)

## Results:

Beats all other techniques most of the time and make sure that in the long run we use the best strategy.

dx.doi.org
sci-hub
scholar.google.com

Fast R-CNN
Girshick, Ross B.
International Conference on Computer Vision - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

Improve on [R-CNN](https://arxiv.org/abs/1311.2524) and [SPPnet](https://arxiv.org/abs/1406.4729) with easier and faster training.

Region-based Convolutional Neural Network (R-CNN), basically takes as input and image and several possibles objects (corresponding to Region of Interest) and score each of them.

## Architecture:

The feature map is computed for the whole image and then for each region of interest a new fixed-length feature vector is computed using max-pooling. From it two predictions are made for classification and bounding-box offsets.

[![screen shot 2017-04-14 at 12 46 38 pm](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)

## Results:

By sharing computation for RoIs of the same image and allowing simple SGD training it really improves performance training although at testing it's still not as fast as YOLO9000.

arxiv.org
arxiv-vanity.com
scholar.google.com

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe and Christian Szegedy
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG
more

[link] Summary by Léo Paillier 7 years ago

Network training is very sensitive to learning rate and initialization factors. Each layer output distribution is different than its input distribution (called covariate shift) which implies that layers have to permanently adapt to new input distribution. In this paper the author introduce batch normalization, a new layer to reduce covariate shift.

_Dataset:_ [MNIST](http://yann.lecun.com/exdb/mnist/), [ImageNet](www.image-net.org/).

#### Inner workings:

Batch normalization fixes the means and variances of layer inputs for a training batch by computing the following normalization on each batch.
[![screen shot 2017-04-13 at 10 21 39 am](https://cloud.githubusercontent.com/assets/17261080/24996464/4027fbba-2033-11e7-966a-2db3c0f1389d.png)](https://cloud.githubusercontent.com/assets/17261080/24996464/4027fbba-2033-11e7-966a-2db3c0f1389d.png)
The parameters Gamma and Beta are then learned with a gradient descent.
During inference the statistics are computed using unbiased estimators of the whole dataset (and not just the batch).

#### Results:

Batch normalization provides several advantages:

1. Use of a higher learning rate without risk of divergence by stabilizing the gradient scale.
2. Regularizes the model.
3. Reduces the need for dropout.
4. Avoid the network to get stuck when using saturating nonlinearities.

#### What to do?

1. Add batch norm layer before activation layers.
2. Increase the learning rate.
3. Remove dropout.
4. Reduce L2 weight regularization.
5. Accelerate learning rate decay.
6. Reduce picture distorsion for data augmentation.

arxiv.org
scholar.google.com

Training Deep Neural Networks on Noisy Labels with Bootstrapping
Reed, Scott E. and Lee, Honglak and Anguelov, Dragomir and Szegedy, Christian and Erhan, Dumitru and Rabinovich, Andrew
arXiv e-Print archive - 2014 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Design a loss to make deep network robust to label noise.

_Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/), Toroto Faces Database, [ILSVRC2014](http://www.image-net.org/challenges/LSVRC/2014/).


#### Inner-workings:

Three types of losses are presented:

*   reconstruciton loss:

[![screen shot 2017-06-26 at 11 00 07 am](https://user-images.githubusercontent.com/17261080/27532200-bb42b8a6-5a5f-11e7-8c14-673958216bfc.png)](https://user-images.githubusercontent.com/17261080/27532200-bb42b8a6-5a5f-11e7-8c14-673958216bfc.png)

*   soft bootstrapping which uses the predicted labels by the network `qk` and the user-provided labels `tk`:

[![screen shot 2017-06-26 at 11 10 43 am](https://user-images.githubusercontent.com/17261080/27532296-1e01a420-5a60-11e7-9273-d1affb0d7c2e.png)](https://user-images.githubusercontent.com/17261080/27532296-1e01a420-5a60-11e7-9273-d1affb0d7c2e.png)

*   hard bootstrapping replaces the soft predicted labels by their binary version:

[![screen shot 2017-06-26 at 11 12 58 am](https://user-images.githubusercontent.com/17261080/27532439-a3f9dbd8-5a60-11e7-91a7-327efc748eae.png)](https://user-images.githubusercontent.com/17261080/27532439-a3f9dbd8-5a60-11e7-91a7-327efc748eae.png)

[![screen shot 2017-06-26 at 11 13 05 am](https://user-images.githubusercontent.com/17261080/27532463-b52f4ab4-5a60-11e7-9aed-615109b61bd8.png)](https://user-images.githubusercontent.com/17261080/27532463-b52f4ab4-5a60-11e7-9aed-615109b61bd8.png)

#### Architecture:


They test with Feed Forward Neural Networks only.

#### Results:

They use only permutation noise with a very high probability compared with what we might encounter in real-life.

[![screen shot 2017-06-26 at 11 29 05 am](https://user-images.githubusercontent.com/17261080/27533105-b051d366-5a62-11e7-95f3-168d0d2d7841.png)](https://user-images.githubusercontent.com/17261080/27533105-b051d366-5a62-11e7-95f3-168d0d2d7841.png)

The improvement for small noise probability (<10%) might not be that interesting.

arxiv.org
arxiv-vanity.com
scholar.google.com

Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin and Victor Lempitsky
arXiv e-Print archive - 2014 via Local arXiv
Keywords: stat.ML, cs.LG, cs.NE
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Build a network easily trainable by back-propagation to perform unsupervised domain adaptation while at the same time learning a good embedding for both source and target domains.

 _Dataset:_ [SVHN](ufldl.stanford.edu/housenumbers/), [MNIST](yann.lecun.com/exdb/mnist/), [USPS](https://www.otexts.org/1577), [CIFAR](https://www.cs.toronto.edu/%7Ekriz/cifar.html) and [STL](https://cs.stanford.edu/%7Eacoates/stl10/).

#### Architecture:

Very similar to RevGrad but with some differences.

Basically a shared encoder and then a classifier and a reconstructor.  
[![screen shot 2017-05-22 at 6 11 22 pm](https://cloud.githubusercontent.com/assets/17261080/26318076/21361592-3f1a-11e7-9213-9cc07cfe2f2a.png)](https://cloud.githubusercontent.com/assets/17261080/26318076/21361592-3f1a-11e7-9213-9cc07cfe2f2a.png)

The two losses are:

*   the usual cross-entropy with softmax for the classifier
*   the pixel-wise squared loss for reconstruction

Which are then combined using a trade-off hyper-parameter between classification and reconstruction.

They also use data augmentation to generate additional training data during the supervised training using only geometrical deformation: translation, rotation, skewing, and scaling

Plus denoising to reconstruct clean inputs given their noisy counterparts (zero-masked noise and Gaussian noise).

#### Results:

Outperforms state of the art on most tasks at the time, now outperformed itself by Generate To Adapt on most tasks.

arxiv.org
scholar.google.com

Conditional Generative Adversarial Nets
Mirza, Mehdi and Osindero, Simon
arXiv e-Print archive - 2014 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ In an unconditional GAN it's not possible to control the mode of the data being generated which is what this paper tries to accomplish using the label data (but it can be generalized to any kind of conditional data).

_Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/) and [MIRFLICKR](http://press.liacs.nl/mirflickr/).


#### Inner workings:

Changes the loss to the conditional loss:  
[![screen shot 2017-04-24 at 10 07 25 am](https://cloud.githubusercontent.com/assets/17261080/25327832/e86f53fe-28d5-11e7-8694-6df8f2e1ef18.png)](https://cloud.githubusercontent.com/assets/17261080/25327832/e86f53fe-28d5-11e7-8694-6df8f2e1ef18.png)

For implementation the only thing needed is to feed the label data to both the discriminator and generator:  
[![screen shot 2017-04-24 at 10 07 18 am](https://cloud.githubusercontent.com/assets/17261080/25327826/e53ab4a8-28d5-11e7-8056-1518602d50c9.png)](https://cloud.githubusercontent.com/assets/17261080/25327826/e53ab4a8-28d5-11e7-8056-1518602d50c9.png)

#### Results:

Interesting at the time but not surprising now. There's not much more to the paper than what is in the summary.

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
Oquab, Maxime and Bottou, Léon and Laptev, Ivan and Sivic, Josef
Conference and Computer Vision and Pattern Recognition - 2014 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Transfer feature learned from large-scale dataset to small-scale dataset

 _Dataset:_ [ImageNet](www.image-net.org), [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/).


#### Inner-workings:

Basically they train the network on the large dataset, then replace the last layers, sometimes adding a new one and train this on the new dataset. Pretty standard transfer learning nowadays.  
[![screen shot 2017-06-14 at 3 06 37 pm](https://user-images.githubusercontent.com/17261080/27133634-2d4c0fde-5113-11e7-848a-719514b1a12c.png)](https://user-images.githubusercontent.com/17261080/27133634-2d4c0fde-5113-11e7-848a-719514b1a12c.png)

What's a bit more interesting is how they deal with background being overrepresented by using the bounding box that they have.  
[![screen shot 2017-06-14 at 3 06 43 pm](https://user-images.githubusercontent.com/17261080/27133641-34d4ee7e-5113-11e7-8307-f1ff708bd5c7.png)](https://user-images.githubusercontent.com/17261080/27133641-34d4ee7e-5113-11e7-8307-f1ff708bd5c7.png)

#### Results:

A bit dated, not really applicable but the part on specifically tackling the domain shift (such as background) is interesting.  
Plus they use the bounding-box information to refine the dataset.

dx.doi.org
sci-hub
scholar.google.com

Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction
Masci, Jonathan and Meier, Ueli and Ciresan, Dan C. and Schmidhuber, Jürgen
International Conference on Artificial Neural Networks (ICANN) - 2011 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Introduces the Convolutional Auto-Encoder, a hierarchical unsupervised feature extractor.

_Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/) and [SVHN](ufldl.stanford.edu/housenumbers/).

#### Architecture:

Uses convolutions to generate an encoding of the image and then decodes it and do a pixel-wise comparison.  
Used to initializes CNN.

#### Results:

Old article, not really relevant nowadays. They don't speak about the deconvolution part.

ieeexplore.ieee.org
sci-hub
scholar.google.com

Deconvolutional networks
Zeiler, M. D. and Krishnan, D. and Taylor, G. W. and Fergus, R.
Conference and Computer Vision and Pattern Recognition - 2010 via Local Bibsonomy
Keywords: dnn, das_2018_1

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Define a new deconvolution layer.

#### Results:

Not really interesting except from the fact that it first introduces **deconvolution layers** which are very ill-name as they are not actual deconvolution but instead a **transposed convolution** or also called a **fractionally strided convolutions**.

[![Deconvolutional layer](https://cloud.githubusercontent.com/assets/17261080/25344392/44693b48-2912-11e7-8dda-2b64d99292a9.gif)](https://cloud.githubusercontent.com/assets/17261080/25344392/44693b48-2912-11e7-8dda-2b64d99292a9.gif)

Visualization for other operations can be seen [here](https://github.com/vdumoulin/conv_arithmetic) corresponding to [A guide to convolution arithmetic for deep learning](https://arxiv.org/pdf/1603.07285.pdf).

axon.cs.byu.edu
scholar.google.com

Active Learning Literature Survey
Settles, Burr
- 2009 via Local Bibsonomy
Keywords: machine_learning, active_learning, survey

[link] Summary by Léo Paillier 7 years ago

Very good introduction to active learning.

#### Scenarios

There are three mains scenari:

*   Pool-based: a large amount of unlabeled data is available and we need to chose which one to annotate next.
*   Stream-based: same as above except example come one after the other.
*   Membership query synthesis: we can generate the point to label.

#### Query Strategy Frameworks

2.1. Uncertainty Sampling

Basically how to evaluate the informativeness of unlabeled instances and then select the most informative.

2.1.1. Least Confident

Query the instances about which the algorithm is least certain how to label.  
[![screen shot 2017-06-14 at 5 08 37 pm](https://user-images.githubusercontent.com/17261080/27139765-281f1374-5124-11e7-9418-fb458be0bfc3.png)](https://user-images.githubusercontent.com/17261080/27139765-281f1374-5124-11e7-9418-fb458be0bfc3.png)  
[![screen shot 2017-06-14 at 5 09 36 pm](https://user-images.githubusercontent.com/17261080/27139841-5636458e-5124-11e7-95c4-ea586deb853a.png)](https://user-images.githubusercontent.com/17261080/27139841-5636458e-5124-11e7-95c4-ea586deb853a.png)  
Most used by discard information on all other labels.

2.1.2. Margin Sampling

Use the first two labels and chose the instance for which the different between the two is the smallest.  
[![screen shot 2017-06-14 at 5 12 29 pm](https://user-images.githubusercontent.com/17261080/27139968-aabebe6a-5124-11e7-879b-f518e2279eba.png)](https://user-images.githubusercontent.com/17261080/27139968-aabebe6a-5124-11e7-879b-f518e2279eba.png)

2.1.3. Entropy

Instead of using the two first labels, why not use all of them?  
[![screen shot 2017-06-14 at 5 13 44 pm](https://user-images.githubusercontent.com/17261080/27140049-e33ea25a-5124-11e7-84ea-adab87d29174.png)](https://user-images.githubusercontent.com/17261080/27140049-e33ea25a-5124-11e7-84ea-adab87d29174.png)

#### Query-By-Committee

A committee of different models is trained. They then vote on which instance to label and the one for which they most disagree is chosen.

To measure the level of disagreement, one can either use:

*   Vote entropy:

[![screen shot 2017-06-14 at 5 20 26 pm](https://user-images.githubusercontent.com/17261080/27140436-d12d330a-5125-11e7-8f40-7be3bbc83987.png)](https://user-images.githubusercontent.com/17261080/27140436-d12d330a-5125-11e7-8f40-7be3bbc83987.png)

*   Kullback-Leibler divergence:

[![screen shot 2017-06-14 at 5 21 32 pm](https://user-images.githubusercontent.com/17261080/27140492-f45be722-5125-11e7-9b42-204aaf4bdd92.png)](https://user-images.githubusercontent.com/17261080/27140492-f45be722-5125-11e7-9b42-204aaf4bdd92.png)

[![screen shot 2017-06-14 at 5 22 29 pm](https://user-images.githubusercontent.com/17261080/27140537-12289cd2-5126-11e7-8e1d-62158576cd95.png)](https://user-images.githubusercontent.com/17261080/27140537-12289cd2-5126-11e7-8e1d-62158576cd95.png)

#### Expected Model Change

Selects the instance that would impart the greatest change to the current model if we knew its label.

*   Expected Gradient Length: compute the gradient for all instances and find the one with the largest magnitude on average for all labels.

[![screen shot 2017-06-14 at 5 25 20 pm](https://user-images.githubusercontent.com/17261080/27140694-79cc6e4a-5126-11e7-9314-e837a1e0eba2.png)](https://user-images.githubusercontent.com/17261080/27140694-79cc6e4a-5126-11e7-9314-e837a1e0eba2.png)

#### Expected Error Reduction

Measure not how much the model is likely to change, but how much its generalization error is likely to be reduced. Either by measuring:

*   Expected 0/1 loss: to reduce the expected total number of incorrect predictions. A new model needs to be trained for every label and instance, very greedy.

[![screen shot 2017-06-14 at 5 28 42 pm](https://user-images.githubusercontent.com/17261080/27140912-08d7410a-5127-11e7-9d53-33f2044692a2.png)](https://user-images.githubusercontent.com/17261080/27140912-08d7410a-5127-11e7-9d53-33f2044692a2.png)

*   Expected Log-Loss: maximizing the expected information gain of the query. Still very greedy in computation! Not really usable except if the model can be analytically resolved instead of re-trained.

[![screen shot 2017-06-14 at 5 30 42 pm](https://user-images.githubusercontent.com/17261080/27140970-3e117516-5127-11e7-9936-671fea5d94dd.png)](https://user-images.githubusercontent.com/17261080/27140970-3e117516-5127-11e7-9936-671fea5d94dd.png)

#### Variance Reduction

Reduce generalization error indirectly by minimizing the output variance.  
[![screen shot 2017-06-14 at 5 38 17 pm](https://user-images.githubusercontent.com/17261080/27141417-6507b71a-5128-11e7-81ca-ab227836098f.png)](https://user-images.githubusercontent.com/17261080/27141417-6507b71a-5128-11e7-81ca-ab227836098f.png)

#### Density-Weighted Methods

[![screen shot 2017-06-14 at 5 40 53 pm](https://user-images.githubusercontent.com/17261080/27141501-a920bd34-5128-11e7-8e9d-0870da365633.png)](https://user-images.githubusercontent.com/17261080/27141501-a920bd34-5128-11e7-8e9d-0870da365633.png)

With the left function is the informativeness of x and the right function represents average similarity to all other instances in the input distribution

Léo Paillier

sciscore: 1.595