#### Introduction
* The paper demonstrates how simple CNNs, built on top of word embeddings, can be used for sentence classification tasks.
* [Link to the paper](https://arxiv.org/abs/1408.5882)
* [Implementation](https://github.com/shagunsodhani/CNN-Sentence-Classifier)
#### Architecture
* Pad input sentences so that they are of the same length.
* Map words in the padded sentence using word embeddings (which may be either initialized as zero vectors or initialized as word2vec embeddings) to obtain a matrix corresponding to the sentence.
* Apply convolution layer with multiple filter widths and feature maps.
* Apply max-over-time pooling operation over the feature map.
* Concatenate the pooling results from different layers and feed to a fully-connected layer with softmax activation.
* Softmax outputs probabilistic distribution over the labels.
* Use dropout for regularisation.
#### Hyperparameters
* RELU activation for convolution layers
* Filter window of 3, 4, 5 with 100 feature maps each.
* Dropout - 0.5
* Gradient clipping at 3
* Batch size - 50
* Adadelta update rule.
#### Variants
* CNN-rand
* Randomly initialized word vectors.
* CNN-static
* Uses pre-trained vectors from word2vec and does not update the word vectors.
* CNN-non-static
* Same as CNN-static but updates word vectors during training.
* CNN-multichannel
* Uses two set of word vectors (channels).
* One set is updated and other is not updated.
#### Datasets
* Sentiment analysis datasets for Movie Reviews, Customer Reviews etc.
* Classification data for questions.
* Maximum number of classes for any dataset - 6
#### Strengths
* Good results on benchmarks despite being a simple architecture.
* Word vectors obtained by non-static channel have more meaningful representation.
#### Weakness
* Small data with few labels.
* Results are not very detailed or exhaustive.