Bag of Tricks for Efficient Text Classification on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Bag of Tricks for Efficient Text Classification
Armand Joulin and Edouard Grave and Piotr Bojanowski and Tomas Mikolov
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL
more

Summaries/Notes 1

[link] Summary by Shagun Sodhani 8 years ago

#### Introduction

* Introduces fastText, a simple and highly efficient approach for text classification.
* At par with deep learning models in terms of accuracy though an order of magnitude faster in performance. 
* [Link to the paper](http://arxiv.org/abs/1607.01759v3)
* [Link to code](https://github.com/facebookresearch/fastText)

#### Architecture

* Built on top of linear models with a rank constraint and a fast loss approximation.
* Start with word representations that are averaged into text representation and feed them to a linear classifier.
* Think of text representation as a hidden state that can be shared among features and classes.
* Softmax layer to obtain a probability distribution over pre-defined classes.
* High computational complexity $O(kh)$, $k$ is the number of classes and $h$ is dimension of text representation.

##### Hierarchial Softmax

* Based on Huffman Coding Tree
* Used to reduce complexity to $O(hlog(k))$
* Top T results (from the tree) can be computed efficiently $O(logT)$ using a binary heap.

##### N-gram Features

* Instead of explicitly using word order, uses a bag of n-grams to maintain efficiency without losing on accuracy.
* Uses [hashing trick](https://arxiv.org/pdf/0902.2206.pdf) to maintain fast and memory efficient mapping of the n-grams.

#### Experiments

##### Sentiment Analysis

* fastText benefits by using bigrams.
* Outperforms [char-CNN](http://arxiv.org/abs/1502.01710v5) and [char-CRNN](http://arxiv.org/abs/1602.00367v1) and performs a bit worse than [VDCNN](http://arxiv.org/abs/1606.01781v1).
* Order of magnitudes faster in terms of training time.
* Note: fastText does not use pre-trained word embeddings.

##### Tag Prediction

* fastText with bigrams outperforms [Tagspace](http://emnlp2014.org/papers/pdf/EMNLP2014194.pdf).
* fastText performs upto 600 times faster at test time.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private