Learning Factored Representations in a Deep Mixture of Experts
Eigen, David
and
Ranzato, Marc'Aurelio
and
Sutskever, Ilya
arXiv e-Print archive - 2013 via Local Bibsonomy
Keywords:
dblp
This paper extends the mixture-of-experts (MoE) model by stacking several blocks of the MoEs to form a deep MoE. In this model, each mixture weight is implemented with a gating network. The mixtures at each block is different. The whole deep MoE is trained jointly using the stochastic gradient descent algorithm. The motivation of the work is to reduce the decoding time by exploiting the structure imposed in the MoE model. The model was evaluated on the MNIST and speech monophone classification tasks.