This paper introduces two model extensions to improve character level recurrent neural network language models. The authors evaluate their approaches on a multilingual language modeling benchmark along with the standard Penn Tree Bank Corpus. Evaluation uses only entropy rather than including the language model in a downstream task but that's okay for a paper of this scope. The paper is clearly written and definitely a sufficient contribution for the workshop track it would be really nice to see how well these methods can improve and more sophisticated recurrent architecture like gru or lstm units. On the PTB Corpus it would be nice to include a state-of-the-art or standard n-gram model to use as a reference point for the reported results.
The conditioning on words model is an interesting approach. It's unfortunate that such a small word level vocabulary is used with this model. It seems like the small vocabulary restriction is due to the fact that the word level model is jointly trained along with the character models. An alternative approach might be to use as input features the hidden representations from a word level recurrent model already trained when building the Character level language model. I don't have a good sense for how much joint training of both models matters.