Learning to Reason: End-to-End Module Networks for Visual Question Answering
Hu, Ronghang
and
Andreas, Jacob
and
Rohrbach, Marcus
and
Darrell, Trevor
and
Saenko, Kate
International Conference on Computer Vision - 2017 via Local Bibsonomy
Keywords:
dblp
A modular neural architecture for visual question answering. A seq2seq component predicts the sequence of neural modules (eg find() and compare()) based on the textual question, which are then dynamically combined and trained end-to-end. Achieves good results on three separate benchmarks that focus on reasoning about the image.
https://i.imgur.com/iOkSh8y.png