Learning to Reason: End-to-End Module Networks for Visual Question Answering on ShortScience.org

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Learning to Reason: End-to-End Module Networks for Visual Question Answering
Hu, Ronghang and Andreas, Jacob and Rohrbach, Marcus and Darrell, Trevor and Saenko, Kate
International Conference on Computer Vision - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Marek Rei 8 years ago

A modular neural architecture for visual question answering. A seq2seq component predicts the sequence of neural modules (eg find() and compare()) based on the textual question, which are then dynamically combined and trained end-to-end. Achieves good results on three separate benchmarks that focus on reasoning about the image.

https://i.imgur.com/iOkSh8y.png

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private