Deep Visual-Semantic Alignments for Generating Image Descriptions
Karpathy, Andrej
and
Li, Fei-Fei
arXiv e-Print archive - 2014 via Local Bibsonomy
Keywords:
dblp
https://www.youtube.com/watch?v=e-WB4lfg30M
This technique is a combination of two powerful machine learning algorithms:
- convolutional neural networks are excellent at image classification, i.e., finding out what is seen on an input image,
- recurrent neural networks that are capable of processing a sequence of inputs and outputs, therefore it can create sentences of what is seen on the image.
Combining these two techniques makes it possible for a computer to describe in a sentence what is seen on an input image.