kundan2510's profile - ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Fully-Convolutional Siamese Networks for Object Tracking
Luca Bertinetto and Jack Valmadre and João F. Henriques and Andrea Vedaldi and Philip H. S. Torr
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

[link] Summary by kundan2510 8 years ago

Summary:
This paper suggests an approach to find correlation score between different sub-window of a search image with a query image. Using a fully convolutional siamese network architecture that they describe helps in getting this correlation for different sub windows for search images in one forward pass of the network. For every video, they compute the features for the object being tracked once and use it for entire duration of video for computing correlation.

My take:
This is in the same spirit as GOTURN tracker. Although having fully convolutional helps in having translation invariance, it is not directly an advantage over predicting bounding boxes directly as adopted in GOTURN paper. Also, results are not directly comparable as this has been trained on a different data-set.

arxiv.org
arxiv-vanity.com
scholar.google.com

Learning to Track at 100 FPS with Deep Regression Networks
David Held and Sebastian Thrun and Silvio Savarese
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV, cs.AI, cs.LG, cs.RO
more

[link] Summary by kundan2510 8 years ago

This paper introduces a bunch of tricks which make sense for visual tracking. These tricks are as followed:
1. At test time, a crop with center at the previous frame's bounding box's center with size larger than the bounding box is given along with the search area in the current frame.
2. Training offline on a large set of videos (where object bounding boxes are given for a subset of frames) and images with object bounding boxes.
3. Network takes two images: i) a crop of the image/frame around the bounding box and  ii) the image centered at the center of the bounding box. Given the later, network regresses the bounding box in i).
4. Above crops are sampled such that the ground truth bounding box center in i) is not very far from the center in ii), hence network prefers smooth motion.

My take: This is very nice way to use still images to train image correlation task and hence can be used for tracking. Speed on gpu is very impressive but still not comparable on CPUs.

kundan2510

sciscore: 3