First published: 2016/06/30 (8 years ago)
Abstract: The problem of arbitrary object tracking has traditionally been tackled by
learning a model of the object's appearance exclusively online, using as sole
training data the video itself. Despite the success of these methods, their
online-only approach inherently limits the richness of the model they can
learn. Recently, several attempts have been made to exploit the expressive
power of deep convolutional networks. However, when the object to track is not
known beforehand, it is necessary to perform Stochastic Gradient Descent online
to adapt the weights of the network, severely compromising the speed of the
system. In this paper we equip a basic tracking algorithm with a novel
fully-convolutional Siamese network trained end-to-end on the ILSVRC15 video
object detection dataset. Our tracker operates at frame-rates beyond real-time
and, despite its extreme simplicity, achieves state-of-the-art performance in
the VOT2015 benchmark.