CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition
Shreyank N Gowda
and
Laura Sevilla-Lara
and
Frank Keller
and
Marcus Rohrbach
arXiv e-Print archive - 2021 via Local arXiv
Keywords:
cs.CV
First published: 2024/12/12 (just now) Abstract: Zero-shot action recognition is the task of recognizingaction classes without
visual examples, only with a seman-tic embedding which relates unseen to seen
classes. Theproblem can be seen as learning a function which general-izes well
to instances of unseen classes without losing dis-crimination between classes.
Neural networks can modelthe complex boundaries between visual classes, which
ex-plains their success as supervised models. However, inzero-shot learning,
these highly specialized class bound-aries may not transfer well from seen to
unseen classes.In this paper we propose a centroid-based representation,which
clusters visual and semantic representation, consid-ers all training samples at
once, and in this way generaliz-ing well to instances from unseen classes. We
optimize theclustering using Reinforcement Learning which we show iscritical
for our approach to work. We call the proposedmethod CLASTER and observe that
it consistently outper-forms the state-of-the-art in all standard datasets,
includ-ing UCF101, HMDB51 and Olympic Sports; both in thestandard zero-shot
evaluation and the generalized zero-shotlearning. Further, we show that our
model performs com-petitively in the image domain as well, outperforming
thestate-of-the-art in many settings.