TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
Jiyang Gao
and
Zhenheng Yang
and
Chen Sun
and
Kan Chen
and
Ram Nevatia
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CV
First published: 2017/03/17 (7 years ago) Abstract: Temporal Action Proposal (TAP) generation is an important problem, as fast
and accurate extraction of semantically important (e.g. human actions) segments
from untrimmed videos is an important step for large-scale video analysis. We
propose a novel Temporal Unit Regression Network (TURN) model. There are two
salient aspects of TURN: (1) TURN jointly predicts action proposals and refines
the temporal boundaries by temporal coordinate regression; (2) Fast computation
is enabled by unit feature reuse: a long untrimmed video is decomposed into
video units, which are reused as basic building blocks of temporal proposals.
TURN outperforms the state-of-the-art methods under average recall (AR) by a
large margin on THUMOS-14 and ActivityNet datasets, and runs at over 880 frames
per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal
generation stage for existing temporal action localization pipelines, it
outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.