Temporal Action Detection with Structured Segment Networks
Yue Zhao
and
Yuanjun Xiong
and
Limin Wang
and
Zhirong Wu
and
Xiaoou Tang
and
Dahua Lin
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.CV
First published: 2017/04/20 (7 years ago) Abstract: Detecting actions in untrimmed videos is an important yet challenging task.
In this paper, we present the structured segment network (SSN), a novel
framework which models the temporal structure of each action instance via a
structured temporal pyramid. On top of the pyramid, we further introduce a
decomposed discriminative model comprising two classifiers, respectively for
classifying actions and determining completeness. This allows the framework to
effectively distinguish positive proposals from background or incomplete ones,
thus leading to both accurate recognition and localization. These components
are integrated into a unified network that can be efficiently trained in an
end-to-end fashion. Additionally, a simple yet effective temporal action
proposal scheme, dubbed temporal actionness grouping (TAG) is devised to
generate high quality action proposals. On two challenging benchmarks, THUMOS14
and ActivityNet, our method remarkably outperforms previous state-of-the-art
methods, demonstrating superior accuracy and strong adaptivity in handling
actions with various temporal structures.