Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
Zheng Shou and Dongang Wang and Shih-Fu Chang
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

Summaries/Notes 1

[link] Summary by shiyu 6 years ago

## Segmented SNN 

**Summary**: this paper use 3-stage 3D CNN to identify candidate proposals, recognize actions and localize temporal boundaries.

**Models**: 
this network can be mainly divided into 3 parts: generate proposals, select proposal and refine temporal boundaries, and using NMS to remove redundant proposals.
1. generate multiscale(16,32,64,128,256.512) segment using sliding window with 75% overlap. high computing complexity!
2. network: Each stage of the three-stage network is using 3D convNets concatenating with 3 FC layers.
  * the proposal network is basically a classifier which will judge if each proposal contains action or not.
 * the classification network is used to classify each proposal which the proposal network think is valid into background and K action categories
 * the localization network functioned as a scoring system which raises scores of proposals that have high overlap with corresponding ground truth while decreasing the others.
.

Your comment: