Summary by zhewei 6 years ago
From this frame of a video to next frame, maybe the pixel change a lot, but the semantic content changes slowly. Reflect on the the neutral network, shallow layers change more than deeper layers.
So they use a clock to decide if need to update the deeper layers or just use the previews output result.
The clock triggers by the differences of output of some layer on previous and next frames. The condition of clock execution can be fixed or learned.