[link]
Summary by Oleksandr Bailo 4 years ago
This paper proposes an approach to measure motion similarity between human-human and human-object interaction. The authors claim that human activities are usually defined by the interaction between individual characters, such as a high-five interaction.
As the interaction datasets are not available authors provide multiple small-scale interaction datasets:
https://i.imgur.com/P815TYu.png
where:
- 2C = a Character-Character (2C) database using kick-boxing motions
- CRC = Character-Retargeted Character where the size of characters is adjusted while maintaining the nature of the interaction
https://i.imgur.com/XX1WNpO.png
- HOI = Human-Object Interaction where a Chair is used as an object
https://i.imgur.com/Z6cxd7R.png
- 2PB = 2 People Boxing
https://i.imgur.com/yUxmpY5.png
- 2PD = 2 People Daily Interaction where people represented as a surface point cloud
https://i.imgur.com/EzyELg3.png
**Methodology**
- *Customized Interaction Mesh Structure*. An interaction mesh is created by generating a volumetric mesh using Delaunay Tetrahedralization. Interaction is therefore represented by a series of interaction meshes. To reduce the bias of unequal number of points per human body part, synthetic points (shown in blue) are derived from available skeleton structure (shown in red):
https://i.imgur.com/jtlrH49.png
The edges after Delaunay Tetrahedralization are filtered in a way that all edges connecting to the same character are removed, as they do not contribute to the interaction.
https://i.imgur.com/nWeUNl1.png
The temporal sequence of interaction is a series of interaction meshes.
- *Distance between interaction meshes*.
- Distance between two interaction meshes of two-character interactions:
$d(e_i, e_j) = (|e_{i1} - e_{j1}| + |e_{i2} - e_{j2}|) \times \frac{1}{2} (1 - cos\theta),$
https://i.imgur.com/jMHGx3o.png
where $e_{i1}$ and $e_{i2}$ are two endpoints of an edge.
- *Earth Mover's Distance*. Earth Mover’s Distance (EMD) is used to find the best correspondence between the input interaction meshes to achieve the comparison of two interactions with different semantic meaning:
$EMD(E_I^{t_I}, E_J^{tJ}) = \frac{D(E_I^{t_I}, E_J^{tJ})}{\sum_{i=1}^{m} \sum_{j=1}^{n} f_{i,j}^*},$
where $D(E_I^{t_I}, E_J^{tJ}) = \sum_{i=1}^{m} \sum_{j=1}^{n} d(e_i, e_j) f_{i,j}^*$ represents the minimum distance between two interaction meshes and $f_{i,j}^*$ is the optimal set of flow values returned by the mass transport solver that finds the optimal edge-level correspondence between two interaction meshes. The concept of the mass transport solver is visualized below:
https://i.imgur.com/wmRhcxP.png
- *Distance between interactions sequences*.
- spatial normalization - removing its pelvis translation and its horizontal facing angle in each frame
- temporal sampling - non-linear sampling strategy based on the frame distance measured by EMD. The sampling algorithm samples fewer in temporal regions with high similarity, which contribute less to the context of the interaction.
- temporal alignment - keyframes are aligned using Dynamic Time Warping (DTW)
Possible functionality with the proposed method:
- Interaction motion similarity analysis
- Interaction motion retrieval
Notice:
>The pre-process took 1.5 hours, 0.5 hour and 4 hours for the 2C, CRC and HOI databases respectively. Given the meshes, computing the distance between two interactions took 0.2 seconds on average.
Discussion:
- the algorithm focuses on boxing/kickboxing as they have clear rules. Extending the proposed algorithm for measuring motion similarity for daily activities would require careful annotation.
- While the method, in theory, can be applied to single human activities, it quite clear that it would perform worse than other baselines for the task. This implies that it is better not applied for use cases where a comparison between two independent movements having no interaction with each other (dancing, yoga) is required.
- The method works best for close interaction activities as the edges in the interaction mesh would tend to have a similar structure (e.g. edge length) in case of distant interacting objects.
- Application-wise, the algorithm is not suitable for online real-time use.
more
less