[link]
This paper proposes an approach to measure motion similarity between humanhuman and humanobject interaction. The authors claim that human activities are usually defined by the interaction between individual characters, such as a highfive interaction. As the interaction datasets are not available authors provide multiple smallscale interaction datasets: https://i.imgur.com/P815TYu.png where:  2C = a CharacterCharacter (2C) database using kickboxing motions  CRC = CharacterRetargeted Character where the size of characters is adjusted while maintaining the nature of the interaction https://i.imgur.com/XX1WNpO.png  HOI = HumanObject Interaction where a Chair is used as an object https://i.imgur.com/Z6cxd7R.png  2PB = 2 People Boxing https://i.imgur.com/yUxmpY5.png  2PD = 2 People Daily Interaction where people represented as a surface point cloud https://i.imgur.com/EzyELg3.png **Methodology**  *Customized Interaction Mesh Structure*. An interaction mesh is created by generating a volumetric mesh using Delaunay Tetrahedralization. Interaction is therefore represented by a series of interaction meshes. To reduce the bias of unequal number of points per human body part, synthetic points (shown in blue) are derived from available skeleton structure (shown in red): https://i.imgur.com/jtlrH49.png The edges after Delaunay Tetrahedralization are filtered in a way that all edges connecting to the same character are removed, as they do not contribute to the interaction. https://i.imgur.com/nWeUNl1.png The temporal sequence of interaction is a series of interaction meshes.  *Distance between interaction meshes*.  Distance between two interaction meshes of twocharacter interactions: $d(e_i, e_j) = (e_{i1}  e_{j1} + e_{i2}  e_{j2}) \times \frac{1}{2} (1  cos\theta),$ https://i.imgur.com/jMHGx3o.png where $e_{i1}$ and $e_{i2}$ are two endpoints of an edge.  *Earth Mover's Distance*. Earth Moverâ€™s Distance (EMD) is used to find the best correspondence between the input interaction meshes to achieve the comparison of two interactions with different semantic meaning: $EMD(E_I^{t_I}, E_J^{tJ}) = \frac{D(E_I^{t_I}, E_J^{tJ})}{\sum_{i=1}^{m} \sum_{j=1}^{n} f_{i,j}^*},$ where $D(E_I^{t_I}, E_J^{tJ}) = \sum_{i=1}^{m} \sum_{j=1}^{n} d(e_i, e_j) f_{i,j}^*$ represents the minimum distance between two interaction meshes and $f_{i,j}^*$ is the optimal set of flow values returned by the mass transport solver that finds the optimal edgelevel correspondence between two interaction meshes. The concept of the mass transport solver is visualized below: https://i.imgur.com/wmRhcxP.png  *Distance between interactions sequences*.  spatial normalization  removing its pelvis translation and its horizontal facing angle in each frame  temporal sampling  nonlinear sampling strategy based on the frame distance measured by EMD. The sampling algorithm samples fewer in temporal regions with high similarity, which contribute less to the context of the interaction.  temporal alignment  keyframes are aligned using Dynamic Time Warping (DTW) Possible functionality with the proposed method:  Interaction motion similarity analysis  Interaction motion retrieval Notice: >The preprocess took 1.5 hours, 0.5 hour and 4 hours for the 2C, CRC and HOI databases respectively. Given the meshes, computing the distance between two interactions took 0.2 seconds on average. Discussion:  the algorithm focuses on boxing/kickboxing as they have clear rules. Extending the proposed algorithm for measuring motion similarity for daily activities would require careful annotation.  While the method, in theory, can be applied to single human activities, it quite clear that it would perform worse than other baselines for the task. This implies that it is better not applied for use cases where a comparison between two independent movements having no interaction with each other (dancing, yoga) is required.  The method works best for close interaction activities as the edges in the interaction mesh would tend to have a similar structure (e.g. edge length) in case of distant interacting objects.  Applicationwise, the algorithm is not suitable for online realtime use.
Your comment:
