Learning Human-centric Motion Representation for Action Analysis

Michigan State University
CVPR 2025 Submission

Abstract

We introduce H-MoRe, an innovative pipeline designed to learn precise, human-centered motion representations.

Our approach dynamically retains essential human motion features while filtering out background noise. Unlike traditional methods that rely on fully supervised learning with synthetic data, H-MoRe employs a self-supervised learning paradigm directly from real-world scenarios, incorporating both human pose and body shape information.

Drawing inspiration from kinematics, H-MoRe encodes absolute and relative movements of body points into a matrix representation, termed world-local flows, to capture subtle motion details. This method provides a detailed understanding of human motion, making it highly adaptable to various action-based applications.

Models and code will be made publicly available upon publication.

Experiment Results

Qualitative Comparison

Flow visualizations generated by our H-MoRe and seven SoTA optical flow estimation algorithms.
Primary differences are marked and zoomed with red boxes and arrows.

Frame-by-Frame

Use this slider here to view the H-MoRe inference results for each frame.

Loading...

Quantitative Comparison

Using H-MoRe as motion representation, we can boost the performance of gait recognition.
Higher values indicate better performance across all metrics.