PointHPS: Cascaded 3D Human Pose and
Shape Estimation from Point Clouds

Zhongang Cai^*,1,2, Liang Pan^*,1, Chen Wei², Wanqi Yin², Fangzhou Hong¹,
Mingyuan Zhang¹, Chen Change Loy¹, Lei Yang², Ziwei Liu^1,✉

¹S-Lab, Nanyang Technological University, ²SenseTime Research

^*equal contributions, ^✉corresponding author

arXiv Video Code Data(Coming Soon)

Abstract

Human pose and shape estimation (HPS) has attracted increasing attention in recent years. While most existing studies focus on HPS from 2D images or videos with inherent depth ambiguity, there are surging need to investigate HPS from 3D point clouds as depth sensors have been frequently employed in commercial devices. However, real-world sensory 3D points are usually noisy and incomplete, and also human bodies could have different poses of high diversity. To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture. Specifically, each stage of PointHPS performs a series of downsampling and upsampling operations to extract and collate both local and global cues, which are further enhanced by two novel modules: 1) Cross-stage Feature Fusion (CFF) for multi-scale feature propagation that allows information to flow effectively through the stages, and 2) Intermediate Feature Enhancement (IFE) for body-aware feature aggregation that improves feature quality after each stage. Notably, previous benchmarks for HPS from point clouds consist of synthetic data with over-simplified settings (e.g., SURREAL) or real data with limited diversity (e.g., MHAD). To facilitate a comprehensive study under various scenarios, we conduct our experiments on two large-scale benchmarks, comprising i) a dataset that features diverse subjects and actions captured by real commercial sensors in a laboratory environment, and ii) controlled synthetic data generated with realistic considerations such as clothed humans in crowded outdoor scenes. Extensive experiments demonstrate that PointHPS, with its powerful point feature extraction and processing scheme, outperforms State-of-the-Art methods by significant margins across the board. Ablation studies validate the effectiveness of the cascaded architecture, powered by CFF and IFE. The pretrained models, code, and data will be publicly available to facilitate future investigation in HPS from point clouds.

Qualitative Results

BibTeX


      @article{cai2023pointhps,
        title   =   {PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds},
        author  =   {Cai, Zhongang and Pan, Liang, and Wei, Chen and Yin, Wanqi, and Hong, Fangzhou and Zhang, Mingyuan and Loy, Chen Change, and Yang, Lei, and Liu, Ziwei},
        year    =   {2023},
        journal =   {arXiv preprint arXiv:2308.14492}
      }

Acknowledgement

This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). The project is also supported by NTU NAP and Singapore MOE AcRF Tier 2 (MOET2EP20221-0012).

We referred to the project page of ProPainter when creating this project page.

More Fantastic Works on 3D Virtual Humans 🔥

Motion Generation

⇨ (Coming Soon) FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

⇨ ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

⇨ MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

⇨ Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

3D Human Generation

⇨ EVA3D: Compositional 3D Human Generation from 2D Image Collections

⇨ AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Datasets

⇨ SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling

⇨ HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

⇨ GTA-Human: Playing for 3D Human Recovery

Human Segmentation

⇨ Human3D: 3D Segmentation of Humans in Point Clouds with Synthetic Data