Playing for 3D Human Recovery
S-Lab, Nanyang Technological University
|
Shanghai Artificial Intelligence Laboratory
|
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
|
Datasets
Updates
[2024-10-02] GTA-Human datasets are now available on Hugging Face!
[2024-09-19] Release of GTA-Human II Dataset.
[2022-07-08] Release of GTA-Human Dataset on MMHuman3D.
Paper
Image- and video-based 3D human recovery ( i.e. , pose and shape estimation) have achieved substantial progress.
However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity.
In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths.
Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine,
featuring a highly diverse set of subjects, actions, and scenarios. More importantly, we study the use of game-playing
data and obtain five major insights. First , game-playing data is surprisingly effective. A simple frame-based
baseline trained on GTA-Human outperforms more sophisticated methods by a large margin. For videobased methods,
GTA-Human is even on par with the in-domain training set. Second , we discover that synthetic data provides
critical complements to the real data that is typically collected indoor. We highlight that our investigation
into domain gap provides explanations for our data mixture strategies that are simple yet useful, which offers
new insights to the research community. Third , the scale of the dataset matters. The performance boost is closely
related to the additional data available. A systematic study on multiple key factors (such as camera angle and
body pose) reveals that the model performance is sensitive to data density. Fourth , the effectiveness of GTA-Human
is also attributed to the rich collection of strong supervision labels (SMPL parameters), which are otherwise
expensive to acquire in real datasets. Fifth , the benefits of synthetic data extend to larger models such as
deeper convolutional neural networks (CNNs) and Transformers, for which a significant impact is also observed.
We hope our work could pave the way for scaling up 3D human recovery to the real world.
|

[PDF]
(Last update: 8 Sep 2024)
|
|
Citation
@ARTICLE{10652891,
author={Cai, Zhongang and Zhang, Mingyuan and Ren, Jiawei and Wei, Chen and Ren, Daxuan and
Lin, Zhengyu and Zhao, Haiyu and Yang, Lei and Loy, Chen Change and Liu, Ziwei},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Playing for 3D Human Recovery},
year={2024},
volume={},
number={},
pages={1-12},
keywords={Three-dimensional displays;Annotations;Synthetic data;Shape;Training;
Parametric statistics;Solid modeling;Human Pose and Shape Estimation;
3D Human Recovery;Parametric Humans;Synthetic Data;Dataset},
doi={10.1109/TPAMI.2024.3450537}
}