Hi there! I’m a Staff Research Scientist at SenseTime Research, working with Dr. Lei Yang. My research focuses on spatial intelligence, for which I lead the development of SenseNova-SI and EASI, advancing both scalable training and holistic evaluation for spatially capable models. As a side project, I also lead DLP3D, an open-source framework for real-time autonomous 3D characters powered by large language models. I received my Ph.D. from MMLab@NTU, advised by Prof. Ziwei Liu and Prof. Chen Change Loy, where I spent wonderful years exploring virtual humans.
[2026-02] Release of
A Very Big Video Reasoning Suite (VBVR).
[2026-02]
SenseNova-SI,
ConsistCompose, and
VLM-Guided HMR
have been accepted to
CVPR 2026.
[2026-01]
ViMoGen
has been accepted to
ICLR 2026.
[2025-12] Invited talk on
SenseNova-SI
(slides and
recording) at
Plutons.
[2025-12] Invited talk on Embodied Intelligence
(slides) at
TriFusion Workshop.
[2025-11] Release of
SenseNova-SI: Scaling Spatial Intelligence with Multimodal Foundation Models.
[2025-10] Release of the source code of
DLP3D.
Try it now at
dlp3d.ai !
[2025-10]
Digital Life Project 2 (DLP3D)
has been accepted to
SIGGRAPH Asia 2025 (Real-Time Live!).
[2025-10]
SMPLest-X
has been accepted to
TPAMI 2025.
[2025-09]
PoseFuse3D-KI
has been accepted to
NeurIPS 2025.
[2025-08]
Release of
EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence.
[2025-06]
DPoser-X (Oral)
has been accepted to
ICCV 2025.
[2025-05]
ADHMR
has been accepted to
ICML 2025.
[2025-02]
SOLAMI,
Disco4D, and
EgoLife
have been accepted to
CVPR 2025.
[2025-01]
MeshAnything
has been accepted to
ICLR 2025.
[2024-12] I have started a new role as a Staff Research Scientist at SenseTime Research.
[2024-09] Release of HuMMan v1.0: Motion Generation Subset and
GTA-Human II Dataset.
[2024-08] Release of HuMMan v1.0: 3D Vision Subset.
[2024-08]
GTA-Human has been accepted to
TPAMI 2024 after two years of review!
[2024-07]
WHAC and
Large Motion Model
have been accepted to
ECCV 2024.
[2024-06]
I have defended my Ph.D. thesis
Scaling Up Parametric Human Recovery! 🎓
[2024-04]
Invited talk (slides) at
China3DV 2024.
[2024-02]
Digital Life Project,
AiOS, and
GaussianEditor
have been accepted to CVPR 2024.
[2024-02] Invited talk on
SMPLer-X
(recording) at
International Digital Economy Academy (IDEA).
A Very Big Video Reasoning Suite
Maijunxian Wang*, Ruisi Wang*, Juyi Lin*, Ran Ji*, Thaddäus Wiedemer, Qingying Gao,
Dezhi Luo, Yaoyao Qian, Lianyu Huang, Zelong Hong, Jiahui Ge, Qianli Ma, Hang He,
Yifan Zhou, Lingzi Guo, Lantao Mei, Jiachen Li, Hanwen Xing, Tianqi Zhao, Fengyuan Yu,
Weihang Xiao, Yizheng Jiao, Jianheng Hou, Danyang Zhang, Pengcheng Xu, Boyang Zhong,
Zehong Zhao, Gaoyun Fang, John Kitaoka, Yile Xu, Hua Xu, Kenton Blacutt, Tin Nguyen,
Siyuan Song, Haoran Sun, Shaoyue Wen, Linyang He, Runming Wang, Yanzhi Wang,
Mengyue Yang, Ziqiao Ma, Raphaël Millière, Freda Shi, Nuno Vasconcelos,
Daniel Khashabi, Alan Yuille, Yilun Du, Ziming Liu, Bo Li, Dahua Lin, Ziwei Liu,
Vikash Kumar, Yijiang Li, Lei Yang, Zhongang Cai✉, Hokin Deng✉.
Technical Report, 2026
Homepage
PDF
Data
Model
EvalKit
Leaderboard
Scaling Spatial Intelligence with Multimodal Foundation Models
Zhongang Cai*, Ruisi Wang*, Chenyang Gu*, Fanyi Pu*, Junxiang Xu*, Yubo Wang*, Wanqi Yin*, Zhitao Yang*, Chen Wei*, Qingping Sun*,
Tongxi Zhou*, Jiaqi Li*, Hui En Pang*, Oscar Qian*, Yukun Wei, Zhiqian Lin, Xuanke Shi, Kewang Deng, Xiaoyang Han, Zukai Chen,
Xiangyu Fan, Hanming Deng, Lewei Lu, Liang Pan, Bo Li, Ziwei Liu✉, Quan Wang✉, Dahua Lin✉, Lei Yang*✉.
Computer Vision and Pattern Recognition (CVPR), 2026
PDF
Code
HuggingFace
ModelScope
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
Previously: Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Zhongang Cai*, Yubo Wang*, Qingping Sun*, Ruisi Wang*, Chenyang Gu*, Wanqi Yin*, Zhiqian Lin*, Zhitao Yang*, Chen Wei*, Oscar Qian*, Hui En Pang*,
Xuanke Shi, Kewang Deng, Xiaoyang Han, Zukai Chen, Jiaqi Li, Xiangyu Fan, Hanming Deng, Lewei Lu, Bo Li, Ziwei Liu, Quan Wang✉, Dahua Lin✉, Lei Yang*✉.
Technical Report, 2025
PDF
Code
Leaderboard