Hi there! I’m a Staff Research Scientist at SenseTime Research and a core contributor to SenseNova-U1, working with Dr. Lei Yang. I believe visual modalities are the next intelligence substrate, and my research currently focuses on visual reasoning, for which we discovered Chain-of-Steps and built VBVR. I also lead the development of SenseNova-SI and EASI, advancing both scalable training and holistic evaluation for spatial intelligence. I received my Ph.D. from MMLab@NTU, advised by Prof. Ziwei Liu and Prof. Chen Change Loy, where I spent wonderful years exploring virtual humans.
[2026-06]
Demystifying Video Reasoning
has been accepted to
ECCV 2026.
[2026-06]
EgoLife
won the Distinguished Paper Award at the
EgoVis Workshop,
CVPR 2026.
[2026-05] Release of
SenseNova-U1,
a production-level unified multimodal model built upon NEO-Unify.
[2026-05] Release of
NEO-Unify,
an encoder-free unified multimodal model.
[2026-05]
A Very Big Video Reasoning Suite (VBVR)
has been accepted to
ICML 2026.
[2026-04]
PointHPS
has been accepted to
IJCV 2026.
[2026-03] Release of
Demystifying Video Reasoning, where we discover Chain-of-Steps!
[2026-02] Release of
A Very Big Video Reasoning Suite (VBVR).
[2026-02]
SenseNova-SI,
ConsistCompose, and
VLM-Guided HMR
have been accepted to
CVPR 2026.
[2026-01]
ViMoGen
has been accepted to
ICLR 2026.
[2025-12] Invited talk on
SenseNova-SI
(slides and
recording) at
Plutons.
[2025-12] Invited talk on Embodied Intelligence
(slides) at
TriFusion Workshop.
[2025-11] Release of
SenseNova-SI: Scaling Spatial Intelligence with Multimodal Foundation Models.
[2025-10] Release of the source code of
DLP3D.
Try it now at
dlp3d.ai !
[2025-10]
Digital Life Project 2 (DLP3D)
has been accepted to
SIGGRAPH Asia 2025 (Real-Time Live!).
[2025-10]
SMPLest-X
has been accepted to
TPAMI 2025.
[2025-09]
PoseFuse3D-KI
has been accepted to
NeurIPS 2025.
[2025-08]
Release of
EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence.
A Very Big Video Reasoning Suite
Maijunxian Wang*, Ruisi Wang*, Juyi Lin*, Ran Ji*, Thaddäus Wiedemer, Qingying Gao,
Dezhi Luo, Yaoyao Qian, Lianyu Huang, Zelong Hong, Jiahui Ge, Qianli Ma, Hang He,
Yifan Zhou, Lingzi Guo, Lantao Mei, Jiachen Li, Hanwen Xing, Tianqi Zhao, Fengyuan Yu,
Weihang Xiao, Yizheng Jiao, Jianheng Hou, Danyang Zhang, Pengcheng Xu, Boyang Zhong,
Zehong Zhao, Gaoyun Fang, John Kitaoka, Yile Xu, Hua Xu, Kenton Blacutt, Tin Nguyen,
Siyuan Song, Haoran Sun, Shaoyue Wen, Linyang He, Runming Wang, Yanzhi Wang,
Mengyue Yang, Ziqiao Ma, Raphaël Millière, Freda Shi, Nuno Vasconcelos,
Daniel Khashabi, Alan Yuille, Yilun Du, Ziming Liu, Bo Li, Dahua Lin, Ziwei Liu,
Vikash Kumar, Yijiang Li, Lei Yang, Zhongang Cai✉, Hokin Deng✉.
International Conference on Machine Learning (ICML), 2026
(Hugging Face #1 Paper of the Month, February 2026)
Homepage
PDF
Data
Model
EvalKit
Leaderboard
Demystifying Video Reasoning
Ruisi Wang, Zhongang Cai✉, Fanyi Pu, Junxiang Xu, Wanqi Yin, Maijunxian Wang, Ran Ji, Chenyang Gu, Bo Li, Ziqi Huang, Hokin Deng, Dahua Lin, Ziwei Liu, Lei Yang.
European Conference on Computer Vision (ECCV), 2026
(Hugging Face #1 Paper of the Day, 18 March 2026)
Homepage
PDF
Video
Code
Scaling Spatial Intelligence with Multimodal Foundation Models
Zhongang Cai*, Ruisi Wang*, Chenyang Gu*, Fanyi Pu*, Junxiang Xu*, Yubo Wang*, Wanqi Yin*, Zhitao Yang*, Chen Wei*, Qingping Sun*,
Tongxi Zhou*, Jiaqi Li*, Hui En Pang*, Oscar Qian*, Yukun Wei, Zhiqian Lin, Xuanke Shi, Kewang Deng, Xiaoyang Han, Zukai Chen,
Xiangyu Fan, Hanming Deng, Lewei Lu, Liang Pan, Bo Li, Ziwei Liu✉, Quan Wang✉, Dahua Lin✉, Lei Yang*✉.
Computer Vision and Pattern Recognition (CVPR), 2026
PDF
Code
HuggingFace
ModelScope