赋能VR/AR的三维人体重建方法综述

doi:10.16182/j.issn1004731x.joss.25-1056

摘要/Abstract

摘要：

三维人体重建技术是VR/AR落地的核心支撑。早期研究依赖多视角相机与深度传感器，精度高但成本高、难以适配动态场景。中期以参数化人体模型为代表，将重建转为低维姿态与形状参数估计，实现单图高效重建。隐式神经表示提升了细节保真度与环境适应性，但相关方法渲染效率偏低。当前三维高斯溅射技术通过优化离散高斯元参数，兼顾建模精度与实时渲染效率，为动态人体重建提供新范式。目前该技术仍面临细节失真、泛化性不足、效率与终端算力不匹配等挑战，未来将进一步适配VR/AR场景，提升实用价值并推动二者深度融合。

关键词: 三维人体重建技术, VR/AR, 参数化人体模型, 隐式神经表示, 三维高斯溅射技术

Abstract:

3D human reconstruction is critical for VR/AR. Early methods relied on multi-view cameras and depth sensors but were costly. Mid-term approaches using parametric human models enabled efficient single-image reconstruction, while implicit neural representations improved fidelity yet suffered from low efficiency. Currently, 3D Gaussian Splatting achieves high accuracy and real-time rendering as a new paradigm. Challenges include detail distortion and limited generalization, and future development will focus on VR/AR integration.

Key words: 3D human reconstruction, VR/AR, parametric human model, implicit neural representation, 3D Gaussian splatting (3DGS) technology

中图分类号:

TP319.9

张莉莎,霍宇驰,叶琦等 . 赋能VR/AR的三维人体重建方法综述[J]. 系统仿真学报, 2026, 38(3): 545-562.

Zhang Lisha,Huo Yuchi,Ye Qi,et al . Review of 3D Human Reconstruction Methods Empowering VR/AR[J]. Journal of System Simulation, 2026, 38(3): 545-562.

图/表 6

图1

图2

表1

三维人体重建核心技术体系分类及典型方法对比

类别	典型方法	具体案例	核心技术特点	性能指标 (基于H36M/3DPW数据集)
类别	典型方法	具体案例	核心技术特点	PA-MPJPE	MPJPE
基于优化的建模	轮廓对齐方法	基于SCAPE模型的轮廓对齐	通过三维模型投影与图像剪影的几何一致性优化
	人体参数优化	SMPLify^[25]	迭代优化SMPL模型的姿态与形状参数	82.3 (H36M)
	多视觉线索融合	HoloPose^[27]	融合DensePose稠密姿态估计、关键点与分割掩码	50.56 (H36M)	64.28 (H36M)
	深度学习辅助优化	ExemplarFine-Tuning (EFT)^[29]	利用预训练回归器生成隐式先验，在参数邻域内开展无正则项的少量样本微调	44.0 (H36M) 51.6 (3DPW)
	逆运动学方法	HybrIK	将关节旋转解耦为摆动与扭转分量	33.6 (H36M) 45.0 (3DPW)	55.4 (H36M) 74.1 (3DPW)
基于学习的建模	参数化学习	Neural Body Fitting	端到端回归参数化模型参数(如SMPL的姿态θ与形状β参数)	59.9 (H36M)
	非参数化学习	I2L-MeshNet	基于线像素一维热图的高效顶点定位	41.7 (H36M) 58.6 (3DPW)	55.7 (H36M) 93.2 (3DPW)
	回归概率分布	ProHMR	采用条件归一化流，对姿态参数的条件概率分布进行建模	41.2 (H36M) 59.8 (3DPW)
	回归中间表示	DecoMR^[20]	通过IUV图像建立图像与3D网格的密集对应，迁移局部特征并回归位置图	39.3 (H36M) 68.5 (3DPW)	60.6 (H36M)
混合与新兴方法	隐式表示方法	PIFu^[7]	像素对齐隐式函数，从单张图像推断三维表面及纹理
	显隐混合表示	D³-Human	SMPL显式模板+隐式hmSDF分割服装与身体
	3DGS	SplattingAvatar	优化离散高斯元的几何与外观参数，实时渲染
	全身重建	ExPose	预测SMPL-X参数，高分辨率区域裁剪细化，融合专项数据集知识	60.7 (3DPW)	93.4 (3DPW)
网络结构设计	单阶段框架	HMR	基于ResNet全局特征提取 + 迭代误差反馈(IEF)机制优化	56.8 (H36M) 81.3 (3DPW)	88.0 (H36M) 130.0 (3DPW)
	多阶段框架	Zanfir et al.	多阶段框架结合语义信息中间表示，以归一化流实现弱监督三维人体姿态与形状重建	57.1 (3DPW)	90.0 (3DPW)
	多分支框架	DaNet	全局或局部流任务拆分，IUV全局参数预测 + 关键点RoI池化局部优化	42.9 (H36M) 54.8 (3DPW)	54.6 (H36M) 85.5 (3DPW)

表1

图3

表2

表3

参考文献 129

[1]	Yang Shuo, Gu Xiaoling, Kuang Zhenzhong, et al. Innovative AI Techniques for Photorealistic 3D Clothed Human Reconstruction from Monocular Images or Videos: A Survey[J]. The Visual Computer, 2025, 41(6): 3973-4000.
[2]	Sun Mingyang, Yang Dingkang, Kou Dongliang, et al. Human 3D Avatar Modeling with Implicit Neural Representation: A Brief Survey[EB/OL]. (2023-06-06) [2025-08-12]. .
[3]	Correia Helena A, José Henrique Brito. 3D Reconstruction of Human Bodies from Single-view and Multi-view Images: A Systematic Review[J]. Computer Methods and Programs in Biomedicine, 2023, 239: 107620.
[4]	Kolotouros N, Alldieck T, Corona E, et al. Instant 3D Human Avatar Generation Using Image Diffusion Models[EB/OL]. (2024-07-12) [2025-08-14]. .
[5]	Loper Matthew, Mahmood Naureen, Romero Javier, et al. SMPL: A Skinned Multi-person Linear Model[J]. ACM Transactions on Graphics, 2015, 34(6): 248.
[6]	Pavlakos Georgios, Choutas Vasileios, Ghorbani Nima, et al. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 10967-10977.
[7]	Saito S, Huang Zeng, Natsume Ryota, et al. PIFu: Pixel-aligned Implicit Function for High-resolution Clothed Human Digitization[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 2304-2314.
[8]	Mildenhall B, Srinivasan P P, Tancik M, et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[9]	Newcombe R A, Izadi S, Hilliges O, et al. KinectFusion: Real-time Dense Surface Mapping and Tracking[C]//2011 10th IEEE International Symposium on Mixed and Augmented Reality. Piscataway: IEEE, 2011: 127-136.
[10]	Yu Tao, Zhao Jianhui, Zheng Zerong, et al. DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2523-2539.
[11]	Kerbl Bernhard, Kopanas Georgios, Leimkuehler Thomas, et al. 3D Gaussian Splatting for Real-time Radiance Field Rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139.
[12]	Pan Panwang, Su Zhuo, Lin Chenguo, et al. HumanSplat: Generalizable Single-image Human Gaussian Splatting with Structure Priors[C]//Proceedings of the 38th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2024: 74383-74410.
[13]	Hu Shoukang, Liu Ziwei. GauHuman: Articulated Gaussian Splatting from Monocular Human Videos[EB/OL]. (2023-12-05) [2025-08-06]. .
[14]	Romero J, Tzionas D, Black M J. Embodied Hands: Modeling and Capturing Hands and Bodies Together[EB/OL]. (2022-01-07) [2025-08-12]. .
[15]	Li Tianye, Bolkart T, Black M J, et al. Learning a Model of Facial Shape and Expression from 4D Scans[J]. ACM Transactions on Graphics, 2017, 36(6): 194.
[16]	Jiang Boyan, Zhang Yinda, Wei Xingkui, et al. H4D: Human 4D Modeling by Learning Neural Compositional Representation[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 19333-19343.
[17]	Patel Chaitanya, Liao Zhouyingcheng, Pons-Moll Gerard. TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 7363-7373.
[18]	Xiu Yuliang, Yang Jinlong, Cao Xu, et al. ECON: Explicit Clothed Humans Optimized via Normal Integration[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 512-523.
[19]	Pons-Moll Gerard, Pujades Sergi, Hu Sonny, et al. ClothCap: Seamless 4D Clothing Capture and Retargeting[J]. ACM Transactions on Graphics, 2017, 36(4): 73.
[20]	Zeng Wang, Ouyang Wanli, Luo Ping, et al. 3D Human Mesh Regression with Dense Correspondence[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 7052-7061.
[21]	Zhang Tianshu, Huang Buzhen, Wang Yangang. Object-occluded Human Shape and Pose Estimation from a Single Color Image[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 7374-7383.
[22]	Mescheder Lars, Oechsle Michael, Niemeyer Michael, et al. Occupancy Networks: Learning 3D Reconstruction in Function Space[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4455-4465.
[23]	Park J J, Florence P, Straub J, et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 165-174.
[24]	Anguelov D, Srinivasan P, Koller D, et al. SCAPE: Shape Completion and Animation of People[C]. ACM Transactions on Graphics, 2005, 24(3): 408-416.
[25]	Bogo F, Kanazawa A, Lassner Christoph, et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image[C]//Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016: 561-578.
[26]	Zanfir Andrei, Marinoiu Elisabeta, Sminchisescu Cristian. Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2148-2157.
[27]	Riza Alp Güler, Kokkinos I. HoloPose: Holistic 3D Human Reconstruction In-the-wild[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 10876-10886.
[28]	Riza Alp Güler, Neverova N, Kokkinos I. DensePose: Dense Human Pose Estimation in the Wild[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7297-7306.
[29]	Joo H, Neverova N, Vedaldi A. Exemplar Fine-tuning for 3D Human Model Fitting Towards In-the-wild 3D Human Pose Estimation[C]//2021 International Conference on 3D Vision (3DV). Piscataway: IEEE, 2021: 42-52.
[30]	Song Jie, Chen Xu, Hilliges Otmar. Human Body Model Fitting by Learned Gradient Descent[C]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 744-760.
[31]	Iqbal U, Xie K, Guo Yunrong, et al. KAMA: 3D Keypoint Aware Body Mesh Articulation[C]//2021 International Conference on 3D Vision (3DV). Piscataway: IEEE, 2021: 689-699.
[32]	Yu Zhenbo, Wang Junjie, Xu Jingwei, et al. Skeleton2Mesh: Kinematics Prior Injected Unsupervised Human Mesh Recovery[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 8599-8609.
[33]	Li Jiefeng, Xu Chao, Chen Zhicun, et al. HybrIK: A Hybrid Analytical-neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 3382-3392.
[34]	Li Jiefeng, Bian Siyuan, Liu Qi, et al. NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 12933-12942.
[35]	Shetty Karthik, Birkhold Annette, Jaganathan Srikrishna, et al. PLIKS: A Pseudo-linear Inverse Kinematic Solver for 3D Human Body Estimation[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway : IEEE, 2023: 574-584.
[36]	Zhou Yi, Barnes C, Lu Jingwan, et al. On the Continuity of Rotation Representations in Neural Networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 5738-5746.
[37]	Varol Gül, Ceylan D, Russell B, et al. BodyNet: Volumetric Inference of 3D Human Body Shapes[C]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 20-38.
[38]	Kolotouros N, Pavlakos G, Daniilidis K. Convolutional Mesh Regression for Single-image Human Shape Reconstruction[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4496-4505.
[39]	Moon Gyeongsik, Mu Lee Kyoung. I2L-MeshNet: Image-to-lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image[C]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 752-768.
[40]	Biggs B, Ehrhardt Sébastien, Joo H, et al. 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 20496-20507.
[41]	Kolotouros N, Pavlakos G, Jayaraman D, et al. Probabilistic Modeling for Human Mesh Recovery[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 11585-11594.
[42]	Fang Qi, Chen Kang, Fan Yinghui, et al. Learning Analytical Posterior Probability for Human Mesh Recovery[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 8781-8791.
[43]	Sengupta A, Budvytis I, Cipolla R. HuManiFlow: Ancestor-conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 4779-4789.
[44]	Gabeur Valentin, Franco Jean-Sebastien, Martin Xavier, et al. Moulding Humans: Non-parametric 3D Human Shape Estimation from Single Images[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 2232-2241.
[45]	Zhang Siwei, Zhang Yan, Bogo F, et al. Learning Motion Priors for 4D Human Body Capture in 3D Scenes[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 11323-11333.
[46]	Sun Yu, Ye Yun, Liu Wu, et al. Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 5348-5357.
[47]	Kocabas Muhammed, Huang Chunhao, Hilliges Otmar, et al. PARE: Part Attention Regressor for 3D Human Body Estimation[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 11107-11117.
[48]	Choutas Vasileios, Pavlakos G, Bolkart Timo, et al. Monocular Expressive Body Regression Through Body-driven Attention[C]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 20-40.
[49]	Sun Yu, Huang Tianyu, Bao Qian, et al. Learning Monocular Mesh Recovery of Multiple Body Parts Via Synthesis[C]//ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 2022: 2669-2673.
[50]	Lin Jing, Zeng Ailing, Wang Haoqian, et al. One-stage 3D Whole-body Mesh Recovery with Component Aware Transformer[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 21159-21168.
[51]	Forte Maria-Paola, Kulits Peter, Huang Chunhao, et al. Reconstructing Signing Avatars from Video Using Linguistic Priors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 12791-12801.
[52]	Alldieck Thiemo, Magnor Marcus, Bharat Lal Bhatnagar, et al. Learning to Reconstruct People in Clothing from a Single RGB Camera[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 1175-1186.
[53]	Bhatnagar Bharat, Tiwari Garvita, Theobalt Christian, et al. Multi-garment Net: Learning to Dress 3D People from Images[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 5419-5429.
[54]	Alldieck Thiemo, Pons-Moll Gerard, Theobalt Christian, et al. Tex2Shape: Detailed Full Human Body Geometry from a Single Image[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 2293-2303.
[55]	Saito S, Simon T, Saragih J, et al. PIFuHD: Multi-level Pixel-aligned Implicit Function for High-resolution 3D Human Digitization[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 81-90.
[56]	Huang Zeng, Xu Yuanlu, Lassner C, et al. ARCH: Animatable Reconstruction of Clothed Humans[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 3090-3099.
[57]	He Tong, Xu Yuanlu, Saito S, et al. ARCH++: Animation-ready Clothed Human Reconstruction Revisited[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 11026-11036.
[58]	He Tong, Collomosse J, Jin Hailin, et al. Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 9276-9287.
[59]	Peng Sida, Zhang Yuanqing, Xu Yinghao, et al. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 9050-9059.
[60]	Liao Tingting, Zhang Xiaomei, Xiu Yuliang, et al. High-fidelity Clothed Avatar Reconstruction from a Single Image[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 8662-8672.
[61]	Zhang Yi, Ji Pengliang, Wang Angtian, et al. 3D-aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 9365-9376.
[62]	Gao Xiangjun, Yang Jiaolong, Kim Jongyoo, et al. MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(8): 6110-6121.
[63]	Mu Jiteng, Sang Shen, Vasconcelos N, et al. ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 18345-18355.
[64]	Zhu Hao, Zuo Xinxin, Wang Sen, et al. Detailed Human Shape Estimation from a Single Image by Hierarchical Mesh Deformation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4486-4495.
[65]	Bharat Lal Bhatnagar, Sminchisescu C, Theobalt Christian, et al. Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction[C]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 311-329.
[66]	Zhu Hao, Zuo Xinxin, Yang Haotian, et al. Detailed Avatar Recovery from Single Image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 7363-7379.
[67]	Xiu Yuliang, Yang Jinlong, Tzionas Dimitrios, et al. ICON: Implicit Clothed Humans Obtained from Normals[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 13286-13296.
[68]	Feng Yao, Liu Weiyang, Bolkart T, et al. Learning Disentangled Avatars with Hybrid 3D Representations[EB/OL]. (2023-09-12) [2025-08-12]. .
[69]	Zhang Xuanmeng, Zhang Jianfeng, Chacko R, et al. GETAvatar: Generative Textured Meshes for Animatable Human Avatars[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 2273-2282.
[70]	Wang A, Xu Yuanlu, Sarafianos N, et al. HISR: Hybrid Implicit Surface Representation for Photorealistic 3D Human Reconstruction[C]//Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2024: 5298-5308.
[71]	Chen Honghu, Peng Bo, Tao Yunfan, et al. D3-human: Dynamic Disentangled Digital Human from Monocular Video[C]//2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2025: 10836-10846.
[72]	Qiu Lingteng, Gu Xiaodong, Li Peihao, et al. LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2025: 14184-14194.
[73]	Yan Chi, Qu Delin, Xu Dan, et al. GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 19595-19604.
[74]	Liu Yang, Huang Xiang, Qin Minghan, et al. Animatable 3D Gaussian: Fast and High-quality Reconstruction of Multiple Human Avatars[C]//Proceedings of the 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 1120-1129.
[75]	Moon Gyeongsik, Shiratori T, Saito S. Expressive Whole-body 3D Gaussian Avatar[C]//Computer Vision – ECCV 2024. Cham: Springer Nature Switzerland, 2025: 19-35.
[76]	Jiang Yujiao, Liao Qingmin, Li Xiaoyu, et al. UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling[J]. Knowledge-based Systems, 2025, 320: 113470.
[77]	Hu Liangxiao, Zhang Hongwen, Zhang Yuxiang, et al. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 634-644.
[78]	Shao Zhijing, Wang Zhaolong, Li Zhuang, et al. SplattingAvatar: Realistic Real-time Human Avatars with Mesh-embedded Gaussian Splatting[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 1606-1616.
[79]	Lei Jiahui, Wang Yufu, Pavlakos G, et al. GART: Gaussian Articulated Template Models[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 19876-19887.
[80]	Li Zhe, Sun Yipengjing, Zheng Zerong, et al. Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling[EB/OL]. (2024-05-25) [2025-08-13]. .
[81]	Li Zhe, Zheng Zerong, Wang Lizhen, et al. Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 19711-19722.
[82]	He Yisheng, Gu Xiaodong, Ye Xiaodan, et al. LAM: Large Avatar Model for One-shot Animatable Gaussian Head[EB/OL]. (2025-04-04) [2025-09-24]. .
[83]	Zhang D, Liu Y, Lin L, et al. GUAVA: Generalizable Upper Body 3D Gaussian Avatar[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2025: 14205-14217.
[84]	Liao Kaimin, Wang Hua, Chen Zhi, et al. LiteGS: A High-performance Framework to Train 3DGS in Subminutes via System and Algorithm Codesign[EB/OL]. (2025-09-26) [2025-09-29]. .
[85]	Kanazawa A, Black Michael J, Jacobs D W, et al. End-to-End Recovery of Human Shape and Pose[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7122-7131.
[86]	Zanfir Andrei, Eduard Gabriel Bazavan, Xu Hongyi, et al. Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows[C]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 465-481.
[87]	Pavlakos G, Zhu Luyang, Zhou Xiaowei, et al. Learning to Estimate 3D Human Pose and Shape from a Single Color Image[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 459-468.
[88]	Zhang Hongwen, Cao Jie, Lu Guo, et al. Learning 3D Human Shape and Pose from Dense Body Parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2610-2627.
[89]	Omran Mohamed, Lassner Christoph, Pons-Moll Gerard, et al. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation[C]//2018 International Conference on 3D Vision (3DV). Piscataway: IEEE, 2018: 484-494.
[90]	Loper Matthew, Mahmood Naureen, Black Michael J. MoSh: Motion and Shape Capture from Sparse Markers[J]. ACM Transactions on Graphics, 2014, 33(6): 220.
[91]	Varol Gül, Romero J, Martin Xavier, et al. Learning from Synthetic Humans[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 4627-4635.
[92]	Cai Zhongang, Zhang Mingyuan, Ren Jiawei, et al. Playing for 3D Human Recovery[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10533-10545.
[93]	Patel Priyanka, Huang Chunhao, Tesch Joachim, et al. AGORA: Avatars in Geography Optimized for Regression Analysis[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 13463-13473.
[94]	Yu Tao, Zheng Zerong, Guo Kaiwen, et al. Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 5742-5752.
[95]	Zheng Yang, Shao Ruizhi, Zhang Yuxiang, et al. DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 6219-6229.
[96]	Ionescu Catalin, Papava Dragos, Olaru Vlad, et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339.
[97]	Timo von Marcard, Henschel Roberto, Black Michael J, et al. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera[C]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 614-631.
[98]	Mahmood Naureen, Ghorbani Nima, Troje Nikolaus F, et al. AMASS: Archive of Motion Capture as Surface Shapes[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 5441-5450.
[99]	Joo H, Liu Hao, Tan Lei, et al. Panoptic Studio: A Massively Multiview System for Social Motion Capture[C]//2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2015: 3334-3342.
[100]	Yu Zhixuan, Shin Yoon J, Lee I K, et al. HUMBI: A Large Multiview Dataset of Human Body Expressions[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 2987-2997.
[101]	Kolotouros N, Pavlakos G, Black M, et al. Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 2252-2261.
[102]	Sigal Leonid, Balan A O, Black M J. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion[J]. International Journal of Computer Vision, 2010, 87(1): 4-27.
[103]	Trumble M, Gilbert A, Malleson C, et al. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors[C]//2017 British Machine Vision Conference (BMVC). Durham: BMVA Press, 2017: 1-13.
[104]	Mehta Dushyant, Rhodin Helge, Dan Casas, et al. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision[C]//2017 International Conference on 3D Vision (3DV). Piscataway: IEEE, 2017: 506-516.
[105]	Fang Qi, Qing Shuai, Dong Junting, et al. Reconstructing 3D Human Pose by Watching Humans in the Mirror[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 12809-12818.
[106]	Xiang Donglai, Joo H, Sheikh Y. Monocular Total Capture: Posing Face, Body, and Hands in the Wild[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 10957-10966.
[107]	Johnson S, Everingham M. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation[C]//Proceedings of the British Machine Vision Conference. Durham: BMVA Press, 2010: 1-11.
[108]	Johnson S, Everingham M. Learning Effective Human Pose Estimation from Inaccurate Annotation[C]//CVPR 2011. Piscataway: IEEE, 2011: 1465-1472.
[109]	Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]//Computer Vision – ECCV 2014. Cham: Springer International Publishing, 2014: 740-755.
[110]	Andriluka Mykhaylo, Pishchulin Leonid, Gehler Peter, et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 3686-3693.
[111]	Andriluka M, Iqbal Umar, Insafutdinov Eldar, et al. PoseTrack: A Benchmark for Human Pose Estimation and Tracking[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5167-5176.
[112]	Zhang Songhai, Li Ruilong, Dong Xin, et al. Pose2Seg: Detection Free Human Instance Segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 889-898.
[113]	Hu Pengpeng, Ho E S L, Munteanu Adrian. 3DBodyNet: Fast Reconstruction of 3D Animatable Human Body Shape from a Single Commodity Depth Camera[J]. IEEE Transactions on Multimedia, 2022, 24: 2139-2149.
[114]	Yan Ming, Zhang Yan, Cai Shuqiang, et al. RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 2250-2262.
[115]	Zhao Mingmin, Li Tianhong, Alsheikh M A, et al. Through-wall Human Pose Estimation Using Radio Signals[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7356-7365.
[116]	Wang Fei, Zhou Sanping, Panev S, et al. Person-in-WiFi: Fine-grained Person Perception Using WiFi[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 5451-5460.
[117]	Luo Yiyue, Li Yunzhu, Foshey M, et al. Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 11250-11260.
[118]	Zuo Chengxu, Wang Yiming, Zhan Lishuang, et al. Loose Inertial Poser: Motion Capture with IMU-attached Loose-wear Jacket[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 2209-2219.
[119]	Bujić Mila, Macey Anna-Leena, Järvelä Simo, et al. Playing with Embodied Social Interaction: A Thematic Review of Experiments on Social Aspects in Gameful Virtual Reality[J]. Interacting With Computers, 2022, 33(6): 583-595.
[120]	Pan Ye, Zhang Ruisi, Wang Jingying, et al. Real-time Facial Animation for 3D Stylized Character with Emotion Dynamics[C]//Proceedings of the 31st ACM International Conference on Multimedia. New York: ACM, 2023: 6851-6859.
[121]	Raj A, Tanke Julian, Hays J, et al. ANR: Articulated Neural Rendering for Virtual Avatars[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 3721-3730.
[122]	Wu Jiajie, Li F W B, Tam G K L, et al. Talking Face Generation with Lip and Identity Priors[J]. Computer Animation & Virtual Worlds, 2025, 36(3): e70026.
[123]	Wang Xinyi, Liu Shiguang, Yang Xu. SSGesture: Multimodal Gesture Generation Framework for Human Animation Synthesis[J/OL]. IEEE Computer Graphics and Applications. (2025-06-06) [2025-08-14]. .
[124]	Zhang Zechuan, Sun Li, Yang Zongxin, et al. Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 7818-7830.
[125]	Martini Miriana, Valentini Valeria, Ciprian Alberto, et al. Semi-automated Digital Human Production for Enhanced Media Broadcasting[C]//2024 IEEE Gaming, Entertainment, and Media Conference (GEM). Piscataway: IEEE, 2024: 1-6.
[126]	Zheng Zerong, Yu Tao, Wei Yixuan, et al. DeepHuman: 3D Human Reconstruction from a Single Image[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 7738-7748.
[127]	Peng Sida, Dong Junting, Wang Qianqian, et al. Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 14294-14303.
[128]	Akbari H, Yuan Liangzhe, Qian Rui, et al. VATT: Transformers for Multimodal Self-supervised Learning from Raw Video, Audio and Text[C]//Advances in Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 24206-24221.
[129]	Mohr A, Gleicher M. Building Efficient, Accurate Character Skins from Examples[J]. ACM Transactions on Graphics, 2003, 22(3): 562-568.

类型	数据集	帧数	场景数	人数	人数/帧	真实场景	标注类型
渲染数据集	SURREAL^[91]	6.5×10⁶	2 607	145	1		SMPL
	GTA-Human^[92]	1.4×10⁶		>600	1		SMPL
	AGORA^[93]	1.7×10⁴	>350	4 240	5~15		SMPL-X
	THUman2.0^[94]			200	1		SMPL-X
	MultiHuman^[95]			50	1~3		SMPL-X
标记/传感器	HumanEva^[102]	8.0×10⁴	1	4	1
	Human3.6M^[96]	3.6×10⁶	1	11	1		SMPL
	Total Capture^[103]	1.9×10⁶	1	5	1
	3DPW^[97]	>5.1×10⁴	60	7	1~2	􀳫	SMPL
无标记多视角	CMUPanoptic^[99]	1.5×10⁶	1	40	3~8
	MPI-INF-3DHP^[104]	>1.3×10⁶	1	8	1		SMPL
	3DOH50K^[21]	5.16×10⁴	1	-	1		SMPL
	Mirrored-Human^[105]	1.8×10⁶	>200	>200	≥1		SMPL
	MTC^[106]	8.34×10⁶	1	40	1
	EHF^[6]	1.0×10²	1	1	1		SMPL-X
	HUMBI^[100]	1.73×10⁷	1	772	1		SMPL
	ZJU-MoCap^[59]		1	9	1		SMPL-X
伪3D标签	LSP^[107]	2.0×10³			1	􀳫	SMPL
	LSP-Extended^[108]	1.0×10⁴			1	􀳫	SMPL
	MSCOCO^[109]	3.8×10⁴			≥1	􀳫	SMPL
	MPII^[110]	2.492×10⁴	3 913	>40 000	≥1	􀳫	SMPL
	PoseTrack^[111]	6.637 4×10⁴	550	550	>1	􀳫	SMPL
	OCHuman^[112]	4.731×10³		8 110	>1	􀳫	SMPL
	Ubody^[50]	>1.050×10⁶			≥1	􀳫	SMPL-X

传感器比较	主动传感器				被动传感器
传感器比较	运动捕捉系统	深度相机	激光雷达	毫米波雷达	触觉传感器	IMU	RGB相机
探测距离	近	较近	较远	远	很近	很近	一般
室外工作能力	弱	弱	强	强	强	强	强
夜间工作能力	强	强	强	强	强	强	弱
烟雾环境工作能力	弱	弱	弱	强	强	强	一般
雨雪环境工作能力	弱	弱	一般	强	强	强	一般
温度稳定性	一般	强	强	强	弱	弱	强
速度测量能力	弱	弱	弱	强	弱	强	弱
测量精度	很高	高	高	低	低	低	高