系统仿真学报 ›› 2026, Vol. 38 ›› Issue (1): 99-111.doi: 10.16182/j.issn1004731x.joss.25-0742

• 论文 • 上一篇    下一篇

多目视觉下的逆运动学三维人体建模仿真

方国宇1, 李琰泽2, 陈凯1, 赵晓冬1, 胡子卓1, 杨明实1, 武婉晴1, 王子晨1, 郭文凯1   

  1. 1.南京航空航天大学 机电学院,江苏 南京 210016
    2.北京理工大学,北京 100081
  • 收稿日期:2025-08-03 修回日期:2025-10-03 出版日期:2026-01-18 发布日期:2026-01-28
  • 通讯作者: 陈凯
  • 第一作者简介:方国宇(2000-),男,硕士生,研究方向为计算机视觉、交通仿真。
  • 基金资助:
    中央高校基本科研业务(NS2024030);国家自然科学基金(52202417);江苏省基础研究计划(BK20252021);中国博士后科学 基金(2022TQ0155);中国博士后科学 基金(2022M721605);中国科协青年科技人才托举工程(2023QNRC001)

Inverse Kinematics 3D Human Modeling Simulation based on Multi-view Vision

Fang Guoyu1, Li Yanze2, Chen Kai1, Zhao Xiaodong1, Hu Zizhuo1, Yang Mingshi1, Wu Wanqing1, Wang Zichen1, Guo Wenkai1   

  1. 1.College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    2.Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-08-03 Revised:2025-10-03 Online:2026-01-18 Published:2026-01-28
  • Contact: Chen Kai

摘要:

自动驾驶仿真和工业虚拟现实仿真技术中对三维人体建模的准确性和鲁棒性具有较高的需求,现阶段基于关节点进行人体建模存在连续建模抖动、局部扭曲、遮挡适应性差等影响人体模型质量的问题,制约了智能驾驶和数字工厂等实际应用的发展。针对上述问题,提出一种多目视觉下基于向量量化变分自编码器的逆运动学三维人体建模方法,通过梯度下降自动变分方法的联合训练与IK-VQ-VAE(inverse kinematics vector quantised-variational auto encoder)方法相结合,得到了多视角时序融合、遮挡适应且更具鲁棒性的方法,满足更加符合真实人体姿态的需求。在公开数据集Shelf上进行实验,结果显示所提方法的正确部件百分比(PCP)相比近年的优化工作最高提升23.7%,平均提升了8.7%,同时,定性实验分析结果也表明了所提方法对人体3D建模效果优于其他方法。

关键词: 多目视觉, 人体网格恢复, 向量量化变分自编码器, 三维人体建模, 人体姿态

Abstract:

In autonomous driving simulation and industrial virtual reality simulation, there is a high demand for accuracy and robustness in 3D human body modeling. However, current joint-based human modeling approaches suffer from issues such as continuous modeling jitter, local distortion, and poor adaptability to occlusion, which degrade model quality and limit the development of practical applications such as intelligent driving and digital factories. To address these challenges, this paper proposes a multi-view vision-based inverse kinematics 3D human modeling method using a vector quantized variational autoencoder(IK-VQ-VAE). By integrating joint training with an automatic variational gradient descent approach, the proposed method achieves multi-view temporal fusion and enhanced occlusion adaptability, resulting in a more robust and realistic human pose reconstruction. Experiments conducted on the public Shelf dataset demonstrate that the proposed method achieves a maximum improvement of 23.7% and an average improvement of 8.7% in the percentage of correct parts(PCP)compared with recent optimized approaches. Qualitative results further confirm that our method produces superior 3D human modeling performance compared to existing methods.

Key words: multi-view vision, human mesh recovery, vector quantized variational autoencoder(VQ-VAE), 3D human modeling, human pose

中图分类号: