系统仿真学报 ›› 2026, Vol. 38 ›› Issue (1): 174-188.doi: 10.16182/j.issn1004731x.joss.25-0863

• 论文 • 上一篇    下一篇

DEHPR:基于扩散模型的端到端手部姿态重建网络

廖国琼1,2, 黄龙杰3, 李清新3, 张家俊1, 陈柯帆1   

  1. 1.江西财经大学,虚拟现实(VR)现代产业学院,江西 南昌 330032
    2.江西旅游商贸职业学院,江西 南昌 330100
    3.江西财经大学 计算机与人工智能学院,江西 南昌 330013
  • 收稿日期:2025-09-07 修回日期:2025-11-19 出版日期:2026-01-18 发布日期:2026-01-28
  • 通讯作者: 黄龙杰
  • 第一作者简介:廖国琼(1969-),男,教授,博士,研究方向为人机交互。
  • 基金资助:
    国家自然科学基金(62272207);江西省自然科学基金(20224ACB202009)

DEHPR: A Diffusion-based End-to-end Hand Pose Reconstruction Network

Liao Guoqiong1,2, Huang Longjie3, Li Qingxin3, Zhang Jiajun1, Chen Kefan1   

  1. 1.Modern Industry School of Virtual Reality (VR), Jiangxi University of Finance and Economics, Nanchang 330032, China
    2.Jiangxi Tourism and Commerce Vocational College, Nanchang 330100, China
    3.School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics, Nanchang 330032, China
  • Received:2025-09-07 Revised:2025-11-19 Online:2026-01-18 Published:2026-01-28
  • Contact: Huang Longjie

摘要:

针对传统方法如卷积神经网络(CNN)和Transformer在处理复杂场景的手部姿态重建任务时存在对大规模标注数据依赖性强、泛化能力不足等问题,提出了基于扩散模型的端到端手部姿态重建网络(diffusion-based end-to-end hand pose reconstruction network,DEHPR)。DEHPR通过引入扩散模型直接生成3D姿态假设并进行细化的策略,降低2D-to-3D建模方式导致的空间不确定性,引入端到端模型对多个3D姿态假设进行重投影选取最优关节点,最终生成预测的手部姿态。分别在HO3D V2数据集、DexYCB数据集以及FreiHand数据集上对所提出网络进行性能评估实验,结果表明,DEHPR性能效果优于现有方法,有效降低了对大规模标注数据的依赖性和单RGB图像2D-to-3D间接模型的不确定性,提升了手部姿态重建的准确性和鲁棒性。

关键词: 扩散模型, 端到端, 手部姿态, 手部遮挡, 姿态重建

Abstract:

Traditional methods such as convolutional neural networks (CNNs) and Transformers suffer from strong dependence on large-scale annotated data and limited generalization capability when dealing with hand pose reconstruction in complex scenarios. To address these issues, a diffusion-based end-to-end hand pose reconstruction network (DEHPR) is proposed. This method employs a diffusion model to directly generate and refine 3D predictions, thereby reducing spatial uncertainties inherent in 2D-to-3D modeling paradigms. By incorporating an end-to-end framework that reprojects multiple 3D candidate predictions to select optimal joint positions, the approach ultimately produces accurate hand pose estimations. Comprehensive evaluations conducted on HO3D V2, DexYCB, and FreiHand datasets demonstrate that DEHPR achieves superior performance compared to existing methods. The proposed solution effectively diminishes dependency on large-scale annotated data, mitigates uncertainties in indirect 2D-to-3D modeling from single RGB images, and consequently enhances both accuracy and robustness in hand pose reconstruction.

Key words: diffusion model, end-to-end, hand posture, hand occlusion, pose reconstruction

中图分类号: