系统仿真学报 ›› 2024, Vol. 36 ›› Issue (3): 649-658.doi: 10.16182/j.issn1004731x.joss.23-0223

• 论文 • 上一篇    下一篇

基于多维投影时空事件帧的动态视觉传感手势识别

康来1,2(), 张亚坤3   

  1. 1.国防科技大学 系统工程学院,湖南 长沙 410073
    2.国防科技大学 大数据与决策实验室,湖南 长沙 410073
    3.中国人民解放军61081部队,北京 100089
  • 收稿日期:2023-02-28 修回日期:2023-04-23 出版日期:2024-03-15 发布日期:2024-03-14
  • 第一作者简介:康来(1983-),男,副教授,博士,研究方向为计算机视觉与模式识别、虚拟现实技术。E-mail:kanglai@nudt.edu.cn
  • 基金资助:
    国家自然科学基金(61873274)

Gesture Recognition for Dynamic Vision Sensor Based on Multi-dimensional Projection Spatiotemporal Event Frame

Kang Lai1,2(), Zhang Yakun3   

  1. 1.College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
    2.Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
    3.PLA 61081 Troops, Beijing 100089, China
  • Received:2023-02-28 Revised:2023-04-23 Online:2024-03-15 Published:2024-03-14

摘要:

基于视觉的手势识别是虚拟现实、游戏仿真等领域常用的人机交互手段。在实际应用中,手势动作快速变化将导致传统RGB相机或深度相机成像模糊,给手势识别带来巨大挑战。针对上述问题,利用动态视觉传感器捕捉高速手势运动信息,提出一种基于多维投影时空事件帧(spatiotemporal event frame, STEF)的动态视觉数据手势识别方法。将时空信息嵌入到数据投影面融合形成多维投影时空事件帧,克服现有动态视觉信息事件帧表达方法时域信息丢失的局限性,提升动态视觉传感数据的特征表达能力。在此基础上,采用先进的脉冲神经网络对时空事件帧进行分类实现手势识别。在公开数据集上的识别精度达到96.67%,性能优于同类方法,表明该方法可显著提升动态视觉传感数据手势识别准确率。

关键词: 动态视觉传感器, 手势识别, 多维投影, 时空事件帧, 脉冲神经网络

Abstract:

Vision-based gesture recognition is a commonly used means of human-computer interaction in the fields of virtual reality and game simulation. In practical applications, rapid changes in gesture movements will lead to blurred imaging with traditional RGB cameras or depth cameras, which brings great challenges to gesture recognition. To solve the above problems, a dynamic visual data gesture recognition method based on a multi-dimensional projection spatiotemporal event frame (STEF) is proposed by a using dynamic vision sensor to capture high-speed gesture movement information. The spatiotemporal information is embedded in the data projection surface and fused to form a multi-dimensional projection STEF, which overcomes the limitation of the time-domain information loss of the existing event frame expression method of dynamic visual information and improves the feature expression ability of dynamic visual sensing data. On this basis, advanced spiking neural networks are used to classify STEFs to realize gesture recognition. The recognition accuracy of the above method on the public dataset reaches 96.67%, which is better than similar methods, indicating that the proposed method can significantly improve the accuracy of gesture recognition in dynamic visual sensing data.

Key words: dynamic vision sensor, gesture recognition, multi-dimensional projection, spatiotemporal event frame, spiking neural network

中图分类号: