基于多维投影时空事件帧的动态视觉传感手势识别

doi:10.16182/j.issn1004731x.joss.23-0223

系统仿真学报 ›› 2024, Vol. 36 ›› Issue (3): 649-658.doi: 10.16182/j.issn1004731x.joss.23-0223

基于多维投影时空事件帧的动态视觉传感手势识别

康来¹^,²(), 张亚坤³

^1.国防科技大学系统工程学院，湖南长沙 410073
^2.国防科技大学大数据与决策实验室，湖南长沙 410073
^3.中国人民解放军61081部队，北京 100089

收稿日期:2023-02-28 修回日期:2023-04-23 出版日期:2024-03-15 发布日期:2024-03-14
第一作者简介:康来(1983-)，男，副教授，博士，研究方向为计算机视觉与模式识别、虚拟现实技术。E-mail：kanglai@nudt.edu.cn
基金资助:
国家自然科学基金(61873274)

Gesture Recognition for Dynamic Vision Sensor Based on Multi-dimensional Projection Spatiotemporal Event Frame

Kang Lai¹^,²(), Zhang Yakun³

^1.College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
^2.Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
^3.PLA 61081 Troops, Beijing 100089, China

Received:2023-02-28 Revised:2023-04-23 Online:2024-03-15 Published:2024-03-14

摘要/Abstract

摘要：

基于视觉的手势识别是虚拟现实、游戏仿真等领域常用的人机交互手段。在实际应用中，手势动作快速变化将导致传统RGB相机或深度相机成像模糊，给手势识别带来巨大挑战。针对上述问题，利用动态视觉传感器捕捉高速手势运动信息，提出一种基于多维投影时空事件帧(spatiotemporal event frame, STEF)的动态视觉数据手势识别方法。将时空信息嵌入到数据投影面融合形成多维投影时空事件帧，克服现有动态视觉信息事件帧表达方法时域信息丢失的局限性，提升动态视觉传感数据的特征表达能力。在此基础上，采用先进的脉冲神经网络对时空事件帧进行分类实现手势识别。在公开数据集上的识别精度达到96.67%，性能优于同类方法，表明该方法可显著提升动态视觉传感数据手势识别准确率。

关键词: 动态视觉传感器, 手势识别, 多维投影, 时空事件帧, 脉冲神经网络

Abstract:

Vision-based gesture recognition is a commonly used means of human-computer interaction in the fields of virtual reality and game simulation. In practical applications, rapid changes in gesture movements will lead to blurred imaging with traditional RGB cameras or depth cameras, which brings great challenges to gesture recognition. To solve the above problems, a dynamic visual data gesture recognition method based on a multi-dimensional projection spatiotemporal event frame (STEF) is proposed by a using dynamic vision sensor to capture high-speed gesture movement information. The spatiotemporal information is embedded in the data projection surface and fused to form a multi-dimensional projection STEF, which overcomes the limitation of the time-domain information loss of the existing event frame expression method of dynamic visual information and improves the feature expression ability of dynamic visual sensing data. On this basis, advanced spiking neural networks are used to classify STEFs to realize gesture recognition. The recognition accuracy of the above method on the public dataset reaches 96.67%, which is better than similar methods, indicating that the proposed method can significantly improve the accuracy of gesture recognition in dynamic visual sensing data.

Key words: dynamic vision sensor, gesture recognition, multi-dimensional projection, spatiotemporal event frame, spiking neural network

中图分类号:

TP391

康来,张亚坤 . 基于多维投影时空事件帧的动态视觉传感手势识别[J]. 系统仿真学报, 2024, 36(3): 649-658.

Kang Lai,Zhang Yakun . Gesture Recognition for Dynamic Vision Sensor Based on Multi-dimensional Projection Spatiotemporal Event Frame[J]. Journal of System Simulation, 2024, 36(3): 649-658.

图/表 9

图 1

图 2

图 3

图 4

表 1

图 5

图 6

图 7

图 8

参考文献 22

1	Amir A, Taba B, Berg D, et al. A Low Power, Fully Event-based Gesture Recognition System[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2017: 7388-7397.
2	Glover Arren, Bartolozzi Chiara. Event-driven Ball Detection and Gaze Fixation in Clutter[C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, NJ, USA: IEEE, 2016: 2203-2208.
3	Orchard G, Meyer Cedric, Etienne-Cummings R, et al. HFirst: A Temporal Approach to Object Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(10): 2028-2040.
4	Matsuda N, Cossairt O, Gupta M. MC3D: Motion Contrast 3D Scanning[C]//2015 IEEE International Conference on Computational Photography (ICCP). Piscataway, NJ, USA: IEEE, 2015: 1-10.
5	Zhu Zihao, Yuan Liangzhe, Chaney K, et al. EV-FlowNet: Self-supervised Optical Flow Estimation for Event-based Cameras[EB/OL]. (2018-08-13) [2023-01-07]. .
6	Rebecq Henri, Ranftl René, Koltun V, et al. High Speed and High Dynamic Range Video with an Event Camera[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6): 1964-1980.
7	Rebecq Henri, Horstschaefer Timo, Gallego Guillermo, et al. EVO: A Geometric Approach to Event-based 6-DOF Parallel Tracking and Mapping in Real Time[J]. IEEE Robotics and Automation Letters, 2017, 2(2): 593-600.
8	Antoni Rosinol Vidal, Rebecq Henri, Horstschaefer Timo, et al. Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-speed Scenarios[J]. IEEE Robotics and Automation Letters, 2018, 3(2): 994-1001.
9	Gallego Guillermo, Delbrück Tobi, Orchard G, et al. Event-based Vision: A Survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 154-180.
10	Maqueda Ana I, Loquercio Antonio, Gallego Guillermo, et al. Event-based Vision Meets Deep Learning on Steering Prediction for Self-driving Cars[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 5419-5427.
11	Gehrig Daniel, Rebecq Henri, Gallego Guillermo, et al. EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames[J]. International Journal of Computer Vision, 2020, 128(3): 601-618.
12	Liu Min, Delbruck T. Adaptive Time-slice Block-Matching Optical Flow Algorithm for Dynamic Vision Sensors[C]//British Machine Vision Conference (BMVC) 2018. UK: BMVC, 2018: 168589.
13	Aimar Alessandro, Mostafa Hesham, Calabrese Enrico, et al. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 644-656.
14	Lagorce Xavier, Orchard G, Galluppi Francesco, et al. HOTS: A Hierarchy of Event-based Time-surfaces for Pattern Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(7): 1346-1359.
15	Manderscheid Jacques, Sironi Amos, Bourdis Nicolas, et al. Speed Invariant Time Surface for Learning to Detect Corner Points with Event-based Cameras[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2019: 10237-10246.
16	Sironi Amos, Brambilla Manuele, Bourdis Nicolas, et al. HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 1731-1740.
17	Wang Qinyi, Zhang Yexin, Yuan Junsong, et al. Space-time Event Clouds for Gesture Recognition: from RGB Cameras to Event Cameras[C]//2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway, NJ, USA: IEEE, 2019: 1826-1835.
18	Gallego Guillermo, Jon E A Lund, Mueggler Elias, et al. Event-based, 6-DOF Camera Tracking from Photometric Depth Maps[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(10): 2402-2412.
19	Roy K, Jaiswal A, Panda P. Towards Spike-based Machine Intelligence with Neuromorphic Computing[J]. Nature, 2019, 575(7784): 607-617.
20	Fang Wei, Yu Zhaofei, Chen Yanqi, et al. Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2021: 2641-2651.
21	黄铁军, 余肇飞, 李源, 等. 脉冲视觉研究进展[J]. 中国图象图形学报, 2022, 27(6): 1823-1839.
	Huang Tiejun, Yu Zhaofei, Li Yuan, et al. Advances in Spike Vision[J]. Journal of Image and Graphics, 2022, 27(6): 1823-1839.
22	Loshchilo Ilya, Hutter Frank. SGDR: Stochastic Gradient Descent with Warm Restarts[C]//ICLR 2017 (5th International Conference on Learning Representations). New York, USA: ICLR, 2017: 1-16.

标签	动作类型
1	hand clap
2	right hand wave
3	left hand wave
4	right arm clockwise
5	right arm counter clockwise
6	left arm clockwise
7	left arm counter clockwise
8	arm roll
9	air drums
10	air guitar
11	other gestures

[1]	沈嘉玮, 才大业, 杨国青, 吕攀, 李红. 大规模脉冲神经网络动态加载仿真方法[J]. 系统仿真学报, 2025, 37(2): 541-550.
[2]	刘晓德, 郭宇飞, 陈元培, 周洁, 张瑀涵, 彭玮航, 马喆. 基于脉冲强化学习的连续运动控制仿真与优化[J]. 系统仿真学报, 2025, 37(10): 2662-2671.
[3]	徐胜, 冯文宇, 刘志诚, 涂鑫涛, 费敏锐, 张堃. 基于机器视觉的复杂环境下精确手势识别算法研究[J]. 系统仿真学报, 2021, 33(10): 2460-2469.
[4]	薛艳萍, 王学松, 武仲科, 王醒策, 周明全. Myo手势识别方法在古建筑漫游系统中的应用[J]. 系统仿真学报, 2019, 31(9): 1907-1915.
[5]	王远明, 张珺, 秦远辉, 柴秀娟. 基于多特征融合的指挥手势识别方法研究[J]. 系统仿真学报, 2019, 31(2): 346-352.
[6]	蔡芝蔚, 吴淑燕, 宋俊锋. 基于SVM和组合优化模型的手势识别[J]. 系统仿真学报, 2016, 28(8): 1812-1817.
[7]	毛雁明, 章立亮. 基于Kinect深度信息的手势分割与识别[J]. 系统仿真学报, 2015, 27(4): 830-835.
[8]	王任大, 尹勇, 邢胜伟. 用于手势识别系统的肤色图像后处理方法研究[J]. 系统仿真学报, 2015, 27(10): 2483-2488.

基于多维投影时空事件帧的动态视觉传感手势识别

Gesture Recognition for Dynamic Vision Sensor Based on Multi-dimensional Projection Spatiotemporal Event Frame

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 22

相关文章 8

编辑推荐

Metrics

本文评价