系统仿真学报 ›› 2021, Vol. 33 ›› Issue (5): 1019-1030.doi: 10.16182/j.issn1004731x.joss.19-0448

• 仿真建模理论与方法 • 上一篇    下一篇

基于深度神经网络的多视角人体动作识别

赵瑛1,2, 陆耀1, 张健3, 梁启弟3, 龙炜1   

  1. 1.北京理工大学 智能信息技术北京市重点实验室,北京 100081;
    2.北京联合大学 师范学院,北京 100011;
    3.中南大学 计算机学院,长沙 410083
  • 收稿日期:2019-08-26 修回日期:2019-10-08 出版日期:2021-05-18 发布日期:2021-06-09
  • 通讯作者: 陆耀(1958-),男,博士,教授,研究方向为神经网络、图像和信号处理、模式识别。E-mail:vis_yl@bit.edu.cn
  • 作者简介:赵瑛(1977-),女,博士,副教授,研究方向为人体行为分析、机器学习、智慧教育。E-mail:sftzhaoying@buu.edu.cn
  • 基金资助:
    国家自然科学基金(61273273); 国家重点研发计划(2017YFC0112001)

Multi-view Human Action Recognition Based on Deep Neural Network

Zhao Ying1,2, Lu Yao1, Zhang Jian3, Liang Qidi3, Long Wei1   

  1. 1. Beijing Laboratory of Intelligent Information Technology, Beijing Institute of Technology, Beijing 100081, China;
    2. Teachers College, Beijing Union University, Beijing 100011, China;
    3. School of Computer Science and Engineering, Central South University, Changsha 410083, China
  • Received:2019-08-26 Revised:2019-10-08 Online:2021-05-18 Published:2021-06-09

摘要: 为提高多视角人体动作识别的精度,提出了一种新的深度神经网络模型——CNN+CA(Convolutional Neural Network plus Context Attention)模型和一种基于序列匹配的识别方法。利用卷积神经网络自动学习出多视角融合特征;引入上下文注意力模块自动突出特征中有利于识别的区域,进一步提高特征的判别力;通过基于序列匹配的方法实现人体动作识别。在IXMAS 数据集和i3DPost 数据集上的实验结果表明,所提方法在识别精度上超过了其他同类方法。

关键词: 多视角, 人体动作识别, 卷积神经网络, 上下文注意力, 序列匹配

Abstract: A novel deep neural network named CNN+CA(Convolutional Neural Network plus Context Attention) model is constructed and a new recognition algorithm based on sequence matching is presented to improve the recognition accuracy of MVHAR (Multi-view Human Action Recognition). A CNN(Convolutional Neural Network) is designed to automatically learn multi-view fusion features; the CA (Context Attention) module is introduced to selectively focus on the parts of the features that are relevant for the recognition task; the proposed recognition algorithm based on sequence matching is used to realize MVHAR. The experimental results on the IXMAS dataset and the i3DPost dataset demonstrate that the recognition accuracy of the proposed method is higher than those of the state-of-the-art MVHAR methods.

Key words: multi-view, human action recognition, convolutional neural network, context attention, sequence matching

中图分类号: