系统仿真学报 ›› 2024, Vol. 36 ›› Issue (1): 67-82.doi: 10.16182/j.issn1004731x.joss.22-0937

• 论文 • 上一篇    下一篇

基于余弦相似性的定向注意力行为识别模型

李晨1(), 何明1(), 董晨2, 李伟1   

  1. 1.陆军工程大学 指挥控制工程学院,江苏 南京 210007
    2.陆军政治工作部 军事人力资源保障中心,北京 100072
  • 收稿日期:2022-08-09 修回日期:2022-10-18 出版日期:2024-01-20 发布日期:2024-01-19
  • 通讯作者: 何明 E-mail:651220007@qq.com;paper_review@126.com
  • 第一作者简介:李晨(1994-),男,硕士生,研究方向为计算机视觉。E-mail:651220007@qq.com
  • 基金资助:
    江苏省重点研发计划(BE20200729);军内科研项目(LJ20212Z010032);军队重点课题(JYKYA2021029)

Action Recognition Model of Directed Attention Based on Cosine Similarity

Li Chen1(), He Ming1(), Dong Chen2, Li Wei1   

  1. 1.Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China
    2.Military Human Resource Support Center, Political Work Department of the Army, Beijing 100072, China
  • Received:2022-08-09 Revised:2022-10-18 Online:2024-01-20 Published:2024-01-19
  • Contact: He Ming E-mail:651220007@qq.com;paper_review@126.com

摘要:

针对传统点积注意力缺乏方向性的问题,建立了一种基于余弦相似性的定向注意力模型(directed attention model, DAM)。为有效表示视频帧时空特征间的方向关系,运用余弦相似性理论,定义了注意力机制中关系函数,能够去除特征间关系绝对值;为降低注意力机制计算量,从时间和空间两个维度上对运算进行分解;结合线性注意力运算,进一步优化计算复杂度。实验分为两个阶段:对定向注意力各模块开展了4个消融实验,以表现DAM在精确度和效率方面的最佳性能;该模型在Sth-Sth V1(something something V1)数据集上的精确度较I3D-NL(inflated 3D ConvNet non-local)高7.3%,在UCF101(101 human action classes from videos in the wild)数据集上的识别精确率为95.7%。研究成果在安全监控、自动驾驶等方面应用前景广泛。

关键词: 行为识别, 深度学习, 注意力机制, 余弦相似性, 时空分解

Abstract:

Aiming at the lack of directionality of traditional dot product attention, this paper proposes a directed attention model (DAM) based on cosine similarity. To effectively represent the direction relationship between the spatial and temporal features of video frames, the paper defines the relationship function in the attention mechanism using the cosine similarity theory, which can remove the absolute value of the relationship between features. To reduce the computational burden of the attention mechanism, the operation is decomposed from two dimensions of time and space. The computational complexity is further optimized by combining linear attention operation.The experiment is divided into two stages : Four ablation experiments are carried out on each module of directed attention to show the best performance of DAM in accuracy and efficiency; the accuracy of the model is 7.3% higher than that of I3D-NL on the Sth-Sth V1(something something V1) dataset and 95.7% on the UCF101(101 human action classes from videos in the wild) dataset. The research results have a wide application prospect in safety monitoring, automatic driving, and so on.

Key words: action recognition, deep learning, attentional mechanism, cosine similarity, time-space decomposition

中图分类号: