系统仿真学报 ›› 2019, Vol. 31 ›› Issue (11): 2382-2387.doi: 10.16182/j.issn1004731x.joss.19-FZ0369

• 仿真系统与技术 • 上一篇    下一篇

基于时空特征金字塔网络的动作时序检测方法

刘望, 孙金玉, 马世伟*   

  1. 上海大学 机电工程与自动化学院,上海 200444
  • 收稿日期:2019-05-21 修回日期:2019-07-23 出版日期:2019-11-10 发布日期:2019-12-13
  • 作者简介:刘望(1995-),男,福建福州,硕士生,研究方向为视频检索;马世伟(通讯作者1965-),男,甘肃嘉峪关,博士,教授,研究方向为信号处理、图像处理和模式识别等。
  • 基金资助:
    新疆兵团重大项目子项目(2018AA008-04)

A Temporal Action Detection Algorithm Based on Spatio-Temporal Feature Pyramid Network

Liu Wang, Sun Jinyu, Ma Shiwei*   

  1. School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
  • Received:2019-05-21 Revised:2019-07-23 Online:2019-11-10 Published:2019-12-13

摘要: 针对帧级别预测网络结构中的动作时序检测提案不连续问题,提出基于时空特征金字塔网络的动作时序检测算法。在帧级别动作预测中,采用多个3D卷积反卷积网络,将空间特征维度降至1维,并将时间特征维度还原到相应的提案长度,得到不同时间尺度下的多个预测。采用非极大值抑制的方式融合多个子网络的预测,并用分类器进行帧级别动作分类,进而得到时序提案。在共享数据集THUMOS14上的实验结果表明,该算法有效地提高了动作的时序区域定位精度。

关键词: 动作时序检测, 特征融合, 时空特征金字塔, 3D卷积反卷积, 非极大值抑制

Abstract: In view of the discontinuity of motion timing detection in the frame-level prediction network structure, a novel algorithm based on spatio-temporal feature pyramid network (ST-FPN) is proposed. In the frame-level action prediction, several 3D convolution-de-convolution (CDC) networks are used to sample spatial feature down to 1 dimension and sample temporal feature up to corresponding proposal level. Then the prediction scores of different CDC networks are fused by non-maximum suppression (NMS). The softmax classifier is used to classify frame-level actions, and then temporal action detection is obtained. The experimental results on dataset THUMOS14 show that the proposed algorithm improves the accuracy of temporal action detection.

Key words: temporal action detection, feature fusion, spatio-temporal feature pyramid network, 3D convolution-de-convolution, non-maximum suppression

中图分类号: