基于片段关键帧的视频行为识别方法

doi:10.16182/j.issn1004731x.joss.201807044

系统仿真学报 ›› 2018, Vol. 30 ›› Issue (7): 2787-2793.doi: 10.16182/j.issn1004731x.joss.201807044

基于片段关键帧的视频行为识别方法

李鸣晓, 庚琦川, 莫红, 吴威, 周忠

北京航空航天大学虚拟现实技术与系统国家重点实验室,北京 100191

收稿日期:2017-07-30 出版日期:2018-07-10 发布日期:2019-01-08
作者简介:李鸣晓(1993-), 男,山东,硕士, 研究方向为深度学习、行为识别;庚琦川(1989-), 男, 黑龙江,博士,研究方向为图像语义理解。
基金资助:
国家自然科学基金(61572061,61472020),国家“863”高技术研究发展计划(2015AA016403)

Video Action Recognition Based on Key-frame

Li Mingxiao, Geng Qichuan, Mo Hong, Wu Wei, Zhou Zhong

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China

Received:2017-07-30 Online:2018-07-10 Published:2019-01-08

摘要/Abstract

摘要： 视频行为识别是智能视频分析的重要组成部分。深度学习方法在该领域有了显著的进步,目前得到最佳效果的方法都使用了双流卷积神经网络。在长视频识别中,现有的行为识别方法大多以均匀分段固定采样得到的视频帧作为输入,这可能损失采样间隔中的重要信息。通过定义视频的信息量,提出了一种用于视频行为识别的片段划分和关键帧提取方法,使用多时间尺度双流网络提取视频特征,设计了视频行为识别系统,在UCF101数据集split1上达到了目前最高的94.2%准确率。

关键词: 深度学习, 行为识别, 视频片段划分, 关键帧提取

Abstract: Video action recognition is an important part of intelligent video analysis. In recent years, deep learning methods, especially the two-stream convolutional neural network achieved the state-of-the-art performance. However, most methods simply use uniform sampling to get frames, which may cause the loss of information in sampling interval. We propose a segmentation method and a key-frame extraction method for video action recognition, and combine them with a multi-temporal-scale two-stream network. Our framework achieves a 94.2% accuracy at UCF101 split1, which is the same as the state-of-the-art method’s performance.

Key words: deep learning, action recognition, video segment, key-frame extraction

中图分类号:

TP391.4

李鸣晓, 庚琦川, 莫红, 吴威, 周忠. 基于片段关键帧的视频行为识别方法[J]. 系统仿真学报, 2018, 30(7): 2787-2793.

Li Mingxiao, Geng Qichuan, Mo Hong, Wu Wei, Zhou Zhong. Video Action Recognition Based on Key-frame[J]. Journal of System Simulation, 2018, 30(7): 2787-2793.

参考文献

[1] Dalal N, Triggs B.Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition (S1077-2626), CVPR 2005, IEEE Computer Society Conference on. USA: IEEE, 2005: 886-893.
[2] Simonyan K, Zisserman A.Two-Stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems (S1049-5258), 2014, 1(4): 568-576.
[3] Donahue J, Hendricks L A, Rohrbach M, et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(S0162-8828), 2017, 39(4): 677-691.
[4] Zhuang Y, Rui Y, Huang T S, et al.Adaptive key frame extraction using unsupervised clustering[C]// International Conference on Image Processing, ICIP 98. Proceedings. IEEE, 2002, 1: 866-870.
[5] Ma C Y, Chen M H, Kira Z, et al.TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition[J]. arXiv preprint arXiv:1703.10667.
[6] Wang L, Xiong Y, Wang Z, et al.Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[J]. Acm Transactions on Information Systems (S1046-8188), 2016, 22(1): 20-36.
[7] Karpathy A, Toderici G, Shetty S, et al.Large-Scale Video Classification with Convolutional Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition(S1063-6919), IEEE Computer Society, 2014: 1725-1732.
[8] Tran D, Bourdev L, Fergus R, et al.Learning spatiotemporal features with 3d convolutional networks[C]//Computer Vision (ICCV), 2015 IEEE International Conference on(S1550-5499). IEEE, 2015: 4489-4497.
[9] Farnebäck G.Two-frame motion estimation based on polynomial expansion[C]//Scandinavian conference on Image analysis (S0302-9743). Springer, Berlin, Heidelberg, 2003: 363-370.
[10] Hu Y, Zheng W.Human Action Recognition Based on Key Frames[M]// Advances in Computer Science and Education Applications(S1865-0929). Germany: Springer Berlin Heidelberg, 2011: 535-542.
[11] Zhu W, Hu J, Sun G, et al.A Key Volume Mining Deep Framework for Action Recognition[C]// Computer Vision and Pattern Recognition. USA: IEEE, 2016: 1991-1999.
[12] Poppe R.A survey on vision-based human action recognition.[J]. Image & Vision Computing(S0262-8856), 2010, 28(6): 976-990.
[13] Zitnick C L, Dollár P.Edge boxes: Locating object proposals from edges[C]//European Conference on Computer Vision. Springer, Cham, 2014: 391-405.
[14] Dollár P, Zitnick C L.Structured Forests for Fast Edge Detection[C]. IEEE International Conference on Computer Vision (ICCV)(S1550-5499) 2013: 1841-1848.
[15] Ioffe S, Szegedy C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning (S1938-7228). 2015: 448-456.
[16] Everingham M, Eslami S M A, Gool L V, et al. The Pascal, Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision (S0920-5691), 2015, 111(1): 98-136.
[17] Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv preprint arXiv:1212.0402, 2012.
[18] Jia Y, Shelhamer E, Donahue J, et al.Caffe: Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014: 675-678.
[19] Abadi M, Agarwal A, Barham P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[J]. arXiv preprint arXiv:1603.04467, 2016.
[20] Wang L, Xiong Y, Wang Z, et al.Towards good practices for very deep two-stream convnets[J]. arXiv preprint arXiv:1507.02159.

基于片段关键帧的视频行为识别方法

Video Action Recognition Based on Key-frame

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	康旭, 张晓峰. 基于生成对抗神经网络的雷达遥感数据增广方法[J]. 系统仿真学报, 2022, 34(4): 920-927.
[2]	林硕, 安磊, 高治军, 单丹, 尚文利. 结合栈式自编码及长短时记忆的入侵检测研究[J]. 系统仿真学报, 2021, 33(6): 1288-1296.
[3]	程文聪, 史小康, 王志刚. 基于生成对抗网络的仿真卫星云图生成方法[J]. 系统仿真学报, 2021, 33(6): 1297-1306.
[4]	董书琴, 张斌. 面向不平衡数据的网络流量异常检测方法[J]. 系统仿真学报, 2021, 33(3): 679-689.
[5]	王霄汉, 张霖, 任磊, 谢堃钰, 王昆玉, 叶飞, 陈真. 基于强化学习的车间调度问题研究简述[J]. 系统仿真学报, 2021, 33(12): 2782-2791.
[6]	王步维, 王敏, 范谦, 王雅男, 章涵文, 乐云亮. 基于深度学习的晶体性质预测研究[J]. 系统仿真学报, 2021, 33(12): 2854-2863.
[7]	冯晓, 张辉, 周蕊, 乔璐, 魏东, 李丹丹, 张玉尧, 郑国清. 基于深度学习和籽粒双面特征的玉米品种识别[J]. 系统仿真学报, 2021, 33(12): 2983-2991.
[8]	杜金莲, 李淑飞, 金雪云. 三维烟雾流场超分辨率数据生成网络模型的研究[J]. 系统仿真学报, 2021, 33(10): 2381-2389.
[9]	刘瑞军, 王向上, 张晨, 章博华. 基于深度学习的视觉SLAM综述[J]. 系统仿真学报, 2020, 32(7): 1244-1256.
[10]	阴敬方, 朱登明, 石敏, 王兆其. 基于引导对抗网络的人体深度图像修补方法[J]. 系统仿真学报, 2020, 32(7): 1312-1321.
[11]	戢晓峰, 戈艺澄. 基于深度学习的节假日高速公路交通流预测方法[J]. 系统仿真学报, 2020, 32(6): 1164-1171.
[12]	孔锐, 谢玮, 雷泰. 基于神经网络的图像描述方法研究[J]. 系统仿真学报, 2020, 32(4): 601-611.
[13]	秦胜伟, 李重, 李金锋, 陈梓浩, 丁靖骞, 刘万顺. 校园漫游互动AR系统设计与实现[J]. 系统仿真学报, 2019, 31(7): 1367-1376.
[14]	叶继华, 时淑霞, 李汉曦, 王仕民, 杨思渝. 基于深度学习的驾驶关注区域检测研究与实现[J]. 系统仿真学报, 2019, 31(7): 1421-1428.
[15]	唐超, 张苗辉, 李伟, 曹峰, 王晓峰, 童晓红. 融合局部与全局特征的人体动作识别[J]. 系统仿真学报, 2018, 30(7): 2497-2506.