[1] Dalal N, Triggs B.Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition (S1077-2626), CVPR 2005, IEEE Computer Society Conference on. USA: IEEE, 2005: 886-893. [2] Simonyan K, Zisserman A.Two-Stream Convolutional Networks for Action Recognition in Videos[J]. Advances in Neural Information Processing Systems (S1049-5258), 2014, 1(4): 568-576. [3] Donahue J, Hendricks L A, Rohrbach M, et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(S0162-8828), 2017, 39(4): 677-691. [4] Zhuang Y, Rui Y, Huang T S, et al.Adaptive key frame extraction using unsupervised clustering[C]// International Conference on Image Processing, ICIP 98. Proceedings. IEEE, 2002, 1: 866-870. [5] Ma C Y, Chen M H, Kira Z, et al.TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition[J]. arXiv preprint arXiv:1703.10667. [6] Wang L, Xiong Y, Wang Z, et al.Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[J]. Acm Transactions on Information Systems (S1046-8188), 2016, 22(1): 20-36. [7] Karpathy A, Toderici G, Shetty S, et al.Large-Scale Video Classification with Convolutional Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition(S1063-6919), IEEE Computer Society, 2014: 1725-1732. [8] Tran D, Bourdev L, Fergus R, et al.Learning spatiotemporal features with 3d convolutional networks[C]//Computer Vision (ICCV), 2015 IEEE International Conference on(S1550-5499). IEEE, 2015: 4489-4497. [9] Farnebäck G.Two-frame motion estimation based on polynomial expansion[C]//Scandinavian conference on Image analysis (S0302-9743). Springer, Berlin, Heidelberg, 2003: 363-370. [10] Hu Y, Zheng W.Human Action Recognition Based on Key Frames[M]// Advances in Computer Science and Education Applications(S1865-0929). Germany: Springer Berlin Heidelberg, 2011: 535-542. [11] Zhu W, Hu J, Sun G, et al.A Key Volume Mining Deep Framework for Action Recognition[C]// Computer Vision and Pattern Recognition. USA: IEEE, 2016: 1991-1999. [12] Poppe R.A survey on vision-based human action recognition.[J]. Image & Vision Computing(S0262-8856), 2010, 28(6): 976-990. [13] Zitnick C L, Dollár P.Edge boxes: Locating object proposals from edges[C]//European Conference on Computer Vision. Springer, Cham, 2014: 391-405. [14] Dollár P, Zitnick C L.Structured Forests for Fast Edge Detection[C]. IEEE International Conference on Computer Vision (ICCV)(S1550-5499) 2013: 1841-1848. [15] Ioffe S, Szegedy C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning (S1938-7228). 2015: 448-456. [16] Everingham M, Eslami S M A, Gool L V, et al. The Pascal, Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision (S0920-5691), 2015, 111(1): 98-136. [17] Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv preprint arXiv:1212.0402, 2012. [18] Jia Y, Shelhamer E, Donahue J, et al.Caffe: Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014: 675-678. [19] Abadi M, Agarwal A, Barham P, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[J]. arXiv preprint arXiv:1603.04467, 2016. [20] Wang L, Xiong Y, Wang Z, et al.Towards good practices for very deep two-stream convnets[J]. arXiv preprint arXiv:1507.02159. |