[1] Herath S, Harandi M, Porikli F.Going Deeper into Action Recognition: A survey[J]. Image and Vision Computing (S0262-8856), 2017, 60: 4-21. [2] Wang H, Schmid C.Action Recognition with Improved Trajectories[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2013: 3551-3558. [3] Karpathy A, Toderici G, Shetty S, et al.Large-scale Video Classification with Convolutional Neural Networks[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1725-1732. [4] Simonyan K, Zisserman A.Two-stream Convolutional Networks for Action Recognition in Videos[C]. Advances in Neural Information Processing Systems. 2014: 568-576. [5] Ji S, Xu W, Yang M, et al.3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (S0162-8828), 2012, 35(1): 221-231. [6] Tran D, Bourdev L, Fergus R, et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]. Proceedings of the IEEE International Conference on Computer Vision. 2015: 4489-4497. [7] Srivastava N, Mansimov E, Salakhudinov R.Unsupervised Learning of Video Representations using LSTMs[C]. International Conference on Machine Learning. 2015: 843-852. [8] Carreira J, Zisserman A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6299-6308. [9] Shou Z, Wang D, Chang S F.Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1049-1058. [10] Shou Z, Chan J, Zareian A, et al.CDC: Convolutional-de-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5734-5743. [11] Wang L, Qiao Y, Tang X.Action Recognition and Detection by Combining Motion and Appearance Features[J]. THUMOS14 Action Recognition Challenge, 2014, 1(2): 2. [12] Oneata D, Verbeek J, Schmid C.The Lear Submission at Thumos 2014[J]. 2013. [13] Richard A, Gall J.Temporal Action Detection Using a Statistical Language Model[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3131-3140. [14] Yeung S, Russakovsky O, Mori G, et al.End-to-end Learning of Action Detection from Frame Glimpses in Videos[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2678-2687. [15] Zeiler M D, Fergus R.Visualizing and Understanding Convolutional Networks[C]. European Conference on Computer Vision. Springer, Cham. 2014: 818-833. [16] Zeiler M D, Krishnan D, Taylor G W, et al.Deconvolutional Networks[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2010, 10: 7. [17] Lin T Y, Dollár P, Girshick R, et al.Feature Pyramid Networks for Object Detection[C]. IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117-2125. |