Person Retrieval Method Based on Image Caption

doi:10.16182/j.issn1004731x.joss.201807045

Abstract

Abstract: Retrieve specific person in the surveillance scene is an important and urgent demand on the security field. In recent years, the method of image retrieval is mainly based on image content, which requires the image to be retrieved works as input and thus can’t meet the actual needs of surveillance and security. We propose a method of person retrieval based on image caption and present a new surveillance dataset SPCD which tags person caption. We verify the method in new dataset, the accuracy of gender, dress color and action respectively reaches 86.5%, 93.5% and 65.5%. This paper provides an effective way for application of retrieval person in the surveillance scene.

Key words: surveillance scene, image retrieval, image caption, multi-attribute labels

CLC Number:

TP391.4

Li Yadong, Mo Hong, Wang Shihao, Zhou Zhong, Wu Wei. Person Retrieval Method Based on Image Caption[J]. Journal of System Simulation, 2018, 30(7): 2794-2800.

References

[1] Jain A K, Vailaya A.Image retrieval using color and shape[J]. Pattern Recognition(S0031-3203), 1996, 29(8): 1233-1244.
[2] Swets D L, Weng J.Using Discriminant Eigenfeatures for Image Retrieval[M]. IEEE Computer Society, 1996.
[3] Datta R, Joshi D, Li J, et al.Image retrieval[J]. Acm Computing Surveys(S0360-0300), 2008, 40(2): 1-60.
[4] Kodituwakku S R, Selvarajah S.Comparison of Color Features for Image Retrieval[J]. Indian Journal of Computer Science & Engineering (S0976-5166), 2010, 1(3): 207211.
[5] Selvarajah S, Kodituwakku S R.Analysis and Comparison of Texture Features for Content Based Image Retrieval[J]. International Journal of Computers & Technology (S2351-8014), 2011, 108(1): 2045-5364.
[6] Rahman M M, Bhattacharya P, Desai B C.A Framework for Medical Image Retrieval Using Machine Learning and Statistical Similarity Matching Techniques With Relevance Feedback[J]. IEEE Transactions on Information Technology in Biomedicine(S1089-7771), 2007, 11(1): 58.
[7] Wang H H.Semantic Gap in CBIR: Automatic Objects Spatial Relationships Semantic Extraction and Representation[J]. International Journal of Image Processing (S1985-2304), 2010, 4(3): 192-204.
[8] Fang H, Platt J C, Zitnick C L, et al.From captions to visual concepts and back[C]. Computer Vision and Pattern Recognition. IEEE, 2015: 1473-1482.
[9] Ordonez V, Kulkarni G, Berg T L, et al.Im2Text: Describing Images Using 1 Million Captioned Photographs[J]. Advances in Neural Information Processing Systems, 2011, 25(6): : 1143-1151.
[10] inyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]. Computer Vision and Pattern Recognition. IEEE, 2015: 3156-3164.
[11] Xu K, Ba J, Kiros R, et al.Show, Attend and Tell: Neural Image Caption Generation with Visual Attention[C]. International Conference on Machine Learning Computer Science, 2015: 2048-2057.
[12] Wu Q, Shen C, Liu L, et al.What Value Do Explicit High Level Concepts Have in Vision to Language Problems?[C]. Computer Vision and Pattern Recognition. IEEE, 2016: 203-212.
[13] Chen X, Zitnick C L.Mind's eye: A recurrent visual representation for image caption generation[C]. Computer Vision and Pattern Recognition. IEEE, 2015: 2422-2431.
[14] Ren S, He K, Girshick R, et al.Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[C]. International Conference on Neural Information Processing Systems, 2015: 91-99.
[15] He K, Zhang X, Ren S, et al.Deep Residual Learning for Image Recognition[C]. Computer Vision and Pattern Recognition. IEEE, 2016: 770-778.
[16] Szegedy C, Vanhoucke V, Ioffe S, et al.Rethinking the Inception Architecture for Computer Vision[C]. Computer Vision and Pattern Recognition. IEEE, 2016: 2818-2826.
[17] Lin T Y, Maire M, Belongie S, et al.Microsoft COCO: Common Objects in Context[C]. European Conference on Computer Vision, 2014: 740-755.
[18] Liu W, Anguelov D, Erhan D, et al.SSD: Single Shot MultiBox Detector[C]. European Conference on Computer Vision, 2016.
[19] Dai J, Li Y, He K, et al.R-FCN: Object Detection via Region-based Fully Convolutional Networks[C]. Neural Information Processing Systems, 2016.
[20] Szegedy C, Ioffe S, Vanhoucke V, et al.Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning[C]. Association for the Advancement of Artificial Intelligence, 2017.