基于图像描述的人物检索方法

doi:10.16182/j.issn1004731x.joss.201807045

系统仿真学报 ›› 2018, Vol. 30 ›› Issue (7): 2794-2800.doi: 10.16182/j.issn1004731x.joss.201807045

基于图像描述的人物检索方法

李亚栋, 莫红, 王世豪, 周忠, 吴威

北京航空航天大学虚拟现实技术与系统国家重点实验室,北京 100191

收稿日期:2017-08-01 出版日期:2018-07-10 发布日期:2019-01-08
第一作者简介:李亚栋(1992-),男,山西吕梁,硕士,研究方向为图像描述和计算机视觉;莫红(1988-),女,湖北襄阳,博士,研究方向为机器视觉。
基金资助:
国家自然科学基金(61572061, 61472020),国家“863”高技术研究发展计划(2015AA016403)

Person Retrieval Method Based on Image Caption

Li Yadong, Mo Hong, Wang Shihao, Zhou Zhong, Wu Wei

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China

Received:2017-08-01 Online:2018-07-10 Published:2019-01-08

摘要/Abstract

摘要： 监控场景中特定人物的检索是安防领域重要且迫切的需求。近年来,图像检索领域的方法主要基于图像内容的方法,但是由于该类方法需要待检索图像作为输入,无法满足监控安防的实际需求。提出一种基于图像描述的人物检索方法,并提供一个标注了人物描述的监控数据集SPCD。在此数据集上验证该方法,性别预测准确率达到86.5%,服饰颜色的匹配准确率达到93.5%,行为分类的准确率达到65.5%,为监控场景中的人物检索提供了一种新的有效方式。

关键词: 监控场景, 图像检索, 图像描述, 多属性标签

Abstract: Retrieve specific person in the surveillance scene is an important and urgent demand on the security field. In recent years, the method of image retrieval is mainly based on image content, which requires the image to be retrieved works as input and thus can’t meet the actual needs of surveillance and security. We propose a method of person retrieval based on image caption and present a new surveillance dataset SPCD which tags person caption. We verify the method in new dataset, the accuracy of gender, dress color and action respectively reaches 86.5%, 93.5% and 65.5%. This paper provides an effective way for application of retrieval person in the surveillance scene.

Key words: surveillance scene, image retrieval, image caption, multi-attribute labels

中图分类号:

TP391.4

李亚栋,莫红,王世豪等 . 基于图像描述的人物检索方法[J]. 系统仿真学报, 2018, 30(7): 2794-2800.

Li Yadong,Mo Hong,Wang Shihao,et al . Person Retrieval Method Based on Image Caption[J]. Journal of System Simulation, 2018, 30(7): 2794-2800.

参考文献

[1] Jain A K, Vailaya A.Image retrieval using color and shape[J]. Pattern Recognition(S0031-3203), 1996, 29(8): 1233-1244.
[2] Swets D L, Weng J.Using Discriminant Eigenfeatures for Image Retrieval[M]. IEEE Computer Society, 1996.
[3] Datta R, Joshi D, Li J, et al.Image retrieval[J]. Acm Computing Surveys(S0360-0300), 2008, 40(2): 1-60.
[4] Kodituwakku S R, Selvarajah S.Comparison of Color Features for Image Retrieval[J]. Indian Journal of Computer Science & Engineering (S0976-5166), 2010, 1(3): 207211.
[5] Selvarajah S, Kodituwakku S R.Analysis and Comparison of Texture Features for Content Based Image Retrieval[J]. International Journal of Computers & Technology (S2351-8014), 2011, 108(1): 2045-5364.
[6] Rahman M M, Bhattacharya P, Desai B C.A Framework for Medical Image Retrieval Using Machine Learning and Statistical Similarity Matching Techniques With Relevance Feedback[J]. IEEE Transactions on Information Technology in Biomedicine(S1089-7771), 2007, 11(1): 58.
[7] Wang H H.Semantic Gap in CBIR: Automatic Objects Spatial Relationships Semantic Extraction and Representation[J]. International Journal of Image Processing (S1985-2304), 2010, 4(3): 192-204.
[8] Fang H, Platt J C, Zitnick C L, et al.From captions to visual concepts and back[C]. Computer Vision and Pattern Recognition. IEEE, 2015: 1473-1482.
[9] Ordonez V, Kulkarni G, Berg T L, et al.Im2Text: Describing Images Using 1 Million Captioned Photographs[J]. Advances in Neural Information Processing Systems, 2011, 25(6): : 1143-1151.
[10] inyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]. Computer Vision and Pattern Recognition. IEEE, 2015: 3156-3164.
[11] Xu K, Ba J, Kiros R, et al.Show, Attend and Tell: Neural Image Caption Generation with Visual Attention[C]. International Conference on Machine Learning Computer Science, 2015: 2048-2057.
[12] Wu Q, Shen C, Liu L, et al.What Value Do Explicit High Level Concepts Have in Vision to Language Problems?[C]. Computer Vision and Pattern Recognition. IEEE, 2016: 203-212.
[13] Chen X, Zitnick C L.Mind's eye: A recurrent visual representation for image caption generation[C]. Computer Vision and Pattern Recognition. IEEE, 2015: 2422-2431.
[14] Ren S, He K, Girshick R, et al.Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[C]. International Conference on Neural Information Processing Systems, 2015: 91-99.
[15] He K, Zhang X, Ren S, et al.Deep Residual Learning for Image Recognition[C]. Computer Vision and Pattern Recognition. IEEE, 2016: 770-778.
[16] Szegedy C, Vanhoucke V, Ioffe S, et al.Rethinking the Inception Architecture for Computer Vision[C]. Computer Vision and Pattern Recognition. IEEE, 2016: 2818-2826.
[17] Lin T Y, Maire M, Belongie S, et al.Microsoft COCO: Common Objects in Context[C]. European Conference on Computer Vision, 2014: 740-755.
[18] Liu W, Anguelov D, Erhan D, et al.SSD: Single Shot MultiBox Detector[C]. European Conference on Computer Vision, 2016.
[19] Dai J, Li Y, He K, et al.R-FCN: Object Detection via Region-based Fully Convolutional Networks[C]. Neural Information Processing Systems, 2016.
[20] Szegedy C, Ioffe S, Vanhoucke V, et al.Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning[C]. Association for the Advancement of Artificial Intelligence, 2017.

基于图像描述的人物检索方法

Person Retrieval Method Based on Image Caption

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价

[1]	孔锐, 谢玮, 雷泰. 基于神经网络的图像描述方法研究[J]. 系统仿真学报, 2020, 32(4): 601-611.
[2]	陈莹, 郭佳宇. 利用位置信息加权词汇树的图像检索[J]. 系统仿真学报, 2017, 29(10): 2353-2360.