| [1] | 
																						 
											石祥滨, 房雪键, 张德园, 等. 基于深度学习混合模型迁移学习的图像分类[J]. 系统仿真学报, 2016, 28(1): 167-173,182.Shi Xiangbin, Fang Xuejian, Zhang Deyuan, et al.Image classification based on mixed deep learning model transfer learning[J]. Journal of System Simulation, 2016, 28(1): 167-173,182.
																						 | 
										
																													
																							| [2] | 
																						 
											许锋, 卢建刚, 孙优贤. 神经网络在图像处理中的应用[J]. 信息与控制, 2003, 32(4): 344-351.Xu Feng, Lu Jiangang, Sun Youxian.Application of neural network in image processing[J]. Information and Control, 2003, 32(4): 344-351.
																						 | 
										
																													
																							| [3] | 
																						 
											Farhadi A, Hejrati M, Sadeghi M A, et al.Every picture tells a story: Generating sentences from images[C]// European conference on computer vision. Heidelberg: Springer, Berlin, 2010: 15-29.
																						 | 
										
																													
																							| [4] | 
																						 
											Li S, Kulkarni G, Berg T L, et al.Composing simple image descriptions using web-scale n-grams[C]// Proceedings of the Fifteenth Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2011: 220-228.
																						 | 
										
																													
																							| [5] | 
																						 
											Kulkarni G, Premraj V, Ordonez V, et al.Babytalk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (S0162-8828), 2013, 35(12): 2891-2903.
																						 | 
										
																													
																							| [6] | 
																						 
											张红斌, 姬东鸿, 尹兰, 等. 基于梯度核特征及 N-gram 模型的商品图像句子标注[J]. 计算机科学, 2016, 43(5): 269-273, 287.Zhang Hongbin, Ji Donghong, Yin Lan, et al.Product image sentence annotation based on gradient kernel feature and N-gram model[J]. Computer Science, 2016, 43(5): 269-273, 287.
																						 | 
										
																													
																							| [7] | 
																						 
											Xu K, Ba J, Kiros R, et al.Show, attend and tell: Neural image caption generation with visual attention[C]// International Conference on Machine Learning. Lille Grand Palais: International Machine learning Society, 2015: 2048-2057.
																						 | 
										
																													
																							| [8] | 
																						 
											Karpathy A, Fei-Fei L.Deep visual-semantic alignments for generating image descriptions[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. Los Alamitos: IEEE Computer Society, 2015: 3128-3137.
																						 | 
										
																													
																							| [9] | 
																						 
											Jia X, Gavves E, Fernando B, et al.Guiding the long-short term memory model for image caption generation[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 2407-2415.
																						 | 
										
																													
																							| [10] | 
																						 
											Vinyals O, Toshev A, Bengio S, et al.Show and tell: A neural image caption generator[C]// Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. Piscataway: IEEE, 2015: 3156-3164.
																						 | 
										
																													
																							| [11] | 
																						 
											Vinyals O, Toshev A, Bengio S, et al.Show and tell: Lessons learned from the 2015 mscoco image captioning challenge[J]. IEEE Transactions on Pattern Analysis And Machine Intelligence (S0162-8828), 2017, 39(4): 652-663.
																						 | 
										
																													
																							| [12] | 
																						 
											Szegedy C, Ioffe S, Vanhoucke V, et al.Inception-v4, inception-resnet and the impact of residual connections on learning[C]// Menlo Park: AAAI, 2017, 4: 12.
																						 | 
										
																													
																							| [13] | 
																						 
											Mikolov T, Chen K, Corrado G, et al.Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv (S2331-8422), 2013: 1301.3781.
																						 | 
										
																													
																							| [14] | 
																						 
											Hochreiter S, Schmidhuber J.Long short-term memory[J]. Neural Computation (S0899-7667), 1997, 9(8): 1735-1780.
																						 | 
										
																													
																							| [15] | 
																						 
											Lin T Y, Maire M, Belongie S, et al.Microsoft coco: Common objects in context[C]// European conference on computer vision. Cham: Springer, 2014: 740-755.
																						 | 
										
																													
																							| [16] | 
																						 
											Zaremba W, Sutskever I, Vinyals O.Recurrent neural network regularization[J]. arXiv preprint arXiv (S2331-8422), 2014: 1409.2329.
																						 | 
										
																													
																							| [17] | 
																						 
											Chen X L, Fang H, Lin T Y, et al. Microsoft COCO caption evaluation[EB/OL].2015. https://github.com/ tylin/ coco-caption.
																						 | 
										
																													
																							| [18] | 
																						 
											Mao J, Xu W, Yang Y, et al.Deep captioning with multimodal recurrent neural networks (m-rnn)[J]. arXiv preprint arXiv (S2331-8422), 2014: 1412.6632.
																						 | 
										
																													
																							| [19] | 
																						 
											Fang H, Gupta S, Iandola F, et al.From captions to visual concepts and back[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. Los Alamitos: IEEE Computer Society, 2015: 1473-1482.
																						 | 
										
																													
																							| [20] | 
																						 
											Devlin J, Cheng H, Fang H, et al.Language models for image captioning: The quirks and what works[J]. arXiv preprint arXiv (S2331-8422), 2015: 1505.01809.
																						 |