系统仿真学报 ›› 2020, Vol. 32 ›› Issue (4): 601-611.doi: 10.16182/j.issn1004731x.joss.18-0310

• 仿真建模理论与方法 • 上一篇    下一篇

基于神经网络的图像描述方法研究

孔锐1, 谢玮1, 雷泰2   

  1. 1. 暨南大学智能科学与工程学院,广东 珠海 519070;
    2. 暨南大学信息科学技术学院,广东 广州 510632
  • 收稿日期:2018-05-24 修回日期:2018-09-26 出版日期:2020-04-18 发布日期:2020-04-16
  • 作者简介:孔锐(1964-),男,安徽合肥,博士,教授,研究方向为人脸识别和机器学习;谢玮(1994-),男,湖南娄底,硕士生,研究方向为计算机视觉和深度学习。
  • 基金资助:
    广东省科技计划(产学研合作)(2016B0909 18098)

Research on Image Description Method Based on Neural Network

Kong Rui1, Xie Wei1, Lei Tai2   

  1. 1. School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China;
    2. College of Information Science and Technology, Jinan University, Guangzhou 510632, China
  • Received:2018-05-24 Revised:2018-09-26 Online:2020-04-18 Published:2020-04-16

摘要: 自动识别和描述图像的内容是人工智能中一个重要的研究方向,它涉及计算机视觉和自然语言处理技术。针对这一难题,提出了一种由深层神经网络模型生成自然语言句子来描述图像内容的方法。该方法提出的模型由卷积神经网络(Convolution Neural Network,CNN)和循环神经网络(Recurrent Neural Network,RNN)组成,其中,CNN用来提取输入图像的特征生成固定长度的特征向量,该特征向量初始化RNN来生成句子。在MSCOCO图像描述数据集上的实验结果表明了该模型所生成句子的语法准确性和语义准确性,且优于先前的基线模型。

关键词: 图像描述, 神经网络, 语言模型, 深度学习

Abstract: The automatic recognition and automatically describing image content is an important research direction to the artificial intelligence to connect the computer vision and the natural language processing. A method of describing the image content is proposed to generate the natural language by using the deep neural network model. The model consists of a convolutional neural network (CNN) and a recurrent neural network (RNN). The CNN is used to extract features of the input image to generate a fixed-length feature vector, which initializes the RNN to generate the sentences. Experimental results on the MSCOCO image description dataset show the syntactic accuracy and the semantic accuracy of the sentences generated by the model is superior to the previous baseline model.

Key words: image description, neural networks, language model, deep learning

中图分类号: