基于ANNet网络的RGB-D图像的目标检测

系统仿真学报 ›› 2016, Vol. 28 ›› Issue (9): 2260-2266.

基于ANNet网络的RGB-D图像的目标检测

蔡强, 魏立伟, 李海生, 曹健

北京工商大学计算机与信息工程学院食品安全大数据技术北京市重点实验室,北京 100048

收稿日期:2016-05-10 修回日期:2016-07-11 出版日期:2016-09-08 发布日期:2020-08-14
作者简介:蔡强(1969-),男,重庆,博士,教授,硕导,研究方向为计算机图形学,计算几何,科学可视化,智能信息处理;魏立伟(1987-),女,河北,硕士生,研究方向为图像识别、机器学习。
基金资助:
北京市自然科学基金(4162019)

Object Detection in RGB-D Image Based on ANNet

Cai Qiang, Wei Liwei, Li Haisheng, Cao Jian

Beijing Key Laboratory of Big Data Technology for Food Safety, School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China

Received:2016-05-10 Revised:2016-07-11 Online:2016-09-08 Published:2020-08-14

摘要/Abstract

摘要： 由于深度图像采集设备的广泛使用,使得利用RGB-D图像进行目标检测成为计算机视觉领域研究热点。为了使得利用卷积神经网络所提取的特征更具有鲁棒性,设计了一种改进的卷积神经网络(本文称为ANNet),以提高检测准确率。为了提高卷积层中局部感受区域的模型分辨能力,针对AlexNet网络中卷积层中卷积核与下层数据块的线性特性,将部分卷积层改进为带有多层感知机的非线性卷积层。在NYUD2数据集上实验,结果表明,使用改进后的网络结构,在彩色图像上的检测结果提升了3%,在RGB-D图像上的检测结果提升了4%。

关键词: 目标检测, 卷积神经网络, AlexNet网络, RGB-D图像

Abstract: The wide spread of depth images acquisition devices makes object detection in RGB-D images a hotspot in the field of computer vision. In order to make the features extracted by CNN more robust and to improve the detection accuracy, an improved CNN called ANNet was designed. To enhance the model discriminability of local patches within the receptive field, some linear convolutional layers in the AlexNet with nonlinear convolutional layers were replaced which contained multilayer perceptron against the linear feature between convolution filter and underlying data patch. The experiment result shows that the detection accuracy is improved by 3% in the RGB images and 4% in the RGB-D images on the NYUD2 datasets using the improved network.

Key words: object detection, convolutional neural network, AlexNet, RGB-D images

中图分类号:

TP391

蔡强, 魏立伟, 李海生, 曹健. 基于ANNet网络的RGB-D图像的目标检测[J]. 系统仿真学报, 2016, 28(9): 2260-2266.

Cai Qiang, Wei Liwei, Li Haisheng, Cao Jian. Object Detection in RGB-D Image Based on ANNet[J]. Journal of System Simulation, 2016, 28(9): 2260-2266.

参考文献 18

[1]	黄凯奇, 任伟强, 谭铁牛. 图像物体分类与检测算法综述[J]. 计算机学报, 2014, 37(6): 1225-1240.
[2]	Belongie S, Malik J, Puzicha J.Shape matching and object recognition using shape contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (S0162-8828), 2002, 24(4): 509-522.
[3]	Leung T, Malik J.Representing and recognizing the visual appearance of materials using three-dimensional textons[J]. International Journal of Computer Vision (S0920-5691), 2001, 43(1): 29-44.
[4]	Bo L, Ren X, Fox D.Depth kernel descriptors for object recognition[C]// Proceedings of Intelligent Robots and Systems (IROS), 2011 IEEE. France: IEEE, 2011: 821-826.
[5]	Cheng Y, Cai R, Zhao X, et al.Convolutional fisher kernels for rgb-d object recognition[C]// Proceedings of 2015 International Conference on 3D Vision (3DV). France: IEEE, 2015: 135-143.
[6]	Song S, Xiao J. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images [J]. arXiv preprint arXiv:1511.02300, 2015.
[7]	Gupta S, Girshick R, Arbeláez P, et al.Learning rich features from RGB-D images for object detection and segmentation[M]// Computer Vision-ECCV 2014. Zurich, Switzerland: Springer, 2014: 345-360.
[8]	Girshick R, Donahue J, Darrell T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 580-587.
[9]	Girshick R.Fast r-cnn[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448, Santiago, Chile. USA: IEEE, 2015: 91-99.
[10]	Ren S, He K, Girshick R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems. Quebec, Canada: NIPS, 2015: 91-99.
[11]	LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE (S0018-9219), 1998, 86(11): 2278-2324.
[12]	Krizhevsky A, Sutskever I, Hinton G E.Imagenet classification with deep convolutional neural networks[C]// Advances in Neural Information Processing Systems. Navada, USA: NIPS, 2012: 1097-1105.
[13]	Zeiler M D, Fergus R.Visualizing and understanding convolutional networks [M]// Computer vision-ECCV 2014. Germany: Springer International Publishing, 2014: 818-833.
[14]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint arXiv:1409.1556, 2014.
[15]	Szegedy C, Liu W, Jia Y, et al.Going deeper with convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 1-9
[16]	Ouyang W, Luo P, Zeng X, et al. Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection [J]. arXiv preprint arXiv:1409.3505, 2014
[17]	Lin M, Chen Q, Yan S. Network in network [J]. arXiv preprint arXiv:1312.4400, 2013.
[18]	Silberman N, Hoiem D, Kohli P, et al.Indoor Segmentation and Support Inference from RGB-D Images[M]// Computer Vision-ECCV 2012. Berlin, Heidelberg, Germany: Springer, 2012: 746-76.