多尺度特征融合车辆检测方法

doi:10.16182/j.issn1004731x.joss.21-0907

系统仿真学报 ›› 2022, Vol. 34 ›› Issue (6): 1219-1229.doi: 10.16182/j.issn1004731x.joss.21-0907

多尺度特征融合车辆检测方法

王银(), 王飞翔(), 孙前来

太原科技大学电子信息工程学院，山西太原 030024

收稿日期:2021-09-03 修回日期:2021-10-16 出版日期:2022-06-30 发布日期:2022-06-16
通讯作者: 王飞翔 E-mail:xpw417@163.com;2601741160@qq.com
作者简介:王银(1982-)，男，博士，副教授，研究方向为计算机视觉、智能控制。E-mail：xpw417@163.com
基金资助:
国家自然科学基金(61905172);山西省科技成果转化引导专项(201904D131023);山西省重点研发计划(201903D121130);山西省面上青年基金(201901D211304);山西省研究生教育创新(2020SY422)

Vehicle Detection Method Based on Multi Scale Feature Fusion

Yin Wang(), Feixiang Wang(), Qianlai Sun

College of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China

Received:2021-09-03 Revised:2021-10-16 Online:2022-06-30 Published:2022-06-16
Contact: Feixiang Wang E-mail:xpw417@163.com;2601741160@qq.com

摘要/Abstract

摘要：

针对传统的车辆目标检测算法检测精度低，小尺度目标识别效果差等问题，提出了一种基于YOLOv4(you only look once v4)算法的目标检测方法，以提升对交通场景小目标车辆的检测性能。通过对YOLOv4网络进行再设计，使用MobileNetv2深度可分离卷积模块代替传统卷积，将CBAM(convolutional block attention module)注意力模块融合到特征提取网络中，在保证模型检测精度的同时减少模型参数。采用PANet-D特征融合网络融合获取到4个尺度特征图深浅层的语义信息，增强对小目标物体的检测能力。通过使用Focal loss优化分类损失函数，加快网络模型的收敛速度。实验结果表明，改进后的网络识别准确率达到96.55%，网络模型大小较原YOLOv4网络降低了92.49 M，同时检测速度比原网络提升了17%，充分证明了本算法的可行性。

关键词: 车辆检测, YOLOv4, 多尺度融合, 深度可分离卷积, 注意力机制

Abstract:

Vehicle detection is the important research content and hotspot in the intelligent transportation. Aiming at the low detection accuracy and poor small-scale recognition effect of the traditional vehicle detection algorithm, an improved detection method based on YOLOv4(you only look once v4) is proposed to improve the detection performance of small target vehicles in traffic scenes. By redesigning the YOLOv4 network, the MobileNetv2 deep separable convolution module is used to replace the traditional convolution, and the convolutional block attention module (CBAM) attention module is integrated into the feature extraction network to ensure the detection accuracy of the model and reduce the model parameters. The deep and shallow semantic information of the four scale feature maps is fused by using PANet-D feature fusion to enhance the detection ability of small objects. By using Focal loss to optimize the classification loss function, the convergence speed of the network model is accelerated. The experimental results show that the recognition accuracy of the improved network reaches 96.55%, and the size of the network model is 92.49 M lower, and the detection speed is 17% higher than those of the original YOLOv4 network, which fully proves the feasibility of the algorithm.

Key words: vehicle detection, YOLOv4, multiscale fusion, depth separable convolution, attention mechanism

中图分类号:

TP311

王银, 王飞翔, 孙前来. 多尺度特征融合车辆检测方法[J]. 系统仿真学报, 2022, 34(6): 1219-1229.

Yin Wang, Feixiang Wang, Qianlai Sun. Vehicle Detection Method Based on Multi Scale Feature Fusion[J]. Journal of System Simulation, 2022, 34(6): 1219-1229.

图/表 12

图1

图2

图3

表1

图4

图5

图6

表2

表3

图7

表4

图8

参考文献 26

1	刘淑萍, 刘羽, 於俊, 等. 结合手指检测和HOG特征的分层静态手势识别[J]. 中国图象图形学报, 2015, 20(6): 781-788.
	Liu Shuping, Liu Yu, Yu Jun, et al. Hierarchical Static Hand Gesture Recognition by Combining Finger Detection and HOG FEATUREs[J]. Journal of Image and Graphics, 2015, 20(6): 781-788.
2	文学志, 方巍, 郑钰辉. 一种基于类Haar特征和改进AdaBoost分类器的车辆识别算法[J]. 电子学报, 2011, 39(5): 1121-1126.
	Wen Xuezhi, Fang Wei, Zheng Yuhui. An Algorithm Based on Haar-Like Features and Improved AdaBoost Classifier for Vehicle Recognition[J]. Chinese Journal of Electronics, 2011, 39(5): 1121-1126.
3	黄超, 胡志军, 徐勇, 等. 基于视觉的车辆异常行为检测综述[J]. 模式识别与人工智能, 2020, 33(3): 234-248.
	Huang Chao, Hu Zhijun, Xu Yong, et al. Vision-Based Abnormal Vehicle Behavior Detection:A Survey[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(3): 234-248.
4	Girshick R, Donahue J, Darrell T, et al. Rich Featurehierarchies for Accurate Object Detection and Semantic Seg-Mentation[C]// Computer Vision and Pattern Recognition(CVPR). Columbus: IEEE Press, 2014: 580-587.
5	Kong T, Yao A, Chen Y. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection[C]// Computer Vision and Pattern Recognition(CVPR).Las Vegas: IEEE Press, 2016: 845-853.
6	Girshick R. Fast R-CNN[C]// International Conference on Computer Vision(ICCV). Santiago, Chile: IEEE Press,2015: 1440-1448.
7	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence(S0162-8828), 2017, 39(6): 1137-1149.
8	Redmon J, Divvala S, Girshick R, et al. You Only Look Once:Unified, Real-Time Object Detection[C]//Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE Press, 2016: 779-788.
9	Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]// Computer Vision and Pattern Recognition(CVPR). Honolulu, USA: IEEE Press, 2017: 7263-7271.
10	Redmon J, Farhadi A. Yolov3: An Incremental Improvement [J/OL]. [2021-08-02]. .
11	Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox DETECtor[C]// European Conference on Computer Vision(ECCV). Amsterdam, Netherlands: IEEE Press, 2016: 21-37.
12	Iandola F N, Han S, Moskewice M W, et al. Squeeze Net: AlexNet-Level Accuracy with 50x Fewer Parameters and<0.5MB Model Size[J/OL]. [2021-08-02]. .
13	袁铭择, 夏时洪. 人脸特征点跟踪系统与仿真分析[J]. 系统仿真学报, 2018, 30(12): 4618-4624.
	Yuan Mingze, Xia Shihong. Facial Feature Points Tracking System and Simulation Analysis[J]. Journal of System Simulation, 2018, 30(12): 4618-4624.
14	Howard A G, Zhu M L, Chen B, et al. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications[J/OL]. [2021-08-02]. .
15	Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal Speed and Accuracy of Object Detection[J/OL].[2021-08-02]. .
16	Sandler M, Howard A, Zhu M, et al. Mobilenetv2:Inverted Residuals and Linear Bottlenecks[C]// Computer Vision and Pattern Recognition(CVPR).Salt Lake City: IEEE Press, 2018: 4510-4520.
17	Lin T Y, Dollar P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]// Computer Vision and Pattern Recognition(CVPR).San Juan: IEEE Press, 2017:2117-2125.
18	Liu S, Qi L, Qin H F, et al. Path Aggregation Network for Instance Segmentation[C]// Computer Vision and Pattern Recognition(CVPR). Wellington: IEEE Press, 2018: 8759-8768.
19	He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE Press, 2016: 770-778.
20	Woo S, Park J, Lee J Y, et al. Cbam: Convolutional Block Attention Module[C]// European Conference on Computer Vision (ECCV). Munich: IEEE Press, 2018: 3-19.
21	Lin T Y, Goyal P, Girshick R,et al. Focal Loss for Dense Object Detection[C]//Computer Vision and Pattern Recognition(CVPR). Venice: IEEE Press, 2017: 2980-2988.
22	Geiger A, Lenz P, Stiller C, et al. Vision Meets Robotics: The kitti dataset[J]. The International Journal of Robotics Research(S0278-3649), 2013, 32(11): 1231-1237.
23	李大湘, 王小雨, 刘颖. 监控视频中的车型分类方法[J]. 西安邮电大学学报, 2018, 23(4): 40-47.
	Li Daxiang, Wang Xiaoyu, Liu Ying. Vehicle Classification Method in Surveillance Video[J]. Journal of Xi'an University of Posts and Telecommunications,2018, 23(4): 40-47.
24	Gupta B, Shukla P, Mittal A. K-Nearest Correlated Neighbor Classification for Indian Sign Language Gesture Recognition Using Feature Fusion[C]// International Conference on Computer Communication and Informatics(ICCCI). Coimbatore: IEEE Press, 2016: 1-5.
25	Tan M, Le Q. Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks[C]//International Conference on Machine Learning. Stockholm, Sweden: IEEE Press, 2019: 6105-6114.
26	Howard A, Sandler M, Chu G, et al. Searching for Mobilenetv3[C]// International Conference on Computer Vision. Long Beach: IEEE Press, 2019: 1314-1324.

input	operator	t	c	n	s	K	output	特征输出
416×416×3	conv2d	-	32	1	2	3	208×208×32
208×208×32	bottleneck	1	16	1	1	3	208×208×16
208×208×16	bottleneck	6	24	2	2	3	104×104×24	out 1
104×104×24	bottleneck	6	32	3	2	3	52×52×32	out 2
52×52×32	bottleneck	6	64	4	2	3	26×26×64
26×26×64	bottleneck	6	96	3	1	3	26×26×96	out 3
26×26×96	bottleneck	6	160	3	2	3	13×13×160
13×13×160	bottleneck	6	320	1	1	3	13×13×320	out 4

集合	number	汽车	卡车	厢式货车	铁轨车
合计	6 820	28 742	1 094	2 914	511
训练集	5 456	23 041	902	2 344	419
测试集	1 364	5 701	192	570	92

CBAM	Focal loss	4尺度	mAP/%
			92.20
	√	√	95.30
√	√		94.78
√		√	96.31
√	√	√	96.55

算法名称	mAP/%	AP%				FPS	Params(个)	模型大小/M
算法名称	mAP/%	汽车	卡车	铁轨车	厢式货车	FPS	Params(个)	模型大小/M
SSD300	86.14	87.01	90.24	84.66	82.68	49	2.40×10⁷	92.00
YOLOv3	92.05	94.30	92.22	91.38	90.32	30	6.15×10⁷	234.76
YOLOv3-Efficient	89.87	92.52	93.75	85.14	88.07	24	1.06×10⁷	40.28
YOLOv4	96.73	95.12	98.09	97.03	96.67	23	6.40×10⁷	243.96
YOLOv4-Tiny	85.66	85.85	87.60	89.78	79.39	83	5.88×10⁶	22.43
MobileNetv1-YOLOv4	91.32	93.18	94.08	86.92	91.09	40	1.23×10⁷	46.86
MobileNetv2-YOLOv4	92.20	92.96	93.86	91.25	90.72	32	1.03×10⁷	39.64
MobileNetv3-YOLOv4	91.07	92.90	92.65	87.66	91.08	28	1.13×10⁷	43.18
本文算法	96.55	96.79	97.30	96.70	95.42	27	4.00×10⁷	151.47

[1]	金炜东, 张述礼, 唐鹏, 张曼. 基于稠密残差块与通道像素注意力的图像去雾网络[J]. 系统仿真学报, 2022, 34(8): 1663-1673.
[2]	邱俊杰, 郑红, 程云辉. 基于多尺度LSTM预测模型研究[J]. 系统仿真学报, 2022, 34(7): 1593-1604.
[3]	尹诗, 侯国莲, 迟岩, 弓林娟, 胡晓东. 风电机组发电机前轴承健康度预测方法及实现[J]. 系统仿真学报, 2021, 33(6): 1323-1333.
[4]	徐胜, 冯文宇, 刘志诚, 涂鑫涛, 费敏锐, 张堃. 基于机器视觉的复杂环境下精确手势识别算法研究[J]. 系统仿真学报, 2021, 33(10): 2460-2469.
[5]	刘渊, 邱常伶, 王晓锋, 蒋敏. 面向多尺度融合网络仿真的拓扑映射方法研究[J]. 系统仿真学报, 2019, 31(10): 2030-2041.
[6]	王素琴, 施文豪, 李兆歆, 毛天露. 无人机航拍视频中的车辆检测方法[J]. 系统仿真学报, 2018, 30(7): 2776-2786.
[7]	姚春莲, 冯胜男, 张芳芳, 司慧琳. 利用车牌面积进行车距测量的研究[J]. 系统仿真学报, 2017, 29(11): 2820-2827.
[8]	孔亦思, 胡晓峰, 朱丰, 陶九阳. 战场态势感知中的注意力机制探析[J]. 系统仿真学报, 2017, 29(10): 2233-2241.

多尺度特征融合车辆检测方法

Vehicle Detection Method Based on Multi Scale Feature Fusion

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 26

相关文章 8

编辑推荐

Metrics

本文评价