Vehicle Detection Method Based on Multi Scale Feature Fusion

doi:10.16182/j.issn1004731x.joss.21-0907

Abstract

Abstract:

Vehicle detection is the important research content and hotspot in the intelligent transportation. Aiming at the low detection accuracy and poor small-scale recognition effect of the traditional vehicle detection algorithm, an improved detection method based on YOLOv4(you only look once v4) is proposed to improve the detection performance of small target vehicles in traffic scenes. By redesigning the YOLOv4 network, the MobileNetv2 deep separable convolution module is used to replace the traditional convolution, and the convolutional block attention module (CBAM) attention module is integrated into the feature extraction network to ensure the detection accuracy of the model and reduce the model parameters. The deep and shallow semantic information of the four scale feature maps is fused by using PANet-D feature fusion to enhance the detection ability of small objects. By using Focal loss to optimize the classification loss function, the convergence speed of the network model is accelerated. The experimental results show that the recognition accuracy of the improved network reaches 96.55%, and the size of the network model is 92.49 M lower, and the detection speed is 17% higher than those of the original YOLOv4 network, which fully proves the feasibility of the algorithm.

Key words: vehicle detection, YOLOv4, multiscale fusion, depth separable convolution, attention mechanism

CLC Number:

TP311

Yin Wang, Feixiang Wang, Qianlai Sun. Vehicle Detection Method Based on Multi Scale Feature Fusion[J]. Journal of System Simulation, 2022, 34(6): 1219-1229.

Figures/Tables 12

Fig. 1

Fig. 2

Fig. 3

Table 1

Fig. 4

Fig. 5

Fig. 6

Table 2

Table 3

Fig. 7

Table 4

Fig. 8

References 26

1	刘淑萍, 刘羽, 於俊, 等. 结合手指检测和HOG特征的分层静态手势识别[J]. 中国图象图形学报, 2015, 20(6): 781-788.
	Liu Shuping, Liu Yu, Yu Jun, et al. Hierarchical Static Hand Gesture Recognition by Combining Finger Detection and HOG FEATUREs[J]. Journal of Image and Graphics, 2015, 20(6): 781-788.
2	文学志, 方巍, 郑钰辉. 一种基于类Haar特征和改进AdaBoost分类器的车辆识别算法[J]. 电子学报, 2011, 39(5): 1121-1126.
	Wen Xuezhi, Fang Wei, Zheng Yuhui. An Algorithm Based on Haar-Like Features and Improved AdaBoost Classifier for Vehicle Recognition[J]. Chinese Journal of Electronics, 2011, 39(5): 1121-1126.
3	黄超, 胡志军, 徐勇, 等. 基于视觉的车辆异常行为检测综述[J]. 模式识别与人工智能, 2020, 33(3): 234-248.
	Huang Chao, Hu Zhijun, Xu Yong, et al. Vision-Based Abnormal Vehicle Behavior Detection:A Survey[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(3): 234-248.
4	Girshick R, Donahue J, Darrell T, et al. Rich Featurehierarchies for Accurate Object Detection and Semantic Seg-Mentation[C]// Computer Vision and Pattern Recognition(CVPR). Columbus: IEEE Press, 2014: 580-587.
5	Kong T, Yao A, Chen Y. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection[C]// Computer Vision and Pattern Recognition(CVPR).Las Vegas: IEEE Press, 2016: 845-853.
6	Girshick R. Fast R-CNN[C]// International Conference on Computer Vision(ICCV). Santiago, Chile: IEEE Press,2015: 1440-1448.
7	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence(S0162-8828), 2017, 39(6): 1137-1149.
8	Redmon J, Divvala S, Girshick R, et al. You Only Look Once:Unified, Real-Time Object Detection[C]//Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE Press, 2016: 779-788.
9	Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]// Computer Vision and Pattern Recognition(CVPR). Honolulu, USA: IEEE Press, 2017: 7263-7271.
10	Redmon J, Farhadi A. Yolov3: An Incremental Improvement [J/OL]. [2021-08-02]. .
11	Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox DETECtor[C]// European Conference on Computer Vision(ECCV). Amsterdam, Netherlands: IEEE Press, 2016: 21-37.
12	Iandola F N, Han S, Moskewice M W, et al. Squeeze Net: AlexNet-Level Accuracy with 50x Fewer Parameters and<0.5MB Model Size[J/OL]. [2021-08-02]. .
13	袁铭择, 夏时洪. 人脸特征点跟踪系统与仿真分析[J]. 系统仿真学报, 2018, 30(12): 4618-4624.
	Yuan Mingze, Xia Shihong. Facial Feature Points Tracking System and Simulation Analysis[J]. Journal of System Simulation, 2018, 30(12): 4618-4624.
14	Howard A G, Zhu M L, Chen B, et al. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications[J/OL]. [2021-08-02]. .
15	Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal Speed and Accuracy of Object Detection[J/OL].[2021-08-02]. .
16	Sandler M, Howard A, Zhu M, et al. Mobilenetv2:Inverted Residuals and Linear Bottlenecks[C]// Computer Vision and Pattern Recognition(CVPR).Salt Lake City: IEEE Press, 2018: 4510-4520.
17	Lin T Y, Dollar P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]// Computer Vision and Pattern Recognition(CVPR).San Juan: IEEE Press, 2017:2117-2125.
18	Liu S, Qi L, Qin H F, et al. Path Aggregation Network for Instance Segmentation[C]// Computer Vision and Pattern Recognition(CVPR). Wellington: IEEE Press, 2018: 8759-8768.
19	He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE Press, 2016: 770-778.
20	Woo S, Park J, Lee J Y, et al. Cbam: Convolutional Block Attention Module[C]// European Conference on Computer Vision (ECCV). Munich: IEEE Press, 2018: 3-19.
21	Lin T Y, Goyal P, Girshick R,et al. Focal Loss for Dense Object Detection[C]//Computer Vision and Pattern Recognition(CVPR). Venice: IEEE Press, 2017: 2980-2988.
22	Geiger A, Lenz P, Stiller C, et al. Vision Meets Robotics: The kitti dataset[J]. The International Journal of Robotics Research(S0278-3649), 2013, 32(11): 1231-1237.
23	李大湘, 王小雨, 刘颖. 监控视频中的车型分类方法[J]. 西安邮电大学学报, 2018, 23(4): 40-47.
	Li Daxiang, Wang Xiaoyu, Liu Ying. Vehicle Classification Method in Surveillance Video[J]. Journal of Xi'an University of Posts and Telecommunications,2018, 23(4): 40-47.
24	Gupta B, Shukla P, Mittal A. K-Nearest Correlated Neighbor Classification for Indian Sign Language Gesture Recognition Using Feature Fusion[C]// International Conference on Computer Communication and Informatics(ICCCI). Coimbatore: IEEE Press, 2016: 1-5.
25	Tan M, Le Q. Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks[C]//International Conference on Machine Learning. Stockholm, Sweden: IEEE Press, 2019: 6105-6114.
26	Howard A, Sandler M, Chu G, et al. Searching for Mobilenetv3[C]// International Conference on Computer Vision. Long Beach: IEEE Press, 2019: 1314-1324.

input	operator	t	c	n	s	K	output	特征输出
416×416×3	conv2d	-	32	1	2	3	208×208×32
208×208×32	bottleneck	1	16	1	1	3	208×208×16
208×208×16	bottleneck	6	24	2	2	3	104×104×24	out 1
104×104×24	bottleneck	6	32	3	2	3	52×52×32	out 2
52×52×32	bottleneck	6	64	4	2	3	26×26×64
26×26×64	bottleneck	6	96	3	1	3	26×26×96	out 3
26×26×96	bottleneck	6	160	3	2	3	13×13×160
13×13×160	bottleneck	6	320	1	1	3	13×13×320	out 4

集合	number	汽车	卡车	厢式货车	铁轨车
合计	6 820	28 742	1 094	2 914	511
训练集	5 456	23 041	902	2 344	419
测试集	1 364	5 701	192	570	92

CBAM	Focal loss	4尺度	mAP/%
			92.20
	√	√	95.30
√	√		94.78
√		√	96.31
√	√	√	96.55

算法名称	mAP/%	AP%				FPS	Params(个)	模型大小/M
算法名称	mAP/%	汽车	卡车	铁轨车	厢式货车	FPS	Params(个)	模型大小/M
SSD300	86.14	87.01	90.24	84.66	82.68	49	2.40×10⁷	92.00
YOLOv3	92.05	94.30	92.22	91.38	90.32	30	6.15×10⁷	234.76
YOLOv3-Efficient	89.87	92.52	93.75	85.14	88.07	24	1.06×10⁷	40.28
YOLOv4	96.73	95.12	98.09	97.03	96.67	23	6.40×10⁷	243.96
YOLOv4-Tiny	85.66	85.85	87.60	89.78	79.39	83	5.88×10⁶	22.43
MobileNetv1-YOLOv4	91.32	93.18	94.08	86.92	91.09	40	1.23×10⁷	46.86
MobileNetv2-YOLOv4	92.20	92.96	93.86	91.25	90.72	32	1.03×10⁷	39.64
MobileNetv3-YOLOv4	91.07	92.90	92.65	87.66	91.08	28	1.13×10⁷	43.18
本文算法	96.55	96.79	97.30	96.70	95.42	27	4.00×10⁷	151.47

[1]	Weidong Jin, Shuli Zhang, Peng Tang, Man Zhang. Image Dehazing Network Based on Densely Connected Residual Block and Channel Pixel Attention [J]. Journal of System Simulation, 2022, 34(8): 1663-1673.
[2]	Junjie Qiu, Hong Zheng, Yunhui Cheng. Research on Prediction of Model Based on Multi-scale LSTM [J]. Journal of System Simulation, 2022, 34(7): 1593-1604.
[3]	Yin Shi, Hou Guolian, Chi Yan, Gong Linjuan, Hu Xiaodong. Prediction Method for Health Degree of Front Bearing of Wind Turbine Generator and Implementation [J]. Journal of System Simulation, 2021, 33(6): 1323-1333.
[4]	Yang Weilong, Xu Kai, Xie Xu, Sun Lin. Research on CGF-oriented Virtual Human Perceptual Attention Model [J]. Journal of System Simulation, 2021, 33(2): 262-270.
[5]	Xu Sheng, Feng Wenyu, Liu Zhicheng, Tu Xintao, Fei Minrui, Zhang Kun. Research on Accurate Gesture Recognition Algorithm in Complex Environment Based on Machine Vision [J]. Journal of System Simulation, 2021, 33(10): 2460-2469.
[6]	Wang Suqin, Shi Wenhao, Li Zhaoxin, Mao Tianlu. Vehicle Detection Method in UAV Aerial Video [J]. Journal of System Simulation, 2018, 30(7): 2776-2786.
[7]	Yao Chunlian, Feng Shengnan, Zhang Fangfang, Si Huilin. Vehicle-Distance Measurement Based on Plate Area [J]. Journal of System Simulation, 2017, 29(11): 2820-2827.
[8]	Kong Yisi, Hu Xiaofeng, Zhu Feng, Tao Jiuyang. Attention Mechanism in Battlefield Situation Awareness [J]. Journal of System Simulation, 2017, 29(10): 2233-2241.