Simulation of Occluded Pedestrian Detection Based on Improved YOLO

doi:10.16182/j.issn1004731x.joss.21-0915

Abstract

Abstract:

Aiming at the high missed detection rates and low accuracy of existing YOLO for occlusion and multi-scale pedestrian targets, an improved pedestrian detection algorithm is proposed. YOLO backbone is modified to enhance the capabilities of cross-scale feature extraction. To increase thepedestrian feature fusion capabilities of different scales, a spatial pyramid pooling module and two attention mechanisms are introduced at different positions in front of YOLO layers. Aiming at the detection performance degradation due to the extreme complexity of network module and to improve the model training efficiency, the network structure is pruned according to the actual situation. Experimental results show that compared with YOLOv3 etc, YOLO-SSC-s model can effectively improve the medium and small pedestrian targets detection accuracy and speed, and reduce the missed detection rates under the condition of occlusion.

Key words: pedestrian detection, you only look once(YOLO), occlusion, attention mechanisms

CLC Number:

TP391

Nan Xiang, Lu Wang, Chongliu Jia, Yuemou Jian, Xiaoxia Ma. Simulation of Occluded Pedestrian Detection Based on Improved YOLO[J]. Journal of System Simulation, 2023, 35(2): 286-299.

Figures/Tables 16

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Fig. 5

Fig. 6

Table 2

Table 3

Table 4

Fig. 7

Fig. 8

Fig. 9

Table 5

Fig. 10

Fig. 11

References 23

1	Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-Time Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2016: 779-788.
2	Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Los Alamitos: IEEE, 2005: 886-893.
3	Everingham M, Eslami S, Gool L V, et al. The Pascal Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision(S0920-5691), 2015, 111(1): 98-136.
4	Papageorgiou C P, Oren M, Poggio T. A General Framework for Object Detection[C]//Sixth International Conference on Computer Vision (IEEE cat. No. 98CH36271). Los Alamitos: IEEE, 1998: 555-562.
5	Viola P, Jones M J, Snow D. Detecting Pedestrians Using Patterns of Motion and Appearance[J]. International Journal of Computer Vision(S0920-5691), 2005, 63(2): 153-161.
6	Leibe B, Seemann E, Schiele B. Pedestrian Detection in Crowded Scenes[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Los Alamitos: IEEE, 2005: 878-885.
7	Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2014: 580-587.
8	Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. Los Alamitos: IEEE, 2015: 1440-1448.
9	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(S0162-8828), 2017, 39(6): 1137-1149.
10	Zhang S, Yang J, Schiele B. Occluded Pedestrian Detection Through Guided Attention in CNNs[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2018: 6995-7003.
11	He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//IEEE International Conference on Computer Vision. Los Alamitos: IEEE, 2017: 2961-2969.
12	Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2017: 7263-7271.
13	Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[J/OL]. ArXiv preprint (2018-04-08)[2021-07-20]. .
14	Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal Speed and Accuracy of Object Detection[J/OL]. ArXiv(2020-04-23)[2021-08-29]. .
15	Wang C Y, Liao H, Wu Y H, et al. CSPNet: A New Backbone that can Enhance Learning Capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW). Los Alamitos: IEEE, 2020: 390-391.
16	Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2018: 7132-7141.
17	He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(S0162-8828), 2014, 37(9): 1904-1916.
18	Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[C]//European Conference on Computer Vision(ECCV). Berlin: Springer, 2018: 3-19.
19	Zhang Q L, Yang Y B. Sa-net: Shuffle Attention for Deep Convolutional Neural Networks[C]//IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Toronto: IEEE, 2021: 2235-2239.
20	Liu Z, Sun M, Zhou T, et al. Rethinking the Value of Network Pruning[J/OL]. ArXiv preprint(2018-10-11)[2021-07-22]. .
21	Han S, Pool J, Tran J, et al. Learning both Weights and Connections for Efficient Neural Networks[J/OL]. ArXiv preprint(2015-06-08)[2021-07-25]. .
22	Yu J, Jiang Y, Wang Z, et al. Unitbox: An Advanced Object Detection Network[C]//24th ACM International Conference on Multimedia. New York: ACM, 2016: 516-520.
23	Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2019: 658-666.

Image_ID	预测框参数		pedestrian score (≥conf_thres)
Image_ID	b_w	b_h	pedestrian score (≥conf_thres)
crop001616	102.263	341.010	0.480
	119.577	379.282	0.676
	130.176	424.379	0.602
	107.976	330.635	0.502
	96.253	367.631	0.476
	67.975	212.238	0.245
	80.363	230.423	0.235
	98.929	383.241	0.160
crop001607	113.105	345.795	0.666
	101.171	357.207	0.515
	83.350	267.993	0.376
	89.907	267.978	0.427
	72.405	241.371	0.409
	79.012	246.653	0.387
	84.017	235.129	0.295
	99.572	274.423	0.243
	75.037	247.766	0.116

网络模型	mAP/%			准确率/%			漏检率/%	平均检测时间/ms	模型尺寸/MB
网络模型	IoU=0.6	IoU=0.7	IoU=0.8	IoU=0.6	IoU=0.7	IoU=0.8
YOLOv3-tiny	57.9	56.6	54.9	66.5	64.1	61.6	45.6	3.2	69.5
YOLOv3	81.2	80.7	79.4	70.6	70.0	68.0	18.0	16.8	246.6
YOLOv3-s	78.7	78.3	77.0	73.3	73.1	71.8	20.0	15.5	196.7
YOLOv3-SPP	82.9	82.2	80.8	75.7	74.6	70.8	17.8	17.7	250.8
Our Method	84.5	83.8	82.1	76.4	76.2	71.5	12.9	15.9	244.9

网络模型	骨干网络	是否简化	mAP/%			准确率/%			漏检率/%	平均检测时间/ms
网络模型	骨干网络	是否简化	IoU=0.6	IoU=0.7	IoU=0.8	IoU=0.6	IoU=0.7	IoU=0.8
YOLOv3	Darknet-53	否	81.2	80.7	79.4	70.6	70.0	68.0	18.0	16.8
YOLOv3-s	Darknet-53	是	78.7	78.3	77.0	73.3	73.1	71.8	20.0	15.5
YOLOv3-Darknet-light	Darknet-light	否	81.9	81.1	80.2	71.5	70.0	67.0	15.5	15.9
YOLOv3-SPP	Darknet-53	否	82.9	82.2	80.8	75.7	74.6	70.8	17.8	17.7
YOLOv3-SPP-s	Darknet-53	是	82.7	82.3	81.0	71.5	71.1	67.3	13.3	16.3
YOLOv3-SPP-Darknet-light	Darknet-light	否	83.5	82.9	81.5	72.9	72.1	69.1	16.2	16.7

注意力组合	mAP/%			准确率/%			漏检率/%	平均检测时间/ms
注意力组合	IoU=0.6	IoU=0.7	IoU=0.8	IoU=0.6	IoU=0.7	IoU=0.8	漏检率/%	平均检测时间/ms
YOLO-S-s(baseline)	80.4	79.8	77.9	73.6	73.4	69.8	23.3	17.0
+SE	79.0	78.4	76.5	71.1	70.1	65.6	17.5	15.8
+CBAM	80.9	80.6	78.9	73.8	73.7	70.6	18.8	15.1
+SHA	81.6	81.1	78.8	74.0	72.8	66.6	17.8	17.3
+SE+CBAM	81.2	80.7	79.6	74.2	73.4	70.2	18.4	14.9
+SE+SHA	83.5	82.8	81.3	70.9	69.2	64.9	15.9	15.2
+CBAM+ SHA	82.1	81.6	79.9	76.5	76.2	72.9	18.7	15.0
+CBAM(2)+SHA	82.3	81.7	80.0	73.1	72.2	68.4	17.2	16.0
+CBAM(3)	81.0	80.5	79.2	72.4	71.0	68.1	18.4	18.2
+CBAM(3)+SE	80.8	80.4	79.1	71.6	71.1	69.4	19.1	16.6
+CBAM(3)+SHA	82.4	81.9	80.6	73.5	73.0	70.8	16.7	18.4
+CBAM(3)+SHA+SE	81.0	80.6	79.3	73.6	72.9	71.2	18.6	16.4
Our Combination	84.5	83.8	82.1	76.4	76.2	71.5	12.9	15.9

网络模型	mAP/%			准确率/%			漏检率/%	平均检测时间/ms
网络模型	IoU=0.6	IoU=0.7	IoU=0.8	IoU=0.6	IoU=0.7	IoU=0.8	漏检率/%	平均检测时间/ms
YOLOv3-tiny	30.8	29.1	27.3	50.0	46.7	41.8	73.6	1.9
YOLOv3	62.6	60.5	56.4	70.8	65.1	56.0	45.2	16.1
YOLOv3-s	62.1	59.5	53.6	74.0	69.2	56.8	49.6	13.7
YOLOv3-SPP	65.4	63.0	58.1	73.9	68.3	58.1	46.2	15.7
Our Method	69.2	66.4	59.6	76.1	71.0	56.9	43.5	14.7