Journal of System Simulation ›› 2023, Vol. 35 ›› Issue (2): 286-299.doi: 10.16182/j.issn1004731x.joss.21-0915
• Papers • Previous Articles Next Articles
Nan Xiang(
), Lu Wang, Chongliu Jia, Yuemou Jian, Xiaoxia Ma
Received:2021-09-07
Revised:2021-11-12
Online:2023-02-28
Published:2023-02-16
CLC Number:
Nan Xiang, Lu Wang, Chongliu Jia, Yuemou Jian, Xiaoxia Ma. Simulation of Occluded Pedestrian Detection Based on Improved YOLO[J]. Journal of System Simulation, 2023, 35(2): 286-299.
Table 1
Result of location prediction
| Image_ID | 预测框参数 | pedestrian score (≥conf_thres) | |
|---|---|---|---|
| bw | bh | ||
| crop001616 | 102.263 | 341.010 | 0.480 |
| 119.577 | 379.282 | 0.676 | |
| 130.176 | 424.379 | 0.602 | |
| 107.976 | 330.635 | 0.502 | |
| 96.253 | 367.631 | 0.476 | |
| 67.975 | 212.238 | 0.245 | |
| 80.363 | 230.423 | 0.235 | |
| 98.929 | 383.241 | 0.160 | |
| crop001607 | 113.105 | 345.795 | 0.666 |
| 101.171 | 357.207 | 0.515 | |
| 83.350 | 267.993 | 0.376 | |
| 89.907 | 267.978 | 0.427 | |
| 72.405 | 241.371 | 0.409 | |
| 79.012 | 246.653 | 0.387 | |
| 84.017 | 235.129 | 0.295 | |
| 99.572 | 274.423 | 0.243 | |
| 75.037 | 247.766 | 0.116 | |
Table 2
Test performance comparison of different models
| 网络模型 | mAP/% | 准确率/% | 漏检率/% | 平均检测时间/ms | 模型尺寸/MB | ||||
|---|---|---|---|---|---|---|---|---|---|
| IoU=0.6 | IoU=0.7 | IoU=0.8 | IoU=0.6 | IoU=0.7 | IoU=0.8 | ||||
| YOLOv3-tiny | 57.9 | 56.6 | 54.9 | 66.5 | 64.1 | 61.6 | 45.6 | 3.2 | 69.5 |
| YOLOv3 | 81.2 | 80.7 | 79.4 | 70.6 | 70.0 | 68.0 | 18.0 | 16.8 | 246.6 |
| YOLOv3-s | 78.7 | 78.3 | 77.0 | 73.3 | 73.1 | 71.8 | 20.0 | 15.5 | 196.7 |
| YOLOv3-SPP | 82.9 | 82.2 | 80.8 | 75.7 | 74.6 | 70.8 | 17.8 | 17.7 | 250.8 |
| Our Method | 84.5 | 83.8 | 82.1 | 76.4 | 76.2 | 71.5 | 12.9 | 15.9 | 244.9 |
Table 3
Test performance comparison of different backbone networks
| 网络模型 | 骨干网络 | 是否简化 | mAP/% | 准确率/% | 漏检率/% | 平均检测时间/ms | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| IoU=0.6 | IoU=0.7 | IoU=0.8 | IoU=0.6 | IoU=0.7 | IoU=0.8 | |||||
| YOLOv3 | Darknet-53 | 否 | 81.2 | 80.7 | 79.4 | 70.6 | 70.0 | 68.0 | 18.0 | 16.8 |
| YOLOv3-s | Darknet-53 | 是 | 78.7 | 78.3 | 77.0 | 73.3 | 73.1 | 71.8 | 20.0 | 15.5 |
| YOLOv3-Darknet-light | Darknet-light | 否 | 81.9 | 81.1 | 80.2 | 71.5 | 70.0 | 67.0 | 15.5 | 15.9 |
| YOLOv3-SPP | Darknet-53 | 否 | 82.9 | 82.2 | 80.8 | 75.7 | 74.6 | 70.8 | 17.8 | 17.7 |
| YOLOv3-SPP-s | Darknet-53 | 是 | 82.7 | 82.3 | 81.0 | 71.5 | 71.1 | 67.3 | 13.3 | 16.3 |
| YOLOv3-SPP-Darknet-light | Darknet-light | 否 | 83.5 | 82.9 | 81.5 | 72.9 | 72.1 | 69.1 | 16.2 | 16.7 |
Table 4
Combination test comparison of different attention mechanisms
| 注意力组合 | mAP/% | 准确率/% | 漏检率/% | 平均检测时间/ms | ||||
|---|---|---|---|---|---|---|---|---|
| IoU=0.6 | IoU=0.7 | IoU=0.8 | IoU=0.6 | IoU=0.7 | IoU=0.8 | |||
| YOLO-S-s(baseline) | 80.4 | 79.8 | 77.9 | 73.6 | 73.4 | 69.8 | 23.3 | 17.0 |
| +SE | 79.0 | 78.4 | 76.5 | 71.1 | 70.1 | 65.6 | 17.5 | 15.8 |
| +CBAM | 80.9 | 80.6 | 78.9 | 73.8 | 73.7 | 70.6 | 18.8 | 15.1 |
| +SHA | 81.6 | 81.1 | 78.8 | 74.0 | 72.8 | 66.6 | 17.8 | 17.3 |
| +SE+CBAM | 81.2 | 80.7 | 79.6 | 74.2 | 73.4 | 70.2 | 18.4 | 14.9 |
| +SE+SHA | 83.5 | 82.8 | 81.3 | 70.9 | 69.2 | 64.9 | 15.9 | 15.2 |
| +CBAM+ SHA | 82.1 | 81.6 | 79.9 | 76.5 | 76.2 | 72.9 | 18.7 | 15.0 |
| +CBAM(2)+SHA | 82.3 | 81.7 | 80.0 | 73.1 | 72.2 | 68.4 | 17.2 | 16.0 |
| +CBAM(3) | 81.0 | 80.5 | 79.2 | 72.4 | 71.0 | 68.1 | 18.4 | 18.2 |
| +CBAM(3)+SE | 80.8 | 80.4 | 79.1 | 71.6 | 71.1 | 69.4 | 19.1 | 16.6 |
| +CBAM(3)+SHA | 82.4 | 81.9 | 80.6 | 73.5 | 73.0 | 70.8 | 16.7 | 18.4 |
| +CBAM(3)+SHA+SE | 81.0 | 80.6 | 79.3 | 73.6 | 72.9 | 71.2 | 18.6 | 16.4 |
| Our Combination | 84.5 | 83.8 | 82.1 | 76.4 | 76.2 | 71.5 | 12.9 | 15.9 |
Table 5
Comparison of model generalization ability
| 网络模型 | mAP/% | 准确率/% | 漏检率/% | 平均检测时间/ms | ||||
|---|---|---|---|---|---|---|---|---|
| IoU=0.6 | IoU=0.7 | IoU=0.8 | IoU=0.6 | IoU=0.7 | IoU=0.8 | |||
| YOLOv3-tiny | 30.8 | 29.1 | 27.3 | 50.0 | 46.7 | 41.8 | 73.6 | 1.9 |
| YOLOv3 | 62.6 | 60.5 | 56.4 | 70.8 | 65.1 | 56.0 | 45.2 | 16.1 |
| YOLOv3-s | 62.1 | 59.5 | 53.6 | 74.0 | 69.2 | 56.8 | 49.6 | 13.7 |
| YOLOv3-SPP | 65.4 | 63.0 | 58.1 | 73.9 | 68.3 | 58.1 | 46.2 | 15.7 |
| Our Method | 69.2 | 66.4 | 59.6 | 76.1 | 71.0 | 56.9 | 43.5 | 14.7 |
| 1 | Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-Time Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2016: 779-788. |
| 2 | Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Los Alamitos: IEEE, 2005: 886-893. |
| 3 | Everingham M, Eslami S, Gool L V, et al. The Pascal Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision(S0920-5691), 2015, 111(1): 98-136. |
| 4 | Papageorgiou C P, Oren M, Poggio T. A General Framework for Object Detection[C]//Sixth International Conference on Computer Vision (IEEE cat. No. 98CH36271). Los Alamitos: IEEE, 1998: 555-562. |
| 5 | Viola P, Jones M J, Snow D. Detecting Pedestrians Using Patterns of Motion and Appearance[J]. International Journal of Computer Vision(S0920-5691), 2005, 63(2): 153-161. |
| 6 | Leibe B, Seemann E, Schiele B. Pedestrian Detection in Crowded Scenes[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Los Alamitos: IEEE, 2005: 878-885. |
| 7 | Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2014: 580-587. |
| 8 | Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. Los Alamitos: IEEE, 2015: 1440-1448. |
| 9 | Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(S0162-8828), 2017, 39(6): 1137-1149. |
| 10 | Zhang S, Yang J, Schiele B. Occluded Pedestrian Detection Through Guided Attention in CNNs[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2018: 6995-7003. |
| 11 | He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//IEEE International Conference on Computer Vision. Los Alamitos: IEEE, 2017: 2961-2969. |
| 12 | Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2017: 7263-7271. |
| 13 | Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[J/OL]. ArXiv preprint (2018-04-08)[2021-07-20]. . |
| 14 | Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal Speed and Accuracy of Object Detection[J/OL]. ArXiv(2020-04-23)[2021-08-29]. . |
| 15 | Wang C Y, Liao H, Wu Y H, et al. CSPNet: A New Backbone that can Enhance Learning Capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW). Los Alamitos: IEEE, 2020: 390-391. |
| 16 | Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2018: 7132-7141. |
| 17 | He K, Zhang X, Ren S, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence(S0162-8828), 2014, 37(9): 1904-1916. |
| 18 | Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[C]//European Conference on Computer Vision(ECCV). Berlin: Springer, 2018: 3-19. |
| 19 | Zhang Q L, Yang Y B. Sa-net: Shuffle Attention for Deep Convolutional Neural Networks[C]//IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Toronto: IEEE, 2021: 2235-2239. |
| 20 | Liu Z, Sun M, Zhou T, et al. Rethinking the Value of Network Pruning[J/OL]. ArXiv preprint(2018-10-11)[2021-07-22]. . |
| 21 | Han S, Pool J, Tran J, et al. Learning both Weights and Connections for Efficient Neural Networks[J/OL]. ArXiv preprint(2015-06-08)[2021-07-25]. . |
| 22 | Yu J, Jiang Y, Wang Z, et al. Unitbox: An Advanced Object Detection Network[C]//24th ACM International Conference on Multimedia. New York: ACM, 2016: 516-520. |
| 23 | Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2019: 658-666. |
| [1] | Li Ju, Cao Mingwei, Yu Ye, XiaYu, Zhou Lifan. An Anti-occlusion Adaptive Particle Filtering Algorithm [J]. Journal of System Simulation, 2018, 30(9): 3552-3557. |
| [2] | Gu Lingkang, Zhou Mingzheng, Wang Jun, Xiu Yu. Multi-pose Pedestrian Detection Based on Posterior Multiple Sparse Dictionaries [J]. Journal of System Simulation, 2017, 29(2): 326-331. |
| [3] | Tang Chunhui. Zenithal Pedestrian Detection Using Multiple Feature Fusion in Monocular Vision [J]. Journal of System Simulation, 2016, 28(9): 2146-2153. |
| [4] | Li Wei, Wang Pengjie, Song Haiyu. Survey on Pedestrian Detection Based on Statistical Classification [J]. Journal of System Simulation, 2016, 28(9): 2186-2194. |
| [5] | Ma Ji, Li Jingjiao, Ma Li, Zhao Yue. Combining CRF and Deformable Part Model for Pedestrian Detection [J]. Journal of System Simulation, 2015, 27(10): 2310-2315. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||