Improved Target Detection Algorithm for Aerial Images Based on YOLOv5

doi:10.16182/j.issn1004731x.joss.23-1564

Abstract

Abstract:

In order to improve the existing small target detection methods, which suffer from low detection accuracy, high false detection rate and high leakage rate, the FSD-YOLOv5 algorithm is proposed, which has three improvements based on the YOLOv5 algorithm. The Focal EIoU is used instead of the original CIoU to improve the model convergence speed and regression accuracy. To cope with the deficiencies in CNN architecture, we adopt a new CNN building block called SPD-Conv is adopted. To address the problem of the reduced or lost information of small objects in feature maps caused by downsampling in convolutional neural networks, feature reuse is introduced to increase the feature information of small objects in the feature maps. Experimental results show that FSD-YOLOv5 achieves a detection accuracy of 36.3%, an improvement of 2.4% in comparison with original algorithm.

Key words: YOLOv5, Focal EIoU, SPD-Conv, densenet, aerial image detection

CLC Number:

TP393

Guo Yecai, Sun Jingdong, Saha Amitave. Improved Target Detection Algorithm for Aerial Images Based on YOLOv5[J]. Journal of System Simulation, 2025, 37(2): 551-562.

Figures/Tables 15

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 1

Fig. 7

Table 2

Table 3

Fig. 8

Table 4

Table 5

Table 6

Fig. 9

References 38

1	Song Gang, Du Hongwei, Zhang Xinyue, et al. Small Object Detection in Unmanned Aerial Vehicle Images Using Multi-scale Hybrid Attention[J]. Engineering Applications of Artificial Intelligence, 2024, 128: 107455.
2	Du Dawei, Qi Yuankai, Yu Hongyang, et al. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking[C]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 375-391.
3	Gu Jingjing, Su Tao, Wang Qiuhong, et al. Multiple Moving Targets Surveillance Based on a Cooperative Network for Multi-UAV[J]. IEEE Communications Magazine, 2018, 56(4): 82-89.
4	Kussul Nataliia, Lavreniuk Mykola, Skakun S, et al. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(5): 778-782.
5	Sadgrove Edmund J, Falzon Greg, Miron David, et al. Real-time Object Detection in Agricultural/Remote Environments Using the Multiple-expert Colour Feature Extreme Learning Machine (MEC-ELM)[J]. Computers in Industry, 2018, 98: 183-191.
6	Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
7	Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2015: 1440-1448.
8	Ren Shaoqing, He Kaiming, Girshick R, et al. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99.
9	Liu Wei, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[C]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
10	Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2017: 2999-3007.
11	Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-788.
12	Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 6517-6525.
13	Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[EB/OL]. (2018-04-08) [2023-07-07]. .
14	Bochkovskiy Alexey, Yao Wang Chien, Liao Hongyuan. YOLOv4: Optimal Speed and Accuracy of Object Detection[EB/OL]. (2020-04-23) [2023-07-07]. .
15	Xiao Hanguang, Li Yuewei, Xiu Yu, et al. Development of Outdoor Swimmers Detection System with Small Object Detection Method Based on Deep Learning[J]. Multimedia Systems, 2023, 29(1): 323-332.
16	Onur Can Koyun, Reyhan Kevser Keser, İbrahim Batuhan Akkaya, et al. Focus-and-detect: A Small Object Detection Framework for Aerial Images[J]. Signal Processing: Image Communication, 2022, 104: 116675.
17	Xue Zhenyang, Lin Haifeng, Wang Fang. A Small Target Forest Fire Detection Model Based on YOLOv5 Improvement[J]. Forests, 2022, 13(8): 1332.
18	Zhang Yifan, Ren Weiqiang, Zhang Zhang, et al. Focal and Efficient IoU Loss for Accurate Bounding Box Regression[J]. Neurocomputing, 2022, 506(C): 146-157.
19	Sunkara R, Luo Tie. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-resolution Images and Small Objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2023: 443-459.
20	Huang Gao, Liu Zhuang, Laurens Van Der Maaten, et al. Densely Connected Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 2261-2269.
21	Du Dawei, Zhu Pengfei, Wen Longyin, et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE, 2019: 213-226.
22	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 770-778.
23	Newell A, Yang Kaiyu, Deng Jia. Stacked Hourglass Networks for Human Pose Estimation[C]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 483-499.
24	Xie Saining, Girshick R, Dollár Piotr, et al. Aggregated Residual Transformations for Deep Neural Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 5987-5995.
25	Lin T Y, Dollár Piotr, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 936-944.
26	Khan Habib, Hussain T, Samee Ullah Khan, et al. Deep Multi-scale Pyramidal Features Network for Supervised Video Summarization[J]. Expert Systems with Applications, 2024, 237, Part C: 121288.
27	He Kaiming, Gkioxari G, Dollár Piotr, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2017: 2980-2988.
28	Cai Zhaowei, Vasconcelos N. Cascade R-CNN: Delving into High Quality Object Detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6154-6162.
29	Wang C Y, Bochkovskiy A, Liao Hongyuan. YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 7464-7475.
30	Tian Zhi, Shen Chunhua, Chen Hao, et al. FCOS: Fully Convolutional One-stage Object Detection[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 9626-9635.
31	Duan Kaiwen, Bai Song, Xie Lingxi, et al. CenterNet: Keypoint Triplets for Object Detection[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 6568-6577.
32	Bodla N, Singh B, Chellappa R, et al. Soft-NMS-improving Object Detection with One Line of Code[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2017: 5562-5570.
33	Neubeck A, Van Gool L. Efficient Non-maximum Suppression[C]//18th International Conference on Pattern Recognition (ICPR'06). Piscataway: IEEE, 2006: 850-855.
34	Dai Xiyang, Chen Yinpeng, Xiao Bin, et al. Dynamic Head: Unifying Object Detection Heads with Attentions[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 7369-7378.
35	Liang Tingting, Chu Xiaojie, Liu Yudong, et al. CBNet: A Composite Backbone Network Architecture for Object Detection[J]. IEEE Transactions on Image Processing, 2022, 31: 6893-6906.
36	Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common Objects in Context[C]//Computer Vision-ECCV 2014. Cham: Springer International Publishing, 2014: 740-755.
37	Carion N, Massa F, Synnaeve G, et al. End-to-end Object Detection with Transformers[C]//Computer Vision- ECCV 2020. Cham: Springer International Publishing, 2020: 213-229.
38	Liu Shu, Qi Lu, Qin Haifang, et al. Path Aggregation Network for Instance Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768.

损失函数	平均精确度/%	帧率/(帧/s)
SIoU	34.4	51
Wise IoU	33.6	56
XIoU	34.3	57
EIoU	34.9	57
Focal EIoU	35.1	58

指标	YOLOv5s	YOLOv5s(Focal EIoU)
参数	7.04×10⁶	7.04×10⁶
精确度/%	46.1	48.2
召回率/%	34.4	34.4
平均精确度/%	33.9	35.1
帧率/(帧/s)	57	58

指标	YOLOv5s	YOLOv5s(SPD-Conv)
参数	7.04×10⁶	8.58×10⁶
精确度/%	46.1	47.9
召回率/%	34.4	34.6
平均精确度/%	33.9	34.8
帧率/(帧/s)	57	52

卷积个数	位置	精确度/ %	召回率/ %	平均精确度/%	帧率/ (帧/s)
0		46.1	34.4	33.9	57
2	Backbone	48.4	34.8	35.2	49
3	Backbone	47.9	34.7	34.5	48
4	Backbone	47.6	34.9	34.4	48
2	Neck	45.4	33.3	33.1	61
2	Head	45.4	32.3	31.4	50

指标	YOLOv5s	YOLOv5s(DenseNet C3)
参数	7.04×10⁶	9.75×10⁶
精确度/%	46.1	48.4
召回率/%	34.4	34.8
平均精确度/%	33.9	35.2
帧率/(帧/s)	57	49