Remote Sensing Small Object Detection Based on Cross-stage Two-branch Feature Aggregation

doi:10.16182/j.issn1004731x.joss.23-1526

Abstract

Abstract:

Aiming at YOLOv8's leakage and false detection problems caused by target scale difference and complex background in remote sensing small target detection, this paper proposes a remote sensing image small target detection method based on cross-stage two-branch feature aggregation. The global shared weights in the convolution operator and the context-aware weights of specific tokens in the attention are fused to obtain high-frequency local information and low-frequency global information; the global remote dependencies are captured using a lightweight MLP, and the parallel cross-stage learnable vision center mechanism is designed to capture the information of the local corner regions of the input image; a multidimensional residual attention mechanism is designed to aggregate the output features of two parallel branches to capture pixel-level pairwise relationships as well as cross-channel and cross-space information. The experimental results show that the proposed model achieves 73.8% and 98.1% mAP on DIOR and RSOD datasets respectively, which is 1.3% and 2.1% higher than the current state-of-the-art methods.

Key words: YOLOv8, remote sensing image, small object detection, feature fusion, attention mechanism

CLC Number:

TP391

Li Jie, Liu Yang, Li Liang, Su Bengan, Wei Jialong, Zhou Guangda, Shi Yanmin, Zhao Zhen. Remote Sensing Small Object Detection Based on Cross-stage Two-branch Feature Aggregation[J]. Journal of System Simulation, 2025, 37(4): 1025-1040.

Figures/Tables 13

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Table 2

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Table 3

Table 4

Table 5

References 35

1	Dutta Suparna, Das Monidipa. Remote Sensing Scene Classification Under Scarcity of Labelled Samples—A Survey of the State-of-the-arts[J]. Computers & Geosciences, 2023, 171: 105295.
2	Girshick R, Donahue J, Darrell T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
3	Ren Shaoqing, He Kaiming, Girshick R, et al. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99.
4	Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-788.
5	Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 6517-6525.
6	Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[EB/OL]. (2018-04-08) [2023-12-06]. .
7	Bochkovskiy A, Wang C Y, Liao Hongyuan. YOLOv4: Optimal Speed and Accuracy of Object Detection[EB/OL]. (2020-04-23) [2023-12-06]. .
8	Ge Zheng, Liu Songtao, Wang Feng, et al. YOLOX: Exceeding YOLO Series in 2021[EB/OL]. (2021-08-06) [2023-12-06]. .
9	Li Chuyi, Li Lulu, Jiang Hongliang, et al. YOLOv6: A Single-stage Object Detection Framework for Industrial Applications[EB/OL]. (2022-09-07) [2023-12-06]. .
10	Wang C Y, Bochkovskiy A, Liao Hongyuan. YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 7464-7475.
11	Mei Yuan, Wu Kaijun, Xu Zehao, et al. SNG-YOLOX: Non-obvious Remote Sensing Target Detection Based on Enhanced YOLOX[EB/OL]. (2022-04-22) [2023-12-07]. .
12	Li Ronghao, Shen Ying. YOLOSR-IST: A Deep Learning Method for Small Target Detection in Infrared Remote Sensing Images Based on Super-resolution and YOLO[J]. Signal Processing, 2023, 208: 108962.
13	赵文清, 康怿瑾, 赵振兵, 等. 改进YOLOv5s的遥感图像目标检测[J]. 智能系统学报, 2023, 18(1): 86-95.
	Zhao Wenqing, Kang Yijin, Zhao Zhenbing, et al. A Remote Sensing Image Object Detection Algorithm with Improved YOLOv5s[J]. CAAI Transactions on Intelligent Systems, 2023, 18(1): 86-95.
14	Fan Qihang, Huang Huaibo, Guan Jiyang, et al. Rethinking Local Perception in Lightweight Vision Transformer[EB/OL]. (2023-06-01) [2023-12-09]. .
15	Quan Yu, Zhang Dong, Zhang Liyan, et al. Centralized Feature Pyramid for Object Detection[J]. IEEE Transactions on Image Processing, 2023, 32: 4341-4354.
16	Ouyang Daliang, He Su, Zhang Guozhong, et al. Efficient Multi-scale Attention Module with Cross-spatial Learning[C]//ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 2023: 1-5.
17	Qu Junsuo, Tang Zongbing, Zhang Le, et al. Remote Sensing Small Object Detection Network Based on Attention Mechanism and Multi-scale Feature Fusion[J]. Remote Sensing, 2023, 15(11): 2728.
18	Zhou Liming, Zheng Chang, Yan Haoxin, et al. RepDarkNet: A Multi-branched Detector for Small-target Detection in Remote Sensing Images[J]. ISPRS International Journal of Geo-Information, 2022, 11(3): 158.
19	Pei Wenjing, Shi Zhanhao, Gong Kai. Small Target Detection with Remote Sensing Images Based on an Improved YOLOv5 Algorithm[J]. Frontiers in Neurorobotics, 2022, 16: 1074862.
20	Tan Mingxing, Le Q V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[EB/OL]. (2020-09-11) [2023-12-11]. .
21	邱天衡, 王玲, 王鹏, 等. 基于改进YOLOv5的目标检测算法研究[J]. 计算机工程与应用, 2022, 58(13): 63-73.
	Qiu Tianheng, Wang Ling, Wang Peng, et al. Research on Object Detection Algorithm Based on Improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(13): 63-73.
22	Qiao Siyuan, Chen L C, Yuille A. DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 10208-10219.
23	Zhao Qijie, Sheng Tao, Wang Yongtao, et al. M2Det: A Single-shot Object Detector Based on Multi-level Feature Pyramid Network[C]//Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Conference on Innovative Applications of Artificial Intelligence and Ninth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2019: 9259-9266.
24	李超, 王凯, 丁才昌, 等. 改进特征融合网络的遥感图像小目标检测[J]. 计算机工程与应用, 2023, 59(17): 232-241.
	Li Chao, Wang Kai, Ding Caichang, et al. Improved Feature Fusion Network for Small Object Detection in Remote Sensing Images[J]. Computer Engineering and Applications, 2023, 59(17): 232-241.
25	Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
26	Woo Sanghyun, Park Jongchan, Lee J Y, et al. CBAM: Convolutional Block Attention Module[C]//Computer Vision—ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
27	Si Chenyang, Yu Weihao, Zhou Pan, et al. Inception Transformer[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2024: 23495-23509.
28	Howard A G, Zhu Menglong, Chen Bo, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[EB/OL]. (2017-04-17) [2023-12-12]. .
29	Tolstikhin I, Houlsby N, Kolesnikov A, et al. MLP-mixer: An All-MLP Architecture for Vision[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2024: 24261-24272.
30	Yu Weihao, Luo Mi, Zhou Pan, et al. MetaFormer is Actually What You Need for Vision[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 10809-10819.
31	Larsson G, Maire M, Shakhnarovich G. FractalNet: Ultra-deep Neural Networks Without Residuals[EB/OL]. (2017-05-26) [2023-12-13]. .
32	Duan Kaiwen, Bai Song, Xie Lingxi, et al. CenterNet: Keypoint Triplets for Object Detection[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 6568-6577.
33	Huang Wei, Li Guanyi, Chen Qiqiang, et al. CF2PN: A Cross-scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection[J]. Remote Sensing, 2021, 13(5): 847.
34	Liang Dong, Geng Qixiang, Wei Zongqi, et al. Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-13.
35	Sharifuzzaman Sagar A S M, Chen Yu, Xie Yakun, et al. MSA R-CNN: A Comprehensive Approach to Remote Sensing Object Detection and Scene Understanding[J]. Expert Systems with Applications, 2024, 241: 122788.

目标类别	YOLOv8	本文模型
均值	74.44	75.76
高速公路服务区	64.62	64.13
高速公路收费站	59.92	62.03
飞机	90.02	92.77
机场	82.91	85.49
棒球场	78.58	78.47
篮球场	90.80	91.95
桥梁	50.30	51.98
烟囱	80.69	83.16
水坝	63.82	65.90
高尔夫球场	81.03	80.58
田径场	80.43	79.91
港口	65.77	67.30
立交桥	61.81	63.88
船只	89.94	92.32
体育场	73.27	73.23
储罐	79.11	81.72
网球场	91.23	90.83
火车站	65.85	65.54
车辆	53.92	56.81
风车	84.70	87.18

目标类别	YOLOv8	本文模型
均值	96.10	98.11
油箱	98.25	99.23
飞机	91.51	94.96
立交桥	94.83	98.31
操场	99.80	99.93

模型	Params(M)	帧率/(帧/s)	mAP₅₀/%	AP_S/%	AP_M/%	AP_L/%
FasterRCNN^[3]	28.50	6.1	63.10	6.5	32.3	57.6
CenterNet^[32]	32.70	19.3	56.05	5.4	25.2	51.4
YOLOv3^[6]	5.50	69.8	57.10	6.8	25.5	48.1
YOLOv4^[7]	5.90	66.9	61.01	6.7	31.3	50.5
YOLOv5	7.10	50.2	66.97	11.1	37.4	62.0
YOLOX^[8]	5.04	56.1	69.79	11.3	35.3	62.7
YOLOv7^[10]	6.10	66.3	72.83	12.3	38.9	69.1
CF2PN^[33]	91.60	19.7	67.25	11.3	36.0	61.4
DEA-Net^[34]	59.90	12.5	69.64	11.9	35.5	61.7
MSA RCNN^[35]	—	—	74.37	12.8	40.6	72.4
YOLOv8	11.10	80.7	74.44	12.7	40.8	72.6
本文模型	29.80	74.2	75.76	13.9	41.6	72.1

模型	Params(M)	帧率/(帧/s)	mAP₅₀/%	AP_S/%	AP_M/%	AP_L/%
FasterRCNN	28.50	6.1	90.7	39.7	65.1	74.6
YOLOv4	5.90	66.9	86.7	38.9	63.3	73.5
CenterNet	32.70	19.3	85.6	37.7	62.6	72.4
YOLOv5	7.10	50.2	92.2	40.3	66.4	75.0
YOLOX	5.04	56.1	94.7	40.7	68.6	77.1
DEA-Net	59.90	12.5	93.1	40.5	67.9	76.7
YOLOv8	11.10	80.7	96.0	41.8	69.8	78.4
本文模型	29.80	74.2	98.1	45.1	72.7	76.9

编号	CSCAP	CSEVC	EMCBAM	Params(M)	帧率/(帧/s)	mAP₅₀/%	AP_S/%	AP_M/%	AP_L/%
Ⅰ				11.1	80.7	96.0	41.8	69.8	78.4
Ⅱ	√			11.2	77.3	96.2	44.5	70.4	75.7
Ⅲ		√		29.2	75.2	97.1	44.1	72.2	78.1
Ⅳ			√	11.6	78.5	97.7	42.5	70.0	79.5
Ⅴ	√	√		29.3	74.7	97.2	44.5	72.4	76.6
Ⅵ	√		√	11.7	77.1	97.7	44.9	70.5	77.1
Ⅶ		√	√	29.7	74.9	97.9	44.6	72.3	78.0
Ⅷ	√	√	√	29.8	74.2	98.1	45.1	72.7	76.9