Object Detection of Lightweight Transformer Based on Knowledge Distillation

doi:10.16182/j.issn1004731x.joss.24-0754

Abstract

Abstract:

In autonomous driving, the efficiency and accuracy of object detection are significant. Object detection based on Transformer structure has gradually become the mainstream method, eliminating the complex anchor generation and non-maximum suppression (NMS). It has problems of high computing cost and slow convergence. An object detection model of the based lightweight pooling transformer (LPT) is designed, which contains a pooling backbone network and dual pooling attention mechanism. A general knowledge distillation method is intended for the DETR (detection transformer) model, which transfers prediction results, query vector, and features extracted by the teacher as knowledge to the LPT model to improve its accuracy. To verify the application potential of the distilled LPT model in autonomous driving, extensive experiments are conducted on the MS COCO 2017 dataset. The results show that the method has great efficiency and accuracy, and is competitive with some advanced techniques.

Key words: object detection, knowledge distillation, lightweight, DETR(detection Transformer), Transformer, autonomous driving

CLC Number:

TP391.9

Wang Gaihua, Li Kehong, Long Qian, Yao Jingxuan, Zhu Bolun, Zhou Zhengshu, Pan Xuran. Object Detection of Lightweight Transformer Based on Knowledge Distillation[J]. Journal of System Simulation, 2024, 36(11): 2517-2527.

Figures/Tables 10

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Table 1

Fig. 5

Fig. 6

Fig. 7

Table 2

Table 3

References 28

1	卢裕秋, 孙金玉, 马世伟. 基于深度卷积神经网络的运动目标检测方法[J]. 系统仿真学报, 2019, 31(11): 2275-2280.
	Lu Yuqiu, Sun Jinyu, Ma Shiwei. Moving Object Detection Based on Deep Convolutional Neural Network[J]. Journal of System Simulation, 2019, 31(11): 2275-2280.
2	张稀柳, 张晓玲, 何敏军. 基于改进YOLOX-s的车辆检测方法研究[J]. 系统仿真学报, 2024, 36(2): 487-496.
	Zhang Xiliu, Zhang Xiaoling, He Minjun. Research on Vehicle Detection Method Based on Improved YOLOX-s[J]. Journal of System Simulation, 2024, 36(2): 487-496.
3	石敏, 姚瀚钦, 李淳芃, 等. 基于深度Alignment网络的足部测量[J]. 系统仿真学报, 2020, 32(7): 1267-1278.
	Shi Min, Yao Hanqin, Li Chunpeng, et al. Foot Measurement Based on Deep Alignment Network[J]. Journal of System Simulation, 2020, 32(7): 1267-1278.
4	Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2015: 1440-1448.
5	Liu Wei, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[C]//Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
6	Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-788.
7	Zhang Shifeng, Chi Cheng, Yao Yongqiang, et al. Bridging the Gap Between Anchor-based and Anchor-free Detection Via Adaptive Training Sample Selection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 9756-9765.
8	Carion N, Massa F, Synnaeve G, et al. End-to-end Object Detection with Transformers[C]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 213-229.
9	Zhu Xizhou, Su Weijie, Lu Lewei, et al. Deformable DETR: Deformable Transformers for End-to-end Object Detection[EB/OL]. (2021-03-18) [2023-11-21]. .
10	Dai Xiyang, Chen Yinpeng, Yang Jianwei, et al. Dynamic DETR: End-to-end Object Detection with Dynamic Attention[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 2968-2977.
11	Li Feng, Zhang Hao, Liu Shilong, et al. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 13609-13617.
12	高昕, 甄国涌, 储成群, 等. 基于改进YOLOv5的自动驾驶目标检测方法[J]. 科学技术与工程, 2024, 24(16): 6757-6765.
	Gao Xin, Zhen Guoyong, Chu Chengqun, et al. Autonomous Driving Target Detection Method Based on Improved YOLOv5[J]. Science Technology and Engineering, 2024, 24(16): 6757-6765.
13	Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network[EB/OL]. (2015-03-09) [2024-01-15]. .
14	Chen Guobin, Choi W, Yu Xiang, et al. Learning Efficient Object Detection Models with Knowledge Distillation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 742-751.
15	Wang Tao, Yuan Li, Zhang Xiaopeng, et al. Distilling Object Detectors with Fine-grained Feature Imitation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4928-4937.
16	Sun Ruoyu, Tang Fuhui, Zhang Xiaopeng, et al. Distilling Object Detectors with Task Adaptive Regularization[EB/OL]. (2020-06-23) [2024-02-09]. .
17	Zhang Linfeng, Ma Kaisheng. Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors[C]//ICLR 2021. New York: ICLR, 2020: 1-14.
18	Yang Zhendong, Li Zhe, Jiang Xiaohu, et al. Focal and Global Knowledge Distillation for Detectors[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 4633-4642.
19	朱志豪, 王艳, 纪志成. 基于模型压缩的安瓿瓶外观检测仿真研究[J]. 系统仿真学报, 2022, 34(12): 2575-2583.
	Zhu Zhihao, Wang Yan, Ji Zhicheng. Simulation Research on Appearance Detection of Ampoules Based on Lightweight Network and Model Compression[J]. Journal of System Simulation, 2022, 34(12): 2575-2583.
20	Yao Zhuyu, Ai Jiangbo, Li Boxun, et al. Efficient DETR: Improving End-to-end Object Detector with Dense Prior[EB/OL]. (2021-04-03) [2023-12-28]. .
21	Meng Depu, Chen Xiaokang, Fan Zejia, et al. Conditional DETR for Fast Training Convergence[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 3631-3640.
22	Roh Byungseok, Jae Woong Shin, Shin Wuhyun, et al. Sparse DETR: Efficient End-to-end Object Detection with Learnable Sparsity[EB/OL]. (2022-03-04) [2024-01-06]. .
23	Zhang Hao, Li Feng, Liu Shilong, et al. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-end Object Detection[EB/OL]. (2022-07-11) [2024-01-18]. .
24	Yu Weihao, Luo Mi, Zhou Pan, et al. MetaFormer is Actually What You Need for Vision[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 10809-10819.
25	Romero Adriana, Ballas Nicolas, Samira Ebrahimi Kahou, et al. FitNets: Hints for Thin Deep Nets[EB/OL]. (2015-03-27) [2024-02-21]. .
26	Zheng Zhaohui, Ye Rongguang, Hou Qibin, et al. Localization Distillation for Object Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 10070-10083.
27	Zhao Yian, Wenyu Lü, Xu Shangliang, et al. DETRs Beat YOLOs on Real-time Object Detection[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 16965-16974.
28	Liu Shilong, Li Feng, Zhang Hao, et al. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR[EB/OL]. (2022-03-30) [2024-03-07]. .

模型	Pooling Backbone	HPT	参数量/M	权重值/MB	帧率/(帧/s)	AP
RT-DETR	—	—	42.944	164	19.99	47.0
LPTv1	√	—	40.136	153	21.13	45.8
LPTv2	—	√	44.622	170	20.92	46.2
LPTv3	√	√	41.814	159	22.05	45.7

模型	模块	Backbone	AP	AP_S	AP_M	AP_L
Deformable DETR	教师	Resnet-101	45.5	27.5	48.7	60.3
	学生(未蒸馏)	Resnet-50	44.1	27.0	47.4	58.3
	学生(蒸馏)	Resnet-50	46.6	28.5	48.6	61.0
Conditional DETR	教师	Resnet-101	42.4	22.6	46.0	61.2
	学生(未蒸馏)	Resnet-50	40.7	20.3	43.8	60.0
	学生(蒸馏)	Resnet-50	42.9	21.6	46.5	62.2
LPT	教师	HgnetV2	48.1	29.3	51.9	66.4
	学生(未蒸馏)	Pooling backbone	45.7	27.8	49.0	63.8
	学生(蒸馏)	Pooling backbone	48.3	28.9	49.7	65.5

模型	参数量/M	计算复杂度/G	帧率/(帧/s)	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
DETR	41.580	86.556	14.80	15.5	29.4	14.5	4.3	15.1	26.7
DAB-DETR	43.722	90.740	10.80	38.0	60.3	39.8	19.2	40.9	55.4
RT-DETR	42.940	69.157	19.99	47.0	64.6	50.8	28.5	51.1	65.2
Ours(未蒸馏)	41.814	60.949	22.02	45.7	63.4	48.9	27.8	49.0	63.8
Ours(蒸馏)	41.814	60.949	22.02	48.3	64.4	51.2	28.9	49.7	65.5

[1]	Jiang Yanji, Zhang Yingyang, Dong Hao, Zhang Xiaoguang, Wang Meihui. Lane Detection in Dark Light Based on Instance Association [J]. Journal of System Simulation, 2025, 37(9): 2188-2199.
[2]	Li Mingyu, Lin Jiaquan. Lightweight Driver Face Object Detection Algorithm Based on YOLOv8-DF [J]. Journal of System Simulation, 2025, 37(8): 2103-2114.
[3]	Wu Jianping, Li Guanzhou, Zhao Shuai, Huang Ling. Intelligent Transition of Automotive Industry Driven by Autonomous Driving Simulation Testing Technology [J]. Journal of System Simulation, 2025, 37(7): 1649-1664.
[4]	Yang Lu, Pei Junying. Aerial Target Detection Algorithm Fused with Multi-scale Features [J]. Journal of System Simulation, 2025, 37(6): 1486-1498.
[5]	Feng Zhiyuan, Chen Ying. Self-supervised Defect Detection via Discriminative Enhancement-based Distillation Learning [J]. Journal of System Simulation, 2025, 37(6): 1499-1511.
[6]	Li Jie, Liu Yang, Li Liang, Su Bengan, Wei Jialong, Zhou Guangda, Shi Yanmin, Zhao Zhen. Remote Sensing Small Object Detection Based on Cross-stage Two-branch Feature Aggregation [J]. Journal of System Simulation, 2025, 37(4): 1025-1040.
[7]	Jiang Lun, Wang Dajiang, Sun Wenlei, Bao Shenghui, Liu Han, Chang Saike. Research on Transformer Fault Diagnosis Method Based on Digital Twin [J]. Journal of System Simulation, 2025, 37(3): 775-790.
[8]	Wu Shuheng, Liu Yongkui, Zhang Lin, Xiao Yingying, Wang Lihui. Lightweight Assembly Workpiece Detection Algorithm Based on Improved YOLOv8 [J]. Journal of System Simulation, 2025, 37(12): 3099-3111.
[9]	Zhou Congling, Wang Chunpeng, Xie Qiwei, Wang Yongqiang, Shen Lijun. Parking Space Reasoning Model for Complex Scenarios [J]. Journal of System Simulation, 2025, 37(11): 2724-2740.
[10]	Gu Hao, Wang Jiayu, Xiong Weili. Soft Sensor Modeling Based on Improved Transformer in Dual-stream Framework [J]. Journal of System Simulation, 2025, 37(10): 2594-2604.
[11]	Ji Peng, Zhao Jinpeng, Jiang Limin. Design and Function Analysis of New Steering System for Autonomous Vehicle [J]. Journal of System Simulation, 2025, 37(1): 211-219.
[12]	Li Dongxue, Liu Yan, Shen Boyao, Jing Yongteng, Ma Qiang, Liu Ran. Carbon Footprint Analysis and Low-carbon Optimization Method Simulation Study of Power Transformer Based on Digital Twin Technology [J]. Journal of System Simulation, 2024, 36(9): 2075-2085.
[13]	Liu Peijin, Fu Xuefeng, Sun Haofeng, He Lin, Liu Shujie. A Highly Robust Target Tracking Algorithm Merging CNN and Transformer [J]. Journal of System Simulation, 2024, 36(8): 1854-1868.
[14]	Li Xiang, Sang Haifeng. Dense Video Description Method Based on Multi-modal Fusion in Transformer Network [J]. Journal of System Simulation, 2024, 36(5): 1061-1071.
[15]	Shi Lanxi, Yan Wenxu, Ni Hongyu, Zhao Feng. Research on Dynamic Scene SLAM Based on Improved Object Detection [J]. Journal of System Simulation, 2024, 36(4): 1028-1042.