Journal of System Simulation ›› 2024, Vol. 36 ›› Issue (11): 2616-2630.doi: 10.16182/j.issn1004731x.joss.23-0926

Previous Articles    

Global-local Fusion for Efficient 3D Object Detection

Lu Bin1,2, Wang Minghan1,2, Sun Yang1,2, Yang Zhenyu1,2   

  1. 1.Department of Computer, North China Electric Power University, Baoding 071003, China
    2.Key Laboratory of Energy and Electric Power Knowledge Calculation in Hebei Province, Baoding 071003, China
  • Received:2023-07-21 Revised:2023-09-12 Online:2024-11-13 Published:2024-11-19
  • Contact: Wang Minghan

Abstract:

As the 3D object detection based on point clouds shows an incapacity of feature extraction and incongruity between classification and regression, this research introduces a novel ResCST architecture based on the SECOND network. It incorporates residual connections into the 3D sparse convolutional layer, with the advantages of capturing long-distance dependent relation by SwinTransformer and obtaining local features by convolutional neural network integrated, proposing the CNN-SwinTransformer hybrid model for enhanced feature extraction. It introduces the RCIoU method for the joint optimization of classification and regression tasks. The experimental results show that the model achieves a 3D detection accuracy of 91.21%, 82.97%, and 80.28% under easy, moderate, and hard levels in detecting cars of the KITTI dataset respectively. The proposed method significantly improves the performance of detecting hard-level targets at an inference speed of 25 frames per second. The proposed ResCST architecture achieves a good balance between accuracy and efficiency.

Key words: 3D object detection, point cloud, feature fusion, attention mechanism, vehicle detection, voxelization, autonomous driving

CLC Number: