Journal of System Simulation ›› 2026, Vol. 38 ›› Issue (1): 73-83.doi: 10.16182/j.issn1004731x.joss.25-0858

• Papers • Previous Articles     Next Articles

PL-Mamba: A 3D Point Cloud Semantic Segmentation Network Based on Bimodal Fusion

Zhu He1, Zhou Feng1, Zhang Qi1, Zhu Mengxiao1, Dai Ju2   

  1. 1.North China University of Technology, Beijing 100044, China
    2.Peng Cheng Laboratory, Shenzhen 518055, China
  • Received:2025-09-06 Revised:2025-10-22 Online:2026-01-18 Published:2026-01-28
  • Contact: Zhou Feng

Abstract:

To enhance the semantic discrimination capability in point cloud semantic segmentation, a 3D point cloud semantic segmentation network named PL-Mamba is proposed, which is centered on the fusion of point cloud (P) and language (L) dual modalities. This method takes PointMamba as the backbone network, leveraging its excellent long-sequence modeling and global perception capabilities. It introduces a language prompt mechanism and uses a pretrained language model BERT to encode the context of category labels,obtaining semantically rich text features.The text information serves as a language guided token and is deeply integrated with point cloud features through cross modal attention mechanism, thereby achieving semantic alignment and region enhancement, effectively alleviating the problems of weak semantic expression ability and severe category confusion in the point cloud itself. The experimental results conducted on the ScanNet large-scale indoor point cloud segmentation dataset show that the proposed PL-Mamba method achieves 78.21% mIoU on ScanNet,which is 0.21% higher than the baseline BFANet (78.00%) and also better than Mamba-based models such as FEAST-Mamba (77.80%).

Key words: computer vision, semantic segmentation, point cloud, dual modalities, object detection

CLC Number: