系统仿真学报 ›› 2026, Vol. 38 ›› Issue (1): 73-83.doi: 10.16182/j.issn1004731x.joss.25-0858

• 论文 • 上一篇    下一篇

PL-Mamba:基于双模态融合的三维点云语义分割网络

朱贺1, 周锋1, 张琪1, 朱孟笑1, 代菊2   

  1. 1.北方工业大学,北京 100044
    2.鹏程实验室,广东 深圳 518055
  • 收稿日期:2025-09-06 修回日期:2025-10-22 出版日期:2026-01-18 发布日期:2026-01-28
  • 通讯作者: 周锋
  • 第一作者简介:朱贺(2001-),男,硕士生,研究方向为计算机视觉。
  • 基金资助:
    北京市自然科学基金(4232023);北京市教委科技计划一般项目(KM202310009002);北方工业大学毓秀创新项目(2024NCU TYXCX202);北方工业大学2025年度青年科研专项(2025NCUTYRSP019);教育部人文社会科学研究基于XR技术的非遗舞蹈智能化展示及保护研究项目(24YJCZH458)

PL-Mamba: A 3D Point Cloud Semantic Segmentation Network Based on Bimodal Fusion

Zhu He1, Zhou Feng1, Zhang Qi1, Zhu Mengxiao1, Dai Ju2   

  1. 1.North China University of Technology, Beijing 100044, China
    2.Peng Cheng Laboratory, Shenzhen 518055, China
  • Received:2025-09-06 Revised:2025-10-22 Online:2026-01-18 Published:2026-01-28
  • Contact: Zhou Feng

摘要:

为提升点云语义分割中的语义辨别能力,提出一种以点云(P)与语言(L)双模态融合为核心的三维点云语义分割网络——PL-Mamba,以PointMamba为主干网络,结合其出色的长序列建模与全局感知能力,引入语言提示机制,通过预训练语言模型BERT对类别标签进行上下文编码,获取语义丰富的文本特征。该文本信息作为语言引导token与点云特征通过跨模态注意力机制进行深度融合,从而实现语义对齐与区域增强,有效缓解点云本身语义表达能力弱、类别混淆严重的问题。在ScanNet大规模室内点云分割数据集上进行实验,结果表明:所提PL-Mamba方法在ScanNet上达到78.21%mIoU,较当前强基线BFANet (78.00%)提升0.21%,优于FEAST-Mamba (77.80%)等主流模型。

关键词: 计算机视觉, 语义分割, 点云, 双模态, 物体检测

Abstract:

To enhance the semantic discrimination capability in point cloud semantic segmentation, a 3D point cloud semantic segmentation network named PL-Mamba is proposed, which is centered on the fusion of point cloud (P) and language (L) dual modalities. This method takes PointMamba as the backbone network, leveraging its excellent long-sequence modeling and global perception capabilities. It introduces a language prompt mechanism and uses a pretrained language model BERT to encode the context of category labels,obtaining semantically rich text features.The text information serves as a language guided token and is deeply integrated with point cloud features through cross modal attention mechanism, thereby achieving semantic alignment and region enhancement, effectively alleviating the problems of weak semantic expression ability and severe category confusion in the point cloud itself. The experimental results conducted on the ScanNet large-scale indoor point cloud segmentation dataset show that the proposed PL-Mamba method achieves 78.21% mIoU on ScanNet,which is 0.21% higher than the baseline BFANet (78.00%) and also better than Mamba-based models such as FEAST-Mamba (77.80%).

Key words: computer vision, semantic segmentation, point cloud, dual modalities, object detection

中图分类号: