系统仿真学报 ›› 2024, Vol. 36 ›› Issue (4): 1028-1042.doi: 10.16182/j.issn1004731x.joss.22-1332

• 论文 • 上一篇    

基于改进目标检测的动态场景SLAM研究

史蓝兮1(), 颜文旭1(), 倪宏宇2, 赵峰2   

  1. 1.江南大学 物联网工程学院,江苏 无锡 214100
    2.国网绍兴供电公司,浙江 绍兴 312000
  • 收稿日期:2022-11-09 修回日期:2023-01-06 出版日期:2024-04-15 发布日期:2024-04-18
  • 通讯作者: 颜文旭 E-mail:2066760176@qq.com;ywx01@jiangnan.edu.cn
  • 第一作者简介:史蓝兮(1997-),女,硕士生,研究方向为视觉SLAM。E-mail:2066760176@qq.com
  • 基金资助:
    国网浙江省电力有限公司科技项目(5211SX220003)

Research on Dynamic Scene SLAM Based on Improved Object Detection

Shi Lanxi1(), Yan Wenxu1(), Ni Hongyu2, Zhao Feng2   

  1. 1.School of Internet of Things Engineering, Jiangnan University, Wuxi 214100, China
    2.State Grid Shaoxing Power Supply Company, Shaoxing 312000, China
  • Received:2022-11-09 Revised:2023-01-06 Online:2024-04-15 Published:2024-04-18
  • Contact: Yan Wenxu E-mail:2066760176@qq.com;ywx01@jiangnan.edu.cn

摘要:

针对单目SLAM在动态场景下存在的对极约束误匹配问题,提出一种基于目标检测的动态特征点选择 方法 ,通过在特征提取时剔除SLAM系统前端图像帧中动态特征点,提高SLAM的定位精度。提出了一个改进的目标检测网络,利用重叠面积、距离相似度和余弦相似度构建描述边界框的回归损失函数,实现目标的准确定位,获得当前图像帧中物体特征点范围。判断物体类别,对于标记为动态的物体根据目标检测结果剔除前端图像帧中的动态特征点。根据静态特征点,采用对极约束进行两帧图像间的特征匹配估计位姿,对单目相机运动进行跟踪、建图与闭环检测。通过对目标检测网络的主干进行结构重参数化改进,提升推理过程的速度,保证整体系统运行的实时性。在公开数据集KITTI的11个序列上的实验结果表明:改进后的系统比ORB-SLAM3系统定位精度提升了23.4%,帧率可以达到30 帧/s以上,在保证实时运行的条件下能有效提高动态场景下单目SLAM系统定位精度。

关键词: 视觉SLAM, 对极约束, 特征匹配, 目标检测, IoU损失函数, 结构重参数化

Abstract:

Aiming at the epipolar constraint matching problem of monocular SLAM in dynamic scenes adynamic feature point selection method based on object detection is proposed, in which the dynamic feature points in the front-end image frame of SLAM system is eliminated during feature extraction to improve the localization accuracy of SLAM. An improved target detection network is proposed to construct a loss function to describe the bounding box by using the overlap area, distance similarity and cosine similarity, which can achieve the accurate localization of target objects and obtain the range of object feature points in the current image frame. The object category is judged in SLAM, and the dynamic feature points in the front-end image frame are rejected according to the target detection result for the objects marked as dynamic. Based on the static feature point results, the epipolar geometry is used for the feature matching between two frames to estimate pose the to carry out the tracking, map building and closed-loop detection of monocular camera motion. The speed of the inference process is improved by the structural reparameterization of the backbone of target detection network to ensure the real-time operation of the overall system. Experimental results on KITTI dataset show that the improved system improves the localization accuracy by 23.4% over ORB-SLAM3 system, and the frame rate can reach more than 30fps. The algorithm can effectively improve the localization accuracy of monocular SLAM system in dynamic scenes under the condition of ensuring the real-time operation.

Key words: visual SLAM, epipolar geometry, feature matching, object detection, IoU loss function, structural reparameterization

中图分类号: