A Survey on Visual SLAM based on Deep Learning

doi:10.16182/j.issn1004731x.joss.19-VR0466

Abstract

Abstract: Following the development of computer vision and robotics, visual Simultaneous Localization and Mapping becomes a research focus in the field of unmanned systems. The powerful advantages of deep learning in the image processing offer a huge opportunity to the wide combination of the two fields. The outstanding research achievements of deep learning combined with visual odometry, loop closure detection and semantic Simultaneous Localization and Mapping are summarized. A comparison between the traditional algorithm and method based on deep learning is carried out. The development direction of visual Simultaneous Localization and Mapping based on deep learning is forecasted.

Key words: deep learning, Visual Simultaneous Localization and Mapping, visual odometry, loop closure detection, semantic Simultaneous Localization and Mapping

CLC Number:

TP391.9

Liu Ruijun, Wang Xiangshang, Zhang Chen, Zhang Bohua. A Survey on Visual SLAM based on Deep Learning[J]. Journal of System Simulation, 2020, 32(7): 1244-1256.

References 54

[1]	Cadena C, Carlone L, Carrillo H, et al.Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age[J]. IEEE Transactions on Robotics (S1552-3098), 2016, 32(6): 1309-1332.
[2]	Fuentes-Pacheco J, Ruiz-Ascencio J, Rendón-Mancha J M, et al. Visual simultaneous localization and mapping: A survey[J]. Artificial Intelligence Review (S0269-2821), 2015, 43(1): 55-81.
[3]	刘浩敏, 章国峰, 鲍虎军. 基于单目视觉的同时定位与地图构建方法综述[J]. 计算机辅助设计与图形学学报, 2016, 28(6): 855-868.Liu Haomin, Zhang Guofeng, Bao Hujun.A survey of monocular simiultaneous localization and mapping[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(6): 855-868.
[4]	Geiger A, Lenz P, Urtasun R.Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA: IEEE, 2012: 3354-3361.
[5]	Kummerle R, Grisetti G, Strasdat H, et al.g2o: A general framework for graph optimization[C]// IEEE International Conference on Robotics and Automation. Piscataway, USA: IEEE, 2011: 3607-3613.
[6]	Belter D, Skrzypczyński P.Precise self-localization of a walking robot on rough terrain using ptam[M]. Baltimore, USA: Adaptive Mobile Robotics, 2012: 89-96.
[7]	Mur-artal R, Tardos J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics (S1552-3098), 2017, 23(5): 1255-1262.
[8]	Engel J, Koltunk V, Cremers D.Direct sparse odometry[J]. IEEE Transaction on Pattern Analysis and Machine Intelligence (S0162-8828), 2018, 40(3): 611-625.
[9]	He K, Zhang X, Ren S, et al.Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. LAS VEGAS: IEEE, 2016: 779-788.
[10]	Ren S, He K, Girshick R B, et al.Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (S0162-8828), 2017, 39(6): 1137-1149.
[11]	Donahue J, Anne Hendricks L, Guadarrama S, et al.Long-term recurrent convolutional networks for visual recognition and description[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3128-3137.
[12]	Sünderhauf N, Pham T T, Latif Y, et al.Meaningful maps with object-oriented semantic mapping[C]// 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, Canada: IEEE, 2017: 5079-5085.
[13]	Zhou Y, Li H, Kneip L.Canny-vo: Visual odometry with rgb-d cameras based on geometric 3-d-2-d edge alignment[J]. IEEE Transactions on Robotics (S1552-3098), 2018, 35(1): 184-199.
[14]	Costante G, Mancini M, Valigi P, et al.Exploring representation learning with CNNs for frame-to-frame ego-motion estimation[J]. IEEE Robotics and Automation Letters (S2377-3766), 2016, 1(1): 18-25.
[15]	Shahid M, Naseer T, Burgard W.DTLC: Deeply trained loop closure detections for lifelong visual SLAM[C]// Proceedings, Workshop on Visual Place Recognition, Conference on Robotics: Science and Systems (RSS). Ann Arbor, USA: RSS, 2016: 1-8.
[16]	Hou Y, Zhang H, Zhou S L.Convolutional neural networkbased image representation for visual loop closure detection[C]// IEEE International Conference on Information and Automation. Piscataway, USA: IEEE, 2015: 2238-2245.
[17]	Daniel D, Malisiewicz T, Rabinovich A. Toward geometric deep SLAM[EB/OL]. (2017-07-24) [2019-08-20], https://arxiv.org /pdf/1707.07410.pdf.
[18]	Sharif Razavian A, Azizpour H, Sullivan J, et al.CNN features off-the-shelf: an astounding baseline for recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Columbus, Ohio: IEEE, 2014: 806-813.
[19]	Wang S, Clark R, Wen H, et al.Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]// 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 2043-2050.
[20]	Donahue J, Anne Hendricks L, Guadarrama S, et al.Long-term recurrent convolutional networks for visual recognition and description[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015: 2625-2634.
[21]	Elman J L.Finding structure in time[J]. Cognitive science (S0364-0213), 1990, 14(2): 179-211.
[22]	Graves A.Supervised Sequence Labeling with Recurrent Neural Networks[M]. Heidelberg: Springer, 2012: 5-13.
[23]	Chen Z, Jacobson A, Sünderhauf N, et al.Deep learning features at scale for visual place recognition[C]// 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 3223-3230.
[24]	Sünderhauf N, Dayoub F, Shirazi S, et al.On the Performance of ConvNet Features for Place Recognition[C]// International Conference on Intelligent Robots and Systems (IROS). Hamburg: IEEE, 2015: 4297-4304.
[25]	Yi H, Hong Z, Zhou S.BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition[J]. Autonomous Robots (S0929-5593), 2017, 42(9): 1-17.
[26]	Lin K, Yang H F, Hsiao J H, et al.Deep learning of binary hash codes for fast image retrieval[C]// Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Boston, USA: IEEE, 2015: 27-35.
[27]	Sünderhauf N, Shirazi S, Jacobson A, et al.Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free[C]// Proceedings of Robotics: Science and Systems XI. Michigan, USA: RSS, 2015: 1-10.
[28]	Parisotto E, Singh Chaplot D, Zhang J, et al.Global pose estimation with an attention-based recurrent network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City: IEEE, 2018: 237-246.
[29]	Hwang J, Park S, Kwak N.Athlete pose estimation by a global-local network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, Hawaii: IEEE, 2017: 58-65.
[30]	Southall C, Stables R, Hockman J.Automatic Drum Transcription for Polyphonic Recordings Using Soft Attention Mechanisms and Convolutional Neural Networks[C]// The 18th International Society for Music Information Retrieval Conference. Suzhou: ISMIR, 2017: 606-612.
[31]	Sünderhauf N, Pham T T, Latif Y, et al.Meaningful Maps with Object-Oriented Semantic Mapping[C]// 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). New York: IEEE, 2017: 5079-5085.
[32]	Ng P C, Henikoff S.SIFT: Predicting amino acid changes that affect protein function[J]. Nucleic Acids Research (S0305-1048), 2003, 31(13): 3812-3814.
[33]	Lei H, Akhtar N, Mian A.Octree guided CNN with Spherical Kernels for 3D Point Clouds[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA: IEEE, 2019: 9631-9640.
[34]	Mani I, Zhang I.KNN approach to unbalanced data distributions: a case study involving information extraction[C]// Proceedings of workshop on learning from imbalanced datasets. Washington: ICML, 2003: 126.
[35]	Radwan N, Valada A, Burgard W.VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry[J]. IEEE Robotics and Automation Letters (S2377-3766), 2018, 3(4): 4407-4414.
[36]	Girisha S, Manohara P, Ujjwal V, et al.Semantic Segmentation of UAV Aerial Videos using Convolutional Neural Networks[C]// 2019 IEEE Knowledge Engineering (AIKE). Sardinia, Italy: IEEE, 2019: 21-27.
[37]	Han Y, Ye J C.Framing U-Net via deep convolutional framelets: Application to sparse-view CT[J]. IEEE Transactions on Medical Imaging (S0278-0062), 2018, 37(6): 1418-1429.
[38]	Bowman S L, Atanasov N, Daniilidis K, et al.Probabilistic data association for semantic slam[C]// 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore: IEEE, 2017: 1722-1729.
[39]	Jordan M I, Jacobs R A.Hierarchical Mixtures of Experts and the EM Algorithm[J]. Neural Computation (S0899-7667), 1994, 6(2): 181-214.
[40]	Engel J, Koltun V, Cremers D.Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (S0162-8828), 2017, 40(3): 611-625.
[41]	Geiger A, Ziegler J, Stiller C.Stereoscan: Dense 3d reconstruction in real-time[C]// 2011 IEEE Intelligent Vehicles Symposium (IV). Baden-Baden, Germany: IEEE, 2011: 963-968.
[42]	Loo S Y, Amiri A J, Mashohor S, et al.CNN-SVO: Improving the mapping in semi-direct visual odometry using single-image depth prediction[C]// 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, 2019: 5218-5223.
[43]	Zhan H, Weerasekera C S, Bian J, et al. Visual Odometry Revisited: What Should Be Learnt?[EB/OL]. (2019/09/21) [2019/10/05], https://arxiv.org/abs/1909.09803.pdf.
[44]	Zhou T, Brown M, Snavely N, et al.Unsupervised learning of depth and ego-motion from video[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii: IEEE , 2017: 1851-1858.
[45]	Cieslewski T, Choudhary S, Scaramuzza D.Data-efficient decentralized visual SLAM[C]// 2018 IEEE International Conference on Robotics and Automation (ICRA). Prague, Czech Republic,: IEEE, 2018: 2466-2473.
[46]	Li S, Zhi Y, Anestis Z, et al.Recurrent-OctoMap: Learning State-based Map Refinement for Long-Term Semantic Mapping with 3D-Lidar Data[J]. IEEE Robotics and Automation Letters (S2377-3766), 2018, 3(4): 3749-3756.
[47]	Hornung A, Kai M W, Bennewitz M, et al.OctoMap: An efficient probabilistic 3D mapping framework based on octrees[J]. Autonomous Robots (S0929-5593), 2013, 34(3): 189-206.
[48]	Zhang J, Singh S.Laser-visual-inertial odometry and mapping with high robustness and low drift[J]. Journal of Field Robotics (S1556-4967), 2018, 35(8): 1242-1264.
[49]	Garcia-Fidalgo E, Ortiz A.Vision-based topological mapping and localization methods: a survey[J]. Robotics and Autonomous Systems (S0921-8890), 2015, 64: 1-20.
[50]	Engel J, Schöps T, Cremers D.LSD-SLAM: Large-Scale Direct Monocular SLAM[M]. Munich: Computer Vision - ECCV 2014. 2014: 834-849.
[51]	Scherer S A, Zell A.Efficient onbard RGBD-SLAM for autonomous MAVs[C]// 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013: 1062-1068.
[52]	Vijayanarasimhan S, Ricco S, Schmid C, et al. Sfm-net: Learning of structure and motion from video[EB/OL]. (2017/04/25) [2019/08/25], https://arxiv.org/abs/ 1704.07804.pdf.
[53]	张峻宁, 苏群星, 刘鹏远, 等. 一种自适应特征地图匹配的改进VSLAM算法[J]. 自动化学报, 2019, 45(3): 553-565.Zhang Junning, Su Qunxing, Liu Pengyuan, et al.An Improved VSLAM Algorithm Based on Adaptive Feature Map[J]. Acta Automatica Sinica, 2019, 45(3): 553-565.
[54]	Grisetti G, Kümmerle R, Strasdat H, et al.g2o: A general framework for (hyper) graph optimization[C]// 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China: IEEE, 2011: 3607-3613.