Abstract:Traditional visual SLAM systems have achieved remarkable success in robot localization and mapping work, but there are pressing problems such as lack of scene information, too sparse maps, and difficulties in initializing monocular cameras. In this paper, we propose MNS-SLAM (Monocular-semantic SLAM), which combines object detection algorithms with monocular visual SLAM (simultaneous localization and map construction) techniques, and then constructs semi-dense semantic maps that contribute to environmental understanding. First, the bounding box and category information are obtained through the object detection network YOLOv4 detection object, and the 3D cube and quadrics are recovered from 2D object detection by the vanishing point algorithm and quadric recovery algorithm to realize the initialization of the 3D object's pose. Meanwhile, the semantic constraint of relative bit-pose invariance among targets is introduced, the semantic loss function is constructed and added to BA optimization, and finally the semi-dense map with object semantic information is constructed by incremental 3D line segment extraction. The method in the paper is tested on the TUM public dataset and real scenarios, which not only constructs semi-dense maps, but also adds semantic information to provide new constraints for back-end optimization, and the absolute and relative positional errors of the cameras show better performance than monocular ORB-SLAM2, which helps mobile robots equipped with monocular cameras to perceive and understand the environment and perform more complex tasks.