首页 | 本学科首页   官方微博 | 高级检索  
     

基于多尺度特征混合注意力的连续帧深度估计
引用本文:郑宇航,曹雏清. 基于多尺度特征混合注意力的连续帧深度估计[J]. 重庆工商大学学报(自然科学版), 2024, 0(4): 104-111
作者姓名:郑宇航  曹雏清
作者单位:1. 安徽工程大学 计算机与信息学院,安徽 芜湖 2410002. 长三角哈特机器人产业技术研究院,安徽 芜湖 241000
摘    要:目的 估计获取拍摄物体到相机之间距离的深度信息是单目视觉 SLAM 中获取深度信息的方法,针对无监督单目深度估计算法出现精度不足以及误差较大的问题,提出基于多尺度特征融合的混合注意力机制的连续帧深度估计网络。 方法 通过深度估计和位姿估计的两种编码器解码器结构分别得到深度信息和 6 自由度的位姿信息,深度信息和位姿信息进行图像重建与原图损失计算输出深度信息,深度估计解码器编码器结构构成 U 型网络,位姿估计网络和深度估计网络使用同一个编码器,通过位姿估计解码器输出位姿信息;在编码器中使用混合注意力机制 CBAM 网络结合 ResNet 网络提取四个不同尺度的特征图,为了提升估计的深度信息轮廓细节在提取的每个不同尺度的特征中再进行分配可学习权重系数提取局部和全局特征再和原始特征进行融合。 结果 在 KITTI 数据集上进行训练同时进行误差以及精度评估,最后还进行了测试,与经典的 monodepth2 单目方法相比误差评估指标相对误差、均方根误差和对数均方根误差分别降低 0. 034、0. 129 和 0. 002,自制测试图片证明了网络的泛化性。 结论使用混合注意力机制结合的 ResNet 网络提取多尺度特征,同时在提取的特征上进行多尺度特征融合提升了深度估计效果,改善了轮廓细节。

关 键 词:单目视觉  连续帧深度估计  混合注意力机制  多尺度特征融合

Continuous Frame Depth Estimation Based on Multi-scale Feature Mixed Attention Mechanism
ZHENG Yuhang;CAO Chuqing. Continuous Frame Depth Estimation Based on Multi-scale Feature Mixed Attention Mechanism[J]. Journal of Chongqing Technology and Business University:Natural Science Edition, 2024, 0(4): 104-111
Authors:ZHENG Yuhang;CAO Chuqing
Affiliation:1. School of Computer and Information Anhui University of Engineering Anhui Wuhu 241000 China2. Yangtze River Delta HIT Robot Technology Research Institute Anhui Wuhu 241000 China
Abstract:Objective Estimating the depth information to obtain the distance between the photographed object and thecamera is the method to obtain the depth information in monocular vision SLAM. As unsupervised monocular depthestimation algorithms suffer from insufficient accuracy as well as large errors a continuous frame depth estimation networkbased on a hybrid attention mechanism with multi-scale feature fusion was proposed. Methods Information on depth and 6degrees of freedom of pose were obtained by two encoder-decoder structures for depth estimation and pose estimation respectively. The depth information and the pose information were used for image reconstruction with the original imageloss calculation to output the depth information. The decoder encoder structure for depth estimation formed a U-shaped network and the same encoder was used for both the pose estimation network and the depth estimation network and the pose information was output through the pose estimation decoder. The feature maps at four different scales were extractedin the encoder using a hybrid attention mechanism CBAM network combined with a ResNet network. For the enhancementof the estimated depth information contour details the extracted features of each different scale were then assignedlearnable weight coefficients to extract local and global features and then fused with the original features.Results Evaluation of error and accuracy was performed on the KITTI dataset and finally testing was also performed.Compared with the classical monodepth2 monocular method the relative error root mean square error and log root meansquare error in the error evaluation metrics were reduced by 0. 034 0. 129 and 0. 002 respectively and self-made testimages demonstrated the generalizability of the network. Conclusion The multiscale features are extracted using a ResNetnetwork combined with a hybrid attention mechanism while multiscale feature fusion on the extracted features enhancesthe depth estimation and improves the contour details.
Keywords:monocular vision   continuous frame depth estimation   hybrid attention mechanism   multiscale feature fusion
点击此处可从《重庆工商大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆工商大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号