《计算机应用》唯一官方网站

• •    下一篇

FCMdepth:多尺度特征优化的单目深度估计框架

刘凤春1,邵馨莹1,张春英2,王立亚1,任静1,1   

  1. 1. 华北理工大学
    2. 华北理工大学主校区
  • 收稿日期:2025-08-05 修回日期:2025-10-14 发布日期:2025-11-05 出版日期:2025-11-05
  • 通讯作者: 张春英

FCMdepth: monocular depth estimation framework with multi-scale feature optimization

  • Received:2025-08-05 Revised:2025-10-14 Online:2025-11-05 Published:2025-11-05

摘要: 针对单目深度估计中特征提取不足、上下文建模不充分的问题,提出一种融合多尺度特征的优化框架——FCMdepth,以提升预测性能。FCMdepth采用编解码结构,编码器FC-Net由MobileNetV3-F与CDBlock组成,通过多尺度特征及空洞卷积优化特征;解码器LapMA-Net结合拉普拉斯金字塔与高效多尺度注意力模块(Efficient Multi-Scale Attention, EMA)模块,增强跨尺度特征融合,输出准确深度图。在KITTI和NYU-Depth V2两个数据集上实验结果表明,FCMdepth框架相较于Lite-mono、Hr-depth、Lapdepth等模型,均方根误差(RMSE)、均方根对数误差(RMSE_Log)、绝对相对误差(Abs_Rel)、平方相对误差(Sq_Rel)四项误差指标均值分别低0.605、0.117、0.183、0.279,三项准确率指标均值分别提高1.5、1.4、0.8个百分点。FCMdepth在多数指标上优于对比模型,为单目深度估计和复杂场景的三维重建提供有效参考。

Abstract: To address the issues of insufficient feature extraction and inadequate context modeling in monocular depth estimation, a multi-scale feature fusion optimization framework, FCMdepth, was proposed to enhance prediction performance. FCMdepth adopts an encoder-decoder structure, where the encoder, FC-Net, consists of MobileNetV3-F and CDBlock, and features were optimized through multi-scale extraction and dilated convolutions. The decoder, LapMA-Net, combines the Laplacian pyramid with an Efficient Multi-scale Attention (EMA) module to enhance cross-scale feature fusion and outputs accurate depth maps. Experiments on the KITTI and NYU-Depth V2 datasets show that FCMdepth outperforms models such as Lite-mono, Hr-depth, and Lapdepth in four error metrics: Root Mean Square Error (RMSE), Root Mean Square Logarithmic Error (RMSE_Log), Absolute Relative error (Abs_Rel), and Square Relative error (Sq_Rel), with average reductions of 0.605, 0.117, 0.183, and 0.279, respectively. Furthermore, FCMdepth framework improves three accuracy metrics by an average of 1.5, 1.4, and 0.8 percentage points. FCMdepth demonstrates superior performance compared to other methods and provides an effective reference for monocular depth estimation and 3D reconstruction in complex scenes.

中图分类号: