Moncular Depth Estimation Based on Scene Flow Compensation in Dynamic Scenes

doi:10.11772/j.issn.1001-9081.2025111399

Abstract

Abstract: To address the issue that the confusion between independently moving objects and camera motion in dynamic scenes degrades depth estimation accuracy in self-supervised monocular depth estimation, this paper proposes a depth estimation method based on scene flow compensation. By extracting 3D scene flow and moving object masks, the independent motion of objects is decoupled, which is introduced as a motion prior into the construction of the Compensated Cost Volume to dynamically compensate pixel matching and suppress interference from moving objects. Regarding the model architecture, the proposed model adopts a high-resolution encoder to preserve detailed information, and a channel attention-augmented decoder is designed. Experimental results show that the model achieves an absolute relative error (AbsRel) of 0.098 and a threshold accuracy (δ?) of 0.889 on the KITTI dataset. On the NuScenes dataset with complex dynamic objects, it achieves an AbsRel of 0.149 and a δ? of 0.806. Visualization results demonstrate accurate depth estimation for dynamic objects.

Key words: Self-supervised, Depth estimation, Scene flow, Moving objects, Cost volume

摘要： 针对自监督单目深度估计中，动态场景下自主运动物体与相机运动混淆导致深度估计精度下降的问题，本文提出基于场景流补偿的深度估计方法，通过提取三维场景流和运动物体掩膜解耦物体独立运动，将其作为运动先验引入成本体积(Compensated Cost Volume)构建以动态补偿像素匹配，抑制运动目标干扰。在模型结构方面，所提出模型采用高分辨率编码器以保留细节信息，并设计通道注意力增强解码器。实验结果表明，该模型在 KITTI 数据集上绝对相对误(AbsRel)指标达到了 0.098，阈值准确率(δ1)指标达到了 0.889，在包含复杂动态体的 NuScenes 数据集上 AbsRel 指标达到了0.149，δ1指标达到了0.806，可视化结果展示了对动态物体的正确深度估计。

关键词: 自监督, 深度估计, 场景流, 运动物体, 成本体积

CLC Number:

TP391

张瑞欣于红绯. 动态场景下基于场景流补偿的单目深度估计[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025111399.

[1]	Jing ZHANG, Songhua LIU, Yuanqian ZHU. Time series representation method based on spectral sensing and hierarchical convolution [J]. Journal of Computer Applications, 2026, 46(4): 1124-1130.
[2]	Wen LI, Kairong LI, Kai YANG. Subgraph-aware contrastive learning with data augmentation [J]. Journal of Computer Applications, 2026, 46(1): 1-9.
[3]	Chao LIU, Yanhua YU. Knowledge-aware recommendation model combining denoising strategy and multi-view contrastive learning [J]. Journal of Computer Applications, 2025, 45(9): 2827-2837.
[4]	Jin XIE, Surong CHU, Yan QIANG, Juanjuan ZHAO, Hua ZHANG, Yong GAO. Dual-branch distribution consistency contrastive learning model for hard negative sample identification in chest X-rays [J]. Journal of Computer Applications, 2025, 45(7): 2369-2377.
[5]	Zonghang WU, Dong ZHANG, Guanyu LI. Multimodal fusion recommendation algorithm based on joint self-supervised learning [J]. Journal of Computer Applications, 2025, 45(6): 1858-1868.
[6]	Junyi ZHU, Leilei CHANG, Xiaobin XU, Zhiyong HAO, Haiyue YU, Jiang JIANG. Self-supervised learning method using minimal prior knowledge [J]. Journal of Computer Applications, 2025, 45(4): 1035-1041.
[7]	Guangju YANG, Tianjian LUO, Kaijun WANG, Siqi YANG. Multi-branch multi-view based contextual contrastive representation learning method for time series [J]. Journal of Computer Applications, 2025, 45(4): 1042-1052.
[8]	Changjiang JIANG, Jie XIANG, Xuying HE. Binocular vision object localization algorithm for robot arm grasping [J]. Journal of Computer Applications, 2025, 45(11): 3698-3706.
[9]	Zhenyuan LIANG, Songlin JIANG, Songhao ZHU. Self-supervised image denoising based on blind-ring network and random recovery mask [J]. Journal of Computer Applications, 2025, 45(10): 3311-3319.
[10]	Jianfeng YANG, Bin CHEN, Yuxuan LI. Self-supervised point cloud anomaly detection method based on point cloud reconstruction [J]. Journal of Computer Applications, 2025, 45(10): 3302-3310.
[11]	Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718.
[12]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[13]	Guijin HAN, Xinyuan ZHANG, Wentao ZHANG, Ya HUANG. Self-supervised image registration algorithm based on multi-feature fusion [J]. Journal of Computer Applications, 2024, 44(5): 1597-1604.
[14]	Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL： positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492.
[15]	Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.

Moncular Depth Estimation Based on Scene Flow Compensation in Dynamic Scenes

动态场景下基于场景流补偿的单目深度估计

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics