《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1253-1263.DOI: 10.11772/j.issn.1001-9081.2025040488

• 多媒体计算与计算机仿真 • 上一篇    

RGB-D双流镜像伪装目标检测网络

陈鹏1,2, 李旭1,2, 余肖生1,2()   

  1. 1.湖北省水电工程智能视觉监测重点实验室(三峡大学),湖北 宜昌 443002
    2.三峡大学 计算机与信息学院,湖北 宜昌 443002
  • 收稿日期:2025-05-02 修回日期:2025-07-25 接受日期:2025-07-28 发布日期:2025-07-30 出版日期:2026-04-10
  • 通讯作者: 余肖生
  • 作者简介:陈鹏(1973—),男,湖北恩施人,教授,博士,CCF会员,主要研究方向:计算机视觉、健康医疗大数据分析
    李旭(2000—),男,湖南邵阳人,硕士,CCF会员,主要研究方向:伪装目标检测、深度学习;
  • 基金资助:
    国家重点研发计划项目(2016YFC0802500)

RGB-D dual-stream mirror network for camouflaged object detection

Peng CHEN1,2, Xu LI1,2, Xiaosheng YU1,2()   

  1. 1.Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (China Three Gorges University),Yichang Hubei 443002,China
    2.College of Computer and Information Technology,China Three Gorges University,Yichang Hubei 443002,China
  • Received:2025-05-02 Revised:2025-07-25 Accepted:2025-07-28 Online:2025-07-30 Published:2026-04-10
  • Contact: Xiaosheng YU
  • About author:CHEN Peng, born in 1973, Ph. D., professor. His research interests include computer vision, healthcare big data analysis.
    LI Xu, born in 2000, M. S. His research interests include camouflage object detection, deep learning.
  • Supported by:
    National Key Research and Development Program of China(2016YFC0802500)

摘要:

伪装目标因在纹理和颜色等视觉属性上与背景高度相似,导致RGB图像易受干扰,难以准确分辨目标位置,常导致分割结构不完整甚至目标缺失,从而影响检测性能。为了解决该问题,提出一种RGB-D双流镜像伪装目标检测(COD)网络——RDMNet(RGB-D Dual-stream Mirror Network)。首先,采用TransNeXt和Vision Mamba组成的混合主干提取特征,减少模型参数,并设计多模态特征融合(MFF)模块,利用RGB和深度信息融合增强深度特征。其次,设计深度定位模块(DPM)和定位引导完整性特征聚合(PGA)模块,前者用于生成完整的轮廓定位特征,后者用于快速地定位伪装目标,并高效地预测出完整的分割特征,两者交叉细化融合后,既关注伪装目标的整体结构,又不断细化分割特征和轮廓定位特征。最后,设计卷积门控通道注意力(CCA)模块,提取低层特征中的结构细节。实验结果显示:RDMNet在COD和RGB-D显著目标检测(SOD)数据集上优于当前15个代表性方法;在CAMO、COD10K和NC4K数据集上,与MVGNet(Multi-View Guided Network)相比,RDMNet在结构相似性度量(S-measure)、平均增强对齐度量(mean E-measure)、精度和召回率的加权平均值(weighted F-measure)方面分别平均提升了2.0%、1.5%和3.2%,而在平均绝对误差方面平均降低了17.2%。可见,RDMNet在COD中能够有效提高分割的完整性和准确性。

关键词: 伪装目标检测, 深度感知, RGB-D融合, 交叉细化, 跨模态学习

Abstract:

Camouflaged objects have high visual similarity to surrounding background in terms of texture, color, and other attributes, and RGB-based representations are particularly vulnerable to such interference. So that, it is difficult to distinguish object location accurately, often resulting in incomplete segmentation structure or even object lack, thereby degrading detection performance. To address the issue, an RGB-D Dual-stream Mirror Network (RDMNet) for Camouflaged Object Detection (COD) was proposed. Firstly, a hybrid backbone, composed of TransNeXt and Vision Mamba, was adopted to reduce model parameters, and a Multi-modal Feature Fusion (MFF) module was designed to enhance depth features by fusing RGB and depth information. Secondly, a Depth Positioning Module (DPM) and a Positioning-Guided feature integrity Aggregation (PGA) module were designed. The former was used to generate complete contour localization features, while the latter was employed to locate camouflaged objects rapidly and predict complete features efficiently. After cross-refinement fusion of the above two, the global structure of camouflaged objects was focused on, and the segmentation features as well as contour localization features were refined continuously. Finally, a Convolutional gated Channel Attention (CCA) module was designed to extract structural details from low-level features. Experimental results show that on COD and RGB-D Salient Object Detection (SOD) datasets confirm the superiority of RDMNet over 15 representative methods; on CAMO, COD10K, and NC4K datasets, RDMNet achieves gains compared to MVGNet (Multi-View Guided Network), with average improvements of 2.0% in structural similarity index measure (S-measure), 1.5% in mean enhanced alignment measure (mean E-measure), 3.2% in weighted F-measure, and a 17.2% reduction in mean absolute error. RDMNet’s effectiveness can be seen in enhancing both segmentation completeness and accuracy in COD.

Key words: Camouflaged Object Detection (COD), depth awareness, RGB-D fusion, cross-refinement, cross-modal learning

中图分类号: