《计算机应用》唯一官方网站

• •    下一篇

RGB-D双流镜像伪装目标检测网络

陈鹏1,2,李旭1,2,余肖生1,2*   

  1. 1.湖北省水电工程智能视觉监测重点实验室(三峡大学),湖北 宜昌 443002;
    2. 三峡大学 计算机与信息学院,湖北 宜昌 443002


  • 收稿日期:2025-05-02 修回日期:2025-07-25 接受日期:2025-07-28 发布日期:2025-07-30 出版日期:2025-07-30
  • 通讯作者: 余肖生
  • 基金资助:
    国家重点研究发展计划资助项目

RGB-D dual-stream mirror network for camouflaged object detection

  • Received:2025-05-02 Revised:2025-07-25 Accepted:2025-07-28 Online:2025-07-30 Published:2025-07-30

摘要: 伪装目标因在纹理、颜色等视觉属性上与背景高度相似,致使RGB图像易受干扰,难以准确分辨目标位置,常导致分割结构不完整甚至目标缺失,从而影响检测性能。为解决该问题,提出一种RGB-D双流镜像伪装目标检测网络——RDMNet (RGB-D Dual-stream Mirror Network)。首先,采用TransNeXt和Vision Mamba组成的混合主干提取特征,降低模型参数,并设计多模态特征融合模块(MFF),利用RGB和深度信息融合,增强深度特征。其次,设计一个深度定位模块(DPM)和一个定位引导完整性特征聚合模块(PGA),深度定位模块生成完整的轮廓定位特征,辅助定位引导完整性特征聚合模块快速地定位伪装目标,高效地预测出完整的分割特征。两者交叉细化融合,既关注伪装目标的整体结构,又不断细化分割特征和轮廓定位特征。最后,设计卷积门控通道注意模块(CCA),提取低层特征中的结构细节。实验结果显示,RDMNet在COD和RGB-D SOD数据集上优于当前15个具有代表性的模型,在CAMO、COD10K和NC4K数据集上,与MVGNet(Multi-View Guided Network)相比,在结构相似性度量(s-measure)、平均增强对齐度量(mean e-measure)、精度和召回率的加权平均值(weighted f-measure)和平均绝对误差(mean absolute error)方面的性能分别平均提升了1.9%、4.7%、3.1%和17.5%。实验结果表明,RDMNet在伪装目标检测中能够有效提高分割的完整性和准确性。

关键词: 伪装目标检测, 深度感知, RGB-D融合, 交叉细化, 跨模态学习

Abstract: Camouflaged objects, due to their high visual similarity to surrounding backgrounds in terms of texture, color, and structural patterns, present a formidable challenge in accurate localization and segmentation. RGB-based representations are particularly vulnerable to such interference, often resulting in incomplete structures or even the omission of targets, thereby degrading detection reliability. To address these limitations, an RGB-D Dual-stream Mirror Network (RDMNet) was proposed. A hybrid backbone, composed of TransNeXt and Vision Mamba, was adopted to extract informative features with reduced parameter overhead. To enrich feature representations, Complementary RGB and depth signals were leveraged by a Multi-modal Feature Fusion (MFF) module through residual integration, enhancing depth cues for improved boundary inference. Further refinement was achieved via a Depth Positioning Module (DPM) and a Positioning-Guided integrity Aggregation (PGA) module. Precise contour-aware localization was provided by the former, while object identification was accelerated and segmentation completeness was reinforced by the latter through iterative cross-modal enhancement. Both global structural coherence and fine-grained boundary precision were jointly emphasized by these components. To capture subtle textural variations, particularly within shallow feature layers, a Convolutional gated Channel Attention (CCA) module was introduced to selectively emphasize structurally relevant information. Empirical evaluations on COD and RGB-D SOD benchmarks confirm the superiority of the proposed framework over 15 state-of-the-art methods. On CAMO, COD10K, and NC4K datasets, RDMNet achieved consistent gains over Multi-View Guided Network (MVGNet), with average improvements of 1.9% in structural similarity index measure (s-measure), 4.7% in mean enhanced alignment measure (mean e-measure), 3.1% in weighted f-measure, and a significant 17.5% reduction in mean absolute error. These findings highlight the model’s effectiveness in enhancing both segmentation completeness and precision in camouflaged object detection.

Key words: Camouflaged Object Detection (COD), depth awareness, RGB-D fusion, cross-refinement, cross modal learning

中图分类号: