计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 2053-2058.DOI: 10.11772/j.issn.1001-9081.2019112057

• 虚拟现实与多媒体计算 • 上一篇    下一篇

基于RCF的跨层融合特征的边缘检测

宋杰, 于裕, 骆起峰   

  1. 安徽大学 计算机科学与技术学院, 合肥 230039
  • 收稿日期:2019-12-04 修回日期:2020-01-14 出版日期:2020-07-10 发布日期:2020-06-29
  • 通讯作者: 于裕
  • 作者简介:宋杰(1966-),男,安徽合肥人,副教授,博士,CCF会员,主要研究方向:智能计算、计算机体系结构;于裕(1995-),男,安徽阜阳人,硕士研究生,主要研究方向:计算机视觉、数据挖掘;骆起峰(1995-),男,安徽芜湖人,硕士研究生,主要研究方向:计算机视觉、智能计算。
  • 基金资助:
    国家自然科学基金资助项目(61974001)。

Cross-layer fusion feature based on richer convolutional features for edge detection

SONG Jie, YU Yu, LUO Qifeng   

  1. School of Computer Science and Technology, Anhui University, Hefei Anhui 230039, China
  • Received:2019-12-04 Revised:2020-01-14 Online:2020-07-10 Published:2020-06-29
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61974001).

摘要: 针对当前基于深度学习的边缘检测技术产生的边缘线条杂乱且模糊等问题,提出了一种基于RCF的端到端的跨层融合多尺度特征的边缘检测(CFF)模型。该模型使用RCF作为基线,在主干网络中加入CBAM,采用具有平移不变性的下采样技术,并且去除了主干网络中的部分下采样操作,以保留图像的细节信息,同时使用扩张卷积技术增大模型感受野。此外,采用跨层融合特征图的方式,使得高低层特征能够充分融合。为了平衡各阶段损失和融合损失之间的关系,以及避免出现多尺度特征融合之后低层细节过度丢失的现象,对每个损失添加了一个权重。在伯克利分割数据集(BSDS500)和PASCAL VOL Context数据集上进行了训练,在测试时使用图像金字塔技术提高边缘图像的质量。实验结果表明,CFF模型提取的轮廓比基线网络更加清晰,能够解决边缘模糊问题。在BSDS500基准上进行的评估表明,该模型将最佳数据集规模(ODS)和最佳图像比例(OIS)指标分别提高到0.818和0.839。

关键词: 深度学习, 边缘检测, 注意力机制, 平移不变性, 跨层融合

Abstract: Aiming at the problems such as chaotic and fuzzy edge lines caused by current deep learning based edge detection technology, an end-to-end Cross-layer Fusion Feature for edge detection (CFF) model based on RCF (Richer Convolutional Features) was proposed. In this model, RCF was used as a baseline, the CBAM (Convolutional Block Attention Module) was added to the backbone network, translation-invariant downsampling technology was adopted, and some downsampling operations in the backbone network were removed in order to preserve the image details information, dilated convolution technique was used to increase the model receptive field at the same time. In addition, the method of cross-layer fusion of feature maps was adopted to enable high-level and low-level features to be fully fused together. In order to balance the relationship between the loss in each stage and the fusion loss, and to avoid the phenomenon of excessive loss of low-level details after multi-scale feature fusion, the weight parameters were added to the losses. The model was trained on Berkeley Segmentation Data Set (BSDS500) and PASCAL VOL Context dataset, and the image pyramid technology was used in testing to improve the quality of edge images. Experimental results show that the contour extracted by CFF model is clearer than that extracted by the baseline network and can solve the edge blurring problem. The evaluation performed on the BSDS500 benchmark shows that, the Optimal Dataset Scale (ODS) and the Optimal Image Scale (OIS) are improved to 0.818 and 0.839 respectively by this model.

Key words: deep learning, edge detection, attention mechanism, translation invariance, cross-layer fusion

中图分类号: