计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2712-2719.DOI: 10.11772/j.issn.1001-9081.2020111852

所属专题: 多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于改进的Faster R-CNN的通用目标检测框架

马佳良1,2, 陈斌2,3, 孙晓飞1,2   

  1. 1. 中国科学院 成都计算机应用研究所, 成都 610041;
    2. 中国科学院大学 计算机科学与技术学院, 北京 100049;
    3. 哈尔滨工业大学(深圳) 人工智能研究院, 广东 深圳 518055
  • 收稿日期:2020-11-25 修回日期:2021-01-13 出版日期:2021-09-10 发布日期:2021-05-12
  • 通讯作者: 陈斌
  • 作者简介:马佳良(1996-),男,河北石家庄人,硕士研究生,主要研究方向:目标检测、语义分割;陈斌(1970-),男,四川广汉人,研究员,博士,主要研究方向:机器视觉、深度学习;孙晓飞(1981-),男,山东栖霞人,博士研究生,主要研究方向:机器视觉、模式识别。

General object detection framework based on improved Faster R-CNN

MA Jialiang1,2, CHEN Bin2,3, SUN Xiaofei1,2   

  1. 1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu Sichuan 610041, China;
    2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Institute for Artificial Intelligence, Harbin Institute of Technology(Shenzhen), Shenzhen Guangdong 518055, China
  • Received:2020-11-25 Revised:2021-01-13 Online:2021-09-10 Published:2021-05-12

摘要: 针对当前基于深度学习的检测器不能有效检测形状不规则或长宽相差悬殊的目标的问题,在传统Faster R-CNN算法的基础上,提出了一个改进的二阶段目标检测框架——Accurate R-CNN。首先,提出了新的交并比(IoU)度量——有效交并比(EIoU),通过提出中心度权重来降低训练数据中冗余包围框的占比。然后,提出了一个上下文相关的特征重分配模块(FRM),通过建模目标的远程依赖和局部上下文关系信息对特征进行重编码,以弥补池化过程中的形状信息损失。实验结果表明,在微软多场景通用目标(MS COCO)数据集上,对于包围框检测任务,当使用深度为50和101的残差网络(ResNet)作为骨干网络时,Accurate R-CNN比基线模型Faster R-CNN的平均精度(AP)分别提高了1.7个百分点和1.1个百分点,超越了使用同样骨干网络的基于掩膜的检测器。在添加掩膜分支后,对于实例分割任务,当使用两种不同深度的ResNet作为骨干网络时,Accurate R-CNN比Mask R-CNN的掩膜平均精度分别提高了1.2个百分点和1.1个百分点。研究结果显示,相较于基线模型,Accurate R-CNN在不同数据集、不同任务上均取得了更好的检测效果。

关键词: 计算机视觉, 目标检测, 实例分割, 交并比, 感兴趣区域池化

Abstract: Aiming at the problem that current detectors based on deep learning cannot effectively detect objects with irregular shapes or large differences between length and width, based on the traditional Faster Region-based Convolutional Neural Network (Faster R-CNN) algorithm, an improved two-stage object detection framework named Accurate R-CNN was proposed. First of all, a novel Intersection over Union (IoU) metric-Effective Intersection over Union (EIoU) was proposed to reduce the proportion of redundant bounding boxes in the training data by using the centrality weight. Then, a context related Feature Reassignment Module (FRM) was proposed to re-encode the features by the remote dependency and local context information of objects, so as to make up for the loss of shape information in the pooling process. Experimental results show that on the Microsoft Common Objects in COntext (MS COCO) dataset, for the bounding box detection task, when using Residual Networks (ResNets) with two different depths of 50 and 101 as the backbone networks, Accurate R-CNN has the Average Precision (AP) improvements of 1.7 percentage points and 1.1 percentage points respectively compared to the baseline model Faster R-CNN, which are significantly than those of the detectors based on mask with the same backbone networks. After adding mask branch, for the instance segmentation task, when ResNets with two different depths are used as the backbone networks, the mask Average Precisions of Accurate R-CNN are increased by 1.2 percentage points and 1.1 percentage points respectively compared with Mask Region-based Convolutional Neural Network (Mask R-CNN). The research results illustrate that compared to the baseline model, Accurate R-CNN achieves better performance on different datasets and different tasks.

Key words: computer vision, object detection, instance segmentation, Intersection over Union (IoU), Region of Interest Pooling (RoI Pooling)

中图分类号: