Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2147-2154.DOI: 10.11772/j.issn.1001-9081.2022060823

• Artificial intelligence • Previous Articles    

High-precision object detection algorithm based on improved VarifocalNet

Zhangjian JI1,2(), Ming ZHANG1,2, Zilong WANG1,2   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    2.Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China
  • Received:2022-06-08 Revised:2022-08-30 Accepted:2022-09-02 Online:2022-09-23 Published:2023-07-10
  • Contact: Zhangjian JI
  • About author:JI Zhangjian, born in 1983, Ph. D., associate professor. His research interests include computer vision, machine learning.
    ZHANG Ming, born in 1997, M. S. candidate. His research interests include computer vision, object detection.
    WANG Zilong, born in 1997, M. S. candidate. His research interests include computer vision, human pose estimation.
  • Supported by:
    Fundamental Research Program of Shanxi Province(20210302123443)

基于改进VarifocalNet的高精度目标检测算法

姬张建1,2(), 张明1,2, 王子龙1,2   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.山西大学 大数据科学与产业研究院,太原 030006
  • 通讯作者: 姬张建
  • 作者简介:姬张建(1983—),男,陕西澄城人,副教授,博士,CCF会员,主要研究方向:计算机视觉、机器学习;
    张明(1997—),男,山西保德人,硕士研究生,主要研究方向:计算机视觉、目标检测;
    王子龙(1997—),男,山西平定人,硕士研究生,主要研究方向:计算机视觉、人体姿态估计。
  • 基金资助:
    山西省基础研究计划项目(20210302123443)

Abstract:

To address the problems of low recognition precision and difficult recognition of the existing one-stage anchor-free detectors in genetic object detection scenarios, a high-precision object detection algorithm based on improved variable focal network VarifocalNet (VFNet) was proposed. Firstly, the ResNet backbone network used for feature extraction in VFNet was replaced by the Recurrent Layer Aggregation Network (RLANet). The recurrent residual connection operation imported the features of the previous layer into the subsequent network layer to improve the representation ability of the features. Next, the original feature fusion network was substituted by the Feature Pyramid Network (FPN) with feature alignment convolution operation, thereby effectively utilizing the deformable convolution operation in the fusion process of the upper and lower layers of FPN to align the features and optimize the feature quality. Finally, the Focal-Global Distillation (FGD) algorithm was used to further improve the detection performance of small-scale algorithm. The evaluation experimental results on COCO (Common Objects in Context) 2017 dataset show that under the same training conditions,the improved algorithm adopting RLANet-50 as the backbone can achieve the mean Average Precision (mAP) of 45.9%, which is 4.3 percentage points higher than that of the VFNet algorithm, and the improved algorithm has the number of parameters of 36.67×10 6, which is only 4×10 6 higher than that of the VFNet algorithm. The improved VFNet algorithm only slightly increases the amount of parameters while improving the detection accuracy, indicating that the algorithm can meet the requirements of lightweight and high-precision of object detection.

Key words: Recurrent Layer Aggregation Network (RLANet), object detection, deformable convolution, feature alignment, Feature Pyramid Network (FPN), knowledge distillation

摘要:

针对通用目标检测场景下,现有单阶段无锚检测器识别精度低、识别困难等问题,提出一种基于改进变焦网络VFNet(VarifocalNet)的高精度目标检测算法。首先,利用循环层聚合网络(RLANet)替换VFNet用于特征提取的主干网络ResNet,循环残差连接操作将前层特征汇入后续网络层中提升特征的表征能力;其次,通过带有特征对齐卷积操作的特征金字塔网络(FPN)替换原始的特征融合网络,利用可变形卷积操作在FPN上下层融合过程中实现特征对齐并优化特征表征能力;最后,使用聚焦-全局蒸馏(FGD)算法进一步提升小规模算法的检测性能。在COCO (Common Objects in Context) 2017数据集上进行的评估实验结果表明,在相同训练条件下,改进后的以RLANet-50为主干的算法的均值平均精度(mAP)可以达到45.9%,与VFNet算法相比提升了4.3个百分点,而改进后的算法参数量为36.67×106,与VFNet相比仅高了4×106。可见,改进后的VFNet算法在提升检测精度的同时稍微增加了参数量,说明该算法可以满足目标检测的轻量化及高精度需求。

关键词: 循环层聚合网络, 目标检测, 可变形卷积, 特征对齐, 特征金字塔网络, 知识蒸馏

CLC Number: