Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 938-944.DOI: 10.11772/j.issn.1001-9081.2023030368

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Faster-RCNN water-floating garbage recognition based on multi-scale feature and polarized self-attention

Zhanjun JIANG, Baijing WU(), Long MA, Jing LIAN   

  1. School of Electronics and Information Engineering,Lanzhou Jiaotong University,Lanzhou Gansu 730070,China
  • Received:2023-04-07 Revised:2023-05-22 Accepted:2023-05-24 Online:2023-06-15 Published:2024-03-10
  • Contact: Baijing WU
  • About author:JIANG Zhanjun, born in 1975, Ph. D., professor. His research interests include digital image processing, key technologies for future mobile communication, wireless network planning and optimization.
    MA Long, born in 1983, Ph. D., senior engineer. His research interests include big data, artificial intelligence, water information.
    LIAN Jing, born in 1983, Ph. D., professor. His research interests include artificial intelligence, pattern recognition.
  • Supported by:
    National Natural Science Foundation of China(62061023);Gansu Academy of Water Resources Funded Project(LZJT523029)


蒋占军, 吴佰靖(), 马龙, 廉敬   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 通讯作者: 吴佰靖
  • 作者简介:蒋占军(1975—),男,宁夏中卫人,教授,博士,主要研究方向:数字图像处理、未来移动通信关键技术、无线网络规划与优化
  • 基金资助:


Aiming at the problems of variable morphology, low resolution and limited information of small-target water-floating garbage, which lead to unsatisfactory detection results, an improved Faster-RCNN (Faster Regions with Convolutional Neural Network) water-floating garbage detection algorithm was proposed, namely MP-Faster-RCNN (Faster-RCNN with Multi-scale feature and Polarized self-attention). Firstly, a small-target water-floating garbage dataset in Lanzhou part of the Yellow River was established, the combination of atrous convolution and ResNet-50 was used as the backbone feature extraction network instead of the original VGG-16 (Visual Geometry Group 16) to expand the perception field for extracting more small-target features. Secondly, two layers of convolutions of 3×3 and 1×1 were set in the Region Proposal Network (RPN) by using multi-scale features to compensate for the feature loss caused by a single sliding window. Finally, polarized self-attention was added before RPN to further utilize multi-scale and channel features to extract finer-grained multi-scale spatial information and inter-channel dependencies to generate a feature map with global features, achieving more accurate target box localization. Experimental results show that compared with the original Faster-RCNN, MP-Faster-RCNN can effectively improve the detection accuracy of water-floating garbage with a mean Average Precision (mAP) improvement of 6.37 percentage points, the model size is reduced from 521 MB to 108 MB, and the convergence speed is faster under the same training epoch.

Key words: target detection, water-floating garbage, Faster Regions with Convolutional Neural Network (Faster-RCNN), atrous convolution, multi-scale feature fusion, Polarized Self-Attention (PSA)


针对小目标水漂垃圾形态多变、分辨率低且信息有限,导致检测效果不理想的问题,提出一种改进的Faster-RCNN(Faster Regions with Convolutional Neural Network)水漂垃圾检测算法MP-Faster-RCNN(Faster-RCNN with Multi-scale feature and Polarized self-attention)。首先,建立黄河兰州段小目标水漂垃圾数据集,将空洞卷积结合ResNet-50代替原来的VGG-16(Visual Geometry Group 16)作为主干特征提取网络,扩大感受野以提取更多小目标特征;其次,在区域生成网络(RPN)利用多尺度特征,设置3×3和1×1的两层卷积,补偿单一滑动窗口造成的特征丢失;最后,在RPN前加入极化自注意力,进一步利用多尺度和通道特征提取更细粒度的多尺度空间信息和通道间依赖关系,生成具有全局特征的特征图,实现更精确的目标框定位。实验结果表明,MP-Faster-RCNN能有效提高水漂垃圾检测精度,与原始Faster-RCNN相比,平均精度均值(mAP)提高了6.37个百分点,模型大小从521 MB降到了108 MB,且在同一训练批次下收敛更快。

关键词: 目标检测, 水漂垃圾, Faster-RCNN, 空洞卷积, 多尺度特征融合, 极化自注意力

CLC Number: