计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 2833-2838.DOI: 10.11772/j.issn.1001-9081.2018030720

• 人工智能 • 上一篇    下一篇

基于转置卷积操作改进的单阶段多边框目标检测方法

郭川磊, 何嘉   

  1. 成都信息工程大学 计算机学院, 成都 610225
  • 收稿日期:2018-04-10 修回日期:2018-06-04 出版日期:2018-10-10 发布日期:2018-10-13
  • 通讯作者: 何嘉
  • 作者简介:郭川磊(1994-),男,山西保德人,硕士研究生,CCF会员,主要研究方向:目标检测、深度学习;何嘉(1968-),女,四川成都人,教授,博士,CCF会员,主要研究方向:智能计算、人工智能。
  • 基金资助:
    四川省科技厅应用基础重点项目(2017JY0011)。

Improved single shot multibox detector based on the transposed convolution

GUO Chuanlei, HE Jia   

  1. School of Computer Science, Chengdu University of Information Technology, Chengdu Sichuan 610225, China
  • Received:2018-04-10 Revised:2018-06-04 Online:2018-10-10 Published:2018-10-13
  • Supported by:
    This work is partially supported by the Applied Foundation Key Project of Sichuan Provincial Science and Technology Department (2017JY0011).

摘要: 针对单阶段多边框目标检测(SSD)模型在以高交并比(IoU)评估平均检测精度(mAP)时出现的精度下降问题,提出一种使用转置卷积操作构建的循环特征聚合模型。该模型以SSD模型为基础,使用ResNet 101作为特征提取网络。首先,利用转置卷积操作扩大网络结构中深层特征图的尺寸,为浅层特征图引入对目标的高层抽象和上下文信息;其次,使用全连接卷积层减少浅层特征图在进行特征聚合时出现偏差的可能性;最后,将浅层特征图与表示了上下文信息的深层特征图拼接,并使用1×1卷积操作恢复通道数。特征聚合过程可以循环进行多次。实验结果表明,使用KITTI数据集,以交并比(IoU)为0.7评估平均检测精度,与原始SSD模型相比,循环特征聚合模型的检测精度提高了5.1个百分点;与已有的精度最高Faster R-CNN相比,检测精度提高了2个百分点。循环特征聚合模型能有效提升平均目标检测精度,生成高质量的边界框。

关键词: 目标检测, 转置卷积, 特征聚合, 单阶段多边框目标检测模型

Abstract: Since the mean Average Precision (mAP) of Single Shot multibox Detector (SSD) drops significantly when evaluating with higher Intersection over Union (IoU), a feature aggregation method using transposed convolution as main component was proposed. On the basis of SSD model, a deep Residual convolutional Network (ResNet) with 101 layers was used to extract features. Firstly, abstraction of semantics and context information was generated by using transposed convolutional layers which doubled the scales of deeper feature maps. Secondly, fully connected convolutional layers were applied to shallow layers to prevent unexpected bias. Finally, the shallow and deep feature maps were concatenated together, and convolutional layers with kernel size 1 were used to reduce the channel sizes. The feature aggregation can repeat multiple times. The experiments were conducted on KITTI dataset and took 0.7 as IoU threshold. Experimental results show that the mAP was improved by about 5.1 and 2 percent points compared to the original SSD model and the state-of-the-art Faster R-CNN model. The feature aggregation model can effectively improve the mAP and generate high quality bounding boxes in object detection tasks.

Key words: object detection, transposed convolution, feature aggregation, Single Shot multibox Detector (SSD) model

中图分类号: