《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 713-722.DOI: 10.11772/j.issn.1001-9081.2022020245

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

面向交通场景解析的局部和全局上下文注意力融合网络

王泽宇1(), 布树辉2, 黄伟1, 郑远攀1, 吴庆岗1, 张旭1   

  1. 1.郑州轻工业大学 计算机与通信工程学院,郑州 450002
    2.西北工业大学 航空学院,西安 710072
  • 收稿日期:2022-03-02 修回日期:2022-06-09 接受日期:2022-06-14 发布日期:2022-08-16 出版日期:2023-03-10
  • 通讯作者: 王泽宇
  • 作者简介:王泽宇(1989—),男,河南郑州人,讲师,博士,主要研究方向:深度学习、计算机视觉
    布树辉(1978—),男,河南洛阳人,教授,博士,主要研究方向:深度学习、计算机视觉
    黄伟(1982—),男,河南郑州人,副教授,博士,主要研究方向:深度学习、计算机视觉
    郑远攀(1983—),男,河南郑州人,副教授,博士,主要研究方向:深度学习、计算机视觉
    吴庆岗(1984—),男,河南濮阳人,副教授,博士,主要研究方向:深度学习、计算机视觉
    张旭(1979—),女,河南南阳人,讲师,硕士,主要研究方向:深度学习、计算机视觉。
  • 基金资助:
    河南省科技攻关项目(222102210021);河南省高等学校重点科研项目计划支持(21A520049)

Local and global context attentive fusion network for traffic scene parsing

Zeyu WANG1(), Shuhui BU2, Wei HUANG1, Yuanpan ZHENG1, Qinggang WU1, Xu ZHANG1   

  1. 1.College of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou Henan 450002,China
    2.School of Aeronautics,Northwestern Polytechnical University,Xi’an Shaanxi 710072,China
  • Received:2022-03-02 Revised:2022-06-09 Accepted:2022-06-14 Online:2022-08-16 Published:2023-03-10
  • Contact: Zeyu WANG
  • About author:WANG Zeyu, born in 1989, Ph. D., lecturer. His research interests include deep learning, computer vision.
    BU Shuhui, born in 1978, Ph. D., professor. His research interests include deep learning, computer vision.
    HUANG Wei, born in 1982, Ph. D., associate professor. His research interests include deep learning, computer vision.
    ZHENG Yuanpan, born in 1983, Ph. D., associate professor. His research interests include deep learning, computer vision.
    WU Qinggang, born in 1984, Ph. D., associate professor. His research interests include deep learning, computer vision.
    ZHANG Xu, born in 1979, M. S., lecturer. Her research interests include deep learning, computer vision.
  • Supported by:
    Science and Technology Project of Henan Province(222102210021);Plan Support for Key Scientific Research Project of Higher Education in Henan Province(21A520049)

摘要:

为解决交通场景解析中局部和全局上下文信息自适应聚合的问题,提出3模块架构的局部和全局上下文注意力融合网络(LGCAFN)。前端的特征提取模块由基于串联空洞空间金字塔池化(CASPP)单元改进的ResNet-101组成,能够更加有效地提取物体的多尺度局部特征;中端的结构化学习模块由8路长短期记忆(LSTM)网络分支组成,可以更加准确地推理物体邻近8个不同方向上场景区域的空间结构化特征;后端的特征融合模块采用基于注意力机制的3阶段融合方式,能够自适应地聚合有用的上下文信息并屏蔽噪声上下文信息,且生成的多模态融合特征能够更加全面且准确地表示物体的语义信息。在Cityscapes标准和扩展数据集上的实验结果表明,相较于逆变换网络(ITN)和对象上下文表示网络(OCRN)等方法,LGCAFN实现了最优的平均交并比(mIoU),达到了84.0%和86.3%,表明LGCAFN能够准确地解析交通场景,有助于实现车辆自动驾驶。

关键词: 交通场景解析, 自适应聚合, 串联空洞空间金字塔池化, 长短期记忆, 注意力融合

Abstract:

In order to solve the local and global contextual information adaptive aggregation problem in traffic scene parsing, a Local and Global Context Attentive Fusion Network (LGCAFN) with three-module architecture was proposed. The front-end feature extraction module consisted of the improved 101-layer Residual Network (ResNet-101) which was based on Cascaded Atrous Spatial Pyramid Pooling (CASPP) unit, and was able to extract object’s multi-scale local features more effectively. The mid-end structural learning module was composed of eight Long Short-Term Memory (LSTM) branches, and was able to infer spatial structural features of object’s adjacent scene regions in eight different directions more accurately. In the back-end feature fusion module, a three-stage fusion method based on attention mechanism was adopted to adaptively aggregate useful contextual information and shield from noisy contextual information, and the generated multi-modal fusion features were able to represent object’s semantic information in a more comprehensive and accurate way. Experimental results on Cityscapes standard and extended datasets demonstrate that compared to the existing state-of-the-art methods such as Inverse Transformation Network (ITN), and Object Contextual Representation Network (OCRN), LGCAFN achieves the best mean Intersection over Union (mIoU), reaching 84.0% and 86.3% respectively, showing that LGCAFN can parse traffic scenes accurately and is helpful to realize autonomous driving of vehicles.

Key words: traffic scene parsing, adaptive aggregation, Cascaded Atrous Spatial Pyramid Pooling (CASPP), Long Short-Term Memory (LSTM), attentive fusion

中图分类号: