Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 424-431.DOI: 10.11772/j.issn.1001-9081.2023020155

• Artificial intelligence • Previous Articles    

Decoupling-fusing algorithm for multiple tasks with autonomous driving environment perception

Cunyi LIAO1, Yi ZHENG1, Weijin LIU1, Huan YU2, Shouyin LIU1()   

  1. 1.College of Physical Science and Technology,Central China Normal University,Wuhan Hubei 430079,China
    2.School of Geodesy and Geomatics,Wuhan University,Wuhan Hubei 430079,China
  • Received:2023-02-21 Revised:2023-04-22 Accepted:2023-05-06 Online:2023-08-14 Published:2024-02-10
  • Contact: Shouyin LIU
  • About author:LIAO Cunyi, born in 1998, M. S. candidate. His research interests include autonomous driving.
    ZHENG Yi, born in 1993, Ph. D. candidate. His research interests include deep learning.
    LIU Weijin, born in 1998, M. S. candidate. Her research interests include deep learning.
    YU Huan, born in 1991, Ph. D. candidate. His research interests include multi-source sensing and localization in autonomous driving.
  • Supported by:
    National Natural Science Foundation of China(62277027)


廖存燚1, 郑毅1, 刘玮瑾1, 于欢2, 刘守印1()   

  1. 1.华中师范大学 物理科学与技术学院,武汉 430079
    2.武汉大学 测绘学院,武汉 430079
  • 通讯作者: 刘守印
  • 作者简介:廖存燚(1998—),男,四川成都人,硕士研究生,主要研究方向:自动驾驶
  • 基金资助:


In the process of driving, autonomous vehicles need to complete target detection, instance segmentation and target tracking for pedestrians and vehicles at the same time. An environment perception model was proposed based on deep learning for multi-task learning of these three tasks simultaneously. Firstly, spatio-temporal features were extracted from continuous frame images by Convolutional Neural Network (CNN). Then, the spatio-temporal features were decoupled and refused by attention mechanism, and differential selection of spatio-temporal features was achieved by making full use of the correlation between tasks. Finally, in order to balance the learning rates between different tasks, the model was trained by dynamic weighted average method. The proposed model was validated on KITTI dataset, and the experimental results show that the F1 score is increased by 0.6 percentage points in target detection compared with CenterTrack model, the Multiple Object Tracking Accuracy (MOTA) is increased by 0.7 percentage points in target tracking compared with TraDeS(Track to Detect and Segment) model, and the AP50 and AP75 are increased by 7.4 and 3.9 percentage points respectively in instance segmentation compared with SOLOv2 (Segmenting Objects by LOcations version 2) model.

Key words: automatic driving, environment perception, target detection, instance segmentation, target tracking, multi-task learning


自动驾驶车辆在行驶过程中,需要对行人和车辆同时完成目标检测、实例分割和目标跟踪三个任务。提出一种基于深度学习的环境感知模型同时对三个任务进行多任务学习。首先,通过卷积神经网络对连续帧图像提取时空特征;然后,通过注意力机制对时空特征进行去耦再融合,充分利用任务间的相关性,实现不同任务对时空特征的差异化选择;最后,为平衡不同任务间的学习速率,使用动态加权平均的方式对模型进行训练。在KITTI数据集上的实验结果表明,所提模型在目标检测方面,比CenterTrack模型F1得分提高了0.6个百分点;在目标跟踪方面,比TraDeS(Track to Detect and Segment)模型多目标跟踪精度(MOTA)提高了0.7个百分点;在实例分割方面,比SOLOv2(Segmenting Objects by LOcations version 2)模型AP50AP75分别提高了7.4和3.9个百分点。

关键词: 自动驾驶, 环境感知, 目标检测, 实例分割, 目标跟踪, 多任务学习

CLC Number: