《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1407-1416.DOI: 10.11772/j.issn.1001-9081.2021030533

• 人工智能 • 上一篇    下一篇

面向三维点云单目标跟踪的提案聚合网络

庄屹, 赵海涛()   

  1. 华东理工大学 信息科学与工程学院,上海 200237
  • 收稿日期:2021-04-08 修回日期:2021-06-17 接受日期:2021-06-17 发布日期:2022-06-11 出版日期:2022-05-10
  • 通讯作者: 赵海涛
  • 作者简介:庄屹(1996—),男,上海人,硕士研究生,主要研究方向:目标检测、目标跟踪
    赵海涛(1974—),男,山东青岛人,教授,博士,主要研究方向:模式识别、机器学习。 haitaozhao@ecust.edu.cn

Proposal-based aggregation network for single object tracking in 3D point cloud

Yi ZHUANG, Haitao ZHAO()   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2021-04-08 Revised:2021-06-17 Accepted:2021-06-17 Online:2022-06-11 Published:2022-05-10
  • Contact: Haitao ZHAO
  • About author:ZHUANG Yi, born in 1996, M. S. candidate. His research interests include object detection, object tracking.
    ZHAO Haitao, born in 1974, Ph. D., professor. His research interests include pattern recognition, machine learning.

摘要:

与二维可见光图像相比,三维点云在空间中保留了物体真实丰富的几何信息,能够应对单目标跟踪问题中存在尺度变换的视觉挑战。针对三维目标跟踪精度受到点云数据稀疏性导致的信息缺失影响,以及物体位置变化带来的形变影响这两个问题,在端到端的学习模式下提出了由三个模块构成的提案聚合网络,通过在最佳提案内定位物体的中心来确定三维边界框从而实现三维点云中的单目标跟踪。首先,将模板和搜索区域的点云数据转换为鸟瞰伪图,模块一通过空间和跨通道注意力机制丰富特征信息;然后,模块二用基于锚框的深度互相关孪生区域提案子网给出最佳提案;最后,模块三先利用最佳提案对搜索区域的感兴趣区域池化操作来提取目标特征,随后聚合了目标与模板特征,利用稀疏调制可变形卷积层来解决点云稀疏以及形变的问题并确定了最终三维边界框。在KITTI跟踪数据集上把所提方法与最新的三维点云单目标跟踪方法进行比较的实验结果表明:在汽车类综合性实验中,真实场景中所提方法在成功率上提高了1.7个百分点,精确率上提高了0.2个百分点;在多类别扩展性实验上,即在汽车、货车、骑车人以及行人这4类上所提方法的平均成功率提高了0.8个百分点,平均精确率提高了2.8个百分点。可见,所提方法能够解决三维点云中的单目标跟踪问题,使得三维目标跟踪结果更加精确。

关键词: 点云, 目标跟踪, 孪生网络, 注意力机制, 可变形卷积

Abstract:

Compared with 2D RGB-based images, 3D point clouds retain the real and rich geometric information of objects in space to deal with vision challenge with scale variation in the single object tracking problem. However, the precision of 3D object tracking is affected by the loss of information brought by the sparsity of point cloud data and the deformation caused by the object position changing. To solve the above two problems, a proposal-based aggregation network composed of three modules was proposed in an end-to-end learning pattern. In this network, the 3D bounding box was determined by locating object center in the best proposal to realize the single object tracking in 3D point cloud. Firstly, the point cloud data of both templates and search areas was transferred into bird’s-eye view pseudo images. In the first module, the feature information was enriched through spatial and cross-channel attention mechanisms. Then, in the second module, the best proposal was given by the anchor-based deep cross-correlation Siamese region proposal subnetwork. Finally, in the third module, the object features were extracted through region of interest pooling operation by the best proposal at first, and then, the object and template features were aggregated, the sparse modulated deformable convolution layer was used to deal with the problems of point cloud sparsity and deformation, and the final 3D bounding box was determined. Experimental results of the comparison between the proposed method and the state-of-the-art 3D point cloud single object tracking methods on KITTI dataset show that: in comprehensive experiment of car, the proposed method has improved 1.7 percentage points on success rate and 0.2 percentage points on precision in real scenes; in multi-category extensive experiment of car, van, cyclist and pedestrian, the proposed method has improved the average success rate by 0.8 percentage points, and the average precision by 2.8 percentage points, indicating that the proposed method can solve the single object tracking problem in 3D point cloud and make the 3D object tracking results more accurate.

Key words: point cloud, object tracking, Siamese network, attention mechanism, deformable convolution

中图分类号: