Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1407-1416.DOI: 10.11772/j.issn.1001-9081.2021030533

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Proposal-based aggregation network for single object tracking in 3D point cloud

Yi ZHUANG, Haitao ZHAO()   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2021-04-08 Revised:2021-06-17 Accepted:2021-06-17 Online:2022-06-11 Published:2022-05-10
  • Contact: Haitao ZHAO
  • About author:ZHUANG Yi, born in 1996, M. S. candidate. His research interests include object detection, object tracking.
    ZHAO Haitao, born in 1974, Ph. D., professor. His research interests include pattern recognition, machine learning.


庄屹, 赵海涛()   

  1. 华东理工大学 信息科学与工程学院,上海 200237
  • 通讯作者: 赵海涛
  • 作者简介:庄屹(1996—),男,上海人,硕士研究生,主要研究方向:目标检测、目标跟踪


Compared with 2D RGB-based images, 3D point clouds retain the real and rich geometric information of objects in space to deal with vision challenge with scale variation in the single object tracking problem. However, the precision of 3D object tracking is affected by the loss of information brought by the sparsity of point cloud data and the deformation caused by the object position changing. To solve the above two problems, a proposal-based aggregation network composed of three modules was proposed in an end-to-end learning pattern. In this network, the 3D bounding box was determined by locating object center in the best proposal to realize the single object tracking in 3D point cloud. Firstly, the point cloud data of both templates and search areas was transferred into bird’s-eye view pseudo images. In the first module, the feature information was enriched through spatial and cross-channel attention mechanisms. Then, in the second module, the best proposal was given by the anchor-based deep cross-correlation Siamese region proposal subnetwork. Finally, in the third module, the object features were extracted through region of interest pooling operation by the best proposal at first, and then, the object and template features were aggregated, the sparse modulated deformable convolution layer was used to deal with the problems of point cloud sparsity and deformation, and the final 3D bounding box was determined. Experimental results of the comparison between the proposed method and the state-of-the-art 3D point cloud single object tracking methods on KITTI dataset show that: in comprehensive experiment of car, the proposed method has improved 1.7 percentage points on success rate and 0.2 percentage points on precision in real scenes; in multi-category extensive experiment of car, van, cyclist and pedestrian, the proposed method has improved the average success rate by 0.8 percentage points, and the average precision by 2.8 percentage points, indicating that the proposed method can solve the single object tracking problem in 3D point cloud and make the 3D object tracking results more accurate.

Key words: point cloud, object tracking, Siamese network, attention mechanism, deformable convolution



关键词: 点云, 目标跟踪, 孪生网络, 注意力机制, 可变形卷积

CLC Number: