Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1407-1416.DOI: 10.11772/j.issn.1001-9081.2021030533
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Received:2021-04-08
															
							
																	Revised:2021-06-17
															
							
																	Accepted:2021-06-17
															
							
							
																	Online:2022-06-11
															
							
																	Published:2022-05-10
															
							
						Contact:
								Haitao ZHAO   
													About author:ZHUANG Yi, born in 1996, M. S. candidate. His research interests include object detection, object tracking.通讯作者:
					赵海涛
							作者简介:庄屹(1996—),男,上海人,硕士研究生,主要研究方向:目标检测、目标跟踪CLC Number:
Yi ZHUANG, Haitao ZHAO. Proposal-based aggregation network for single object tracking in 3D point cloud[J]. Journal of Computer Applications, 2022, 42(5): 1407-1416.
庄屹, 赵海涛. 面向三维点云单目标跟踪的提案聚合网络[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1407-1416.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030533
| 模块 | 参数 | 
|---|---|
| 卷积块1 | Conv2D(64,128,3,2,1) | 
| Conv2D(128,128,3,1,1)*3 | |
| BatchNorm2D(128,128) | |
| ReLU | |
| 卷积块2 | Conv2D(128,128,3,2,1) | 
| Conv2D(128,128,3,1,1)*5 | |
| BatchNorm2D(128,128) | |
| ReLU | |
| 卷积块3 | Conv2D(128,256,3,2,1) | 
| Conv2D(256,256,3,1,1)*5 | |
| BatchNorm2D(256,256) | |
| ReLU | |
| 上采样块1 | Deconv2D(128,256,1,1,0) | 
| 上采样块2 | Deconv2D(128,256,2,2,0) | 
| 上采样块3 | Deconv2D(256,256,4,4,0) | 
| 分类融合卷积块 | Conv2D(256,2,1,1,0)(用于3个分辨率) | 
| Concatenate(拼接3个分辨率结果) | |
| Conv2D(6,2,1,1,0) | |
| Sigmoid | |
| 锚框偏移融合卷积块 | Conv2D(256,4,1,1,0)(用于3个分辨率) | 
| Concatenate(拼接3个分辨率结果) | |
| Conv2D(12,4,1,1,0) | 
Tab. 1 Parameter setting of convolution modules
| 模块 | 参数 | 
|---|---|
| 卷积块1 | Conv2D(64,128,3,2,1) | 
| Conv2D(128,128,3,1,1)*3 | |
| BatchNorm2D(128,128) | |
| ReLU | |
| 卷积块2 | Conv2D(128,128,3,2,1) | 
| Conv2D(128,128,3,1,1)*5 | |
| BatchNorm2D(128,128) | |
| ReLU | |
| 卷积块3 | Conv2D(128,256,3,2,1) | 
| Conv2D(256,256,3,1,1)*5 | |
| BatchNorm2D(256,256) | |
| ReLU | |
| 上采样块1 | Deconv2D(128,256,1,1,0) | 
| 上采样块2 | Deconv2D(128,256,2,2,0) | 
| 上采样块3 | Deconv2D(256,256,4,4,0) | 
| 分类融合卷积块 | Conv2D(256,2,1,1,0)(用于3个分辨率) | 
| Concatenate(拼接3个分辨率结果) | |
| Conv2D(6,2,1,1,0) | |
| Sigmoid | |
| 锚框偏移融合卷积块 | Conv2D(256,4,1,1,0)(用于3个分辨率) | 
| Concatenate(拼接3个分辨率结果) | |
| Conv2D(12,4,1,1,0) | 
| 方法 | 成功率/% | 精确率/% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | |
| 前一帧预测 | 41.3 | 36.2 | 56.2 | 57.3 | 59.0 | 57.9 | 51.0 | 72.8 | 75.0 | 75.2 | 
| 前一帧GT | 64.6 | — | 82.4 | — | 85.5 | 74.5 | — | 90.1 | — | 91.8 | 
| 当前帧GT | 76.9 | — | 84.0 | — | 89.4 | 81.3 | — | 90.3 | — | 93.2 | 
Tab. 2 Comprehensive experimental results of different methods on Car
| 方法 | 成功率/% | 精确率/% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | |
| 前一帧预测 | 41.3 | 36.2 | 56.2 | 57.3 | 59.0 | 57.9 | 51.0 | 72.8 | 75.0 | 75.2 | 
| 前一帧GT | 64.6 | — | 82.4 | — | 85.5 | 74.5 | — | 90.1 | — | 91.8 | 
| 当前帧GT | 76.9 | — | 84.0 | — | 89.4 | 81.3 | — | 90.3 | — | 93.2 | 
| 类别 | 帧数 | 成功率/% | 精确率/% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | ||
| 均值 | 30.0 | 26.6 | 42.4 | 46.7 | 47.5 | 46.7 | 43.6 | 60.0 | 64.9 | 67.7 | |
| 汽车 | 6 424 | 41.3 | 36.2 | 56.2 | 57.3 | 59.0 | 57.9 | 51.0 | 72.8 | 75.0 | 75.2 | 
| 货车 | 1 248 | 40.4 | — | 40.8 | 45.7 | 51.2 | 47.0 | — | 48.4 | 52.8 | 62.8 | 
| 骑车人 | 308 | 41.5 | 43.2 | 32.1 | 36.1 | 55.8 | 70.4 | 81.2 | 44.7 | 49.0 | 78.4 | 
| 行人 | 6 088 | 18.2 | 17.9 | 28.7 | 35.2 | 38.4 | 37.8 | 47.8 | 49.6 | 56.2 | 66.2 | 
Tab. 3 Extensive experimental results on different categories of different methods
| 类别 | 帧数 | 成功率/% | 精确率/% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | SC3D | 2D-SC3D | P2B | 3D-SiamRPN | PA-Net | ||
| 均值 | 30.0 | 26.6 | 42.4 | 46.7 | 47.5 | 46.7 | 43.6 | 60.0 | 64.9 | 67.7 | |
| 汽车 | 6 424 | 41.3 | 36.2 | 56.2 | 57.3 | 59.0 | 57.9 | 51.0 | 72.8 | 75.0 | 75.2 | 
| 货车 | 1 248 | 40.4 | — | 40.8 | 45.7 | 51.2 | 47.0 | — | 48.4 | 52.8 | 62.8 | 
| 骑车人 | 308 | 41.5 | 43.2 | 32.1 | 36.1 | 55.8 | 70.4 | 81.2 | 44.7 | 49.0 | 78.4 | 
| 行人 | 6 088 | 18.2 | 17.9 | 28.7 | 35.2 | 38.4 | 37.8 | 47.8 | 49.6 | 56.2 | 66.2 | 
| 特征丰富层 | 聚合回归层 | 成功率 | 精确率 | 
|---|---|---|---|
| 无注意力机制 | 传统卷积 | 52.2 | 62.3 | 
| 并行式注意力机制 | 传统卷积 | 54.3 | 64.0 | 
| 分离式注意力机制 | 传统卷积 | 54.4 | 68.8 | 
| 分离式注意力机制 | 调制可变性卷积 | 58.8 | 74.9 | 
| 分离式注意力机制 | 稀疏调制可变性卷积 | 59.0 | 75.2 | 
Tab. 4 Ablation experimental results of PA-Net in feature enriching layer and aggregated regression layer on Car
| 特征丰富层 | 聚合回归层 | 成功率 | 精确率 | 
|---|---|---|---|
| 无注意力机制 | 传统卷积 | 52.2 | 62.3 | 
| 并行式注意力机制 | 传统卷积 | 54.3 | 64.0 | 
| 分离式注意力机制 | 传统卷积 | 54.4 | 68.8 | 
| 分离式注意力机制 | 调制可变性卷积 | 58.8 | 74.9 | 
| 分离式注意力机制 | 稀疏调制可变性卷积 | 59.0 | 75.2 | 
| 回归对象 | 预测值 | |
|---|---|---|
| 中心位置 | 提案前景最优置信度 | 0.987 | 
| 回归值/m | [2.500 5,11.901 0,0.752 1] | |
| 中心补偿 | 中心前景最优置信度 | 0.945 | 
| 补偿值/m | [0.034 8,0.195 4,0.015 3] | |
| 预测中心/m | [2.535 3,12.096 4,0.767 4] | |
| 真实中心/m | [2.289 0,12.072 6,0.764 7] | |
| 中心偏差/m | [0.246 3,0.023 8,0.002 7] | |
| 预测偏转角度/rad | 0.048 7 | |
| 真实偏转角度/rad | 0.058 6 | |
| 角度偏差/rad | ||
Tab. 5 Results of predicted center position and deflection angle
| 回归对象 | 预测值 | |
|---|---|---|
| 中心位置 | 提案前景最优置信度 | 0.987 | 
| 回归值/m | [2.500 5,11.901 0,0.752 1] | |
| 中心补偿 | 中心前景最优置信度 | 0.945 | 
| 补偿值/m | [0.034 8,0.195 4,0.015 3] | |
| 预测中心/m | [2.535 3,12.096 4,0.767 4] | |
| 真实中心/m | [2.289 0,12.072 6,0.764 7] | |
| 中心偏差/m | [0.246 3,0.023 8,0.002 7] | |
| 预测偏转角度/rad | 0.048 7 | |
| 真实偏转角度/rad | 0.058 6 | |
| 角度偏差/rad | ||
| 方法 | 预处理/ms | 模型推理/ms | 后处理/ms | 总时长/ms | 帧率/(frame·s-1) | 
|---|---|---|---|---|---|
| P2B | 7.0 | 14.3 | 0.9 | 22.2 | 45.0 | 
| 3D-SiamRPN | 0.5 | 40.7 | 7.2 | 48.0 | 20.8 | 
| PA-Net | 35.0 | 5.6 | 0.3 | 40.9 | 24.4 | 
Tab. 6 Running speeds of different methods on Car
| 方法 | 预处理/ms | 模型推理/ms | 后处理/ms | 总时长/ms | 帧率/(frame·s-1) | 
|---|---|---|---|---|---|
| P2B | 7.0 | 14.3 | 0.9 | 22.2 | 45.0 | 
| 3D-SiamRPN | 0.5 | 40.7 | 7.2 | 48.0 | 20.8 | 
| PA-Net | 35.0 | 5.6 | 0.3 | 40.9 | 24.4 | 
| 1 | SMEULDERS A W M, CHU D M, CUCCHIARA R, et al. Visual tracking: an experimental survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1442-1468. 10.1109/tpami.2013.230 | 
| 2 | SHAO L, SHAH P, DWARACHERLA V, et al. Motion-based object segmentation based on dense RGB-D scene flow [J]. IEEE Robotics and Automation Letters, 2018, 3(4): 3797-3804. 10.1109/lra.2018.2856525 | 
| 3 | ZHOU Y, WANG T, HU R H, et al. Multiple Kernelized Correlation Filters (MKCF) for extended object tracking using X-band marine radar data [J]. IEEE Transactions on Signal Processing, 2019, 67(14): 3676-3688. 10.1109/tsp.2019.2917812 | 
| 4 | LI C L, ZHU C L, HUANG Y, et al. Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11217. Cham: Springer, 2018: 831-847. | 
| 5 | ZHU Y B, LI C L, TANG J, et al. Quality-aware feature aggregation network for robust RGBT tracking [J]. IEEE Transactions on Intelligent Vehicles, 2021, 6(1): 121-130. 10.1109/tiv.2020.2980735 | 
| 6 | 王红艳,郑伶杰,陈献娜.简述激光雷达点云数据的处理应用[J].资源导刊,2015(S2):44-45. 10.3969/j.issn.1674-053X.2015.z2.022 | 
| WANG H Y, ZHENG L J, CHEN X N. Brief introduction of the processing application of the point cloud data of lidar [J]. Resources Guide, 2015(S2): 44-45. 10.3969/j.issn.1674-053X.2015.z2.022 | |
| 7 | GIANCOLA S, ZARZAR J, GHANEM B. Leveraging shape completion for 3D Siamese tracking [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1359-1368. 10.1109/cvpr.2019.00145 | 
| 8 | QI H Z, FENG C, CAO Z G, et al. P2B: point-to-box network for 3D object tracking in point clouds [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6328-6337. 10.1109/CVPR42600.2020.00636 | 
| 9 | QI C H, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space [C]// Proceedings of the 2017 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114. | 
| 10 | QI C H, LITANY O, HE K M, et al. Deep Hough voting for 3D object detection in point clouds [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9276-9285. 10.1109/iccv.2019.00937 | 
| 11 | FANG Z, ZHOU S F, CUI Y B, et al. 3D-SiamRPN: an end-to-end learning method for real-time 3D single object tracking using raw point cloud [J]. IEEE Sensors Journal, 2021, 21(4): 4995-5011. 10.1109/jsen.2020.3033034 | 
| 12 | ZARZAR J, GIANCOLA S, GHANEM B. Efficient tracking proposals using 2D-3D Siamese networks on LIDAR [EB/OL]. [2021-02-13]. . | 
| 13 | LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 | 
| 14 | LI B, WU W, WANG Q, et al. SiamRPN++: evolution of Siamese visual tracking with very deep networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4277-4286. 10.1109/cvpr.2019.00441 | 
| 15 | ZHOU Y, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4490-4499. 10.1109/cvpr.2018.00472 | 
| 16 | YAN Y, MAO Y X, LI B. SECOND: sparsely embedded convolutional detection [J]. Sensors, 2018, 18(10): Article No.3337. 10.3390/s18103337 | 
| 17 | LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 12689-12697. 10.1109/cvpr.2019.01298 | 
| 18 | NAM H, HA J W, KIM J. Dual attention networks for multimodal reasoning and matching [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2156-2164. 10.1109/cvpr.2017.232 | 
| 19 | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3141-3149. 10.1109/cvpr.2019.00326 | 
| 20 | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 764-773. 10.1109/iccv.2017.89 | 
| 21 | ZHU X Z, HU H, LIN S, et al. Deformable ConvNets v2: more deformable, better results [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9300-9308. 10.1109/cvpr.2019.00953 | 
| 22 | YU Y C, XIONG Y L, HUANG W Let al. Deformable Siamese attention networks for visual object tracking [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6727-6736. 10.1109/cvpr42600.2020.00676 | 
| 23 | 尚丽,苏品刚,周燕.基于改进的快速稀疏编码的图像特征提取[J].计算机应用,2013,33(3):656-659. 10.3724/SP.J.1087.2013.00656 | 
| SHANG L, SU P G, ZHOU Y. Image feature extraction based on modified fast sparse coding algorithm [J]. Journal of Computer Applications, 2013, 33(3): 656-659. 10.3724/SP.J.1087.2013.00656 | |
| 24 | LIN T Y, GOYAL P, GIRSHICK Ret al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. 10.1109/iccv.2017.324 | 
| 25 | SHAH J, QURESHI I, DENG Y M, et al. Reconstruction of sparse signals and compressively sampled images based on smooth l1-norm approximation [J]. Journal of Signal Processing Systems, 2017, 88(3): 333-344. 10.1007/s11265-016-1168-8 | 
| 26 | GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the KITTI vision benchmark suite [C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 3354-3361. 10.1109/cvpr.2012.6248074 | 
| 27 | WU Y, LIM J, YANG M H. Online object tracking: a benchmark [C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2013: 2411-2418. 10.1109/cvpr.2013.312 | 
| 28 | KINGMA D P, BA J L. Adam: a method for stochastic optimization [EB/OL]. [2021-02-03]. . | 
| [1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. | 
| [2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. | 
| [3] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. | 
| [4] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. | 
| [5] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. | 
| [6] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. | 
| [7] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. | 
| [8] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. | 
| [9] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. | 
| [10] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. | 
| [11] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. | 
| [12] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. | 
| [13] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. | 
| [14] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. | 
| [15] | Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||
