Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1644-1654.DOI: 10.11772/j.issn.1001-9081.2023060796
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles
Ziwen SUN1, Lizhi QIAN1, Chuandong YANG1, Yibo GAO1, Qingyang LU2, Guanglin YUAN2()
Received:
2023-06-21
Revised:
2023-09-04
Accepted:
2023-09-11
Online:
2023-10-27
Published:
2024-05-10
Contact:
Guanglin YUAN
About author:
SUN Ziwen, born in 1996, Ph. D. candidate. His research interests include computer vision.Supported by:
孙子文1, 钱立志1, 杨传栋1, 高一博1, 陆庆阳2, 袁广林2()
通讯作者:
袁广林
作者简介:
孙子文(1996—),男,安徽蒙城人,博士研究生,主要研究方向:计算机视觉基金资助:
CLC Number:
Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN. Survey of visual object tracking methods based on Transformer[J]. Journal of Computer Applications, 2024, 44(5): 1644-1654.
孙子文, 钱立志, 杨传栋, 高一博, 陆庆阳, 袁广林. 基于Transformer的视觉目标跟踪方法综述[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1644-1654.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023060796
方法 | 出处 | LaSOT | TrackingNet | GOT-10k | OTB2015 | NFS | UAV123 | VOT2020 | FPS | Params/MB | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUC/% | P/% | AUC/% | P/% | AO | SR0.5 | SR0.75 | AUC/% | AUC/% | AUC/% | EAO | Acc/% | Rob | ||||
DTT[ | ICCV21 | 53.8 | 74 | 68.8 | 63.4 | 74.9 | 51.4 | 60.8 | 54.5 | |||||||
TrTr-offline[ | CVPR21 | 46.3 | 69.3 | 69.1 | 55.2 | 59.4 | 50 | 33.9 | ||||||||
TrTr-online[ | 55.1 | 71 | 71.5 | 63.1 | 65.2 | 35.3 | 21 | |||||||||
TransT[ | ICCV21 | 64.9 | 69 | 81.4 | 80.3 | 72.3 | 82.4 | 68.2 | 69.4 | 65.7 | 69.1 | 50 | 35.5 | |||
TrSiam[ | ICCV21 | 62.4 | 60 | 78.1 | 72.7 | 67.3 | 78.7 | 58.6 | 70.8 | 65.8 | 67.4 | 35.6 | 52 | |||
TrDiMP[ | 63.9 | 61.4 | 78.4 | 73.1 | 68.8 | 80.5 | 59.7 | 71.1 | 66.5 | 67.5 | 26.3 | 55.2 | ||||
STARK-ST50[ | ICCV21 | 66.6 | 70.8 | 81.3 | 68.0 | 77.7 | 62.3 | 66.2 | 68.2 | 0.308 | 0.478 | 0.799 | 41.8 | 23.5 | ||
STARK-ST101[ | 67.1 | 77.0 | 82.0 | 68.8 | 78.1 | 64.1 | 0.303 | 0.481 | 0.775 | 31.7 | 42.4 | |||||
E.T.Track[ | CVPR22 | 59.1 | 75.0 | 70.6 | 67.8 | 59.0 | 62.3 | 0.267 | 0.432 | 0.741 | 47.2 | |||||
HCAT[ | CVPR22 | 59.1 | 60.7 | 76.6 | 72.9 | 65.3 | 76.8 | 57 | 68.1 | 63.6 | 63.6 | 0.276 | 0.455 | 0.747 | 195 | |
TransT_H[ | 66.2 | 70.7 | 82.2 | 80.4 | 72.4 | 82.0 | 68.5 | |||||||||
TransT-M[ | CVPR22 | 65.4 | 69.6 | 82.5 | 80.0 | 74.7 | 85.5 | 71.3 | 68.9 | 66.2 | 70.9 | 0.550 | 0.742 | 0.869 | 42.7 | 23.1 |
SparseTT[ | IJCAI22 | 66.0 | 70.1 | 81.7 | 79.5 | 69.3 | 79.1 | 63.8 | 70.4 | 70.4 | 40.1 | 58.3 | ||||
CSWinTT[ | ICCV22 | 66.2 | 70.9 | 81.9 | 79.5 | 69.4 | 78.9 | 65.4 | 70.5 | 0.304 | 0.480 | 0.787 | 12 | |||
ToMP-50[ | CVPR22 | 67.6 | 72.2 | 81.2 | 78.6 | 70.1 | 66.9 | 69 | 0.297 | 0.453 | 0.789 | |||||
ToMP-101[ | 68.5 | 73.5 | 81.5 | 78.9 | 70.1 | 66.7 | 66.9 | 0.309 | 0.453 | 0.814 | ||||||
AiATrack[ | ECCV22 | 69.0 | 73.8 | 82.7 | 80.4 | 69.6 | 80.0 | 63.2 | 69.6 | 67.9 | 70.6 | 0.530 | 0.764 | 0.827 | 38 | 23.6 |
DualTFR[ | ICCV21 | 63.5 | 66.5 | 80.1 | 73.5 | 84.8 | 69.9 | 68.2 | 0.528 | 0.755 | 0.836 | 44.1 | ||||
SwinTrack-B[ | CVPR21 | 69.6 | 74.1 | 82.5 | 80.4 | 69.4 | 78 | 64.3 | 52.2 | 91.4 | ||||||
SwinTrack-B-384[ | 70.2 | 75.3 | 84.0 | 83.2 | 45 | 101.3 | ||||||||||
SFTransT[ | CoRR22 | 69.0 | 73.9 | 82.9 | 81.3 | 72.7 | 84.3 | 66.9 | 70.3 | 66.0 | 71.3 | 27.3 | 29.61 | |||
Sim-L/14[ | ECCV21 | 70.5 | 76.2 | 83.4 | 87.4 | 69.8 | 78.8 | 66.0 | 71.2 | 103.1 | ||||||
OSTrack-256[ | ECCV22 | 69.1 | 75.2 | 83.1 | 82.0 | 71.0 | 80.4 | 68.2 | 64.7 | 66.5 | 105.4 | |||||
OSTrack-384[ | 71.1 | 77.6 | 83.9 | 83.2 | 73.7 | 83.2 | 70.8 | 68.3 | 70.7 | 58.1 | ||||||
ProContEXT[ | CoRR22 | 84.6 | 83.8 | 74.6 | 84.7 | 72.9 | 26.4 | 118.5 | ||||||||
VideoTrack[ | CVPR23 | 70.2 | 76.4 | 83.8 | 83.1 | 72.9 | 81.9 | 69.8 | 69.7 | 81.4 | ||||||
TATrack-L[ | IAAI23 | 71.1 | 76.1 | 85.0 | 84.5 | 79.2 | 88.6 | 78.3 | 6.6 | 112.8 | ||||||
GRM[ | CVPR23 | 69.9 | 75.8 | 84.0 | 83.3 | 73.4 | 82.9 | 70.4 | 65.6 | 70.2 | 45 | |||||
GRM-L320[ | 71.4 | 77.9 | 84.4 | 84.0 | ||||||||||||
DropTrack[ | CVPR23 | 71.8 | 78.1 | 84.1 | 83.0 | 75.9 | 86.8 | 72.0 | 69.6 | |||||||
MixFormer[ | CVPR22 | 67.9 | 73.9 | 82.6 | 81.2 | 73.2 | 83.2 | 70.2 | 68.7 | 0.527 | 0.746 | 0.833 | 29 | 53.3 | ||
MixFormer-L[ | 70.1 | 76.3 | 83.9 | 83.1 | 75.6 | 85.7 | 72.8 | 69.5 | 0.555 | 0.762 | 0.855 | 25 | 215.7 | |||
MixViT-L (ConvMAE)[ | CVPR23 | 73.3 | 80.3 | 86.1 | 86.0 | 75.4 | 84.0 | 75.4 | 70.7 | 70.0 | 0.567 | 0.747 | 0.870 | 10 | 286.9 | |
SeqTrack-B256[ | CVPR23 | 69.9 | 76.3 | 83.3 | 82.2 | 74.7 | 84.7 | 71.8 | 67.6 | 69.2 | 0.520 | 40 | 89 | |||
SeqTrack-B384[ | 71.5 | 77.8 | 83.9 | 83.6 | 74.5 | 84.3 | 71.4 | 66.7 | 68.6 | 0.522 | 15 | 89 | ||||
SeqTrack-L256[ | 72.1 | 79.0 | 85.0 | 84.9 | 74.5 | 83.2 | 72.0 | 66.9 | 69.7 | 0.555 | 15 | 309 | ||||
SeqTrack-L384[ | 72.5 | 79.3 | 85.5 | 85.8 | 74.8 | 81.9 | 72.2 | 66.2 | 68.5 | 0.561 | 5 | 309 | ||||
ARTrack256[ | CVPR23 | 70.4 | 76.6 | 84.2 | 83.5 | 73.5 | 82.2 | 70.9 | 64.3 | 67.7 | ||||||
ARTrack384[ | 72.6 | 79.1 | 85.1 | 84.8 | 75.5 | 84.3 | 74.3 | 66.8 | 70.5 | |||||||
ARTrack-L384[ | 73.1 | 80.3 | 85.6 | 86.0 | 78.5 | 87.4 | 77.8 | 67.9 | 71.2 | 26 |
Tab.1 Comparison of results of Transformer based object tracking methods in common datasets
方法 | 出处 | LaSOT | TrackingNet | GOT-10k | OTB2015 | NFS | UAV123 | VOT2020 | FPS | Params/MB | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUC/% | P/% | AUC/% | P/% | AO | SR0.5 | SR0.75 | AUC/% | AUC/% | AUC/% | EAO | Acc/% | Rob | ||||
DTT[ | ICCV21 | 53.8 | 74 | 68.8 | 63.4 | 74.9 | 51.4 | 60.8 | 54.5 | |||||||
TrTr-offline[ | CVPR21 | 46.3 | 69.3 | 69.1 | 55.2 | 59.4 | 50 | 33.9 | ||||||||
TrTr-online[ | 55.1 | 71 | 71.5 | 63.1 | 65.2 | 35.3 | 21 | |||||||||
TransT[ | ICCV21 | 64.9 | 69 | 81.4 | 80.3 | 72.3 | 82.4 | 68.2 | 69.4 | 65.7 | 69.1 | 50 | 35.5 | |||
TrSiam[ | ICCV21 | 62.4 | 60 | 78.1 | 72.7 | 67.3 | 78.7 | 58.6 | 70.8 | 65.8 | 67.4 | 35.6 | 52 | |||
TrDiMP[ | 63.9 | 61.4 | 78.4 | 73.1 | 68.8 | 80.5 | 59.7 | 71.1 | 66.5 | 67.5 | 26.3 | 55.2 | ||||
STARK-ST50[ | ICCV21 | 66.6 | 70.8 | 81.3 | 68.0 | 77.7 | 62.3 | 66.2 | 68.2 | 0.308 | 0.478 | 0.799 | 41.8 | 23.5 | ||
STARK-ST101[ | 67.1 | 77.0 | 82.0 | 68.8 | 78.1 | 64.1 | 0.303 | 0.481 | 0.775 | 31.7 | 42.4 | |||||
E.T.Track[ | CVPR22 | 59.1 | 75.0 | 70.6 | 67.8 | 59.0 | 62.3 | 0.267 | 0.432 | 0.741 | 47.2 | |||||
HCAT[ | CVPR22 | 59.1 | 60.7 | 76.6 | 72.9 | 65.3 | 76.8 | 57 | 68.1 | 63.6 | 63.6 | 0.276 | 0.455 | 0.747 | 195 | |
TransT_H[ | 66.2 | 70.7 | 82.2 | 80.4 | 72.4 | 82.0 | 68.5 | |||||||||
TransT-M[ | CVPR22 | 65.4 | 69.6 | 82.5 | 80.0 | 74.7 | 85.5 | 71.3 | 68.9 | 66.2 | 70.9 | 0.550 | 0.742 | 0.869 | 42.7 | 23.1 |
SparseTT[ | IJCAI22 | 66.0 | 70.1 | 81.7 | 79.5 | 69.3 | 79.1 | 63.8 | 70.4 | 70.4 | 40.1 | 58.3 | ||||
CSWinTT[ | ICCV22 | 66.2 | 70.9 | 81.9 | 79.5 | 69.4 | 78.9 | 65.4 | 70.5 | 0.304 | 0.480 | 0.787 | 12 | |||
ToMP-50[ | CVPR22 | 67.6 | 72.2 | 81.2 | 78.6 | 70.1 | 66.9 | 69 | 0.297 | 0.453 | 0.789 | |||||
ToMP-101[ | 68.5 | 73.5 | 81.5 | 78.9 | 70.1 | 66.7 | 66.9 | 0.309 | 0.453 | 0.814 | ||||||
AiATrack[ | ECCV22 | 69.0 | 73.8 | 82.7 | 80.4 | 69.6 | 80.0 | 63.2 | 69.6 | 67.9 | 70.6 | 0.530 | 0.764 | 0.827 | 38 | 23.6 |
DualTFR[ | ICCV21 | 63.5 | 66.5 | 80.1 | 73.5 | 84.8 | 69.9 | 68.2 | 0.528 | 0.755 | 0.836 | 44.1 | ||||
SwinTrack-B[ | CVPR21 | 69.6 | 74.1 | 82.5 | 80.4 | 69.4 | 78 | 64.3 | 52.2 | 91.4 | ||||||
SwinTrack-B-384[ | 70.2 | 75.3 | 84.0 | 83.2 | 45 | 101.3 | ||||||||||
SFTransT[ | CoRR22 | 69.0 | 73.9 | 82.9 | 81.3 | 72.7 | 84.3 | 66.9 | 70.3 | 66.0 | 71.3 | 27.3 | 29.61 | |||
Sim-L/14[ | ECCV21 | 70.5 | 76.2 | 83.4 | 87.4 | 69.8 | 78.8 | 66.0 | 71.2 | 103.1 | ||||||
OSTrack-256[ | ECCV22 | 69.1 | 75.2 | 83.1 | 82.0 | 71.0 | 80.4 | 68.2 | 64.7 | 66.5 | 105.4 | |||||
OSTrack-384[ | 71.1 | 77.6 | 83.9 | 83.2 | 73.7 | 83.2 | 70.8 | 68.3 | 70.7 | 58.1 | ||||||
ProContEXT[ | CoRR22 | 84.6 | 83.8 | 74.6 | 84.7 | 72.9 | 26.4 | 118.5 | ||||||||
VideoTrack[ | CVPR23 | 70.2 | 76.4 | 83.8 | 83.1 | 72.9 | 81.9 | 69.8 | 69.7 | 81.4 | ||||||
TATrack-L[ | IAAI23 | 71.1 | 76.1 | 85.0 | 84.5 | 79.2 | 88.6 | 78.3 | 6.6 | 112.8 | ||||||
GRM[ | CVPR23 | 69.9 | 75.8 | 84.0 | 83.3 | 73.4 | 82.9 | 70.4 | 65.6 | 70.2 | 45 | |||||
GRM-L320[ | 71.4 | 77.9 | 84.4 | 84.0 | ||||||||||||
DropTrack[ | CVPR23 | 71.8 | 78.1 | 84.1 | 83.0 | 75.9 | 86.8 | 72.0 | 69.6 | |||||||
MixFormer[ | CVPR22 | 67.9 | 73.9 | 82.6 | 81.2 | 73.2 | 83.2 | 70.2 | 68.7 | 0.527 | 0.746 | 0.833 | 29 | 53.3 | ||
MixFormer-L[ | 70.1 | 76.3 | 83.9 | 83.1 | 75.6 | 85.7 | 72.8 | 69.5 | 0.555 | 0.762 | 0.855 | 25 | 215.7 | |||
MixViT-L (ConvMAE)[ | CVPR23 | 73.3 | 80.3 | 86.1 | 86.0 | 75.4 | 84.0 | 75.4 | 70.7 | 70.0 | 0.567 | 0.747 | 0.870 | 10 | 286.9 | |
SeqTrack-B256[ | CVPR23 | 69.9 | 76.3 | 83.3 | 82.2 | 74.7 | 84.7 | 71.8 | 67.6 | 69.2 | 0.520 | 40 | 89 | |||
SeqTrack-B384[ | 71.5 | 77.8 | 83.9 | 83.6 | 74.5 | 84.3 | 71.4 | 66.7 | 68.6 | 0.522 | 15 | 89 | ||||
SeqTrack-L256[ | 72.1 | 79.0 | 85.0 | 84.9 | 74.5 | 83.2 | 72.0 | 66.9 | 69.7 | 0.555 | 15 | 309 | ||||
SeqTrack-L384[ | 72.5 | 79.3 | 85.5 | 85.8 | 74.8 | 81.9 | 72.2 | 66.2 | 68.5 | 0.561 | 5 | 309 | ||||
ARTrack256[ | CVPR23 | 70.4 | 76.6 | 84.2 | 83.5 | 73.5 | 82.2 | 70.9 | 64.3 | 67.7 | ||||||
ARTrack384[ | 72.6 | 79.1 | 85.1 | 84.8 | 75.5 | 84.3 | 74.3 | 66.8 | 70.5 | |||||||
ARTrack-L384[ | 73.1 | 80.3 | 85.6 | 86.0 | 78.5 | 87.4 | 77.8 | 67.9 | 71.2 | 26 |
1 | 黄凯奇,陈晓棠,康运锋,等.智能视频监控技术综述[J].计算机学报,2015,38(6):1093-1118. 10.11897/SP.J.1016.2015.01093 |
HUANG K Q, CHEN X T, KANG Y F, et al. Intelligent visual surveillance: a review[J]. Chinese Journal of Computers, 2015, 38(6): 1093-1118. 10.11897/SP.J.1016.2015.01093 | |
2 | 钱夔,宋爱国.一种改进型机器人仿生认知神经网络[J].电子学报,2015,43(6):1084-1089. 10.3969/j.issn.0372-2112.2015.06.007 |
QIAN K, SONG A G. An improved bionic cognitive neural network for robot[J]. Acta Electronica Sinica, 2015, 43(6): 1084-1089. 10.3969/j.issn.0372-2112.2015.06.007 | |
3 | 刘彩虹,张磊,黄华.交通路口监控视频跨视域多目标跟踪的可视化[J].计算机学报,2018,41(1):221-235. 10.11897/SP.J.1016.2018.00221 |
LIU C H, ZHANG L, HUANG H. Visualization of cross-view multi-object tracking for surveillance videos in crossroad[J]. Chinese Journal of Computers, 2018, 41(1): 221-235. 10.11897/SP.J.1016.2018.00221 | |
4 | 梁永强,王崴,瞿珏,等.基于眼动特征的人机交互行为意图预测模型[J].电子学报,2018,46(12):2993-3001. 10.3969/j.issn.0372-2112.2018.12.024 |
LIANG Y Q, WANG W, QU Y, et al. Human-computer interaction behavior and intention prediction model based on eye movement characteristics[J]. Acta Electronica Sinica, 2018, 46(12): 2993-3001. 10.3969/j.issn.0372-2112.2018.12.024 | |
5 | 葛宝义,左宪章,胡永江.视觉目标跟踪方法研究综述[J].中国图象图形学报,2018,23(8):1091-1107. 10.11834/jig.170604 |
GE B Y, ZUO X Z, HU Y J. Review of visual object tracking technology[J]. Journal of Image and Graphics, 2018, 23(8): 1091-1107. 10.11834/jig.170604 | |
6 | 孟琭,杨旭.目标跟踪算法综述[J].自动化学报,2019,45(7):1244-1260. 10.16383/j.aas.c180277 |
MENG L, YANG X. A survey of object tracking algorithms[J]. Acta Automatica Sinica, 2019, 45(7): 1244-1260. 10.16383/j.aas.c180277 | |
7 | AVIDAN S. Support vector tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(8): 1064-1072. 10.1109/tpami.2004.53 |
8 | NING J, YANG J, JIANG S, et al. Object tracking via dual linear structured SVM and explicit feature map[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE, 2016 :4266-4274. 10.1109/cvpr.2016.462 |
9 | ZHANG T, GHANEM B, LIU S, et al. Robust visual tracking via multi task sparse learning [C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 2042-2049. 10.1109/cvpr.2012.6247908 |
10 | BABENKO B, YANG M-H, BELONGIE S. Visual tracking with online multiple instance learning[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 983-990. 10.1109/cvpr.2009.5206737 |
11 | BOLME D S, BEVERIDGE J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]// Proceeding of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2010: 2544-2550. 10.1109/cvpr.2010.5539960 |
12 | HENRIQUES J F, CASEIRO R, MARTINS P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]// Proceedings of the 12th European Conference on Computer Vision. Berlin: Springer, 2012: 702-715. 10.1007/978-3-642-33765-9_50 |
13 | HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. 10.1109/tpami.2014.2345390 |
14 | DANELLJAN M HÄGER G, KHAN F, et al. Accurate scale estimation for robust visual tracking[C]// Proceedings of the British Machine Vision Conference 2014. Durham: British Machine Vision Association, 2014: 1-11. 10.5244/c.28.65 |
15 | LI Y, ZHU J. A scale adaptive kernel correlation filter tracker with feature integration[C]// Proceedings of the 2014 European Conference on Computer Vision Workshops. Cham: Springer, 2015: 254-265. 10.1007/978-3-319-16181-5_18 |
16 | LUKEŽIČ A, VOJÍŘ T, ČEHOVIN ZAJC L, et al. Discriminative correlation filter tracker with channel and spatial reliability[J]. International Journal of Computer Vision, 2018, 126: 671-688. 10.1007/s11263-017-1061-3 |
17 | DANELLJAN M, HÄGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop. Piscataway: IEEE,2015:621-629. 10.1109/iccvw.2015.84 |
18 | MA C, HUANG J-B, YANG X, et al. Hierarchical convolutional features for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 3074-3082. 10.1109/iccv.2015.352 |
19 | GALOOGAHI H K, FAGG A, LUCEY S. Learning background-sware correlation filters for visual tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE,2017:1144-1152. 10.1109/iccv.2017.129 |
20 | DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond correlation filters: learning continuous convolution operators for visual tracking[C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 472-488. 10.1007/978-3-319-46454-1_29 |
21 | DANELLJAN M, BHAT G, FAHAD S K, et al. ECO: efficient convolution operators for tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 6931-6939. 10.1109/cvpr.2017.733 |
22 | BHAT G, JOHNANDER J, DANELLJAN M, et al. Unveiling the power of deep tracking[C]// Proceedings of the 15th European Conference on Computer Vision. Piscataway: IEEE, 2018: 493-509. 10.1007/978-3-030-01216-8_30 |
23 | VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
24 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsberg: ACL, 2019: 4171-4186. 10.18653/v1/n18-2 |
25 | TONG C, PENG H, DAI Q, et al. Improving natural language understanding by reverse mapping Bytepair encoding[C]// Proceedings of the 23rd Conference on Computational Natural Language Learning. Stroudsberg: ACL, 2019:163-173. 10.18653/v1/k19-1016 |
26 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C/OL]// Proceedings of the 9th International Conference on Learning Representations. [S.l.]: ICLR, 2020 [2023-05-30]. . |
27 | CHEN C-F R, FAN Q, PANADA R. CrossViT: cross-attention multi-scale vision transformer for image classification[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 347-356. 10.1109/iccv48922.2021.00041 |
28 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]// Proceeding of the 16th European Conference on Computer Vision. Cham: Springer, 2020:213-229. 10.1007/978-3-030-58452-8_13 |
29 | STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: Transformer for semantic segmentation[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:7242-7252. 10.1109/iccv48922.2021.00717 |
30 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:9992-10002. 10.1109/iccv48922.2021.00986 |
31 | TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image Transformers & distillation through attention[C]// Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021:10347-10357. 10.1109/iccv48922.2021.00091 |
32 | ZHOU D, KANG B, JIN X, et al. DeepViT: towards deeper vision Transformer[EB/OL]. [2023-05-30]. . |
33 | LI Y, ZHANG K, CAO J, et al. LocalViT: bringing locality to vision Transformers[EB/OL]. [2023-05-30]. . 10.1109/iros55552.2023.10342025 |
34 | CHOPRA S, HADSELL R, LeCUN Y. Learning a similarity metric discriminatively, with application to face verification[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005:539-546. |
35 | TAO R, GAVVES E, SMEULDERS A W M. Siamese instance search for tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1420-1429. 10.1109/cvpr.2016.158 |
36 | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]// Proceeding of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 850-865. 10.1007/978-3-319-48881-3_56 |
37 | YU B, TANG M, ZHENG L, et al. High-performance discriminative tracking with Transformers[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9836-9845. 10.1109/iccv48922.2021.00971 |
38 | ZHAO M, OKADA K, INABA M. TrTr: visual tracking with Transformer[EB/OL]. [2023-05-30]. . |
39 | CHEN X, YAN B, ZHU J, et al. Transformer tracking[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021:8122-8131. 10.1109/cvpr46437.2021.00803 |
40 | CHEN X, KANG B, WANG D, et al. Efficient visual tracking via hierarchical cross-attention Transformer[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer 2022: 461-477. 10.1007/978-3-031-25085-9_26 |
41 | CHEN X, YAN B, ZHU J, et al. High-performance Transformer tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8507-8523. |
42 | WANG N, ZHOU W, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021:1571-1580. 10.1109/cvpr46437.2021.00162 |
43 | YAN B, PENG H, FU J, et al. Learning spatio-temporal Transformer for visual tracking[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10428-10437. 10.1109/iccv48922.2021.01028 |
44 | BLATTER P, KANAKIS M, DANELLJAN M, et al. Efficient visual tracking with exemplar Transformers[C]// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 1571-1581. 10.1109/wacv56688.2023.00162 |
45 | FU Z, FU Z, LIU Q, et al. SparseTT: visual tracking with sparse Transformers[C]// Proceedings of 31st the International Joint Conference on Artificial Intelligence. California: IJCAI, 2022:905-912. 10.24963/ijcai.2022/127 |
46 | SONG Z, YU J, CHEN Y-P, et al. Transformer tracking with cyclic shifting window attention[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022:8781-8790. 10.1109/cvpr52688.2022.00859 |
47 | MAYER C, DANELLJAN M, BHAT G, et al. Transforming model prediction for tracking[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 8721-8730. 10.1109/cvpr52688.2022.00853 |
48 | GAO S, ZHOU C, MA C, et al. AiATrack: attention in attention for Transformer visual tracking[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022:146-164. 10.1007/978-3-031-20047-2_9 |
49 | XIE F, WANG C, WANG G, et al. Learning tracking representations via dual-branch fully Transformer networks[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE,2021:2688-2697. 10.1109/iccvw54120.2021.00303 |
50 | LIN L, FAN H, XU Y, et al. SwinTrack: a simple and strong baseline for Transformer tracking[EB/OL]. [2023-05-30]. . 10.48550/arXiv.2112.00995 |
51 | TANG C, WANG X, BAI Y, et al. Learning spatial-frequency Transformer for visual object tracking[EB/OL]. [2023-05-30]. . 10.1109/tcsvt.2023.3249468 |
52 | HE K, ZHANG C, XIE S, et al. Target-aware tracking with long-term context attention[C]// Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2023: 773-780. 10.1609/aaai.v37i1.25155 |
53 | CHEN Q, WU Q, WANG J, et al. MixFormer: mixing features across windows and dimensions[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 5239-5249. 10.1109/cvpr52688.2022.00518 |
54 | CUI T, JIANG C, WU G, et al. MixFormer: end-to-end tracking with iterative mixed attention[EB/OL]. [2023-05-30]. . 10.1109/cvpr52688.2022.01324 |
55 | CHEN B, LI P, BAI L, et al. Backbone is all you need: a simplified architecture for visual object tracking[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 375-392. 10.1007/978-3-031-20047-2_22 |
56 | YE B, CHANG H, MA B, et al. Joint feature learning and relation modeling for tracking: a one-stream framework[C]// Proceedings of the 17th European Conference on Computer Vision. Cham: Springer, 2022: 341-357. 10.1007/978-3-031-20047-2_20 |
57 | LAN J-P, CHENG Z-Q, HE J-Y, et al. ProContEXT: exploring progressive context Transformer for tracking[EB/OL]. [2023-05-30]. . 10.1109/icassp49357.2023.10094971 |
58 | XIE F, CHU L, LI J, et al. VideoTrack: learning to track objects via video Transformer[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 22826-22835. 10.1109/cvpr52729.2023.02186 |
59 | GAO S, ZHOU C, ZHANG J. Generalized relation modeling for Transformer tracking [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18686-18695. 10.1109/cvpr52729.2023.01792 |
60 | WU Q, YANG T, LIU Z, et al. DropMAE: masked autoencoders with spatial-attention dropout for tracking tasks[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14561-14571. 10.1109/cvpr52729.2023.01399 |
61 | CHEN X, PENG H, WANG D, et al. SeqTrack: sequence to sequence learning for visual object tracking [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023:14572-14581. 10.1109/cvpr52729.2023.01400 |
62 | WEI X, BAI Y, ZHENG Y, et al. Autoregressive visual tracking [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 9697-9706. 10.1109/cvpr52729.2023.00935 |
63 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
64 | FAN H, LING L, YANG F, et al. LaSOT: a high quality benchmark for large scale single object tracking[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5369-5378. 10.1109/cvpr.2019.00552 |
65 | MÜLLER M, BIBI A, GIANCOLA S, et al. TrackingNet: a large scale dataset and benchmark for object tracking in the wild[C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018:310-327. 10.1007/978-3-030-01246-5_19 |
66 | HUANG L, ZHAO X, HUANG K. GOT-10k: a large high diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5):1562-1577. 10.1109/tpami.2019.2957464 |
67 | WU Y, LIM J, YANG M-H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848. 10.1109/tpami.2014.2388226 |
68 | GALOOGHI H K, FAGG A, HUANG C, et al. Need for speed: a benchmark for higher frame rate object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1134-1143. 10.1109/iccv.2017.128 |
69 | MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 445-461. 10.1007/978-3-319-46448-0_27 |
70 | KRISTAN M, LEONARDIS A, MATAS J, et al. The eighth visual object tracking VOT2020 Challenge results[C]// Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 547-601. |
71 | KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1097-1105. |
72 | XIA M, ZHONG Z, CHEN D. Structured pruning learns compact and accurate models[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsberg: ACL, 2022: 1513-1528. 10.18653/v1/2022.acl-long.107 |
73 | WANG W, WEI F, LI D, et al. MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained Transformers[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 5776-5788. |
74 | SO D R, LE Q V, LIANG C. The evolved Transformer[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR, 2019:5877-5886. |
75 | WU Z, LIU Z, LIN J, et al. Lite Transformer with long-short range attention[C/OL]// Proceedings of the 8th International Conference on Learning Representations. [S.l.]: ICLR, 2020 [2023-05-30]. . |
76 | CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision Transformers[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9630-9640. 10.1109/iccv48922.2021.00951 |
77 | XIE Z, LIN Y, YAO Z, et al. Self-supervised learning with Swin Transformers[EB/OL]. [2023-05-30]. . |
78 | ATITO S, AWAIS M, KITTLER J. SiT: self-supervised vision Transformer[EB/OL]. [2023-05-30]. . 10.1109/icip49359.2023.10222150 |
79 | CHEFER H, GUR S, WOLF L. Transformer interpretability beyond attention visualization[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognitio. Piscataway: IEEE, 2021: 782-791. 10.1109/cvpr46437.2021.00084 |
80 | SU H, YE Y, CHEN Z, et al. Re-attention Transformer for weakly supervised object localization[C]// Proceedings of the 33rd British Machine Vision Conference. Durham: BMVA Press, 2022:70. |
81 | XIE W, LI X-H, CAO C C, et al. ViT-CX: causal explanation of vision Transformers[C]// Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. California: IJCAI, 2023: 1569-1577. 10.24963/ijcai.2023/174 |
82 | MOHEBBI H, ZUIDEMA W, CHRUPAŁA G, et al. Quantifying context mixing in Transformers[C]// Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistic. Stroudsberg: ACL, 2023: 3378-3400. 10.18653/v1/2023.eacl-main.245 |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[3] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[4] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[5] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[6] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[7] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[8] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[9] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[10] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[11] | Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380. |
[12] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[13] | Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579. |
[14] | Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603. |
[15] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||