Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3884-3890.DOI: 10.11772/j.issn.1001-9081.2021091636
• Multimedia computing and computer simulation • Previous Articles Next Articles
Xiao LYU1, Huihui SONG2(), Jiaqing FAN1
Received:
2021-09-17
Revised:
2022-01-11
Accepted:
2022-01-19
Online:
2022-12-21
Published:
2022-12-10
Contact:
Huihui SONG
About author:
LYU Xiao, born in 1996, M. S. candidate. His research interests include video object segmentation, video object tracking.Supported by:
通讯作者:
宋慧慧
作者简介:
吕潇(1996—),男,江苏泰州人,硕士研究生,主要研究方向:视频目标分割、视频目标跟踪基金资助:
CLC Number:
Xiao LYU, Huihui SONG, Jiaqing FAN. Semi-supervised video object segmentation via deep and shallow representations fusion[J]. Journal of Computer Applications, 2022, 42(12): 3884-3890.
吕潇, 宋慧慧, 樊佳庆. 深浅层表示融合的半监督视频目标分割[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3884-3890.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021091636
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 81.4 | 82.1 | 81.8 | 14.30 | ― |
文献[ | 88.7 | 89.9 | 89.3 | 6.25 | ― |
文献[ | 81.1 | 82.2 | 81.7 | 2.20 | ― |
文献[ | 74.0 | 72.9 | 73.5 | 7.14 | ― |
文献[ | 84.9 | 88.6 | 86.8 | 0.01 | ― |
文献[ | 85.6 | 87.5 | 86.6 | 0.22 | ― |
文献[ | 86.1 | 84.9 | 85.5 | 0.08 | ― |
文献[ | 82.6 | 83.6 | 83.1 | 39.00 | ― |
FRTM | 83.7 | 83.4 | 83.6 | 21.90 | 18.17 |
本文算法 | 85.5 | 86.3 | 85.9 | ― | 17.76 |
Tab. 1 Evaluation results of different algorithms on DAVIS 2016 validation set
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 81.4 | 82.1 | 81.8 | 14.30 | ― |
文献[ | 88.7 | 89.9 | 89.3 | 6.25 | ― |
文献[ | 81.1 | 82.2 | 81.7 | 2.20 | ― |
文献[ | 74.0 | 72.9 | 73.5 | 7.14 | ― |
文献[ | 84.9 | 88.6 | 86.8 | 0.01 | ― |
文献[ | 85.6 | 87.5 | 86.6 | 0.22 | ― |
文献[ | 86.1 | 84.9 | 85.5 | 0.08 | ― |
文献[ | 82.6 | 83.6 | 83.1 | 39.00 | ― |
FRTM | 83.7 | 83.4 | 83.6 | 21.90 | 18.17 |
本文算法 | 85.5 | 86.3 | 85.9 | ― | 17.76 |
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 67.2 | 72.7 | 70.0 | 14.30 | ― |
文献[ | 79.2 | 84.3 | 81.8 | 6.25 | ― |
文献[ | 69.1 | 74.0 | 71.5 | 2.20 | ― |
文献[ | 52.5 | 57.1 | 54.8 | 7.14 | ― |
文献[ | 73.9 | 81.7 | 77.8 | 0.01 | ― |
文献[ | 64.7 | 71.3 | 68.0 | 0.22 | ― |
文献[ | 64.5 | 71.2 | 67.9 | 0.08 | ― |
文献[ | 68.6 | 76.0 | 72.3 | 39.00 | ― |
FRTMT | 73.8 | 79.6 | 76.7 | 21.90 | 18.17 |
本文算法 | 75.0 | 80.5 | 77.8 | ― | 17.76 |
Tab. 2 Evaluation results of different algorithms on DAVIS 2017 validation set
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 67.2 | 72.7 | 70.0 | 14.30 | ― |
文献[ | 79.2 | 84.3 | 81.8 | 6.25 | ― |
文献[ | 69.1 | 74.0 | 71.5 | 2.20 | ― |
文献[ | 52.5 | 57.1 | 54.8 | 7.14 | ― |
文献[ | 73.9 | 81.7 | 77.8 | 0.01 | ― |
文献[ | 64.7 | 71.3 | 68.0 | 0.22 | ― |
文献[ | 64.5 | 71.2 | 67.9 | 0.08 | ― |
文献[ | 68.6 | 76.0 | 72.3 | 39.00 | ― |
FRTMT | 73.8 | 79.6 | 76.7 | 21.90 | 18.17 |
本文算法 | 75.0 | 80.5 | 77.8 | ― | 17.76 |
算法 | J | F | 综合指标g | ||
---|---|---|---|---|---|
可见 | 未见 | 可见 | 未见 | ||
文献[ | 67.8 | 60.8 | 69.5 | 66.2 | 66.1 |
文献[ | 60.1 | 46.1 | 62.7 | 51.4 | 55.2 |
文献[ | 71.4 | 56.5 | ― | ― | 66.9 |
文献[ | ― | ― | ― | ― | 68.2 |
本文算法 | 68.0 | 60.7 | 71.3 | 68.4 | 67.1 |
Tab. 3 Evaluation results of different algorithms on YouTube-VOS validation set
算法 | J | F | 综合指标g | ||
---|---|---|---|---|---|
可见 | 未见 | 可见 | 未见 | ||
文献[ | 67.8 | 60.8 | 69.5 | 66.2 | 66.1 |
文献[ | 60.1 | 46.1 | 62.7 | 51.4 | 55.2 |
文献[ | 71.4 | 56.5 | ― | ― | 66.9 |
文献[ | ― | ― | ― | ― | 68.2 |
本文算法 | 68.0 | 60.7 | 71.3 | 68.4 | 67.1 |
模型 | J&F | 模型 | J&F |
---|---|---|---|
Base | 81.4 | Base+Fuse | 85.2 |
Base+EHOA | 84.6 | Base+EHOA+Fuse | 85.9 |
Tab. 4 Ablation experimental results
模型 | J&F | 模型 | J&F |
---|---|---|---|
Base | 81.4 | Base+Fuse | 85.2 |
Base+EHOA | 84.6 | Base+EHOA+Fuse | 85.9 |
模型 | λ1 | λ2 | λ3 | J&F/% |
---|---|---|---|---|
HOA | ― | ― | ― | 85.0 |
EHOA | 1.0 | 0.0 | 0.0 | 84.1 |
0.0 | 1.0 | 0.0 | 84.6 | |
0.0 | 0.0 | 1.0 | 84.8 | |
0.1 | 0.2 | 0.7 | 85.1 | |
0.2 | 0.3 | 0.5 | 85.9 | |
0.3 | 0.3 | 0.4 | 85.3 | |
0.6 | 0.3 | 0.1 | 85.0 |
Tab. 5 Comparison of experimental results of EHOA and HOA models
模型 | λ1 | λ2 | λ3 | J&F/% |
---|---|---|---|---|
HOA | ― | ― | ― | 85.0 |
EHOA | 1.0 | 0.0 | 0.0 | 84.1 |
0.0 | 1.0 | 0.0 | 84.6 | |
0.0 | 0.0 | 1.0 | 84.8 | |
0.1 | 0.2 | 0.7 | 85.1 | |
0.2 | 0.3 | 0.5 | 85.9 | |
0.3 | 0.3 | 0.4 | 85.3 | |
0.6 | 0.3 | 0.1 | 85.0 |
层特征 | J&F/% | 层特征 | J&F/% |
---|---|---|---|
Layer2 | 85.9 | Layer4 | 85.0 |
Layer3 | 85.4 | Layer5 | 84.8 |
Tab. 6 Comparison of experimental results with features of different layers
层特征 | J&F/% | 层特征 | J&F/% |
---|---|---|---|
Layer2 | 85.9 | Layer4 | 85.0 |
Layer3 | 85.4 | Layer5 | 84.8 |
1 | GUO J M, LI Z W, CHEONG L F, et al. Video co-segmentation for meaningful action extraction[C]// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 2232-2239. 10.1109/iccv.2013.278 |
2 | 杨天明,陈志,岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899, 915. 10.11772/j.issn.1001-9081.2017071740 |
YANG T M, CHEN Z, YUE W J. Spatio-temporal two-stream human action recognition model based on video deep learning[J]. Journal of Computer Applications, 2018, 38(3): 895-899, 915. 10.11772/j.issn.1001-9081.2017071740 | |
3 | 胡学敏,童秀迟,郭琳,等. 基于深度视觉注意神经网络的端到端自动驾驶模型[J]. 计算机应用, 2020, 40(7): 1926-1931. 10.11772/j.issn.1001-9081.2019112054 |
HU X M, TONG X C, GUO L, et al. End-to-end autonomous driving model based on deep visual attention neural network[J]. Journal of Computer Applications, 2020, 40(7): 1926-1931. 10.11772/j.issn.1001-9081.2019112054 | |
4 | SALEH K, HOSSNY M, NAHAVANDI S. Kangaroo vehicle collision detection using deep semantic segmentation convolutional neural network[C]// Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications. Piscataway: IEEE, 2016: 1-7. 10.1109/dicta.2016.7797057 |
5 | OH S W, LEE J Y, SUNKAVALLI K, et al. Fast video object segmentation by reference-guided mask propagation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7376-7385. 10.1109/cvpr.2018.00770 |
6 | CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5320-5329. 10.1109/cvpr.2017.565 |
7 | MANINIS K K, CAELLES S, CHEN Y H, et al. Video object segmentation without temporal information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(6): 1515-1530. 10.1109/tpami.2018.2838670 |
8 | PERAZZI F, KHOREVA A, BENENSON R, et al. Learning video object segmentation from static images[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3491-3500. 10.1109/cvpr.2017.372 |
9 | XU N, YANG L J, FAN Y C, et al. YouTube-VOS: sequence-to-sequence video object segmentation[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11209. Cham: Springer, 2018: 603-619. |
10 | VOIGTLAENDER P, LEIBE B. Online adaptation of convolutional neural networks for video object segmentation[C]// Proceedings of the 2017 British Machine Vision Conference. Durham: BMVA Press, 2017: No.116. 10.5244/c.31.116 |
11 | OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9225-9234. 10.1109/iccv.2019.00932 |
12 | VOIGTLAENDER P, CHAI Y N, SCHROFF F, et al. FEELVOS: fast end-to-end embedding learning for video object segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9473-9482. 10.1109/cvpr.2019.00971 |
13 | HU Y T, HUANG J B, SCHWING A G. VideoMatch: matching based video object segmentation[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11212. Cham: Springer, 2018: 56-73. |
14 | 王宁,宋慧慧,张开华. 基于距离加权重叠度估计与椭圆拟合优化的精确目标跟踪算法[J]. 计算机应用, 2021, 41(4): 1100-1105. 10.11772/j.issn.1001-9081.2020060869 |
WANG N, SONG H H, ZHANG K H. Accurate object tracking algorithm based on distance weighting overlap prediction and ellipse fitting optimization[J]. Journal of Computer Applications, 2021, 41(4): 1100-1150. 10.11772/j.issn.1001-9081.2020060869 | |
15 | WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: a unifying approach[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1328-1338. 10.1109/cvpr.2019.00142 |
16 | LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 |
17 | PERAZZI F, PONT-TUSET J, McWILLIAMS B, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 724-732. 10.1109/cvpr.2016.85 |
18 | PONT-TUSET J, PERAZZI F, CAELLES S, et al. The 2017 DAVIS challenge on video object segmentation[EB/OL]. (2018-03-01) [2021-04-03].. 10.1109/cvpr.2017.565 |
19 | CHEN X, LI Z X, YUAN Y, et al. State-aware tracker for real-time video object segmentation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9381-9390. 10.1109/cvpr42600.2020.00940 |
20 | ROBINSON A, JÄREMO LAWIN F, DANELLJAN M, et al. Learning fast and robust target models for video object segmentation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7404-7413. 10.1109/cvpr42600.2020.00743 |
21 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
22 | DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: accurate tracking by overlap maximization[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4655-4664. 10.1109/cvpr.2019.00479 |
23 | CHEN B H, DENG W H, HU J N. Mixed high-order attention network for person re-identification[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 371-381. 10.1109/iccv.2019.00046 |
24 | LI W, ZHU X T, GONG S G. Person re-identification by deep joint learning of multi-loss classification[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 2194-2200. 10.24963/ijcai.2017/305 |
25 | LIU J X, NI B B, YAN Y C, et al. Pose transferrable person re-identification[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4099-4108. 10.1109/cvpr.2018.00431 |
26 | ZHONG Z, ZHENG L, CAO D L, et al. Re-ranking person re-identification with k-reciprocal encoding[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3652-3661. 10.1109/cvpr.2017.389 |
27 | XIANG X Y, TIAN Y P, ZHANG Y L, et al. Zooming Slow-Mo: fast and accurate one-stage space-time video super-resolution[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3367-3376. 10.1109/cvpr42600.2020.00343 |
28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. |
29 | XU N, YANG L J, FAN Y C, et al. YouTube-VOS: a large-scale video object segmentation benchmark[EB/OL]. (2018-09-06) [2021-08-22].. 10.1007/978-3-030-01228-1_36 |
30 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2021-08-22].. |
31 | JOHNANDER J, DANELLJAN M, BRISSMAN E, et al. A generative appearance model for end-to-end video object segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8945-8954. 10.1109/cvpr.2019.00916 |
32 | YANG L J, WANG Y R, XIONG X H, et al. Efficient video object segmentation via network modulation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6499-6507. 10.1109/cvpr.2018.00680 |
33 | LUITEN J, VOIGTLAENDER P, LEIBE B. PReMVOS: proposal-generation, refinement and merging for video object segmentation[C]// Proceedings of the 2018 Asian Conference on Computer Vision, LNCS 11364. Cham: Springer, 2019: 565-580. |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[3] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[4] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[5] | Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885. |
[6] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[7] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[8] | Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU. Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration [J]. Journal of Computer Applications, 2024, 44(9): 2802-2809. |
[9] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[10] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[11] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[12] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[13] | Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399. |
[14] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[15] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||