Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 548-555.DOI: 10.11772/j.issn.1001-9081.2023020246
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Weichao DANG, Lei ZHANG(), Gaimei GAO, Chunxia LIU
Received:
2023-03-09
Revised:
2023-06-11
Accepted:
2023-06-15
Online:
2023-08-14
Published:
2024-02-10
Contact:
Lei ZHANG
About author:
DANG Weichao, born in 1974, Ph. D., associate professor. His research interests include intelligent computing, software reliability.Supported by:
通讯作者:
张磊
作者简介:
党伟超(1974—),男,山西运城人,副教授,博士,CCF会员,主要研究方向:智能计算、软件可靠性基金资助:
CLC Number:
Weichao DANG, Lei ZHANG, Gaimei GAO, Chunxia LIU. Weakly supervised action localization method with snippet contrastive learning[J]. Journal of Computer Applications, 2024, 44(2): 548-555.
党伟超, 张磊, 高改梅, 刘春霞. 融合片段对比学习的弱监督动作定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 548-555.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023020246
方法 | mAP@IoU | ||||||
---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | |
STPN[ | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 |
W-TALC[ | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | — | 7.6 |
MAAN[ | 59.8 | 50.8 | 41.1 | 30.6 | 20.3 | 12.0 | 6.9 |
BasNet[ | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 |
DGAM[ | 60.0 | 54.2 | 46.8 | 38.2 | 28.8 | 19.8 | 11.4 |
A2CL-PT[ | 61.2 | 56.1 | 48.1 | 39.0 | 30.1 | 19.2 | 10.6 |
TSCN[ | 63.4 | 57.6 | 47.8 | 37.7 | 28.7 | 19.4 | 10.2 |
MSA-Net[ | 65.6 | 60.7 | 52.3 | 41.6 | 29.7 | 20.6 | 10.1 |
HAM-Net[ | 65.4 | 59.0 | 50.3 | 41.1 | 31.0 | 20.7 | 11.1 |
EGA-Net[ | 64.5 | 58.4 | 50.0 | 41.4 | 31.5 | 21.0 | 10.7 |
ACS-Net[ | — | — | 51.4 | 42.7 | 32.4 | 22.0 | 11.7 |
DGCNN[ | 66.3 | 59.9 | 52.3 | 43.2 | 32.8 | 22.1 | 13.1 |
本文模型 | 67.7 | 62.3 | 53.5 | 43.3 | 33.9 | 22.1 | 11.1 |
Tab. 1 Detection results of different weakly-supervised action localization methods on THUMOS14 dataset
方法 | mAP@IoU | ||||||
---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | |
STPN[ | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 |
W-TALC[ | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | — | 7.6 |
MAAN[ | 59.8 | 50.8 | 41.1 | 30.6 | 20.3 | 12.0 | 6.9 |
BasNet[ | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 |
DGAM[ | 60.0 | 54.2 | 46.8 | 38.2 | 28.8 | 19.8 | 11.4 |
A2CL-PT[ | 61.2 | 56.1 | 48.1 | 39.0 | 30.1 | 19.2 | 10.6 |
TSCN[ | 63.4 | 57.6 | 47.8 | 37.7 | 28.7 | 19.4 | 10.2 |
MSA-Net[ | 65.6 | 60.7 | 52.3 | 41.6 | 29.7 | 20.6 | 10.1 |
HAM-Net[ | 65.4 | 59.0 | 50.3 | 41.1 | 31.0 | 20.7 | 11.1 |
EGA-Net[ | 64.5 | 58.4 | 50.0 | 41.4 | 31.5 | 21.0 | 10.7 |
ACS-Net[ | — | — | 51.4 | 42.7 | 32.4 | 22.0 | 11.7 |
DGCNN[ | 66.3 | 59.9 | 52.3 | 43.2 | 32.8 | 22.1 | 13.1 |
本文模型 | 67.7 | 62.3 | 53.5 | 43.3 | 33.9 | 22.1 | 11.1 |
模型 | mAP@IoU | ||
---|---|---|---|
0.5 | 0.75 | 0.95 | |
STPN[ | 29.3 | 16.9 | 2.6 |
MAAN[ | 33.7 | 21.9 | 5.5 |
TSM[ | 30.3 | 19.0 | 4.5 |
BasNet[ | 34.5 | 22.5 | 4.9 |
EGA-Net[ | 35.4 | 22.5 | 4.5 |
A2CL-PT[ | 36.8 | 22.5 | 5.2 |
BMUE[ | 37.0 | 23.9 | 5.7 |
DGCNN[ | 37.2 | 23.8 | 5.8 |
本文模型 | 40.1 | 24.0 | 6.0 |
Tab. 2 Detection results of different models on ActivityNet1.3 dataset
模型 | mAP@IoU | ||
---|---|---|---|
0.5 | 0.75 | 0.95 | |
STPN[ | 29.3 | 16.9 | 2.6 |
MAAN[ | 33.7 | 21.9 | 5.5 |
TSM[ | 30.3 | 19.0 | 4.5 |
BasNet[ | 34.5 | 22.5 | 4.9 |
EGA-Net[ | 35.4 | 22.5 | 4.5 |
A2CL-PT[ | 36.8 | 22.5 | 5.2 |
BMUE[ | 37.0 | 23.9 | 5.7 |
DGCNN[ | 37.2 | 23.8 | 5.8 |
本文模型 | 40.1 | 24.0 | 6.0 |
平衡因子 | mAP@0.5 | 平衡因子 | mAP@0.5 |
---|---|---|---|
0.005 | 32.5 | 0.050 | 33.8 |
0.007 | 33.1 | 0.070 | 33.8 |
0.010 | 33.9 | 0.100 | 33.6 |
0.030 | 33.6 |
Tab. 3 Performance comparison of different balance factors on THUMOS 14 dataset
平衡因子 | mAP@0.5 | 平衡因子 | mAP@0.5 |
---|---|---|---|
0.005 | 32.5 | 0.050 | 33.8 |
0.007 | 33.1 | 0.070 | 33.8 |
0.010 | 33.9 | 0.100 | 33.6 |
0.030 | 33.6 |
实验 | mAP@IoU/% | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.1 | 0.3 | 0.5 | 0.7 | ||||||
1 | √ | × | × | × | × | 49.9 | 32.9 | 16.6 | 5.3 |
2 | √ | × | √ | × | × | 55.9 | 41.9 | 23.0 | 7.1 |
3 | √ | √ | × | × | × | 67.4 | 50.8 | 31.5 | 10.8 |
4 | √ | √ | √ | × | × | 65.6 | 49.4 | 29.6 | 10.0 |
5 | √ | √ | √ | √ | √ | 67.7 | 53.5 | 33.9 | 11.1 |
Tab.4 Ablation experiment results of action context branch
实验 | mAP@IoU/% | ||||||||
---|---|---|---|---|---|---|---|---|---|
0.1 | 0.3 | 0.5 | 0.7 | ||||||
1 | √ | × | × | × | × | 49.9 | 32.9 | 16.6 | 5.3 |
2 | √ | × | √ | × | × | 55.9 | 41.9 | 23.0 | 7.1 |
3 | √ | √ | × | × | × | 67.4 | 50.8 | 31.5 | 10.8 |
4 | √ | √ | √ | × | × | 65.6 | 49.4 | 29.6 | 10.0 |
5 | √ | √ | √ | √ | √ | 67.7 | 53.5 | 33.9 | 11.1 |
实验 | mAP@IoU/% | ||||||
---|---|---|---|---|---|---|---|
0.1 | 0.3 | 0.5 | 0.7 | ||||
1 | √ | × | × | 65.6 | 49.4 | 29.6 | 10.0 |
2 | √ | × | √ | 65.5 | 49.7 | 29.8 | 10.1 |
3 | √ | √ | × | 66.4 | 51.5 | 32.2 | 11.0 |
4 | √ | √ | √ | 67.7 | 53.5 | 33.9 | 11.1 |
Tab.5 Ablation experiment results of attention guided loss and snippet contrast loss
实验 | mAP@IoU/% | ||||||
---|---|---|---|---|---|---|---|
0.1 | 0.3 | 0.5 | 0.7 | ||||
1 | √ | × | × | 65.6 | 49.4 | 29.6 | 10.0 |
2 | √ | × | √ | 65.5 | 49.7 | 29.8 | 10.1 |
3 | √ | √ | × | 66.4 | 51.5 | 32.2 | 11.0 |
4 | √ | √ | √ | 67.7 | 53.5 | 33.9 | 11.1 |
1 | SUN C, SHETTY S, SUKTHANKAR R, et al. Temporal localization of fine-grained actions in videos by domain transfer from web images [C]// Proceedings of the 23rd ACM International Conference on Multimedia. New York: ACM, 2015:371-380. 10.1145/2733373.2806226 |
2 | 胡聪, 华钢.基于注意力机制的弱监督动作定位方法[J].计算机应用, 2022, 42(3): 960-967. |
HU C, HUA G. Weakly supervised action localization method based on attention mechanism[J]. Journal of Computer Applications, 2022, 42(3): 960-967. | |
3 | 郭文斌, 杨兴明, 蒋哲远,等.多时间尺度一致性的弱监督时序动作定位[J].计算机工程与应用, 2023, 59(10): 151-161. 10.3778/j.issn.1002-8331.2201-0233 |
GUO W B, YANG X M, JIANG Z Y, et al. Multi-temporal scales consensus for weakly supervised temporal action localization[J]. Computer Engineering and Applications, 2023, 59(10): 151-161. 10.3778/j.issn.1002-8331.2201-0233 | |
4 | NGUYEN P, HAN B, LIU T, et al. Weakly supervised action localization by sparse temporal pooling network[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6752-6761. 10.1109/cvpr.2018.00706 |
5 | ZENG R, GAN C, CHEN P, et al. Breaking Winner-Takes-All: Iterative-Winners-Out networks for weakly supervised temporal action localization[J]. IEEE Transactions on Image Processing, 2019, 28(12):5797-5808. 10.1109/tip.2019.2922108 |
6 | SHOU Z, GAO H, ZHANG L, et al. AutoLoc: weakly-supervised temporal action localization in untrimmed videos[C]// Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer, 2018: 162-179. 10.1007/978-3-030-01270-0_10 |
7 | CHEN M, FANG Y, WANG X, et al. Diversity transfer network for Few-Shot learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34:10559-10566. 10.1609/aaai.v34i07.6628 |
8 | ZHUANG C, ZHAI A, YAMINS D. Local aggregation for unsupervised learning of visual embeddings[C]// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019: 6001-6011. 10.1109/iccv.2019.00610 |
9 | SHI B, DAI Q, MU Y, et al. Weakly-supervised action localization by generative attention modeling[C]// Proceedings of the 2020 International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1006-1016. 10.1109/cvpr42600.2020.00109 |
10 | ZHANG C, CAO M, YANG D, et al. CoLA: weakly-supervised temporal action localization with snippet contrastive learning[C]// Proceedings of the 2021 International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 16010-16019. 10.1109/cvpr46437.2021.01575 |
11 | SHOU Z, WANG D, CHANG S-F. Temporal action localization in untrimmed videos via multi-stage CNNs[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1049-1058. 10.1109/cvpr.2016.119 |
12 | ZHAO Y, XIONG Y, WANG L, et al. Temporal action detection with structured segment networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision. Piscataway: IEEE, 2017: 2933-2942. 10.1109/iccv.2017.317 |
13 | XU H, DAS A, SAENKO K. R-C3D: region convolutional 3D network for temporal activity detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5794-5803. 10.1109/iccv.2017.617 |
14 | LIN T, ZHAO X, SU H, et al. BSN: boundary sensitive network for temporal action proposal generation[C]// Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer, 2018: 3-21. 10.1007/978-3-030-01225-0_1 |
15 | LIN T, ZHAO X, SHOU Z. Single shot temporal action detection[C]// Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM, 2017: 988-996. 10.1145/3123266.3123343 |
16 | WANG L, XIONG Y, LIN D, et al. UntrimmedNets for weakly supervised action recognition and detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6402-6411. 10.1109/cvpr.2017.678 |
17 | NARAYAN S, CHOLAKKAL H, KHAN F S, et al. 3C-Net: category count and center loss for weakly-supervised action localization[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 8679-8687. 10.1109/iccv.2019.00877 |
18 | MIN K, CORSO J J. Adversarial background-aware loss for weakly-supervised temporal activity localization[C]// Proceedings of the 2020 European Conference on Computer Vision. Cham: Springer, 2020: 283-299. 10.1007/978-3-030-58568-6_17 |
19 | YUAN Y, LYU Y, SHEN X, et al. Marginalized average attentional network for weakly-supervised learning[EB/OL]. [2023-03-09]. . |
20 | 李希, 刘喜平, 李旺才,等.对比学习研究综述[J].小型微型计算机系统, 2023, 44(4): 787-797. |
LI X, LIU X P, LI W C, et al. Survey on contrastive learning research [J]. Journal of Chinese Computer Systems, 2023, 44(4): 787-797. | |
21 | HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9726-9735. 10.1109/cvpr42600.2020.00975 |
22 | CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[EB/OL]. [2023-03-09]. . |
23 | GUTMANN M, HYVÄRINEN A. Noise-contrastive estimation: a new estimation principle for unnormalized statistical models[C]// Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. New York: JMLR.org, 2010: 297-304. |
24 | ZACH C, POCK T, BISCHOF H, et al. A duality based approach for realtime TV-L 1 optical flow[C]// Proceedings of the 29th DAGM Conference on Pattern Recognition. Berlin: Springer, 2007: 214-223. 10.1007/978-3-540-74936-3_22 |
25 | KAY W, CARREIRA J, SIMONYAN K, et al. The Kinetics human action video dataset[EB/OL]. [2023-03-09]. . |
26 | CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4724-4733. 10.1109/cvpr.2017.502 |
27 | IDREES H, ZAMIR A R, JIANG Y-G, et al. The THUMOS challenge on action recognition for videos "in the wild"[J]. Computer Vision and Image Understanding, 2017, 155: 1-23. 10.1016/j.cviu.2016.10.018 |
28 | HEILBRON F C, ESCORCIA V, GHANEM B, et al. ActivityNet: a large-scale video benchmark for human activity understanding[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 961-970. 10.1109/cvpr.2015.7298698 |
29 | PAUL S, ROY S, ROY-CHOWDHURY A K. W-TALC: weakly-supervised temporal activity localization and classification [C]// Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer, 2018: 588-607. 10.1007/978-3-030-01225-0_35 |
30 | LEE P, UH Y, BYUN H. Background suppression network for weakly-supervised temporal action localization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11320-11327. 10.1609/aaai.v34i07.6793 |
31 | ZHAI Y, WANG L, TANG W, et al. Two-stream consensus network for weakly-supervised temporal action localization[C]// Proceedings of the 2020 European Conference on Computer Vision. Cham: Springer, 2020: 37-54. 10.1007/978-3-030-58539-6_3 |
32 | YANG W, ZHANG T, MAO Z, et al. Multi-scale structure-aware network for weakly supervised temporal action detection[J]. IEEE Transactions on Image Processing, 2021, 30: 5848-5861. 10.1109/tip.2021.3089361 |
33 | ISLAM A, LONG C, RADKE R. A hybrid attention mechanism for weakly-supervised temporal action localization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2): 1637-1645. 10.1609/aaai.v35i2.16256 |
34 | CHENG Y, SUN Y, FAN H, et al. Entropy guided attention network for weakly-supervised action localization[J]. Pattern Recognition, 2022, 129: 108718. 10.1016/j.patcog.2022.108718 |
35 | LIU Z, WANG L, ZHANG Q, et al. ACSNet: action-context separation network for weakly supervised temporal action localization [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(3): 2233-2241. 10.1609/aaai.v35i3.16322 |
36 | SHI H, ZHANG X-Y, LI C, et al. Dynamic graph modeling for weakly-supervised temporal action localization[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM, 2022: 3820-3828. 10.1145/3503161.3548077 |
37 | KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. [2023-03-09]. . |
38 | YU T, REN Z, LI Y, et al. Temporal structure mining for weakly supervised action detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 5521-5530. 10.1109/iccv.2019.00562 |
39 | LEE P, WANG J, LU Y, et al. Background modeling via uncertainty estimation for weakly-supervised action localization[EB/OL]. [2023-03-09]. . |
[1] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[2] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[5] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[6] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[7] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[8] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[9] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[10] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[11] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
[12] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[13] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[14] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[15] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||