Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 960-967.DOI: 10.11772/j.issn.1001-9081.2021030372
• Multimedia computing and computer simulation • Previous Articles Next Articles
Received:
2021-03-12
Revised:
2021-06-22
Accepted:
2021-06-28
Online:
2022-04-09
Published:
2022-03-10
Contact:
Gang HUA
About author:
HU Cong, born in 1995, M. S. candidate. His research interests include computer vision understanding, deep learning.
通讯作者:
华钢
作者简介:
胡聪(1995—),男(回族),江苏徐州人,硕士研究生,主要研究方向:计算机视觉理解、深度学习;
CLC Number:
Cong HU, Gang HUA. Weakly supervised action localization method based on attention mechanism[J]. Journal of Computer Applications, 2022, 42(3): 960-967.
胡聪, 华钢. 基于注意力机制的弱监督动作定位方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 960-967.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030372
mAP/% | mAP/% | 虚拟分布空间大小 | mAP/% | ||
---|---|---|---|---|---|
0.05 | 28.6 | 5 | 28.8 | 8×8 | 26.1 |
0.10 | 29.2 | 6 | 29.5 | 64×64 | 28.9 |
0.20 | 29.9 | 7 | 29.9 | 128×128 | 29.9 |
0.30 | 27.8 | 8 | 29.6 | 256×256 | 29.0 |
9 | 29.4 | 512×512 | 28.6 |
Tab. 1 Comparison of mAP values based on IoU=0.5 using different β, α and latent space size on THUMOS14 dataset
mAP/% | mAP/% | 虚拟分布空间大小 | mAP/% | ||
---|---|---|---|---|---|
0.05 | 28.6 | 5 | 28.8 | 8×8 | 26.1 |
0.10 | 29.2 | 6 | 29.5 | 64×64 | 28.9 |
0.20 | 29.9 | 7 | 29.9 | 128×128 | 29.9 |
0.30 | 27.8 | 8 | 29.6 | 256×256 | 29.0 |
9 | 29.4 | 512×512 | 28.6 |
mAP/% | mAP/% | mAP/% | |||
---|---|---|---|---|---|
0.1 | 28.4 | 0.3 | 28.3 | 0.1 | 28.7 |
0.2 | 29.1 | 0.4 | 29.2 | 0.2 | 29.1 |
0.3 | 29.9 | 0.5 | 29.9 | 0.3 | 29.9 |
0.4 | 28.4 | 0.6 | 28.2 | 0.4 | 28.5 |
Tab. 2 Comparison of mAP values based on IoU=0.5 using different γ1 and γ2 on THUMOS14 dataset
mAP/% | mAP/% | mAP/% | |||
---|---|---|---|---|---|
0.1 | 28.4 | 0.3 | 28.3 | 0.1 | 28.7 |
0.2 | 29.1 | 0.4 | 29.2 | 0.2 | 29.1 |
0.3 | 29.9 | 0.5 | 29.9 | 0.3 | 29.9 |
0.4 | 28.4 | 0.6 | 28.2 | 0.4 | 28.5 |
是否加入动作前后帧信息 | 漏检率/% |
---|---|
加入 | 14.3 |
未加入 | 16.2 |
Tab. 3 Improvement of mAP value of adding pre- and post-information of action frame on THUMOS14 dataset
是否加入动作前后帧信息 | 漏检率/% |
---|---|
加入 | 14.3 |
未加入 | 16.2 |
IoU | mAP/% | |
---|---|---|
加入区分函数 | 未加入区分函数 | |
0.1 | 59.8 | 56.3 |
0.2 | 54.5 | 51.2 |
0.3 | 47.7 | 44.6 |
0.4 | 39.3 | 36.3 |
0.5 | 29.9 | 26.9 |
0.6 | 20.9 | 18.5 |
0.7 | 12.0 | 10.6 |
0.8 | 3.7 | 2.8 |
0.9 | 0.4 | 0.3 |
Tab. 4 Improvement of mAP of distinguishing function on THUMOS14 dataset
IoU | mAP/% | |
---|---|---|
加入区分函数 | 未加入区分函数 | |
0.1 | 59.8 | 56.3 |
0.2 | 54.5 | 51.2 |
0.3 | 47.7 | 44.6 |
0.4 | 39.3 | 36.3 |
0.5 | 29.9 | 26.9 |
0.6 | 20.9 | 18.5 |
0.7 | 12.0 | 10.6 |
0.8 | 3.7 | 2.8 |
0.9 | 0.4 | 0.3 |
模型 | 特征提取 | IoU | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | ||
AutoLoc | UNT | — | — | 35.8 | 29.0 | 21.2 | 13.4 | 5.8 | — | — |
STPN | I3D | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 | 1.2 | 0.1 |
W-TALC | I3D | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | — | 7.6 | — | — |
3C-Net | I3D | 56.8 | 49.8 | 40.9 | 32.3 | 24.6 | — | 7.7 | — | — |
BaS-Net | I3D | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 | 3.9 | 0.5 |
本文模型 | I3D | 59.8 | 54.5 | 47.7 | 39.3 | 29.9 | 20.9 | 12.0 | 3.7 | 0.4 |
Tab. 5 Comparison of mAP values of different models based on different IoU on THUMOS14 dataset
模型 | 特征提取 | IoU | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | ||
AutoLoc | UNT | — | — | 35.8 | 29.0 | 21.2 | 13.4 | 5.8 | — | — |
STPN | I3D | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 | 1.2 | 0.1 |
W-TALC | I3D | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | — | 7.6 | — | — |
3C-Net | I3D | 56.8 | 49.8 | 40.9 | 32.3 | 24.6 | — | 7.7 | — | — |
BaS-Net | I3D | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 | 3.9 | 0.5 |
本文模型 | I3D | 59.8 | 54.5 | 47.7 | 39.3 | 29.9 | 20.9 | 12.0 | 3.7 | 0.4 |
模型 | 特征提取 | IoU | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | ||
AutoLoc | UNT | 27.3 | 24.9 | 22.5 | 19.9 | 17.5 | 15.1 | 13.0 | 10.0 | 6.8 | 3.3 |
TSM | I3D | 28.3 | 26.0 | 23.6 | 21.2 | 18.9 | 17.0 | 14.0 | 11.1 | 7.5 | 3.5 |
BaS-Net | I3D | 38.5 | — | — | — | — | 24.2 | — | — | — | 5.6 |
3C-Net | I3D | 35.4 | — | — | — | 22.9 | — | — | — | — | — |
W-TALC | I3D | 37.0 | 33.5 | 30.4 | 25.7 | 14.6 | 12.7 | 10.0 | 7.0 | 4.2 | 1.5 |
本文模型 | I3D | 41.9 | 38.4 | 34.3 | 30.8 | 27.3 | 23.8 | 19.7 | 15.6 | 10.4 | 4.7 |
Tab. 6 Comparison of mAP values of different models based on different IoU on ActivityNet1.2 dataset
模型 | 特征提取 | IoU | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0.50 | 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | ||
AutoLoc | UNT | 27.3 | 24.9 | 22.5 | 19.9 | 17.5 | 15.1 | 13.0 | 10.0 | 6.8 | 3.3 |
TSM | I3D | 28.3 | 26.0 | 23.6 | 21.2 | 18.9 | 17.0 | 14.0 | 11.1 | 7.5 | 3.5 |
BaS-Net | I3D | 38.5 | — | — | — | — | 24.2 | — | — | — | 5.6 |
3C-Net | I3D | 35.4 | — | — | — | 22.9 | — | — | — | — | — |
W-TALC | I3D | 37.0 | 33.5 | 30.4 | 25.7 | 14.6 | 12.7 | 10.0 | 7.0 | 4.2 | 1.5 |
本文模型 | I3D | 41.9 | 38.4 | 34.3 | 30.8 | 27.3 | 23.8 | 19.7 | 15.6 | 10.4 | 4.7 |
1 | 王倩,范冬艳,李世玺,等.基于双流卷积神经网络的时序动作定位[J].软件导刊,2020,19(9):35-38. |
WANG Q, FAN D Y, LI S X,et al. Temporal action localization based on two-stream convolution neural network[J]. Software Guide, 2020, 19(9):35-38. | |
2 | ESCORCIA V, DAO C D, JAIN M, et al. Guess where? Actor-supervision for spatiotemporal action localization[J]. Computer Vision and Image Understanding, 2020, 192:102886. 10.1016/j.cviu.2019.102886 |
3 | LI T, BING B, WU X X. Boundary discrimination and proposal evaluation for temporal action proposal generation[J]. Multimedia Tools and Applications, 2020, 80(2):1-17. 10.1007/s11042-020-09703-x |
4 | EUM H, YOON C, LEE H, et al. Continuous human action recognition using Depth-MHI-HOG and a spotter model[J]. Sensors, 2015, 15(3):5197-5227. 10.3390/s150305197 |
5 | ZAWADZKI P, STRACY M, GINDA K, et al. The localization and action of topoisomerase IV in escherichia coli chromosome segregation is coordinated by the SMC complex MukBEF[J]. Cell Reports, 2015, 13(11):2587-2596. 10.1016/j.celrep.2015.11.034 |
6 | 石祥滨,周金成,刘翠微.基于动作模板匹配的弱监督动作定位[J].计算机应用,2019,39(8):2408-2413. |
SHI X B, ZHOU J C, LIU C W. Weakly supervised action localization based on action template matching[J]. Journal of Computer Applications, 2019, 39(8):2408-2413. | |
7 | WANG L, DUAN X H, ZHANG Q L, et al. Segment-Tube: spatio-temporal action localization in untrimmed videos with per-frame segmentation[J]. Sensors, 2018, 18(5):1657. 10.3390/s18051657 |
8 | SHEN Z, WANG F, DAI J. Weakly supervised temporal action localization by multi-stage fusion network[J]. IEEE Access, 2020, 8:1-15. 10.1109/access.2020.2967627 |
9 | LEE P, UH Y, BYUN H. Background suppression network for weakly-supervised temporal action localization[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2020:11320-11327. 10.1609/aaai.v34i07.6793 |
10 | ISLAM A, LONG C, RADKE R J. A hybrid attention mechanism for weakly-supervised temporal action localization[C]// Proceedings of the 2021 AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2021:1-9. |
11 | OGNIBENE D, CHINELLATO E, SARABIA M, et al. Contextual action recognition and target localization with an active allocation of attention on a humanoid robot[J]. Bioinspiration & Biomimetics, 2013, 8(3):035002. 10.1088/1748-3182/8/3/035002 |
12 | ZHANG C W, XU Y L, CHENG Z Z, et al. Adversarial seeded sequence growing for weakly-supervised temporal action localization[C]// Proceedings of the 27th ACM International Conference on Multimedia. New York: ACM, 2019:738-746. 10.1145/3343031.3351044 |
13 | SHIM J, KIM J. Contextualizing geneticization and medical pluralism: How variable institutionalization of Traditional, Complementary and Alternative Medicine (TCAM) conditions effects of genetic beliefs on utilization[J]. Social Science & Medicine, 2020, 267:113349. 10.1016/j.socscimed.2020.113349 |
14 | YIN X Z, NI K, REIS D. An ultra-dense 2FeFET TCAM design based on a multi-domain FeFET model[J].IEEE Transactions on Circuits and Systems II: Express Briefs, 2019, 66(9): 1577-1581. 10.1109/tcsii.2018.2889225 |
15 | NGUYEN P, RAMANAN D, FOWLKES C. Weakly-supervised action localization with back-ground modeling[C]// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019:5502-5511. 10.1109/iccv.2019.00560 |
16 | SHOU Z, WANG D G, CHANG S F. Temporal action localization in untrimmed videos via multi-stage CNNs[C]// Proceedings of the 2016 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:1049-1058. 10.1109/cvpr.2016.119 |
17 | SHI B F, DAI Q, MU Y D, et al. Weakly-supervised action localization by generative attention modeling[C]// Proceedings of the 2020 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020:1009-1019. 10.1109/cvpr42600.2020.00109 |
18 | JARADA T N, ROKNE J G, ALHAJJ R. SNF-CVAE: Computational method to predict drug-disease interactions using similarity network fusion and collective variational autoencoder[J]. Knowledge-Based Systems, 2021, 212:106585. 10.1016/j.knosys.2020.106585 |
19 | GONZALEZ J A, HURTADO L F, PLA F. TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter[J]. Neurocomputing, 2020, 426:58-69. 10.1016/j.neucom.2020.09.078 |
20 | TANG S, CHEN W, JIN L, et al. SWCNTs-based MEMS gas sensor array and its pattern recognition based on deep belief networks of gases detection in oil-immersed transformers[J]. Sensors and Actuators, 2020, 312:127998. 10.1016/j.snb.2020.127998 |
21 | SAKTHI K, Dr. NIRMAL K P. Reconfigurable parallelized TCAM architecture based on enhanced static memory cell [J]. Microprocessors and Microsystems, 2020, 76:103073. 10.1016/j.micpro.2020.103073 |
22 | YEN T P, PARK K. Ternary Content Addressable Memory (TCAM) cells with small footprint size and efficient layout aspect ratio: US6900999 B1[P]. 2005-05-31. |
23 | GAO Z, GUO L M, REN T W, et al. Pairwise two-stream ConvNets for cross-domain action recognition with small data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, PP(99):1-15. 10.1109/tnnls.2020.3041018 |
24 | WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks for action recognition in videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11):1-16. 10.1109/tpami.2018.2868668 |
25 | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 4489-4497. 10.1109/ICCV.2015.510 |
26 | SKOPINTSEV A M, DONTSOV E V, KOVTUNENKO P V, et al. The coupling of an enhanced pseudo-3D model for hydraulic fracturing with a proppant transport model[J]. Engineering Fracture Mechanics, 2020, 236(1):107177. 10.1016/j.engfracmech.2020.107177 |
27 | WU Q Y, ZHU A C, CUI R, et al. Pose-guided inflated 3D ConvNet for action recognition in videos[J]. Signal Processing: Image Communication, 2021, 91(13):116098. 10.1016/j.image.2020.116098 |
28 | SUJOY P, SOURYA R, ROY-CHOWDHURY A K. W-TALC:Weakly-supervised temporal activity localization and classfication[C]// Proceedings of the 2017 European Conference on Computer Vision. Cham: Springer, 2017:5533-5541. |
29 | NARAYAN S, CHOLAKKAL H, KHAN F, et al. 3C-Net: category count and center loss for weakly-supervised action localization[C]// Proceedings of the 2019 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:8679-8687. 10.1109/iccv.2019.00877 |
30 | NGUYEN P, LIU T, PRASAD G, et al. Weakly supervised action localization by sparse temporal pooling network[C]// Proceedings of the 2018 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6752-6761. 10.1109/cvpr.2018.00706 |
31 | SHOU Z, GAO H, ZHANG L, et al. AutoLoc: Weakly-supervised temporal action localization in untrimmed videos[C]// Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer, 2018, 1:154-171. 10.1007/978-3-030-01270-0_10 |
32 | GOODFELLOW I J, POUGET-A J, MIRZA M, et al. Generative adversarial networks[EB/OL].[2020-06-20] . 10.1145/3422622 |
33 | YU H, LI H R. A conditional factor VAE model for pump degradation assessment under varying conditions[J]. Applied Soft Computing Journal, 2021, 100(11):106992. 10.1016/j.asoc.2020.106992 |
34 | LIU D C, JIANG T T, WANG Y Z. Completeness modeling and context separation for weakly supervised temporal action localization[C]// Proceedings of the 2019 IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:1298-1307. 10.1109/cvpr.2019.00139 |
35 | PADMAVATHI K, ASHA C S, KARKI M V. A novel medical image fusion by combining TV-L1 decomposed textures based on adaptive weighting scheme[J]. Engineering Science and Technology, an International Journal, 2020, 23(1):225-239. 10.1016/j.jestch.2019.03.008 |
36 | BIKASH S, AVISHEK K S, DIPTI P M. Graph-based non-maximal suppression for detecting products on the rack[J]. Pattern Recognition Letters, 2020, 140:73-80. 10.1016/j.patrec.2020.09.023 |
37 | YU T, REN Z, LI Y C, et al. Temporal structure mining for weakly supervised learning[C]// Proceedings of the 2019 International Conference on Learning Representations. Piscataway: IEEE, 2019:5522-5531. 10.1109/iccv.2019.00562 |
38 | LEE P, ULH Y, BYUN H. Background suppression network for weakly-supervised temporal action localization[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2020:11320-11327. 10.1609/aaai.v34i07.6793 |
[1] | Zimeng ZHU, Zhixin LI, Zhan HUAN, Ying CHEN, Jiuzhen LIANG. Weakly supervised video anomaly detection based on triplet-centered guidance [J]. Journal of Computer Applications, 2024, 44(5): 1452-1457. |
[2] | Weichao DANG, Lei ZHANG, Gaimei GAO, Chunxia LIU. Weakly supervised action localization method with snippet contrastive learning [J]. Journal of Computer Applications, 2024, 44(2): 548-555. |
[3] | Qiang WANG, Xiaoming HUANG, Qiang TONG, Xiulei LIU. Weakly supervised salient object detection algorithm based on bounding box annotation [J]. Journal of Computer Applications, 2023, 43(6): 1910-1918. |
[4] | Ping LUO, Ling DING, Xue YANG, Yang XIANG. Chinese event detection based on data augmentation and weakly supervised adversarial training [J]. Journal of Computer Applications, 2022, 42(10): 2990-2995. |
[5] | Shuang DENG, Xiaohai HE, Linbo QING, Honggang CHEN, Qizhi TENG. Weakly supervised fine-grained classification method of Alzheimer’s disease based on improved visual geometry group network [J]. Journal of Computer Applications, 2022, 42(1): 302-309. |
[6] | LU Xinwei, YU Pengfei, LI Haiyan, LI Hongsong, DING Wenqian. Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling [J]. Journal of Computer Applications, 2021, 41(5): 1319-1325. |
[7] | BIAN Xiaoyong, JIANG Peiling, ZHAO Min, DING Sheng, ZHANG Xiaolong. Multi-branch neural network model based weakly supervised fine-grained image classification method [J]. Journal of Computer Applications, 2020, 40(5): 1295-1300. |
[8] | SHI Xiangbin, ZHOU Jincheng, LIU Cuiwei. Weakly supervised action localization based on action template matching [J]. Journal of Computer Applications, 2019, 39(8): 2408-2413. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||