Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 548-555.DOI: 10.11772/j.issn.1001-9081.2023020246

• Multimedia computing and computer simulation • Previous Articles    

Weakly supervised action localization method with snippet contrastive learning

Weichao DANG, Lei ZHANG(), Gaimei GAO, Chunxia LIU   

  1. College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan Shanxi 030024,China
  • Received:2023-03-09 Revised:2023-06-11 Accepted:2023-06-15 Online:2023-08-14 Published:2024-02-10
  • Contact: Lei ZHANG
  • About author:DANG Weichao, born in 1974, Ph. D., associate professor. His research interests include intelligent computing, software reliability.
    GAO Gaimei, born in 1978, Ph. D., associate professor. Her research interests include network security, cryptography.
    LIU Chunxia, born in 1977, M. S., associate professor. Her research interests include software engineering, database.
  • Supported by:
    Doctoral Research Start-up Fund of Taiyuan University of Science and Technology(20202063);Graduate Education Innovation Project of Taiyuan University of Science and Technology(SY2022063)


党伟超, 张磊(), 高改梅, 刘春霞   

  1. 太原科技大学 计算机科学与技术学院,太原 030024
  • 通讯作者: 张磊
  • 作者简介:党伟超(1974—),男,山西运城人,副教授,博士,CCF会员,主要研究方向:智能计算、软件可靠性
  • 基金资助:


A weakly supervised action localization method, which integrated snippet contrastive learning, was proposed to address the issue of misclassification of snippets at action boundaries in existing attention-based methods. First, an attention mechanism with three branches was introduced to measure the possibility of each video frame being an action instance, context, or background. Second, the Class Activation Sequences (CAS) corresponding to each branch were constructed based on the obtained attention values. Then, positive and negative sample pairs were generated using a snippet mining algorithm. Finally, the network was guided through snippet contrastive learning to correctly classify hard snippets. Experimental results indicated that at an Intersection over Union (IoU) of 0.5, the mean Average Precisions (mAP) of the proposed method on THUMOS14 and ActivityNet1.3 datasets are 33.9% and 40.1% respectively, with improvements of 1.1 and 2.9 percentage points compared to the DGCNN (Dynamic Graph modeling for weakly-supervised temporal action localization Convolutional Neural Network) weakly supervised action localization model, validating the effectiveness of the proposed method.

Key words: weakly-supervised, contrastive learning, temporal action localization, attention mechanism, class activation sequence


针对现有基于注意力机制的弱监督动作定位方法对动作边界处的片段容易错误分类的问题,提出一种融合片段对比学习的弱监督动作定位方法。首先,引入三个分支的注意力机制,分别测量每个视频帧是动作实例、上下文以及背景的可能性;其次,基于得到的注意力值构建对应分支的类激活序列;然后,通过片段挖掘算法构造正负样本对;最后,利用片段对比学习引导网络将模糊片段正确归类。实验结果表明,当交并比(IoU)取值0.5时,在THUMOS14与ActivityNet1.3两个公共数据集上,所提方法的平均检测精度(mAP)分别达到了33.9%和40.1%,相较于DGCNN(Dynamic Graph modeling for weakly-supervised temporal action localization Convolutional Neural Network)弱监督动作定位模型在上述两个数据集上分别提升1.1和2.9个百分点,验证了所提方法的有效性。

关键词: 弱监督, 对比学习, 时序动作定位, 注意力机制, 类激活序列

CLC Number: