《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (2): 548-555.DOI: 10.11772/j.issn.1001-9081.2023020246

• 多媒体计算与计算机仿真 • 上一篇    

融合片段对比学习的弱监督动作定位方法

党伟超, 张磊(), 高改梅, 刘春霞   

  1. 太原科技大学 计算机科学与技术学院,太原 030024
  • 收稿日期:2023-03-09 修回日期:2023-06-11 接受日期:2023-06-15 发布日期:2023-08-14 出版日期:2024-02-10
  • 通讯作者: 张磊
  • 作者简介:党伟超(1974—),男,山西运城人,副教授,博士,CCF会员,主要研究方向:智能计算、软件可靠性
    高改梅(1978—),女,山西吕梁人,副教授,博士,CCF会员,主要研究方向:网络安全、密码学
    刘春霞(1977—),女,山西大同人,副教授,硕士,CCF会员,主要研究方向:软件工程、数据库。
  • 基金资助:
    太原科技大学博士科研启动基金资助项目(20202063);太原科技大学研究生教育创新项目(SY2022063)

Weakly supervised action localization method with snippet contrastive learning

Weichao DANG, Lei ZHANG(), Gaimei GAO, Chunxia LIU   

  1. College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan Shanxi 030024,China
  • Received:2023-03-09 Revised:2023-06-11 Accepted:2023-06-15 Online:2023-08-14 Published:2024-02-10
  • Contact: Lei ZHANG
  • About author:DANG Weichao, born in 1974, Ph. D., associate professor. His research interests include intelligent computing, software reliability.
    GAO Gaimei, born in 1978, Ph. D., associate professor. Her research interests include network security, cryptography.
    LIU Chunxia, born in 1977, M. S., associate professor. Her research interests include software engineering, database.
  • Supported by:
    Doctoral Research Start-up Fund of Taiyuan University of Science and Technology(20202063);Graduate Education Innovation Project of Taiyuan University of Science and Technology(SY2022063)

摘要:

针对现有基于注意力机制的弱监督动作定位方法对动作边界处的片段容易错误分类的问题,提出一种融合片段对比学习的弱监督动作定位方法。首先,引入三个分支的注意力机制,分别测量每个视频帧是动作实例、上下文以及背景的可能性;其次,基于得到的注意力值构建对应分支的类激活序列;然后,通过片段挖掘算法构造正负样本对;最后,利用片段对比学习引导网络将模糊片段正确归类。实验结果表明,当交并比(IoU)取值0.5时,在THUMOS14与ActivityNet1.3两个公共数据集上,所提方法的平均检测精度(mAP)分别达到了33.9%和40.1%,相较于DGCNN(Dynamic Graph modeling for weakly-supervised temporal action localization Convolutional Neural Network)弱监督动作定位模型在上述两个数据集上分别提升1.1和2.9个百分点,验证了所提方法的有效性。

关键词: 弱监督, 对比学习, 时序动作定位, 注意力机制, 类激活序列

Abstract:

A weakly supervised action localization method, which integrated snippet contrastive learning, was proposed to address the issue of misclassification of snippets at action boundaries in existing attention-based methods. First, an attention mechanism with three branches was introduced to measure the possibility of each video frame being an action instance, context, or background. Second, the Class Activation Sequences (CAS) corresponding to each branch were constructed based on the obtained attention values. Then, positive and negative sample pairs were generated using a snippet mining algorithm. Finally, the network was guided through snippet contrastive learning to correctly classify hard snippets. Experimental results indicated that at an Intersection over Union (IoU) of 0.5, the mean Average Precisions (mAP) of the proposed method on THUMOS14 and ActivityNet1.3 datasets are 33.9% and 40.1% respectively, with improvements of 1.1 and 2.9 percentage points compared to the DGCNN (Dynamic Graph modeling for weakly-supervised temporal action localization Convolutional Neural Network) weakly supervised action localization model, validating the effectiveness of the proposed method.

Key words: weakly-supervised, contrastive learning, temporal action localization, attention mechanism, class activation sequence

中图分类号: