Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1551-1559.DOI: 10.11772/j.issn.1001-9081.2025050571

• Multimedia computing and computer simulation • Previous Articles    

Appearance-motion collaborative modeling for video anomaly detection

Binhong XIE, Erdan ZHU(), Rui ZHANG   

  1. School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan Shanxi 030024,China
  • Received:2025-05-26 Revised:2025-08-14 Accepted:2025-08-26 Online:2025-08-28 Published:2026-05-10
  • Contact: Erdan ZHU
  • About author:XIE Binhong, born in 1971, M. S., professor. His research interests include intelligent software engineering, machine learning.
    ZHANG Rui, born in 1987, Ph. D., professor. His research interests include intelligent information processing.
  • Supported by:
    Fundamental Research Program of Shanxi Province(20210302123216);Key Research and Development Program for the Introduction of High-Level Scientific and Technological Talents in Lvliang City(2022RC08);Shanxi Province Industry-Education Integration Postgraduate Joint Training Demonstration Base Project(2022JD11)

基于外观-运动协同建模的视频异常检测

谢斌红, 朱二丹(), 张睿   

  1. 太原科技大学 计算机科学与技术学院,太原 030024
  • 通讯作者: 朱二丹
  • 作者简介:谢斌红(1971—),男,山西万荣人,教授,硕士,CCF会员,主要研究方向:智能化软件工程、机器学习
    张睿(1987—),男,山西太原人,教授,博士,主要研究方向:智能信息处理。
  • 基金资助:
    山西省基础研究计划项目(面上)(20210302123216);吕梁市引进高层次科技人才重点研发项目(2022RC08);山西省产教融合研究生联合培养示范基地项目(2022JD11)

Abstract:

Video anomaly detection currently faces several challenges. First, insufficient integration of appearance and motion information in complex environments results in a lack of semantic associations between the two modalities. Second, excessive reliance on prior information weakens the model's capacity for effective feature representation. Therefore, an Appearance-Motion Collaborative modeling for Video Anomaly Detection (AMC-VAD) method was proposed. It achieved pixel-level appearance-motion feature weight adjustment through a Pixel-level Dynamic Adaptation (PDA) module, used a dual-branch DepthWise Separable Convolution (DWSConv) to extract multi-scale semantic information, and enhanced the semantic relevance of feature fusion through dynamic activation and residual connection. In addition, an Auxiliary Memory Module (AMM) was designed to extract prototype features from a memory pool via a query-driven semantic alignment strategy, and a Dynamic Aggregation Mechanism (DAM) was incorporated to enhance the query feature saliency representations, alleviating the feature weakening caused by prior information coverage. A diversity loss was introduced to reduce redundancy in memory item distribution, thereby enhancing the model's discriminative ability for abnormal patterns. Experimental results showed that the proposed method achieved Area Under the receiver operating Characteristic curve (AUC) of 98.5% and 88.5% on the UCSD Ped2 and CUHK Avenue datasets, respectively, outperforming AMMC-Net (Appearance-Motion Memory Consistency Network) by 1.9 and 1.9 percentage points, respectively. The above validates the effectiveness of the proposed method in complex dynamic scenarios.

Key words: video anomaly detection, appearance-motion coordination, semantic association, auxiliary memory, diversity loss

摘要:

视频异常检测目前面临以下挑战:一是在复杂环境中外观和运动信息融合不足,导致二者之间的语义关联缺失;二是模型过度依赖先验信息,导致有效特征表达能力弱化。因此,提出基于外观-运动协同建模的视频异常检测(AMC-VAD)方法。该方法通过像素级动态适配(PDA)模块实现外观与运动特征的逐像素权重调控,利用双分支深度可分离卷积(DWSConv)提取多尺度语义信息,并通过动态激活与残差连接增强特征融合的语义关联性;设计辅助记忆模块(AMM),采用查询驱动的语义对齐策略从记忆池提取原型特征,并结合动态聚合机制(DAM)强化查询特征的显著性表达,缓解先验信息覆盖导致的特征弱化;引入多样性损失函数优化记忆项分布的冗余性,提升模型对异常模式的判别能力。实验结果表明,AMC-VAD方法在UCSD Ped2和CUHK Avenue数据集上的接受者操作特征曲线下面积(AUC)分别为98.5%和88.5%,比AMMC-Net(Appearance-Motion Memory Consistency Network)分别提升了1.9和1.9个百分点,验证了该方法在复杂动态场景下的有效性。

关键词: 视频异常检测, 外观-运动协同, 语义关联, 辅助记忆, 多样性损失

CLC Number: