Appearance-Motion Collaborative Modeling for Video Anomaly Detection

doi:10.11772/j.issn.1001-9081.2025050571

Journal of Computer Applications

Received:2025-05-22 Revised:2025-08-14 Accepted:2025-08-26 Online:2025-08-28 Published:2025-08-28

基于外观-运动协同建模的视频异常检测

谢斌红,朱二丹,张睿

太原科技大学

通讯作者: 朱二丹
基金资助:
山西省基础研究计划项目（面上）;山西省产教融合研究生联合培养示范基地项目;吕梁市引进高层次科技人才重点研发项目

Abstract

Abstract: Video anomaly detection currently faces several challenges. First, inadequate integration of appearance and motion features in complex environments leads to missing semantic associations between the two modalities. Second, excessive reliance on prior information weakens the model's capacity for effective feature representation. To address these issues, an approach termed Appearance-Motion Collaborative Modeling for Video Anomaly Detection (AMC-VAD) was proposed. Pixel-level feature coordination was achieved through a Pixel-level Dynamic Adaptation (PDA) module, which adaptively adjusted the importance of appearance and motion features via per-pixel weighting. Multi-scale semantic information was extracted using a dual-branch depthwise separable convolution structure, while dynamic activation and residual connections were employed to enhance the semantic consistency of feature fusion. In addition, an Auxiliary Memory Module (AMM) was designed to extract prototype features from a memory pool via a query-driven semantic alignment mechanism. A dynamic aggregation strategy was incorporated to enhance the saliency of query representations, alleviating the feature degradation caused by over-reliance on prior knowledge. To further optimize the memory structure, a diversity loss was introduced to reduce redundancy among memory items and improve the model's capacity to discriminate anomalous patterns. Experimental results demonstrated that the proposed method achieved AUC scores of 98.5% on the UCSD Ped2 dataset and 88.5% on the CUHK Avenue dataset, outperforming AMMC-Net (Appearance-Motion Memory Consistency Network) by 1.9 percentage points on both benchmarks. These results validate the effectiveness of the method in complex dynamic scenarios.

Key words: Video anomaly detection, appearance-motion coordination, pixel-level dynamic adaptation, auxiliary memory, diversity loss

摘要： 摘要: 视频异常检测目前面临以下挑战：一是在复杂环境中外观和运动信息融合不足，导致二者之间的语义关联缺失；二是模型过度依赖先验信息，导致有效特征表达能力弱化。为此，提出基于外观-运动协同建模的视频异常检测（Appearance-Motion Collaborative Modeling for Video Anomaly Detection，AMC-VAD）方法，该方法通过像素级动态适配（Pixel-level Dynamic Adaptation，PDA）模块实现外观与运动特征逐像素权重调控，利用双分支深度可分离卷积提取多尺度语义信息，并通过动态激活与残差连接增强特征融合的语义关联性；进一步设计辅助记忆（Auxiliary Memory Module，AMM）模块，采用查询驱动的语义对齐策略从记忆池提取原型特征，结合动态聚合机制强化查询特征的显著性表达，缓解先验信息覆盖导致的特征弱化；引入多样性损失函数优化记忆项分布的冗余性，提升模型对异常模式的判别能力。实验结果表明，所提方法在UCSD Ped2和CUHK Avenue数据集上的AUC分别取得了98.5%和88.5%，较AMMC-Net（Appearance-Motion Memory Consistency Network）均提升了1.9个百分点。验证了其在复杂动态场景下的有效性。

关键词: 视频异常检测, 外观-运动协同, 像素级动态适配, 辅助记忆, 多样性损失

CLC Number:

TP391.4

谢斌红朱二丹张睿. 基于外观-运动协同建模的视频异常检测[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025050571.

[1]	Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO. Video anomaly detection for moving foreground regions [J]. Journal of Computer Applications, 2025, 45(4): 1300-1309.
[2]	Pengcheng SONG, Lijun GUO, Rong ZHANG. Weakly supervised video anomaly detection with local-global temporal dependency [J]. Journal of Computer Applications, 2025, 45(1): 240-246.
[3]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[4]	Qing JIA, Laihua WANG, Weisheng WANG. Anomaly detection in video via independently recurrent neural network and variational autoencoder network [J]. Journal of Computer Applications, 2023, 43(2): 507-513.