Journal of Computer Applications
Next Articles
Received:
Revised:
Accepted:
Online:
Published:
谢斌红,朱二丹,张睿
通讯作者:
基金资助:
Abstract: Video anomaly detection currently faces several challenges. First, inadequate integration of appearance and motion features in complex environments leads to missing semantic associations between the two modalities. Second, excessive reliance on prior information weakens the model's capacity for effective feature representation. To address these issues, an approach termed Appearance-Motion Collaborative Modeling for Video Anomaly Detection (AMC-VAD) was proposed. Pixel-level feature coordination was achieved through a Pixel-level Dynamic Adaptation (PDA) module, which adaptively adjusted the importance of appearance and motion features via per-pixel weighting. Multi-scale semantic information was extracted using a dual-branch depthwise separable convolution structure, while dynamic activation and residual connections were employed to enhance the semantic consistency of feature fusion. In addition, an Auxiliary Memory Module (AMM) was designed to extract prototype features from a memory pool via a query-driven semantic alignment mechanism. A dynamic aggregation strategy was incorporated to enhance the saliency of query representations, alleviating the feature degradation caused by over-reliance on prior knowledge. To further optimize the memory structure, a diversity loss was introduced to reduce redundancy among memory items and improve the model's capacity to discriminate anomalous patterns. Experimental results demonstrated that the proposed method achieved AUC scores of 98.5% on the UCSD Ped2 dataset and 88.5% on the CUHK Avenue dataset, outperforming AMMC-Net (Appearance-Motion Memory Consistency Network) by 1.9 percentage points on both benchmarks. These results validate the effectiveness of the method in complex dynamic scenarios.
Key words: Video anomaly detection, appearance-motion coordination, pixel-level dynamic adaptation, auxiliary memory, diversity loss
摘要: 摘 要: 视频异常检测目前面临以下挑战:一是在复杂环境中外观和运动信息融合不足,导致二者之间的语义关联缺失;二是模型过度依赖先验信息,导致有效特征表达能力弱化。为此,提出基于外观-运动协同建模的视频异常检测(Appearance-Motion Collaborative Modeling for Video Anomaly Detection,AMC-VAD)方法,该方法通过像素级动态适配(Pixel-level Dynamic Adaptation,PDA)模块实现外观与运动特征逐像素权重调控,利用双分支深度可分离卷积提取多尺度语义信息,并通过动态激活与残差连接增强特征融合的语义关联性;进一步设计辅助记忆(Auxiliary Memory Module,AMM)模块,采用查询驱动的语义对齐策略从记忆池提取原型特征,结合动态聚合机制强化查询特征的显著性表达,缓解先验信息覆盖导致的特征弱化;引入多样性损失函数优化记忆项分布的冗余性,提升模型对异常模式的判别能力。实验结果表明,所提方法在UCSD Ped2和CUHK Avenue数据集上的AUC分别取得了98.5%和88.5%,较AMMC-Net(Appearance-Motion Memory Consistency Network)均提升了1.9个百分点。验证了其在复杂动态场景下的有效性。
关键词: 视频异常检测, 外观-运动协同, 像素级动态适配, 辅助记忆, 多样性损失
CLC Number:
TP391.4
谢斌红 朱二丹 张睿. 基于外观-运动协同建模的视频异常检测[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025050571.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025050571