Appearance-motion collaborative modeling for video anomaly detection

doi:10.11772/j.issn.1001-9081.2025050571

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1551-1559.DOI: 10.11772/j.issn.1001-9081.2025050571

• Multimedia computing and computer simulation • Previous Articles

Appearance-motion collaborative modeling for video anomaly detection

Binhong XIE, Erdan ZHU(), Rui ZHANG

School of Computer Science and Technology，Taiyuan University of Science and Technology，Taiyuan Shanxi 030024，China

Received:2025-05-26 Revised:2025-08-14 Accepted:2025-08-26 Online:2025-08-28 Published:2026-05-10
Contact: Erdan ZHU
About author:XIE Binhong， born in 1971， M. S.， professor. His research interests include intelligent software engineering， machine learning.
ZHANG Rui， born in 1987， Ph. D.， professor. His research interests include intelligent information processing.
Supported by:
Fundamental Research Program of Shanxi Province(20210302123216);Key Research and Development Program for the Introduction of High-Level Scientific and Technological Talents in Lvliang City(2022RC08);Shanxi Province Industry-Education Integration Postgraduate Joint Training Demonstration Base Project(2022JD11)

基于外观-运动协同建模的视频异常检测

谢斌红, 朱二丹(), 张睿

太原科技大学计算机科学与技术学院，太原 030024

通讯作者: 朱二丹
作者简介:谢斌红（1971—），男，山西万荣人，教授，硕士，CCF会员，主要研究方向：智能化软件工程、机器学习
张睿（1987—），男，山西太原人，教授，博士，主要研究方向：智能信息处理。
基金资助:
山西省基础研究计划项目（面上）(20210302123216);吕梁市引进高层次科技人才重点研发项目(2022RC08);山西省产教融合研究生联合培养示范基地项目(2022JD11)

Abstract

Abstract:

Video anomaly detection currently faces several challenges. First， insufficient integration of appearance and motion information in complex environments results in a lack of semantic associations between the two modalities. Second， excessive reliance on prior information weakens the model's capacity for effective feature representation. Therefore， an Appearance-Motion Collaborative modeling for Video Anomaly Detection （AMC-VAD） method was proposed. It achieved pixel-level appearance-motion feature weight adjustment through a Pixel-level Dynamic Adaptation （PDA） module， used a dual-branch DepthWise Separable Convolution （DWSConv） to extract multi-scale semantic information， and enhanced the semantic relevance of feature fusion through dynamic activation and residual connection. In addition， an Auxiliary Memory Module （AMM） was designed to extract prototype features from a memory pool via a query-driven semantic alignment strategy， and a Dynamic Aggregation Mechanism （DAM） was incorporated to enhance the query feature saliency representations， alleviating the feature weakening caused by prior information coverage. A diversity loss was introduced to reduce redundancy in memory item distribution， thereby enhancing the model's discriminative ability for abnormal patterns. Experimental results showed that the proposed method achieved Area Under the receiver operating Characteristic curve （AUC） of 98.5% and 88.5% on the UCSD Ped2 and CUHK Avenue datasets， respectively， outperforming AMMC-Net （Appearance-Motion Memory Consistency Network） by 1.9 and 1.9 percentage points， respectively. The above validates the effectiveness of the proposed method in complex dynamic scenarios.

Key words: video anomaly detection, appearance-motion coordination, semantic association, auxiliary memory, diversity loss

摘要：

视频异常检测目前面临以下挑战：一是在复杂环境中外观和运动信息融合不足，导致二者之间的语义关联缺失；二是模型过度依赖先验信息，导致有效特征表达能力弱化。因此，提出基于外观-运动协同建模的视频异常检测（AMC-VAD）方法。该方法通过像素级动态适配（PDA）模块实现外观与运动特征的逐像素权重调控，利用双分支深度可分离卷积（DWSConv）提取多尺度语义信息，并通过动态激活与残差连接增强特征融合的语义关联性；设计辅助记忆模块（AMM），采用查询驱动的语义对齐策略从记忆池提取原型特征，并结合动态聚合机制（DAM）强化查询特征的显著性表达，缓解先验信息覆盖导致的特征弱化；引入多样性损失函数优化记忆项分布的冗余性，提升模型对异常模式的判别能力。实验结果表明，AMC-VAD方法在UCSD Ped2和CUHK Avenue数据集上的接受者操作特征曲线下面积（AUC）分别为98.5%和88.5%，比AMMC-Net（Appearance-Motion Memory Consistency Network）分别提升了1.9和1.9个百分点，验证了该方法在复杂动态场景下的有效性。

关键词: 视频异常检测, 外观-运动协同, 语义关联, 辅助记忆, 多样性损失

CLC Number:

TP391.4

Binhong XIE, Erdan ZHU, Rui ZHANG. Appearance-motion collaborative modeling for video anomaly detection[J]. Journal of Computer Applications, 2026, 46(5): 1551-1559.

谢斌红, 朱二丹, 张睿. 基于外观-运动协同建模的视频异常检测[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1551-1559.

Figures/Tables 11

Fig. 1 Network architecture of AMC-VAD

Tab. 1 Comparison of anomaly detection results of AMC-VAD and mainstream methods on different datasets

类型	方法	AUC/%
类型	方法	UCSD Ped2	CUHK Avenue	Shanghai Tech
传统方法	MPPCA^［17］	69.3	—	—
传统方法	MDT^［14］	82.9	—	—
重构方法	ConvAE^［3］	90.0	70.2	—
	ConvLSTM-AE^［18］	88.1	77.0	—
	MNAD-R^［12］	90.2	82.8	69.8
	FE-MemAE^［19］	97.2	88.5	74.3
预测方法	FFP^［4］	95.4	85.1	72.8
	AMMC-Net^［2］	96.6	86.6	73.7
	STCEN^［20］	96.9	86.6	73.8
	STR-VAD^［13］	98.4	86.1	73.2
	CA-VAD^［21］	97.9	88.5	74.1
	MAMC^［10］	96.7	87.6	71.5
	HSTforU^［22］	97.3	87.5	75.3
	ST_MemAE^［23］	97.0	87.7	76.1
混合方法	AMC^［24］	96.2	86.9	—
	IPR^［25］	96.3	85.1	73.0
	DGG^［26］	96.6	85.7	73.1
	AMC-VAD	98.5	88.5	73.8

Fig. 2 Comparison of running speed and AUC of different methods on Shanghai Tech dataset

Tab. 2 Comparison of different models' complexity and inference efficiency on UCSD Ped2 dataset

方法	参数量/10⁶	FPS	GFLOPs	训练耗时/h	GPU显存占用/GB
MPN^［27］	12.71	47.0	165.1	22.7	23.2
AMMC-Net^［2］	25.04	48.6	93.7	19.5	23.6
P3DE^［28］	12.60	52.0	55.2	15.2	18.4
FDC-Net^［29］	218.34	22.2	—	—	—
AMC-VAD	21.20	51.2	88.2	17.1	22.8

Fig. 3 Comparison of future frame prediction error heatmaps on experimental datasets

Fig. 4 Normal score curves on experimental datasets

Tab. 3 Ablation study results on USCD Ped2 dataset

方式	PDA	AMM	Skip-connection	$L d i v$	AUC/%
1					96.6
2	✓				97.1
3		✓			97.2
4	✓	✓			97.3
5	✓		✓		97.5
6		✓		✓	97.7
7	✓	✓	✓		97.6
8		✓	✓	✓	98.1
9	✓	✓		✓	98.3
10	✓	✓	✓	✓	98.5

Tab. 3 Ablation study results on USCD Ped2 dataset

方式	PDA	AMM	Skip-connection	$L d i v$	AUC/%
1					96.6
2	✓				97.1
3		✓			97.2
4	✓	✓			97.3
5	✓		✓		97.5
6		✓		✓	97.7
7	✓	✓	✓		97.6
8		✓	✓	✓	98.1
9	✓	✓		✓	98.3
10	✓	✓	✓	✓	98.5

Fig. 5 Comparison of salient regions between appearance features and fused features on experimental datasets

Tab. 4 Performance comparison of different loss function weight combinations on UCSD Ped2 dataset

组合	$λ p r e$	$λ d i v$	$λ g d$	AUC/%
1	0.5	1.5	0.5	94.1
2	0.5	1	0.5	94.2
3	0.5	0.1	0.5	95.5
4	0.5	0.01	0.5	93.1
5	0.8	0.1	0.5	94.7
6	1	0.1	0.5	95.6
7	2	0.1	0.5	94.6
8	100	0.1	0.5	93.9
9	1	0.1	0.7	96.1
10	1	0.1	0.8	96.2
11	1	0.1	1	98.5
12	1	0.1	50	95.9
13	1	0.1	100	94.2

Tab. 4 Performance comparison of different loss function weight combinations on UCSD Ped2 dataset

组合	$λ p r e$	$λ d i v$	$λ g d$	AUC/%
1	0.5	1.5	0.5	94.1
2	0.5	1	0.5	94.2
3	0.5	0.1	0.5	95.5
4	0.5	0.01	0.5	93.1
5	0.8	0.1	0.5	94.7
6	1	0.1	0.5	95.6
7	2	0.1	0.5	94.6
8	100	0.1	0.5	93.9
9	1	0.1	0.7	96.1
10	1	0.1	0.8	96.2
11	1	0.1	1	98.5
12	1	0.1	50	95.9
13	1	0.1	100	94.2

Fig. 6 AUC performance of different hyperparameters α on USCD Ped2 and CUHK Avenue datasets

Tab. 5 Comparison of PDA's time performance at different resolutions on UCSD Ped2 dataset

分辨率

每帧推理

时间/ms

FPS

分辨率

每帧推理

时间/ms

References 29

[1]	LIU Y， YANG D， WANG Y， et al. Generalized video anomaly event detection： systematic taxonomy and comparison of deep models［J］. ACM Computing Surveys， 2024， 56（7）： No.189.
[2]	CAI R， ZHANG H， LIU W， et al. Appearance-motion memory consistency network for video anomaly detection［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 938-946.
[3]	HASAN M， CHOI J， NEUMANN J， et al. Learning temporal regularity in video sequences［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 733-742.
[4]	LIU W， LUO W， LIAN D， et al. Future frame prediction for anomaly detection — a new baseline［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6536-6545.
[5]	VU H， NGUYEN T D， LE T， et al. Robust anomaly detection in videos using multilevel representations［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2019： 5216-5223.
[6]	BAO Q， LIU F， LIU Y， et al. Hierarchical scene normality-binding modeling for anomaly detection in surveillance videos［C］// Proceedings of the 30th ACM International Conference on Multimedia. New York： ACM， 2022： 6103-6112.
[7]	SINGH R， SETHI A， SAINI K， et al. Attention-guided generator with dual discriminator GAN for real-time video anomaly detection［J］. Engineering Applications of Artificial Intelligence， 2024， 131： No.107830.
[8]	YANG D， LIU Y， HUANG C， et al. Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences［J］. Knowledge-Based Systems， 2023， 265： No.110370.
[9]	LIU Y， LIU J， LIN J， et al. Appearance-motion united auto-encoder framework for video anomaly detection［J］. IEEE Transactions on Circuits and Systems Ⅱ： Express Briefs， 2022， 69（5）： 2498-2502.
[10]	NING Z， WANG Z， LIU Y， et al. Memory-enhanced appearance-motion consistency framework for video anomaly detection［J］. Computer Communications， 2024， 216： 159-167.
[11]	WESTON J， CHOPRA S， BORDES A. Memory networks［EB/OL］. ［2024-11-29］..
[12]	PARK H， NOH J， HAM B. Learning memory-guided normality for anomaly detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 14360-14369.
[13]	WANG Y， LIU T， ZHOU J， et al. Video anomaly detection based on spatio-temporal relationships among objects［J］. Neurocomputing， 2023， 532： 141-151.
[14]	LI W， MAHADEVAN V， VASCONCELOS N. Anomaly detection and localization in crowded scenes［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2014， 36（1）： 18-32.
[15]	LU C， SHI J， JIA J. Abnormal event detection at 150 FPS in MATLAB［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2013： 2720-2727.
[16]	LUO W， LIU W， GAO S. A revisit of sparse coding based anomaly detection in stacked RNN framework［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 341-349.
[17]	KIM J， GRAUMAN K. Observe locally， infer globally： a space-time MRF for detecting abnormal activities with incremental updates［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 2921-2928.
[18]	LUO W， LIU W， GAO S. Remembering history with convolutional LSTM for anomaly detection［C］// Proceedings of the 2017 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2017： 439-444.
[19]	LIU H， HE N， HUANG X， et al. A video anomaly detection framework based on hybrid feature-enhanced memory reconstruction and jigsaw puzzle［J］. Signal， Image and Video Processing， 2025， 19： No.12.
[20]	HAO Y， LI J， WANG N， et al. Spatiotemporal consistency-enhanced network for video anomaly detection［J］. Pattern Recognition， 2022， 121： No.108232.
[21]	YANG Z， RADKE R J. Context-aware video anomaly detection in long-term datasets［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2024： 4002-4011.
[22]	LE V T， JIN H， KIM Y G. HSTforU： anomaly detection in aerial and ground-based videos with hierarchical spatio-temporal transformer for U-net［J］. Applied Intelligence， 2025， 55（4）： No.261.
[23]	LI H， CHEN M. A novel spatio-temporal memory network for video anomaly detection［J］. Multimedia Tools and Applications， 2025， 84（8）： 4603-4624.
[24]	NGUYEN T N， MEUNIER J. Anomaly detection in video sequence with appearance-motion correspondence［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1273-1283.
[25]	TANG Y， ZHAO L， ZHANG S， et al. Integrating prediction and reconstruction for anomaly detection［J］. Pattern Recognition Letters， 2020， 129： 123-130.
[26]	SUN Z， WANG P， ZHENG W， et al. Dual GroupGAN： an unsupervised four-competitor （2V2） approach for video anomaly detection［J］. Pattern Recognition， 2024， 153： No.110500.
[27]	LV H， CHEN C， CUI Z， et al. Learning normal dynamics in videos with meta prototype network［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15420-15429.
[28]	WEN X， LAI H， GAO G， et al. Video anomaly detection based on cross-frame prediction mechanism and spatio-temporal memory-enhanced pseudo-3D encoder［J］. Engineering Applications of Artificial Intelligence， 2023， 126（Pt C）： No.107057.
[29]	SHI R， HE Q， WANG H， et al. FDC-Net： foreground dynamic capture with deep feature enhancement for video anomaly detection［J］. Multimedia Systems， 2025， 31（2）： No.102.

Appearance-motion collaborative modeling for video anomaly detection

基于外观-运动协同建模的视频异常检测

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 29

Related Articles 9

Recommended Articles

Metrics

[1]	Huanxian LIU, Hongtao WANG, Xian’ao WANG, Hongmei WANG, Weifeng XU. Multimodal fact verification with cross-modal semantic association [J]. Journal of Computer Applications, 2026, 46(4): 1069-1076.
[2]	Hao LIANG, Shaojie QIAO. Complex query-based question-answering model integrating bidirectional sequence embeddings [J]. Journal of Computer Applications, 2026, 46(4): 1096-1103.
[3]	Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO. Video anomaly detection for moving foreground regions [J]. Journal of Computer Applications, 2025, 45(4): 1300-1309.
[4]	Linhao LI, Yize WANG, Yingshuang LI, Yongfeng DONG, Zhen WANG. Panoptic scene graph generation method based on relation feature enhancement [J]. Journal of Computer Applications, 2025, 45(2): 584-593.
[5]	Pengcheng SONG, Lijun GUO, Rong ZHANG. Weakly supervised video anomaly detection with local-global temporal dependency [J]. Journal of Computer Applications, 2025, 45(1): 240-246.
[6]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[7]	Qing JIA, Laihua WANG, Weisheng WANG. Anomaly detection in video via independently recurrent neural network and variational autoencoder network [J]. Journal of Computer Applications, 2023, 43(2): 507-513.
[8]	. Parallel OLAP query optimization method based on semantic decomposition [J]. Journal of Computer Applications, 2010, 30(07): 1956-1958.
[9]	. Information query based on semantic association in grid environment [J]. Journal of Computer Applications, 2009, 29(06): 1517-1526.