Video anomaly detection currently faces several challenges. First, insufficient integration of appearance and motion information in complex environments results in a lack of semantic associations between the two modalities. Second, excessive reliance on prior information weakens the model's capacity for effective feature representation. Therefore, an Appearance-Motion Collaborative modeling for Video Anomaly Detection (AMC-VAD) method was proposed. It achieved pixel-level appearance-motion feature weight adjustment through a Pixel-level Dynamic Adaptation (PDA) module, used a dual-branch DepthWise Separable Convolution (DWSConv) to extract multi-scale semantic information, and enhanced the semantic relevance of feature fusion through dynamic activation and residual connection. In addition, an Auxiliary Memory Module (AMM) was designed to extract prototype features from a memory pool via a query-driven semantic alignment strategy, and a Dynamic Aggregation Mechanism (DAM) was incorporated to enhance the query feature saliency representations, alleviating the feature weakening caused by prior information coverage. A diversity loss was introduced to reduce redundancy in memory item distribution, thereby enhancing the model's discriminative ability for abnormal patterns. Experimental results showed that the proposed method achieved Area Under the receiver operating Characteristic curve (AUC) of 98.5% and 88.5% on the UCSD Ped2 and CUHK Avenue datasets, respectively, outperforming AMMC-Net (Appearance-Motion Memory Consistency Network) by 1.9 and 1.9 percentage points, respectively. The above validates the effectiveness of the proposed method in complex dynamic scenarios.