Journal of Computer Applications

    Next Articles

Deepfake detection method based on fusion of multimodal physical prior features

  

  • Received:2025-07-23 Revised:2025-09-26 Online:2025-11-05 Published:2025-11-05

多模态物理先验特征融合的深度伪造检验方法

吕仁堃1,孙鹏1,1,郎宇博1,郭弘2,沈喆3,田迪1   

  1. 1. 中国刑事警察学院
    2. 信息网络安全公安部重点实验室
    3. 沈阳航空航天大学
  • 通讯作者: 吕仁堃

Abstract: The existing verification methods mainly model based on pixel-level clues of images, and seldom consider the impact of the synthesis process on the forged images. Even if good verification results are achieved, it is difficult to explain the verification process. Therefore, a multi-modal physical prior feature fusion-based explainable verification method for deep fakes was proposed. Firstly, optical flow features, illumination features, edge features and DCT features were used to describe the inter-frame motion differences in temporal videos, the illumination inconsistency in single-frame videos, and edge artifact information, respectively, to obtain multi-modal physical prior features with explainability. Secondly, a multi-modal hybrid expert network was proposed, which constructed expert sub-networks for different modalities, and after cross-modal attention weighting, they were fused through a gating unit and input into the discriminative network for classification. The SIAM attention mechanism was introduced into the discriminative network, and the fully connected structure was replaced by the KAN (Kolmogorov-Arnold Networks) structure. The multi-modal physical prior features were used to train different expert sub-networks respectively, and the Shapley value analysis of different input features was given, constructing a pre-feature-post-explanation explainable analysis framework to provide pixel-level explanations for model inference and prediction. The experimental results show that, compared with algorithms such as CORE, SRM, and UCF, the proposed algorithm achieves an accuracy range of 97.35% to 98.75% on the FaceForensics++ dataset, with an average accuracy of 98.22%., and the model's interpretability has also been significantly improved.

摘要: 现有检验方法更多依据图像像素级线索进行建模,较少考虑合成过程对伪造图像的影响,即使取得较好的检验效果,也很难解释检验过程。据此,提出一种多模态物理先验特征融合的深度伪造可解释检验方法。首先,使用光流特征、光照特征、边缘特征和DCT特征分别描述时序视频中的帧间运动差异性、单帧视频中的光照不一致性、边缘伪影信息,得到具有可解释性的多模态物理先验特征;其次,提出多模态混合专家网络,分别构建不同模态的专家子网络,跨模态注意加权后经门控单元融合输入判别网络中实现分类;在判别网络中引入SIAM注意力机制,并将全连接结构替换为KAN(Kolmogorov-Arnold Networks)结构。利用多模态物理先验特征分别对不同专家子网络进行训练,给出对不同输入特征的Shapley值分析,构建事前特征?事后解释的可解释分析框架,为模型推理预测提供像素级解释。实验结果表明,与CORE、SRM、UCF等算法相比,所提算法在FaceForensics++数据集上的准确率范围为97.35%~98.75%,平均准确率达98.22%,模型可解释性也有较大提升。

CLC Number: