《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (10): 3134-3140.DOI: 10.11772/j.issn.1001-9081.2023101506
收稿日期:
2023-11-06
修回日期:
2023-12-30
接受日期:
2024-01-03
发布日期:
2024-10-15
出版日期:
2024-10-10
通讯作者:
陈雁翔
作者简介:
郑盛有(1998—),男,安徽宿州人,硕士研究生,主要研究方向:多媒体内容安全基金资助:
Shengyou ZHENG, Yanxiang CHEN(), Zuxing ZHAO, Haiyang LIU
Received:
2023-11-06
Revised:
2023-12-30
Accepted:
2024-01-03
Online:
2024-10-15
Published:
2024-10-10
Contact:
Yanxiang CHEN
About author:
ZHENG Shengyou, born in 1998, M. S. candidate. His research interests include multimedia content safety.Supported by:
摘要:
针对现有视频伪造数据集缺少多模态伪造场景与部分伪造场景的问题,构建一个综合使用多种音、视频伪造方法的、伪造比例可调的多模态部分伪造数据集PartialFAVCeleb。所提数据集基于FakeAVCeleb多模态伪造数据集,并通过拼接真伪数据构建,其中伪造数据由FaceSwap、FSGAN(Face Swapping Generative Adversarial Network)、Wav2Lip(Wave to Lip)和SV2TTS(Speaker Verification to Text-To-Speech)这4种方法生成。在拼接过程中,使用概率方法生成伪造片段在时域与模态上的定位,并对边界进行随机化处理以贴合实际伪造场景,并通过素材筛选避免背景跳变现象。最终生成的数据集对于每个伪造比例可产生3 970条视频数据。在基准检测中,使用多种音视频特征提取器,并分别进行强、弱监督两种条件下的测试,其中弱监督测试基于层次多示例学习(HMIL)方法实现。测试结果显示,各个测试模型在伪造比例较低数据上的性能表现显著低于在伪造比例较高数据上的性能,且弱监督条件下各模型的性能表现显著低于强监督条件下的表现,这验证了该部分伪造数据集的弱监督检测困难性。以上结果表明,以所提数据集为代表的多模态部分伪造场景有充分的研究价值。
中图分类号:
郑盛有, 陈雁翔, 赵祖兴, 刘海洋. 多模态部分伪造数据集的构建与基准检测[J]. 计算机应用, 2024, 44(10): 3134-3140.
Shengyou ZHENG, Yanxiang CHEN, Zuxing ZHAO, Haiyang LIU. Construction and benchmark detection of multimodal partial forgery dataset[J]. Journal of Computer Applications, 2024, 44(10): 3134-3140.
伪造方法 | 原数据集内比例 | 合成素材中比例 |
---|---|---|
FaceSwap | 11.07 | 6.09 |
FSGAN | 38.84 | 50.78 |
Wav2Lip | 50.09 | 43.13 |
表1 原数据集与合成素材的视觉伪造方法占比 (%)
Tab. 1 Proportion of visual forgery methods in original dataset and synthetic materials
伪造方法 | 原数据集内比例 | 合成素材中比例 |
---|---|---|
FaceSwap | 11.07 | 6.09 |
FSGAN | 38.84 | 50.78 |
Wav2Lip | 50.09 | 43.13 |
数据集 | 真实 视频 条数 | 伪造 视频 条数 | 伪造 方法数 | 多模态 伪造 | 部分 伪造 | 伪造 比例 可调 |
---|---|---|---|---|---|---|
DF-TIMIT[ | 320 | 640 | 1 | × | × | × |
Deep Fake Detection[ | 363 | 3 068 | 5 | × | × | × |
FaceForenscics++[ | 1 000 | 4 000 | 4 | × | × | × |
Celeb-DF[ | 590 | 5 639 | 1 | × | × | × |
DFDC[ | 23 564 | 104 500 | 8 | × | × | × |
DFFD[ | 1 000 | 3 000 | 7 | × | × | × |
ForgeryNet[ | 99 630 | 121 617 | 15 | × | × | × |
FakeAVCeleb[ | 570 | 25 000+ | 4 | √ | × | × |
LAV-DF[ | 36 431 | 99 873 | 2 | √ | √ | × |
PartialFAVCeleb | 570 | 3 400∗ | 4 | √ | √ | √ |
表2 本文数据集与现有视频深度伪造数据集的对比
Tab. 2 Comparison of proposed dataset and existing video deepfake datasets
数据集 | 真实 视频 条数 | 伪造 视频 条数 | 伪造 方法数 | 多模态 伪造 | 部分 伪造 | 伪造 比例 可调 |
---|---|---|---|---|---|---|
DF-TIMIT[ | 320 | 640 | 1 | × | × | × |
Deep Fake Detection[ | 363 | 3 068 | 5 | × | × | × |
FaceForenscics++[ | 1 000 | 4 000 | 4 | × | × | × |
Celeb-DF[ | 590 | 5 639 | 1 | × | × | × |
DFDC[ | 23 564 | 104 500 | 8 | × | × | × |
DFFD[ | 1 000 | 3 000 | 7 | × | × | × |
ForgeryNet[ | 99 630 | 121 617 | 15 | × | × | × |
FakeAVCeleb[ | 570 | 25 000+ | 4 | √ | × | × |
LAV-DF[ | 36 431 | 99 873 | 2 | √ | √ | × |
PartialFAVCeleb | 570 | 3 400∗ | 4 | √ | √ | √ |
视觉特征提取 | 听觉特征提取 | 弱监督条件下的EER分数 | 强监督条件下的EER分数 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
视觉模态 | 听觉模态 | 视频 | 视觉模态 | 听觉模态 | 视频 | ||||||
片段 | 语句 | 片段 | 语句 | 片段 | 语句 | 片段 | 语句 | ||||
LCNN | LCNN | 48.83 | 46.13 | 42.28 | 41.16 | 49.20 | 29.34 | 24.03 | 15.48 | 13.80 | 21.41 |
2DResNet | 48.65 | 45.90 | 42.73 | 40.87 | 48.71 | 29.13 | 23.71 | 15.63 | 13.96 | 21.30 | |
3DResNet | 48.24 | 45.57 | 42.57 | 40.72 | 49.25 | 28.21 | 22.55 | 15.80 | 14.03 | 20.91 | |
2DResNet+3DResNet | 47.96 | 45.81 | 42.70 | 40.93 | 47.92 | 27.01 | 22.63 | 15.48 | 13.72 | 20.72 | |
LCNN | 2DResNet | 49.10 | 46.36 | 43.03 | 40.83 | 48.35 | 30.15 | 26.32 | 14.92 | 14.07 | 21.35 |
2DResNet | 49.08 | 45.92 | 43.16 | 41.20 | 49.22 | 29.87 | 26.01 | 15.35 | 14.29 | 21.47 | |
3DResNet | 48.42 | 45.92 | 42.85 | 40.90 | 47.63 | 28.94 | 24.52 | 15.33 | 14.33 | 22.73 | |
2DResNet+3DResNet | 49.30 | 46.25 | 43.22 | 41.04 | 47.39 | 28.40 | 24.53 | 16.02 | 14.52 | 21.82 |
表3 伪造比例为30%的基准测试结果 (%)
Tab. 3 Results of benchmark detection with forgery ratio of 30%
视觉特征提取 | 听觉特征提取 | 弱监督条件下的EER分数 | 强监督条件下的EER分数 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
视觉模态 | 听觉模态 | 视频 | 视觉模态 | 听觉模态 | 视频 | ||||||
片段 | 语句 | 片段 | 语句 | 片段 | 语句 | 片段 | 语句 | ||||
LCNN | LCNN | 48.83 | 46.13 | 42.28 | 41.16 | 49.20 | 29.34 | 24.03 | 15.48 | 13.80 | 21.41 |
2DResNet | 48.65 | 45.90 | 42.73 | 40.87 | 48.71 | 29.13 | 23.71 | 15.63 | 13.96 | 21.30 | |
3DResNet | 48.24 | 45.57 | 42.57 | 40.72 | 49.25 | 28.21 | 22.55 | 15.80 | 14.03 | 20.91 | |
2DResNet+3DResNet | 47.96 | 45.81 | 42.70 | 40.93 | 47.92 | 27.01 | 22.63 | 15.48 | 13.72 | 20.72 | |
LCNN | 2DResNet | 49.10 | 46.36 | 43.03 | 40.83 | 48.35 | 30.15 | 26.32 | 14.92 | 14.07 | 21.35 |
2DResNet | 49.08 | 45.92 | 43.16 | 41.20 | 49.22 | 29.87 | 26.01 | 15.35 | 14.29 | 21.47 | |
3DResNet | 48.42 | 45.92 | 42.85 | 40.90 | 47.63 | 28.94 | 24.52 | 15.33 | 14.33 | 22.73 | |
2DResNet+3DResNet | 49.30 | 46.25 | 43.22 | 41.04 | 47.39 | 28.40 | 24.53 | 16.02 | 14.52 | 21.82 |
视觉特征提取 | 听觉特征提取 | 弱监督条件下的EER分数 | 强监督条件下的EER分数 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
视觉模态 | 听觉模态 | 视频 | 视觉模态 | 听觉模态 | 视频 | ||||||
片段 | 语句 | 片段 | 语句 | 片段 | 语句 | 片段 | 语句 | ||||
LCNN | LCNN | 48.92 | 45.85 | 39.32 | 26.97 | 37.20 | 27.12 | 22.36 | 14.92 | 10.36 | 16.74 |
2DResNet | 48.20 | 44.70 | 37.43 | 26.31 | 34.52 | 24.37 | 19.71 | 15.01 | 11.28 | 16.33 | |
3DResNet | 47.93 | 45.76 | 36.16 | 24.35 | 33.25 | 24.65 | 18.85 | 14.79 | 10.64 | 16.15 | |
2DResNet+3DResNet | 49.06 | 45.51 | 36.52 | 24.55 | 33.83 | 23.36 | 18.25 | 14.68 | 10.55 | 15.89 | |
LCNN | 2DResNet | 48.60 | 45.63 | 36.90 | 25.19 | 34.01 | 28.25 | 20.94 | 13.73 | 10.13 | 15.97 |
2DResNet | 48.63 | 45.37 | 37.24 | 25.60 | 34.17 | 25.17 | 19.31 | 14.06 | 9.90 | 16.04 | |
3DResNet | 48.01 | 47.22 | 37.39 | 23.83 | 33.86 | 23.74 | 19.05 | 14.72 | 10.39 | 16.37 | |
2DResNet+3DResNet | 49.27 | 46.05 | 36.58 | 24.16 | 33.42 | 24.10 | 18.77 | 15.10 | 10.87 | 16.78 |
表4 伪造比例为60%的基准测试结果 (%)
Tab. 4 Results of benchmark detection with forgery ratio of 60%
视觉特征提取 | 听觉特征提取 | 弱监督条件下的EER分数 | 强监督条件下的EER分数 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
视觉模态 | 听觉模态 | 视频 | 视觉模态 | 听觉模态 | 视频 | ||||||
片段 | 语句 | 片段 | 语句 | 片段 | 语句 | 片段 | 语句 | ||||
LCNN | LCNN | 48.92 | 45.85 | 39.32 | 26.97 | 37.20 | 27.12 | 22.36 | 14.92 | 10.36 | 16.74 |
2DResNet | 48.20 | 44.70 | 37.43 | 26.31 | 34.52 | 24.37 | 19.71 | 15.01 | 11.28 | 16.33 | |
3DResNet | 47.93 | 45.76 | 36.16 | 24.35 | 33.25 | 24.65 | 18.85 | 14.79 | 10.64 | 16.15 | |
2DResNet+3DResNet | 49.06 | 45.51 | 36.52 | 24.55 | 33.83 | 23.36 | 18.25 | 14.68 | 10.55 | 15.89 | |
LCNN | 2DResNet | 48.60 | 45.63 | 36.90 | 25.19 | 34.01 | 28.25 | 20.94 | 13.73 | 10.13 | 15.97 |
2DResNet | 48.63 | 45.37 | 37.24 | 25.60 | 34.17 | 25.17 | 19.31 | 14.06 | 9.90 | 16.04 | |
3DResNet | 48.01 | 47.22 | 37.39 | 23.83 | 33.86 | 23.74 | 19.05 | 14.72 | 10.39 | 16.37 | |
2DResNet+3DResNet | 49.27 | 46.05 | 36.58 | 24.16 | 33.42 | 24.10 | 18.77 | 15.10 | 10.87 | 16.78 |
视觉特征提取 | 听觉特征提取 | 弱监督条件下的EER分数 | 强监督条件下的EER分数 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
视觉模态 | 听觉模态 | 视频 | 视觉模态 | 听觉模态 | 视频 | ||||||
片段 | 语句 | 片段 | 语句 | 片段 | 语句 | 片段 | 语句 | ||||
LCNN | LCNN | 47.89 | 45.05 | 35.62 | 16.55 | 15.75 | 24.13 | 16.32 | 13.60 | 7.56 | 7.11 |
2DResNet | 48.75 | 44.70 | 35.31 | 14.67 | 15.92 | 21.34 | 15.33 | 13.89 | 7.68 | 7.30 | |
3DResNet | 48.18 | 44.97 | 34.28 | 14.01 | 15.03 | 22.01 | 15.94 | 14.32 | 8.01 | 8.23 | |
2DResNet+3DResNet | 48.06 | 45.11 | 34.52 | 14.26 | 15.20 | 20.05 | 14.84 | 14.05 | 7.45 | 7.06 | |
LCNN | 2DResNet | 49.36 | 45.61 | 35.29 | 16.06 | 16.39 | 23.30 | 17.01 | 13.50 | 7.50 | 7.93 |
2DResNet | 48.44 | 44.57 | 35.61 | 14.81 | 15.82 | 21.71 | 15.02 | 13.79 | 7.73 | 7.69 | |
3DResNet | 49.20 | 45.32 | 35.03 | 15.12 | 17.03 | 21.65 | 14.96 | 14.13 | 8.36 | 8.28 | |
2DResNet+3DResNet | 48.72 | 43.65 | 34.77 | 14.90 | 16.61 | 20.48 | 15.10 | 13.92 | 8.20 | 7.94 |
表5 伪造比例为90%的基准测试结果 (%)
Tab. 5 Results of benchmark detection with forgery ratio of 90%
视觉特征提取 | 听觉特征提取 | 弱监督条件下的EER分数 | 强监督条件下的EER分数 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
视觉模态 | 听觉模态 | 视频 | 视觉模态 | 听觉模态 | 视频 | ||||||
片段 | 语句 | 片段 | 语句 | 片段 | 语句 | 片段 | 语句 | ||||
LCNN | LCNN | 47.89 | 45.05 | 35.62 | 16.55 | 15.75 | 24.13 | 16.32 | 13.60 | 7.56 | 7.11 |
2DResNet | 48.75 | 44.70 | 35.31 | 14.67 | 15.92 | 21.34 | 15.33 | 13.89 | 7.68 | 7.30 | |
3DResNet | 48.18 | 44.97 | 34.28 | 14.01 | 15.03 | 22.01 | 15.94 | 14.32 | 8.01 | 8.23 | |
2DResNet+3DResNet | 48.06 | 45.11 | 34.52 | 14.26 | 15.20 | 20.05 | 14.84 | 14.05 | 7.45 | 7.06 | |
LCNN | 2DResNet | 49.36 | 45.61 | 35.29 | 16.06 | 16.39 | 23.30 | 17.01 | 13.50 | 7.50 | 7.93 |
2DResNet | 48.44 | 44.57 | 35.61 | 14.81 | 15.82 | 21.71 | 15.02 | 13.79 | 7.73 | 7.69 | |
3DResNet | 49.20 | 45.32 | 35.03 | 15.12 | 17.03 | 21.65 | 14.96 | 14.13 | 8.36 | 8.28 | |
2DResNet+3DResNet | 48.72 | 43.65 | 34.77 | 14.90 | 16.61 | 20.48 | 15.10 | 13.92 | 8.20 | 7.94 |
1 | CHESNEY B, CITRON D. Deep fakes: a looming challenge for privacy, democracy, and national security[J]. California Law Review, 2019, 107: 1753-1819. |
2 | ZHANG L, WANG X, COOPER E, et al. An initial investigation for detecting partially spoofed audio[C]// Proceedings of the INTERSPEECH 2021. [S.l.]: International Speech Communication Association, 2021: 4264-4268. |
3 | CAI Z, STEFANOV K, DHALL A, et al. Do you really mean that? Content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization[C]// Proceedings of the 2022 International Conference on Digital Image Computing: Techniques and Applications. Piscataway: IEEE, 2022: 1-10. |
4 | KHALID H, TARIQ S, KIM M, et al. FakeAVCeleb: a novel audio-video multimodal deepfake dataset[EB/OL]. [2023-10-12]. . |
5 | KORSHUNOV P, MARCEL S. DeepFakes: a new threat to face recognition? Assessment and detection[EB/OL]. (2018-12-20) [2023-10-12]. . |
6 | YANG X, LI Y, LYU S. Exposing deep fakes using inconsistent head poses[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2019: 8261-8265. |
7 | ZHOU P, HAN X, MORARIU V I, et al. Two-stream neural networks for tampered face detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2017: 1831-1839. |
8 | DUFOUR N, GULLY A. Contributing data to deepfake detection research[EB/OL]. (2019-09-24) [2023-11-02].. |
9 | LI Y, YANG X, SUN P, et al. Celeb-DF: a large-scale challenging dataset for DeepFake forensics[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3204-3213. |
10 | RÖSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: learning to detect manipulated facial images[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1-11. |
11 | DOLHANSKY B, BITTON J, PFLAUM B, et al. The DeepFake Detection Challenge (DFDC) dataset[EB/OL]. (2020-10-28) [2023-10-12].. |
12 | DANG H, LIU F, STEHOUWER J, et al. On the detection of digital face manipulation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5780-5789. |
13 | HE Y, GAN B, CHEN S, et al. ForgeryNet: a versatile benchmark for comprehensive forgery analysis[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 4358-4367. |
14 | LE T N, NGUYEN H H, YAMAGISHI J, et al. OpenForensics: large-scale challenging dataset for multi-face forgery detection and segmentation in-the-wild[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10097-10107. |
15 | PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, et al. A lip sync expert is all you need for speech to lip generation in the wild[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 484-492. |
16 | LI Y, CHANG M C, LYU S. In Ictu Oculi: exposing AI created fake videos by detecting eye blinking[C]// Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security. Piscataway: IEEE, 2018: 1-7. |
17 | YANG X, LI Y, QI H, et al. Exposing GAN-synthesized faces using landmark locations[C]// Proceedings of the 2019 ACM Workshop on Information Hiding and Multimedia Security. New York: ACM, 2019: 113-118. |
18 | MATERN F, RIESS C, STAMMINGER M. Exploiting visual artifacts to expose deepfakes and face manipulations[C]// Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops. Piscataway: IEEE, 2019: 83-92. |
19 | CIFTCI U A, DEMIR I, YIN L. FakeCatcher: detection of synthetic portrait videos using biological signals[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020(Early Access): 1-1. |
20 | GUARNERA L, GIUDICE O, BATTIATO S. DeepFake detection by analyzing convolutional traces[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 2841-2850. |
21 | NATARAJ L, MOHAMMED T M, CHANDRASEKARAN S, et al. Detecting GAN generated fake images using co-occurrence matrices[EB/OL]. (2019-10-03) [2023-12-15].. |
22 | LI L, BAO J, ZHANG T, et al. Face X-ray for more general face forgery detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5000-5009. |
23 | 戴昀书,费建伟,夏志华,等. 局部相似度异常的强泛化性伪造人脸检测[J]. 中国图象图形学报, 2023, 28(11): 3453-3470. |
DAI Y S, FEI J W, XIA Z H, et al. Local similarity anomaly for general face forgery detection[J]. Journal of Image and Graphics, 2023, 28(11): 3453-3470. | |
24 | 张博林,朱春陶,殷琪林,等. 基于噪声注意力的伪造人脸检测方法[J]. 网络与信息安全学报, 2023, 9(4): 155-165. |
ZHANG B L, ZHU C T, YIN Q L, et al. Noise-attention-based forgery face detection method[J]. Chinese Journal of Network and Information Security, 2023, 9(4):155-165. | |
25 | DZANIC T, SHAH K, WITHERDEN F D. Fourier spectrum discrepancies in deep network generated images[C]// Proceedings of the 34th Conference on Neural Information Processing System. New York: ACM, 2020: 3022-3032. |
26 | FRANK J, EISENHOFER T, SCHÖNHERR L, et al. Leveraging frequency analysis for deep fake image recognition[C]// Proceedings of the 37th International Conference on Machine Learning. New York: JMLR.org, 2020: 3247-3258. |
27 | 吴文轩,周文柏,张卫明,等. 基于块间光照不一致性的深度伪造检测算法[J]. 网络与信息安全学报, 2023, 9(1): 167-177. |
WU W X, ZHOU W B, ZHANG W M, et al. Deepfake detection method based on patch-wise lighting inconsistency[J]. Chinese Journal of Network and Information Security, 2023, 9(1): 167-177. | |
28 | SHAHZAD S A, HASHMI A, KHAN S, et al. Lip sync matters: a novel multimodal forgery detector[C]// Proceedings of the 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Piscataway: IEEE, 2022: 1885-1892. |
29 | YANG W, ZHOU X, CHEN Z, et al. AVoiD-DF: audio-visual joint learning for detecting deepfake[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 2015-2029. |
30 | ZHENG Y, BAO J, CHEN D, et al. Exploring temporal coherence for more general video face forgery detection[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 15024-15034. |
31 | SHIOHARA K, YAMASAKI T. Detecting deepfakes with self-blended images[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 18699-18708. |
32 | KORSHUNOVA I, SHI W, DAMBRE J, et al. Fast face-swap using convolutional neural networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3697-3705. |
33 | NIRKIN Y, KELLER Y, HASSNER T. FSGAN: subject agnostic face swapping and reenactment[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7183-7192. |
34 | JIA Y, ZHANG Y, WEISS R J, et al. Transfer learning from speaker verification to multispeaker text-to-speech synthesis[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018:4485-4495. |
35 | ZHANG L, WANG X, COOPER E, et al. The PartialSpoof database and countermeasures for the detection of short fake speech segments embedded in an utterance[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 813-825. |
36 | ZI B, CHANG M, CHEN J, et al. WildDeepfake: a challenging real-world dataset for deepfake detection[C]// Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 2382-2390. |
37 | PU J, MANGAOKAR N, KELLY L, et al. Deepfake videos in the wild: analysis and detection[C]// Proceedings of the Web Conference 2021. New York: ACM, 2021: 981-992. |
38 | AMORES J. Multiple instance classification: review, taxonomy and comparative study[J]. Artificial Intelligence, 2013, 201: 81-105. |
39 | WANG Y, LI J, METZE F. A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2019: 31-35. |
40 | TIAN Y, LI D, XU C. Unified multisensory perception: weakly-supervised audio-visual video parsing[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12348. Cham: Springer, 2020: 436-454. |
41 | KATAOKA H, WAKAMIYA T, HARA K, et al. Would mega-scale datasets further enhance spatiotemporal 3D CNNs?[EB/OL]. (2020-04-10) [2023-10-15].. |
42 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
[1] | 白晓红, 温静, 赵雪, 陈金广. 基于加速鲁棒特征和多示例学习的目标跟踪算法[J]. 计算机应用, 2016, 36(11): 2974-2978. |
[2] | 李净 郭洪禹. 图像检索中结合文本信息的多示例原型选择及主动学习策略[J]. 计算机应用, 2012, 32(10): 2899-2903. |
[3] | 罗承成 李书琴 唐晶磊. 基于多示例学习的超市农产品图像识别[J]. 计算机应用, 2012, 32(06): 1560-1562. |
[4] | 温超 耿国华 李展. 基于K均值聚类和多示例学习的图像检索方法[J]. 计算机应用, 2011, 31(06): 1546-1548. |
[5] | 虎晓红 钱旭 王珂. 图学习的区域图像标注方法[J]. 计算机应用, 2009, 29(09): 2393-2394. |
[6] | 蔡自兴;孙国荣;李枚毅. 基于改进遗传算法的多示例神经网络优化[J]. 计算机应用, 2005, 25(10): 2387-2389. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||