1.Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo Zhejiang 315211,China 2.Southeast Digital Economic Development Institute,Quzhou Zhejiang 324000,China
About author:DONG Mingyu,born in 1997,M. S. candidate. His research interests include machine learning,multimedia forensics,adversarial example.
Supported by:
National Natural Science Foundation of China(U1736215);Zhejiang Provincial Natural Science Foundation(LY20F020010);Ningbo Natural Science Foundation(202003N4089)
Mingyu DONG, Diqun YAN. Detection algorithm of audio scene sound replacement falsification based on ResNet[J]. Journal of Computer Applications, 2022, 42(6): 1724-1728.
WESTERLUND M. The emergence of deepfake technology: a review[J]. Technology Innovation Management Review, 2019, 9(11): 39-52. 10.22215/timreview/1282
2
WU H J, WANG Y, HUANG J W. Identification of electronic disguised voices[J]. IEEE Transactions on Information Forensics and Security, 2014, 9(3): 489-500. 10.1109/tifs.2014.2301912
3
LIN X D, LIU J X, KANG X G. Audio recapture detection with convolutional neural networks[J]. IEEE Transactions on Multimedia, 2016, 18(8): 1480-1487. 10.1109/tmm.2016.2571999
4
AL-ALI A K H, DEAN D, SENADJI B, et al. Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions[J]. IEEE Access, 2017, 5: 15400-15413. 10.1109/access.2017.2728801
5
LIDY T, SCHINDLER A. CQT-based convolutional neural networks for audio scene classification[C/OL]// Proceedings of the 2016 Workshop on Detection and Classification of Acoustic Scenes and Events. [2021-04-21]..
6
WU Z F, SHEN C H, VAN DEN HENGEL A. Wider or deeper: revisiting the ResNet model for visual recognition[J]. Pattern Recognition, 2019, 90: 119-133. 10.1016/j.patcog.2019.01.006
7
HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[C]// Proceedings of the 2016 European Conference on Computer Vision, LNIP 9908. Cham: Springer, 2016: 630-645.
8
REN Y Z, LIU D K, XIONG Q C, et al. Spec-ResNet: a general audio steganalysis scheme based on deep residual network of spectrogram[EB/OL]. (2019-02-26) [2021-04-21].. 10.1109/tdsc.2022.3141121
9
LIU M L, WANG W C, LI Y X. The system for acoustic scene classification using ResNet[R/OL]. [2021-04-21]..
10
GAROFOLO J S, LAMEL L F, FISHER W M, et al. DARPA TIMIT: acoustic-phonetic continous speech corpus CD-ROM: NIST speech disc 1-1.1: NISTIR 4930[R]. Gaithersburg, MD: National Institute of Standards and Technology, 1993.
11
VEAUX C, YAMAGISHI J, KING S. The voice bank corpus: Design, collection and data analysis of a large regional accent speech database[C]// Proceedings of the 2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation. Piscataway: IEEE, 2013: 1-4. 10.1109/icsda.2013.6709856
12
THIEMANN J, ITO N, VINCENT E. The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): a database of multichannel environmental noise recordings[J]. Proceedings of Meetings on Acoustics, 2013, 19(1): No.035081. 10.1121/1.4799597
13
TODISCO M, DELGADO H, EVANS N. Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification[J]. Computer Speech and Language, 2017, 45: 516-535. 10.1016/j.csl.2017.01.001
14
ALZANTOT M, WANG Z Q, SRIVASTAVA M B. Deep residual neural networks for audio spoofing detection[C]// Proceedings of the Interspeech 2019. [S.l.]: International Speech Communication Association, 2019: 1078-1082.
YANG L, ZHAO H D. Environment sound recognition based on lightweight deep neural network[J]. Journal of Computer Applications, 2020, 40(11): 3172-3177. 10.11772/j.issn.1001-9081.2020030433
16
MATEEN M, WEN J H, NASRULLAH, et al. Fundus image classification using VGG-19 architecture with PCA and SVD[J]. Symmetry, 2019, 11(1): No.1. 10.3390/sym11010001