[1] PERROT P, AVERSANO G, CHOLLET G. Voice disguise and automatic detection:review and perspectives[M]//STYLIANOU Y, FAUNDEZ-ZANUY M, ESPOSITO A. Progress in Nonlinear Speech Processing, LNCS 4391. Berlin:Springer, 2007:101-117. [2] ZHANG C, TAN T. Voice disguise and automatic speaker recognition[J]. Forensic Science International, 2008, 175(2/3):118-122. [3] MUCKENHIRN H, KORSHUNOV P, MAGIMAI-DOSS M, et al. Long-term spectral statistics for voice presentation attack detection[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(11):2098-2111. [4] WANG L, NAKAGAWA S, ZHANG Z, et al. Spoofing speech detection using modified relative phase information[J]. IEEE Journal of Selected Topics in Signal Processing, 2017, 11(4):660-670. [5] WU H, WANG Y, HUANG J. Identification of electronic disguised voices[J]. IEEE Transactions on Information Forensics and Security, 2014, 9(3):489-500. [6] 李燕萍,林乐,陶定元.基于GMM统计特性的电子伪装语音鉴定研究[J].计算机技术与发展,2017,27(1):103-106.(LI Y P, LIN L, TAO D Y. Research on identification of electronic disguised voice based on GMM statistical parameters[J]. Computer Technology and Development, 2017, 27(1):103-106.) [7] LIANG H, LIN X, ZHANG Q, et al. Recognition of spoofed voice using convolutional neural networks[C]//Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing. Piscataway:IEEE, 2017:293-297. [8] WANG L, LIANG H, LIN X, et al. Revealing the processing history of pitch-shifted voice using CNNs[C]//Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security. Piscataway:IEEE, 2018:1-7. [9] WONG P H W, AU O C. Fast SOLA-based time scale modification using envelope matching[C]//Proceedings of the 2002 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2002:III-3188-III-3191. [10] 杜守富,毛启容,詹永照.自适应同步叠加语音时长规整算法[J].通信学报,2005,26(2):136-140.(DU S F, MAO Q R, ZHAN Y Z. Adaptive synchronous overlap and add algorithm for time scale modification of speech[J]. Journal on Communications, 2005, 26(2):136-140.) [11] VALBRET H, MOULINES E, TUBACH J. Voice transformation using PSOLA technique[C]//Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway:IEEE, 1992:145-148. [12] VERHELST W, ROELANDS M. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech[C]//Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway:IEEE, 1993:554-557. [13] MOULINES E, CHARPENTIER F. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones[J]. Speech Communication, 1990, 9(5/6):453-467. [14] LAROCHE J, DOLSON M. Improved phase vocoder time-scale modification of audio[J]. IEEE Transactions on Speech and Audio Processing, 1999, 7(3):323-332. [15] Sourceforge. Audacity:a free multi-track audio editor and recorder[EB/OL].[2019-02-20]. http://audacity.sourceforge.net. [16] Adobe. Adobe audition[EB/OL].[2019-02-20].http://www.adobe.com/products/audition.html. [17] BOERSMA P, WEENINK D. Praat:doing phonetics by computer[EB/OL].[2019-02-20]. http://www.fon.hum.uva.nl/praat. [18] ZHU X, BEAUREGARD G T, WYSE L L. Real-time signal estimation from modified short-time Fourier transform magnitude spectra[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(5):1645-1653. [19] CHAKROBORTY S, ROY A, MAJUMDAR S, et al. Capturing complementary information via reversed filter bank and parallel implementation with MFCC for improved text-independent speaker identification[C]//Proceedings of the 2007 International Conference on Computing:Theory and Applications. Piscataway:IEEE, 2007:463-467. [20] SOHN J, KIM N S, SUNG W. A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999, 6(1):1-3. [21] GAROFOLO J S, LAMEL L F, FISHER W M. TIMIT acoustic-phonetic continuous speech corpus[EB/OL].[2019-02-20]. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1. [22] NIST Multimodal Information Group. NIST speaker recognition evaluation database[EB/OL].[2019-02-20]. http://catalog.ldc.upenn.edu/LDC2010S03. [23] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9:2579-2605. |