[1] CHERRY E C. Some experiments on the recognition of speech, with one and with two ears[J]. The Journal of the Acoustical Society of America,1953,25(5):975-979. [2] CHEERY E C. On Human Communication[M]. Cambridge:MIT Press,1957:15-18. [3] HUANG P S,KIM M,HASEGAWA-JOHNSON M,et al. Joint op-timization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Transactions on Audio,Speech, and Language Processing,2015,23(12):2136-2147. [4] ZHANG X,WANG D. A deep ensemble learning method for mon-aural speech separation[J]. IEEE/ACM Transactions on Audio, Speech,and Language Processing,2016,24(5):967-977. [5] LUO Y,CHEN Z,MESGARANI N. Speaker-independent speech separation with deep attractor network[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26(4):787-796. [6] ZHAN G,HUANG Z,YING D,et al. Improvement of mask-based speech source separation using DNN[C]//Proceedings of the 10th International Symposium on Chinese Spoken Language Processing. Piscataway:IEEE,2016:1-5. [7] LI X,WU X,CHEN J. A spectral-change-aware loss function for DNN-based speech separation[C]//Proceedings of the 2019 IEEE International Conference on Acoustics,Speech and Signal Process-ing. Piscataway:IEEE,2019:6870-6874. [8] SUN Y,XIAN Y,WANG W,et al. Monaural source separation in complex domain with long short-term memory neural network[J]. IEEE Journal of Selected Topics in Signal Processing,2019,13(2):359-369. [9] PALIWAL K,WÓJCICKI K,SHANNON B. The importance of phase in speech enhancement[J]. Speech Communication,2011, 53(4):465-494. [10] PASCUAL S,BONAFONTE A,SERRÀ J. SEGAN:speech en-hancement generative adversarial network[C]//Proceedings of the 2017 IEEE International Conference on Acoustics,Speech and Sig-nal Processing. Piscataway:IEEE,2017:3642-3646. [11] TAN K,WANG D. A convolutional recurrent neural network for real-time speech enhancement[C]//Proceedings of the 2018 IEEE International Conference on Acoustics,Speech and Signal Process-ing. Piscataway:IEEE,2018:3229-3233. [12] CHO K,MERRIËNBOER B V,GULCEHRE C,et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2014:1724-1734. [13] 范存航, 刘斌, 陶建华, 等. 一种基于卷积神经网络的端到端语音分离方法[J]. 信号处理,2019,35(4):542-548.(FAN C H, LIU B,TAO J H,et al. An end-to-end speech separation method based on convolutional neural network[J]. Journal of Signal Pro-cessing,2019,35(4):542-548.) [14] LUO Y,MESGARANI N. TasNet:time-domain audio separation network for real-time single-channel speech separation[C]//Pro-ceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE,2018:696-700. [15] 李娟娟, 王丹, 李子晋. 基于深层声学特征的端到端语音分离[J]. 计算机系统应用,2019,28(10):1-7. (LI J J,WANG D, LI Z J. End-to-end speech separation based on deep acoustic fea-ture[J]. Computer Systems and Applications,2019,28(10):1-7. [16] GAROFOLO J S,LAMEL L F,FISHER W M,et al. DARPA TI-MIT acoustic-phonetic continuous speech corpus CD-ROM:NIST speech disc 1-1.1[R]. Gaithersburg,MD:National Institute of Standards and Technology,1993. [17] RIX A W,BEERENDS J G,HOLLIER M P,et al. Perceptual Evaluation of Speech Quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]//Proceed-ings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway:IEEE, 2001:749-752. [18] TAAL C H,HENDRIKS R C,HEUSDENS R,et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech[C]//Proceedings of the 2010 IEEE International Confer-ence on Acoustics,Speech,and Signal Processing. Piscataway:IEEE,2010:4214-4217. [19] VINCENT E,GRIBONVAL R,FEVOTTE C. Performance mea-surement in blind audio source separation[J]. IEEE Transactions on Audio,Speech and Language Processing,2006,14(4):1462-1469. [20] KOLBÆK M,YU D,TAN Z,et al. Multitalker speech separation with utterance-level permutation invariant training of deep recur-rent neural networks[J]. IEEE/ACM Transactions on Audio, Speech,and Language Processing,2017,25(10):1901-1913. [21] HERSHEY J R,CHEN Z,LE ROUX J,et al. Deep clustering:discriminative embeddings for segmentation and separation[C]//Proceedings of the 2016 IEEE International Conference on Acous-tics,Speech,and Signal Processing. Piscataway:IEEE,2016:31-35. [22] CHEN Z,LUO Y,MESGARANI N. Deep attractor network for single-microphone speaker separation[C]//Proceedings of the 2017 IEEE International Conference on Acoustics,Speech,and Signal Processing. Piscataway:IEEE,2017:246-250. |