基于双麦克风的室内语音分离与声源定位系统

doi:10.11772/j.issn.1001-9081.2018040874

计算机应用 ›› 2018, Vol. 38 ›› Issue (12): 3643-3648.DOI: 10.11772/j.issn.1001-9081.2018040874

• 应用前沿、交叉与综合 • 上一篇

基于双麦克风的室内语音分离与声源定位系统

陈斌杰, 陆志华, 周宇, 叶庆卫

宁波大学信息科学与工程学院, 浙江宁波 315211

收稿日期:2018-04-27 修回日期:2018-07-03 出版日期:2018-12-10 发布日期:2018-12-15
通讯作者: 周宇
作者简介:陈斌杰(1993-),男,湖北荆州人,硕士研究生,主要研究方向:语音信号处理、室内声源定位;陆志华(1983-),男,浙江金华人,讲师,博士,主要研究方向:语音信号处理、多运动目标的实时跟踪;周宇(1960-),男,山东威海人,教授,硕士,主要研究方向:信号处理、网络与信息安全;叶庆卫(1970-),男,浙江衢州人,教授,博士,主要研究方向:信号处理、最优化搜索。
基金资助:
国家自然科学基金资助项目（51675286，61071198）；浙江省创新团队项目（2013TD21）。

Indoor speech separation and sound source localization system based on dual-microphone

CHEN Binjie, LU Zhihua, ZHOU Yu, YE Qingwei

Faculty of Information Science and Engineering, Ningbo University, Ningbo Zhejiang 315211, China

Received:2018-04-27 Revised:2018-07-03 Online:2018-12-10 Published:2018-12-15
Contact: 周宇
Supported by:
This work is partially supported by the National Natural Science Foundation of China (51675286, 61071198), the Science and Technology Innovation Team of Zhejiang Province (2013TD21).

摘要/Abstract

摘要： 为了探究利用两个麦克风进行多声源分离和二维平面定位的可能性，提出了一种基于双麦克风的室内语音分离与声源定位系统。该系统根据麦克风采集的信号，建立了双麦克风时延-衰减模型，然后利用DUET算法估计了模型的时延-衰减参数，并绘制了参数直方图。在语音分离阶段，建立了二进制时频掩膜（BTFM），根据参数直方图，结合二值掩蔽的方法对混合语音进行了分离；在声源定位阶段，通过推导模型衰减参数与信号能量比之间的关系，得到了确定声源位置的数学方程组。利用Roomsimove工具箱模拟室内声学环境，通过Matlab仿真和几何坐标计算，在对多个声源目标分离的同时完成了二维平面中的定位。实验结果表明，该系统对多个声源信号的定位误差均在2%以下，有助于小型系统的研究和开发。

关键词: 双麦克风, 语音分离, 声源定位, DUET算法, 二维平面

Abstract: In order to explore the possibility of using two microphones for separation and locating of multiple sound sources in a two-dimensional plane, an indoor voice separation and sound source localization system based on dual-microphone was proposed. According to the signal collected by microphones, a dual-microphone time delay-attenuation model was established. Then, Degenerte Unmixing Estimation Technique (DUET) algorithm was used to estimate the delay-attenuation parameters of model, and the parameter histogram was drawn. In the speech separation stage, Binary Time-Frequency Masking (BTFM) was established. According to the parameter histogram, binary masking method was combined to separate the mixed speech. In the sound source localization stage, the mathematical equations for determining the location of sound source were obtained by deducing the relationship between the model attenuation parameters and the signal energy ratio. Roomsimove toolbox was used to simulate the indoor acoustic environment. Through Matlab simulation and geometric coordinate calculation, the locating in the two-dimensional plane was completed while separating multiple targets of sound source. The experimental results show that, the locating errors of the proposed system for multiple signals of sound source are less than 2%. Therefore, it contributes to the research and development of small system.

Key words: dual-microphone, speech separation, sound source localization, Degenerte Unmixing Estimation Technique (DUET) algorithm, two-dimensional plane

中图分类号:

陈斌杰, 陆志华, 周宇, 叶庆卫. 基于双麦克风的室内语音分离与声源定位系统[J]. 计算机应用, 2018, 38(12): 3643-3648.

CHEN Binjie, LU Zhihua, ZHOU Yu, YE Qingwei. Indoor speech separation and sound source localization system based on dual-microphone[J]. Journal of Computer Applications, 2018, 38(12): 3643-3648.

参考文献

[1] CUI X X, YU K G, LU S S. Approximate closed-form TDOA-based estimator for acoustic direction finding via constrained optimization[J]. IEEE Sensors Journal, 2018, 18(8):3360-3371.
[2] HUANG Q H, ZHANG L, FANG Y. Two-stage decoupled DOA estimation based on real spherical harmonics for spherical arrays[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(11):2045-2058.
[3] LI Y W, CHEN H W. Reverberation robust feature extraction for sound source localization using a small-sized microphone array[J]. IEEE Sensors Journal, 2017, 17(19):6331-6339.
[4] ALEXANDRIDIS A, MOUCHTARIS A. Multiple sound source location estimation in wireless acoustic sensor networks using DOA estimates:the data-association problem[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(2):342-356.
[5] ZHANG Q L, CHEN Z, YIN F L. Speaker tracking based on distributed particle filter in distributed microphone networks[J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2017, 47(9):2433-2443.
[6] ARGENTIERI S, DANÈS P, SOUÈRES P. A survey on sound source localization in robotics:from binaural to array processing methods[J]. Computer Speech & Language, 2015, 34(1):87-112.
[7] YILMAZ O, RICKARD S. Blind separation of speech mixtures via time-frequency masking[J]. IEEE Transactions on Signal Processing, 2004, 52(7):1830-1847.
[8] SAWADA H, ARAKI S, MAKINO S. A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures[C]//Proceedings of the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Piscataway, NJ:IEEE, 2007:139-142.
[9] KIM C, KUMAR K, RAJ B, et al. Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain[C]//Proceedings of the 10th Annual Conference of the International Speech Communication Association. Baixas:International Speech Communication Association, 2009:2495-2498.
[10] KIM C, KHAWAND C, STERN R M. Two-microphone source separation algorithm based on statistical modeling of angle distributions[C]//Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2012:4629-4632.
[11] ZERMINI A, LIU Q J, XU Y, et al. Binaural and log-power spectra features with deep neural networks for speech-noise separation[C]//Proceedings of the IEEE 19th International Workshop on Multimedia Signal Processing. Piscataway, NJ:IEEE, 2017:1-6.
[12] IZUMI Y, ONO N, SAGAVAMA S. Sparseness-based 2CH BSS using the EM algorithm in reverberant environment[C]//Proceedings of the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Piscataway, NJ:IEEE, 2007:147-150
[13] COBOS M, LOPEZ J J, MARTINEZ D. Two-microphone multi-speaker localization based on a Laplacian mixture model[J]. Digital Signal Processing, 2011, 21(1):66-76.
[14] ESCOLANO J, XIANG N, PEREZ-LORENZO J M, et al. A Bayesian direction-of-arrival model for an undetermined number of sources using a two-microphone array[J]. Journal of the Acoustical Society of America, 2014, 135(2):742-753.
[15] WANG L, HON T K, REISS J D, et al. An iterative approach to source counting and localization using two distant microphones[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(6):1079-1093.
[16] RICKARD S. The DUET blind source separation algorithm[M]//MAKINO S, SAWADA H, LEE T W. Blind Speech Separation. Dordrecht:Springer, 2007:217-241.
[17] VINCENT E, CAMPBELL D. Roomsimove:MATLAB toolbox for the computation of simulated room impulse responses for moving sources[EB/OL].[2017-12-24]. http://www.irisa.fr/metiss/members/evincent/software.html.
[18] VINCENT E, ARAKI S, BOFILL P. The 2008 signal separation evaluation campaign:a community-based approach to large-scale evaluation[C]//Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation. Berlin:Springer, 2009:734-741.
[19] 于刚,周以齐.基于能量衰减比的双通道源数目估计方法[J].振动、测试与诊断,2016,36(2):309-314.(YU G, ZHOU Y Q. A method estimating the number of sources based on ratio of energy attenuation between two sensors[J]. Journal of Vibration, Measurement & Diagnosis, 2016, 36(2):309-314.)
[20] 崔玮玮,曹志刚,魏建强.基于双麦克风的2维平面定位算法[J].信号处理,2008,24(2):299-302.(CUI W W, CAO Z G, WEI J Q. Dual-microphone source location method in 2-D space[J]. Signal Processing, 2008, 24(2):299-302.)
[21] YANG Y, YING R, JIANG S, et al. Off-grid sound source localization based on compressive sensing[C]//Proceedings of the 12th IEEE International Conference on Signal Processing. Piscataway, NJ:IEEE, 2014:341-345.

基于双麦克风的室内语音分离与声源定位系统

Indoor speech separation and sound source localization system based on dual-microphone

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics

[1]	陈修凯, 陆志华, 周宇. 基于卷积编解码器和门控循环单元的语音分离算法[J]. 计算机应用, 2020, 40(7): 2137-2141.
[2]	吴米龙邱维宝刘宝强池利阳牟培田李小龙郑海荣. 高速的血管内超声数据传输及成像[J]. 计算机应用, 2014, 34(10): 3020-3023.
[3]	张毅邢武超罗元何春江. 基于耳蜗核模型改进双耳时间差的声源定位[J]. 计算机应用, 2013, 33(11): 3280-3283.
[4]	陶巍刘建平张一闻. 基于麦克风阵列的声源定位系统[J]. 计算机应用, 2012, 32(05): 1457-1459.
[5]	陈涛张明路付灵丽. 基于机器人听觉—视觉系统的声源目标定位[J]. 计算机应用, 2009, 29(09): 2471-2472.
[6]	钟静; 傅彦. 基于快速ICA的混合语音信号分离[J]. 计算机应用, 2006, 26(5): 1120-1121.