结合多尺度时频调制与多线性主成分分析的乐器识别

doi:10.11772/j.issn.1001-9081.2017092175

计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 891-894.DOI: 10.11772/j.issn.1001-9081.2017092175

• 虚拟现实与多媒体计算 • 上一篇下一篇

结合多尺度时频调制与多线性主成分分析的乐器识别

王飞, 于凤芹

江南大学物联网工程学院, 江苏无锡 214100

收稿日期:2017-09-11 修回日期:2017-10-20 出版日期:2018-03-10 发布日期:2018-03-07
通讯作者: 王飞
作者简介:王飞(1991-),男,江苏无锡人,硕士研究生,主要研究方向:语音信号处理、深度学习;于凤芹(1962-),女,辽宁北镇人,教授,博士,主要研究方向:语音信号处理、非平稳信号时频分析。

Musical instrument identification based on multiscale time-frequency modulation and multilinear principal component analysis

WANG Fei, YU Fengqing

School of Internet of Things Engineering, Jiangnan University, Wuxi Jiangsu 214100, China

Received:2017-09-11 Revised:2017-10-20 Online:2018-03-10 Published:2018-03-07

摘要/Abstract

摘要： 针对目前时域频域特征、倒谱特征、稀疏特征、概率特征对同族乐器错分率高且对打击乐器识别不佳的问题，提出一种提取时频信息且低冗余度的模型用于乐器识别。首先利用耳蜗模型对乐音进行谐波分解，生成接近人耳感知且包含时频信息的听觉谱图（AS）；随后利用多尺度滤波器对听觉谱图多尺度时频调制（MTFM）以观测时频的变化；最后利用多线性主成分分析（MPCA）对调制输出在保留数据内在相关的前提下降维，并使用支持向量机（SVM）分类。仿真实验表明，该方法在IOWA数据库上取得92.74%的正确率，对打击乐器与同族乐器的错分率分别为3%与9.12%，均优于上述特征。相比主成分分析（PCA）降维，MPCA提高识别准确率6.43%。因此，该模型适用于对同族乐器与打击乐器的识别。

关键词: 多尺度时频调制, 多线性主成分分析, 听觉谱图, 支持向量机, 乐器识别

Abstract: Aiming at time or frequency feature, cepstrum feature, sparse feature and probability feature's poor classification performance for kindred and percussion instrument, an enhanced model for extracting time-frequency information and with lower redundancy was proposed. Firstly, a cochlea model was set to filter music signal, whose output was called Auditory Spectrum (AS) containing harmonic information and close to human perception. Secondly, time-frequency feature was acquired by Multiscale Time-Frequency Modulation (MTFM). Then, dimension reduction was implied by Multilinear Principal Component Analysis (MPCA) to preserve the structure and intrinsic correlation. Finally, classification was conducted using Support Vector Machine (SVM). The experimental results show that MTFM's average accuracy is 92.74% on IOWA database and error rate of percussion or kindred instrument is 3% and 9.12%, which wins out the features mentioned above. The accuracy of MPCA was higher 6.43% than that of Principle Component Analysis (PCA). It is proved that the proposed model is an option for kindred and percussion instrument identification.

Key words: Multiscale Time-Frequency Modulation (MTFM), Multilinear Principal Component Analysis (MPCA), Auditory Spectrum (AS), Support Vector Machine (SVM), musical instrument identification

中图分类号:

TP391.4

王飞, 于凤芹. 结合多尺度时频调制与多线性主成分分析的乐器识别[J]. 计算机应用, 2018, 38(3): 891-894.

WANG Fei, YU Fengqing. Musical instrument identification based on multiscale time-frequency modulation and multilinear principal component analysis[J]. Journal of Computer Applications, 2018, 38(3): 891-894.

参考文献

[1] STURM B L. The state of the art ten years after a state of the art:future research in music information retrieval[J]. Journal of New Music Research, 2014, 43(2):147-172.
[2] BHALKE D G, RAO C B R, BORMANE D S. Automatic musical instrument classification using fractional Fourier transform based-MFCC features and counter propagation neural network[J]. Journal of Intelligent Information Systems, 2016, 46(3):425-446.
[3] LOUGHRAN R, WALKER J, O'NEILL M, et al. Musical instrument identification using principal component analysis and multi-layered perceptrons[C]//ICALIP 2008:Proceedings of the 2008 International Conference on Audio, Language and Image Processing. Piscataway, NJ:IEEE, 2008:643-648.
[4] BURRED J J, ROBEL A, SIKORA T. Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds[J]. IEEE Transactions on Audio Speech & Language Processing, 2010, 18(3):663-674.
[5] YU L F, SU L, YANG Y H. Sparse cepstral codes and power scale for instrument identification[C]//ICASSP 2014:Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2014:7460-7464.
[6] HAN Y, LEE S, NAM J, et al. Sparse feature learning for instrument identification:effects of sampling and pooling methods[J]. Journal of the Acoustical Society of America, 2016, 139(5):2290-2298.
[7] HU Y, LIU G. Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition[J]. Journal of Intelligent Information Systems, 2013, 40(1):141-158.
[8] WEESE J L. A convolutive model for polyphonic instrument identification and pitch detection using combined classification[J]. Machine Learning, 2013, 15(2):12-17.
[9] ARORA V, BEHERA L. Instrument identification using PLCA over stretched manifolds[C]//NCC 2014:Proceedings of the 201420th National Conference on Communications. Piscataway, NJ:IEEE, 2014:1-5.
[10] PATIL K, PRESSNITZER D, SHAMMA S, et al. Music in our ears:the biological bases of musical timbre perception[J]. PLOS Computational Biology, 2012, 8(11):e1002759.
[11] BINER L, SCHAFER R. Theory and Applications of Digital Speech Processing[M]. Upper Saddle River, NJ:Prentice Hall Press, 2011:124-136.
[12] MEDDIS R, LOPEZPOVEDA E, FAY R R, et al. Computational Models of the Auditory System[M]. Berlin:Springer, 2010:135-149.
[13] ABDI H, WILLIAMS L J. Principal component analysis[J]. Wiley Interdisciplinary Reviews:Computational Statistics, 2010, 2(4):433-459.
[14] LU H, PLATANIOTIS K N, VENETSANOPOULOS A N. MPCA:multilinear principal component analysis of tensor objects[J]. IEEE Transactions on Neural Networks, 2008, 19(1):1-18.
[15] University of IOWA Electronic Music Studio. A musical instrument database[DB/OL].[2017-03-08]. http://theremin.music.uiowa.edu/MISflute.html.
[16] JIANG Z, LIN Z, DAVIS L S. Label consistent K-SVD:learning a discriminative dictionary for recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(11):2651-2664.
[17] 韩纪庆,张磊,郑铁然.语音信号处理[M].北京:清华大学出版社,2004:76-85.(HAN J Q, ZHANG L, ZHENG T R. Voice Signal Processing[M]. Beijing:Tsinghua University Press, 2004:76-85.)

结合多尺度时频调制与多线性主成分分析的乐器识别

Musical instrument identification based on multiscale time-frequency modulation and multilinear principal component analysis

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	贾鹤鸣, 姜子超, 李瑶, 孙康健. 基于改进斑点鬣狗优化算法的同步优化特征选择[J]. 计算机应用, 2021, 41(5): 1290-1298.
[2]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[3]	李凯, 李洁. 基于pinball损失的结构模糊多分类支持向量机算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3104-3112.
[4]	童林, 官铮. 改进鲸鱼优化支持向量机的交通流量模糊粒化预测[J]. 计算机应用, 2021, 41(10): 2919-2927.
[5]	陆荣秀, 陈明明, 杨辉, 朱建勇. 基于溶液图像时序特征的元素组分含量动态监测系统[J]. 计算机应用, 2021, 41(10): 3075-3081.
[6]	张健铭, 施元昊, 徐正蓺, 魏建明. 基于误差预测的自适应UWB/PDR融合定位算法[J]. 计算机应用, 2020, 40(6): 1755-1762.
[7]	黄功, 赵永平, 谢云龙. 基于局部密度的加权一类支持向量机算法及其在涡轴发动机故障检测中的应用[J]. 计算机应用, 2020, 40(3): 917-924.
[8]	王杨, 赵红东. 基于改进粒子群优化的支持向量机与情景感知的人体活动识别[J]. 计算机应用, 2020, 40(3): 665-671.
[9]	赵一, 段兴, 谢仕义, 梁春林. 面向特定目标自识别的交通图像语义检索方法[J]. 计算机应用, 2020, 40(2): 553-560.
[10]	李卉, 杨志霞. 基于Rescaled Hinge损失函数的多子支持向量机[J]. 计算机应用, 2020, 40(11): 3139-3145.
[11]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[12]	白东颖, 易亚星, 王庆超, 余志勇. 面向概念漂移问题的渐进多核学习方法[J]. 计算机应用, 2019, 39(9): 2494-2498.
[13]	何海琳, 郑建彬, 余方利, 余烈, 詹恩奇. 基于改进鲸鱼优化算法的外骨骼机器人步态检测[J]. 计算机应用, 2019, 39(7): 1905-1911.
[14]	潘建国, 李豪. 基于实用拜占庭容错的物联网入侵检测方法[J]. 计算机应用, 2019, 39(6): 1742-1746.
[15]	孔菁, 郭渊博, 刘春辉, 王一丰. 基于智能手机运动传感器的步态特征身份识别方法[J]. 计算机应用, 2019, 39(6): 1747-1752.