Improved pitch contour creation and selection algorithm for melody extraction

doi:10.11772/j.issn.1001-9081.2018020311

Abstract

Abstract: Aiming at the problem that the discontinuity of the pitch sequence of the same sound source was caused by the interference of different sound sources in polyphonic music which reduced the accuracy of pitch estimation, an improved pitch contour creation and selection algorithm for melody extraction was proposed. Firstly, a method based on auditory streaming cues and the continuity of pitch salience was proposed to create pitch contour by calculating the pitch salience of each point in the time-frequency spectrum. In order to further select the melody pitch contour, the non-melodic pitch contours were removed according to the repetitive characteristics of the accompaniment, and dynamic time warping algorithm was used to calculate the similarity between the melodic and non-melodic pitch contours. Finally, the octave errors in the melodic pitch contours was detected based on the long term relationship of the adjacent pitch contours. Simulation experiments on the data set ORCHSET show that the pitch estimation accuracy and the overall accuracy of the proposed algorithm are improved by 2.86% and 3.32% respectively compared with the oringinal algorithm, which can effectively solve the pitch estimation problem.

Key words: melody extraction, pitch contour, continuity of pitch salience, Dynamic Time Warping (DTW), octave error

摘要： 针对复调音乐中不同声源的相互干扰而导致同一声源音高序列的不连续，从而降低音高估计精度的问题，提出改进音高轮廓创建和选择的旋律提取算法。算法首先计算时频谱中每一点的音高显著性，并提出基于听觉流线索和音高显著性的连续性创建音高轮廓；为了进一步选择旋律音高轮廓，随后提出根据伴奏的重复特性去除非旋律音高轮廓，主要采用动态时间规整算法计算旋律和非旋律音高轮廓间的相似度；最后，提出利用相邻音高轮廓的长时关系检测旋律音高轮廓中的倍频错误，并平滑旋律音高轮廓形成旋律音高线。在数据集ORCHSET上进行仿真实验，结果表明所提出的改进算法比改进前提高了2.86%的音高估计精度和3.32%的总精度，可有效解决音高估计问题。

关键词: 旋律提取, 音高轮廓, 音高显著性的连续性, 动态时间规整, 倍频错误

CLC Number:

TP391

LI Qiang, YU Fengqin. Improved pitch contour creation and selection algorithm for melody extraction[J]. Journal of Computer Applications, 2018, 38(8): 2411-2415.

李强, 于凤芹. 改进音高轮廓创建和选择的旋律提取算法[J]. 计算机应用, 2018, 38(8): 2411-2415.

References

[1] POLINER G E, ELLIS D P W, EHMANN A F, et al. Melody transcription from music audio:approaches and evaluation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4):1247-1256.
[2] GOTO M. A real-time music-scene-description system:predominant-F0 estimation for detecting melody and bass lines in real-world audio signals[J]. Speech Communication, 2004, 43(4):311-329.
[3] TSAI W-H, TU Y-M, MA C-H. An FFT-based fast melody comparison method for query-by-singing/humming systems[J]. Pattern Recognition Letters, 2012, 33(16):2285-2291.
[4] DARA V, MOGALLA S. Pattern based melody matching approach to music information retrieval[J]. Transactions on Machine Learning and Artificial Intelligence, 2016, 4(6):78-87.
[5] LAGRANGE M, OZEROV A, VINCENT E. Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning[C]//ISMIR 2012:Proceedings of the International Society for Music Information Retrieval. Berlin:Springer, 2012:595-600.
[6] GÓMEZ E, CAЙADAS F, SALAMON J, et al. Predominant fundamental frequency estimation vs singing voice separation for the automatic transcription of accompanied flamenco singing[C]//ISMIR 2012:Proceedings of the International Society for Music Information Retrieval. Berlin:Springer, 2012:601-606.
[7] YANG L, MAEZAWA A, SMITH J B L, et al. Probabilistic transcription of sung melody using a pitch dynamic model[C]//ICASSP 2017:Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, DC:IEEE Computer Society, 2017:301-305.
[8] SALAMON J, ROCHA B, GóMEZ E. Musical genre classification using melody features extracted from polyphonic music signals[C]//ICASSP 2012:Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, DC:IEEE Computer Society, 2012:81-84.
[9] SALAMON J, GÓMEZ E, ELLIS D P W, et al. Melody extraction from polyphonic music signals:approaches, applications, and challenges[J]. IEEE Signal Processing Magazine, 2014, 31(2):118-134.
[10] DURRIEU J-L, RICHARD G, DAVID B, et al. Source/filter model for unsupervised main melody extraction from polyphonic audio signals[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(3):564-575.
[11] SALAMON J, GÓMEZ E. Melody extraction from polyphonic music signals using pitch contour characteristics[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(6):1759-1770.
[12] BREGMEN A S. Auditory scene analysis[J]. Science, 1991, 251(5000):1508-1509.
[13] BITTER R, SALAMON J, ESSID S, et al. Melody extraction by contour classification[C]//ISMIR 2015:Proceedings of the 2015 International Society for Music Information Retrieval. Berlin:Springer, 2015:500-506.
[14] BOSCH J, BITTNER R, SALAMON J, et al. A comparison of melody extraction methods based on source-filter modelling[C]//ISMIR 2016:Proceedings of the International Society for Music Information Retrieval. Berlin:Springer, 2016:571-577.
[15] FLANAGAN J L, GOLDEN R M. Phase vocoder[J]. The Bell System Technical Journal, 1966, 45(9):1493-1509.
[16] BOSCH J, MARXER R, GÓMEZ E. Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music[J]. Journal of New Music Research, 2016, 45(2):101-117.