基于频谱空间域特征注意的音乐流派分类算法

doi:10.11772/j.issn.1001-9081.2021050740

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (7): 2072-2077.DOI: 10.11772/j.issn.1001-9081.2021050740

• 人工智能 • 上一篇

基于频谱空间域特征注意的音乐流派分类算法

刘万军, 王佳铭(), 曲海成, 董利兵, 曹欣宇

辽宁工程技术大学软件学院，辽宁葫芦岛 125105

收稿日期:2021-05-10 修回日期:2021-11-05 接受日期:2021-11-24 发布日期:2021-12-31 出版日期:2022-07-10
通讯作者: 王佳铭
作者简介:刘万军（1959—），男，辽宁锦州人，教授，硕士，CCF高级会员，主要研究方向：数字图像处理、运动目标检测与跟踪
曲海成（1981—），男，山东烟台人，副教授，博士，CCF会员，主要研究方向：遥感影像快速处理、智能大数据处理
董利兵（1996—），女，辽宁葫芦岛人，硕士，主要研究方向：深度学习、行人检测
曹欣宇（2002—），女，辽宁锦州人，主要研究方向：深度学习。
基金资助:
国家自然科学基金资助项目(41701479);辽宁省教育厅一般项目(LJ2019JL010)

Music genre classification algorithm based on attention spectral-spatial feature

Wanjun LIU, Jiaming WANG(), Haicheng QU, Libing DONG, Xinyu CAO

School of Software，Liaoning Technical University，Huludao Liaoning 125105，China

Received:2021-05-10 Revised:2021-11-05 Accepted:2021-11-24 Online:2021-12-31 Published:2022-07-10
Contact: Jiaming WANG
About author:LIU Wanjun， born in 1959， M. S.， professor. His research interests include digital image processing， moving target detection and tracking.
QU Haicheng， born in 1981， Ph. D.， associate professor. His research interests include rapid remote sensing image processing， intelligent big data processing.
DONG Libing， born in 1996， M. S. Her research interests include deep learning， pedestrian detection.
CAO Xinyu， born in 2002. Her research interests include deep learning.
Supported by:
National Natural Science Foundation of China(41701479);General Project of Educational Department of Liaoning Province(LJ2019JL010)

摘要/Abstract

摘要：

为了提升深度卷积神经网络对音乐频谱流派特征的提取效果，提出一种基于频谱空间域特征注意的音乐流派分类算法模型DCNN-SSA。DCNN-SSA模型通过对不同音乐梅尔谱图的流派特征在空间域上进行有效标注，并且改变网络结构，从而在提升特征提取效果的同时确保模型的有效性，进而提升音乐流派分类的准确率。首先，将原始音频信号进行梅尔滤波，以模拟人耳的滤波操作对音乐的音强及节奏变化进行有效过滤，所生成的梅尔谱图进行切割后输入网络；然后，通过深化网络层数、改变卷积结构及增加空间注意力机制对模型在流派特征提取上进行增强；最后，通过在数据集上进行多批次的训练与验证来有效提取并学习音乐流派特征，从而得到可以对音乐流派进行有效分类的模型。在GTZAN数据集上的实验结果表明，基于空间注意的音乐流派分类算法与其他深度学习模型相比，在音乐流派分类准确率和模型收敛效果上有所提高，准确率提升了5.36个百分点～10.44个百分点。

关键词: 音乐流派分类, 深度卷积神经网络, 深度学习, 空间注意力机制, 梅尔频谱

Abstract:

In order to improve the extraction effect of the deep convolutional neural network on music spectrum genre features， a music genre classification algorithm model based on attention spectral-spatial feature， namely DCNN-SSA （Deep Convolutional Neural Network Spectral Spatial Attention）， was proposed. In DCNN-SSA model， the genre features of different music Mel spectrograms were effectively annotated in the spatial domain， and the network structure was changed to improve the feature extraction effect while ensuring the effectiveness of the model， thereby improving the accuracy of music genre classification. Firstly， the original audio signals were Mel-filtered to effectively filter the sound intensity and rhythm change of the music by simulating the filtering operation of the human ear， and the generated Mel spectrograms were cut and input into the network. Then， the model was enhanced in genre feature extraction by deepening the number of network layers， changing the convolution structure and adding spatial attention mechanism. Finally， through multiple batches of training and verification on the dataset， the features of music genres were extracted and learned effectively， and a model that can effectively classify music genres was obtained. Experimental results on GTZAN dataset show that compared with other deep learning models， the music genre classification algorithm based on spatial attention increases the music genre classification accuracy by 5.36 percentage points to 10.44 percentage points and improves model convergence effect.

Key words: music genre classification, deep convolutional neural network, deep learning, spatial attention mechanism, Mel spectrogram

中图分类号:

TP181

刘万军, 王佳铭, 曲海成, 董利兵, 曹欣宇. 基于频谱空间域特征注意的音乐流派分类算法[J]. 计算机应用, 2022, 42(7): 2072-2077.

Wanjun LIU, Jiaming WANG, Haicheng QU, Libing DONG, Xinyu CAO. Music genre classification algorithm based on attention spectral-spatial feature[J]. Journal of Computer Applications, 2022, 42(7): 2072-2077.

图/表 12

图1 梅尔频谱计算过程

Fig.1 Mel spectrogram calculation process

图2 维度还原前后的梅尔谱图

Fig.2 Mel spectrogram before and after dimension restoration

图3 DCNN-SSA网络模型结构

Fig.3 Structure of DCNN-SSA network model

图4 空间注意模块结构

Fig.4 Structure of spatial attention module

图5 残差模块结构

Fig.5 Structure of residual module

图6 迭代37 000次的实验结果

Fig.6 Experimental results of 37 000 iterations

图7 特征预处理消融实验结果

Fig.7 Ablation experimental results of feature preprocessing

表1 特征预处理消融实验的流派分类准确率 (%)

Tab.1 Genre classification accuracy of ablation experiment of feature preprocessing

预处理方式	流派分类准确率
传统傅里叶变换	85.35
梅尔频谱	87.27

图8 模型主要模块的消融实验结果

Fig. 8 Ablation experimental results of main modules of model

表2 模型主要模块消融实验流派分类准确率

Tab.2 Genre classification accuracies in ablation experiment for main modules of model

实验编号	四重卷积	空间注意力	残差模块	准确率/%
a	—	—	—	87.27
b	—	√	—	88.38
c	√	√	—	89.01
d	—	√	√	90.10
e	√	√	√	91.62

表3 不同网络在验证集上的流派分类准确率对比 (%)

Tab.3 Genre classification accuracy comparison of different networks on verification set

网络	流派分类准确率
GoogLeNet	81.18
ResNet-34B	84.67
VGGNet19	86.11
AlexNet	86.26
DCNN-SSA	91.62

表4 不同网络在测试集上的流派分类准确率对比 (%)

Tab.4 Genre classification accuracy comparison of different networks on test set

网络	流派分类准确率
GoogLeNet	70.00
ResNet-34B	72.00
VGGNet19	76.00
AlexNet	76.00
DCNN-SSA	82.00

参考文献 17

1	伊恩•本特，戴明瑜. 音乐分析学导论［J］. 中国音乐， 1995（4）： 50-51.
	BENT I B， DAI M Y. Introduction to music analysis［J］. Chinese Music， 1995（4）： 50-51.
2	SAMSON J. Genre［J/OL］. Grove music online.［2021-02-20］. . 10.1093/gmo/9781561592630.article.40599
3	TZANETAKIS G， COOK P. Musical genre classification of audio signals［J］. IEEE Transactions on Speech and Audio Processing， 2002， 10（5）：293-302. 10.1109/tsa.2002.800560
4	WOLD E， BLUM T， KEISLAR D， et al. Content-based classification， search， and retrieval of audio［J］. IEEE Multimedia， 1996， 3（3）： 27-36. 10.1109/93.556537
5	COVER T， HART P. Nearest neighbor pattern classification［J］. IEEE Transactions on Information Theory， 1967， 13（1）： 21-27. 10.1109/tit.1967.1053964
6	DUDA R O， HART P E， STORK D G. Pattern Classification［M］. 2nd ed. New York： John Wiley & Sons， Inc.， 2000： 5-6.
7	徐星. 基于最小一范数的稀疏表示音乐流派与乐器分类算法研究［D］. 天津：天津大学， 2012： 154-171.
	XU X. Research on the musical genre and instruments classification based on sparse representation-based classification via L¹-minimization［D］. Tianjin： Tianjin University， 2012： 154-171.
8	焦李成，杨淑媛，刘芳，等. 神经网络七十年：回顾与展望［J］. 计算机学报， 2016， 39（8）： 1697-1716.
	JIAO L C， YANG S Y， LIU F， et al. Seventy years beyond neural networks： retrospect and prospect［J］. Chinese Journal of Computers， 2016， 39（8）： 1697-1716.
9	曹玉红，徐海，刘荪傲，等. 基于深度学习的医学影像分割研究综述［J］. 计算机应用， 2021， 41（8）：2273-2287.
	CAO Y H， XU H， LIU S A， et al. Review of deep learning-based medical image segmentation［J］. Journal of Computer Applications， 2021， 41（8）：2273-2287.
10	孔伶旭，吴海锋，曾玉，等. 使用深度学习和不同频率维度的脑功能性连接对轻微认知障碍的诊断［J］. 计算机应用， 2021， 41（2）：590-597.
	KONG L X， WU H F， ZENG Y， et al. Diagnosis of mild cognitive impairment using deep learning and brain functional connectivities with different frequency dimensions［J］. Journal of Computer Applications， 2021， 41（2）：590-597.
11	史文旭，鲍佳慧，姚宇. 基于深度学习的遥感图像目标检测与识别［J］. 计算机应用， 2020， 40（12）：3558-3562. 10.1109/csrswtc50769.2020.9372469
	SHI W X， BAO J H， YAO Y. Remote sensing image target detection and identification based on deep learning［J］. Journal of Computer Applications， 2020， 40（12）：3558-3562. 10.1109/csrswtc50769.2020.9372469
12	彭育辉，郑玮鸿，张剑锋. 基于深度学习的道路障碍物检测方法［J］. 计算机应用， 2020， 40（8）：2428-2433. 10.1109/icaica50127.2020.9181920
	PENG Y H， ZHENG W H， ZHANG J F. Deep learning-based on-road obstacle detection method［J］. Journal of Computer Applications， 2020， 40（8）：2428-2433. 10.1109/icaica50127.2020.9181920
13	LI T L H， CHAN A B， CHUN A H W. Automatic musical pattern feature extraction using convolutional neural network［C］// Proceedings of the 2010 International MultiConference of Engineering and Computer Scientists. ［S.l.］： International Association of Engineers， 2010：546-550.
14	DIELEMAN S， SCHRAUWEN B. End-to-end learning for music audio［C］// Proceedings of the 2014 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2014：6964-6968. 10.1109/icassp.2014.6854950
15	YANG H S， ZHANG W Q. Music genre classification using duplicated convolutional layers in neural networks［C］// Interspeech 2019： Proceedings of the 20th Annual Conference of the International Speech Communication Association. ［S.l.］： International Speech Communication Association， 2019： 3382-3386.
16	杜佑宸. 基于卷积神经网络的音乐流派分类研究［D］. 大连：大连理工大学， 2019： 26-27.
	DU Y C. Research of music genre classification based on convolutional neural network［D］. Dalian： Dalian University of Technology， 2019：26-27.
17	MANNEPALLI K， SASTRY P N， SUMAN M. MFCC-GMM based accent recognition system for Telugu speech signals［J］. International Journal of Speech Technology， 2016， 19（1）： 87-93. 10.1007/s10772-015-9328-y

[1]	韩亚茹, 闫连山, 姚涛. 基于元学习的深度哈希检索算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2015-2021.
[2]	王震宇, 张雷, 高文彬, 权威铭. 基于渐进式神经网络架构搜索的人体运动识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2058-2064.
[3]	江静, 陈渝, 孙界平, 琚生根. 融合后验概率校准训练的文本分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1789-1795.
[4]	文敏, 王荣存, 姜淑娟. 基于关系图卷积网络的源代码漏洞检测[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1814-1821.
[5]	韩玉民, 郝晓燕. 基于子词嵌入和相对注意力的材料实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1862-1868.
[6]	于蒙, 何文涛, 周绪川, 崔梦天, 吴克奇, 周文杰. 推荐系统综述[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1898-1913.
[7]	李佳, 郑元林, 廖开阳, 楼豪杰, 李世宇, 陈泽豪. 基于显著性深层特征的无参考图像质量评价算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1957-1964.
[8]	杨治佩, 丁胜, 张莉, 张新宇. 无锚点的遥感图像任意角度密集目标检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1965-1971.
[9]	边小勇, 费雄君, 陈春芳, 阚东东, 丁胜. 联合一二阶池化网络学习的遥感场景分类[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1972-1978.
[10]	张杨, 郝江波. 基于注意力机制和残差网络的恶意代码检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1708-1715.
[11]	苏珊, 张杨, 张冬雯. 基于深度学习的耦合度相关代码坏味检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1702-1707.
[12]	屈震, 李堃婷, 冯志玺. 基于有效通道注意力的遥感图像场景分类[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1431-1439.
[13]	邱永茹, 姚光乐, 冯杰, 崔昊宇. 基于半监督学习的单幅图像去雨算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1577-1582.
[14]	鲁永帅, 唐英杰, 马鑫然. 基于深度特征融合的无纺布低对比度浆丝缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1440-1446.
[15]	谢新林, 肖毅, 续欣莹. 基于神经网络架构搜索的肺结节分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1424-1430.

基于频谱空间域特征注意的音乐流派分类算法

Music genre classification algorithm based on attention spectral-spatial feature

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 17

相关文章 15

编辑推荐

Metrics