《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (7): 2072-2077.DOI: 10.11772/j.issn.1001-9081.2021050740

• 人工智能 • 上一篇    

基于频谱空间域特征注意的音乐流派分类算法

刘万军, 王佳铭(), 曲海成, 董利兵, 曹欣宇   

  1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
  • 收稿日期:2021-05-10 修回日期:2021-11-05 接受日期:2021-11-24 发布日期:2021-12-31 出版日期:2022-07-10
  • 通讯作者: 王佳铭
  • 作者简介:刘万军(1959—),男,辽宁锦州人,教授,硕士,CCF高级会员,主要研究方向:数字图像处理、运动目标检测与跟踪
    曲海成(1981—),男,山东烟台人,副教授,博士,CCF会员,主要研究方向:遥感影像快速处理、智能大数据处理
    董利兵(1996—),女,辽宁葫芦岛人,硕士,主要研究方向:深度学习、行人检测
    曹欣宇(2002—),女,辽宁锦州人,主要研究方向:深度学习。
  • 基金资助:
    国家自然科学基金资助项目(41701479);辽宁省教育厅一般项目(LJ2019JL010)

Music genre classification algorithm based on attention spectral-spatial feature

Wanjun LIU, Jiaming WANG(), Haicheng QU, Libing DONG, Xinyu CAO   

  1. School of Software,Liaoning Technical University,Huludao Liaoning 125105,China
  • Received:2021-05-10 Revised:2021-11-05 Accepted:2021-11-24 Online:2021-12-31 Published:2022-07-10
  • Contact: Jiaming WANG
  • About author:LIU Wanjun, born in 1959, M. S., professor. His research interests include digital image processing, moving target detection and tracking.
    QU Haicheng, born in 1981, Ph. D., associate professor. His research interests include rapid remote sensing image processing, intelligent big data processing.
    DONG Libing, born in 1996, M. S. Her research interests include deep learning, pedestrian detection.
    CAO Xinyu, born in 2002. Her research interests include deep learning.
  • Supported by:
    National Natural Science Foundation of China(41701479);General Project of Educational Department of Liaoning Province(LJ2019JL010)

摘要:

为了提升深度卷积神经网络对音乐频谱流派特征的提取效果,提出一种基于频谱空间域特征注意的音乐流派分类算法模型DCNN-SSA。DCNN-SSA模型通过对不同音乐梅尔谱图的流派特征在空间域上进行有效标注,并且改变网络结构,从而在提升特征提取效果的同时确保模型的有效性,进而提升音乐流派分类的准确率。首先,将原始音频信号进行梅尔滤波,以模拟人耳的滤波操作对音乐的音强及节奏变化进行有效过滤,所生成的梅尔谱图进行切割后输入网络;然后,通过深化网络层数、改变卷积结构及增加空间注意力机制对模型在流派特征提取上进行增强;最后,通过在数据集上进行多批次的训练与验证来有效提取并学习音乐流派特征,从而得到可以对音乐流派进行有效分类的模型。在GTZAN数据集上的实验结果表明,基于空间注意的音乐流派分类算法与其他深度学习模型相比,在音乐流派分类准确率和模型收敛效果上有所提高,准确率提升了5.36个百分点~10.44个百分点。

关键词: 音乐流派分类, 深度卷积神经网络, 深度学习, 空间注意力机制, 梅尔频谱

Abstract:

In order to improve the extraction effect of the deep convolutional neural network on music spectrum genre features, a music genre classification algorithm model based on attention spectral-spatial feature, namely DCNN-SSA (Deep Convolutional Neural Network Spectral Spatial Attention), was proposed. In DCNN-SSA model, the genre features of different music Mel spectrograms were effectively annotated in the spatial domain, and the network structure was changed to improve the feature extraction effect while ensuring the effectiveness of the model, thereby improving the accuracy of music genre classification. Firstly, the original audio signals were Mel-filtered to effectively filter the sound intensity and rhythm change of the music by simulating the filtering operation of the human ear, and the generated Mel spectrograms were cut and input into the network. Then, the model was enhanced in genre feature extraction by deepening the number of network layers, changing the convolution structure and adding spatial attention mechanism. Finally, through multiple batches of training and verification on the dataset, the features of music genres were extracted and learned effectively, and a model that can effectively classify music genres was obtained. Experimental results on GTZAN dataset show that compared with other deep learning models, the music genre classification algorithm based on spatial attention increases the music genre classification accuracy by 5.36 percentage points to 10.44 percentage points and improves model convergence effect.

Key words: music genre classification, deep convolutional neural network, deep learning, spatial attention mechanism, Mel spectrogram

中图分类号: