Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 578-583.DOI: 10.11772/j.issn.1001-9081.2024030264

• Multimedia computing and computer simulation • Previous Articles    

Symbolic music generation with pre-training

Yuchen HONG(), Jinlong LI   

  1. School of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui 230026,China
  • Received:2024-03-13 Revised:2024-05-06 Accepted:2024-05-09 Online:2024-05-24 Published:2025-02-10
  • Contact: Yuchen HONG
  • About author:LI Jinlong, born in 1976, Ph. D., associate professor. His research interests include deep learning, artificial intelligence, evolutionary algorithm.

基于预训练的符号化音乐生成

洪予晨(), 李金龙   

  1. 中国科学技术大学 计算机科学与技术学院,合肥 230026
  • 通讯作者: 洪予晨
  • 作者简介:李金龙(1976—),男,湖南永州人,副教授,博士,主要研究方向:深度学习、人工智能、进化算法。

Abstract:

To address the lack of sufficient paired multi-track music score datasets in the field of music representation learning, a music generation pre-training model was proposed. Firstly, a multi-generator model based on Transformers named MMGPNet (Multi-track Music Generation with Pre-training Network) as the baseline model was proposed as the fact that multi-track music generation needs to ensure continuity within the single track and harmony between the tracks at the same time. Secondly, in order to use sufficient single track musical instrument datasets, a music pre-training module was designed on the generation model. Finally, a reconstruction task was designed during the pre-training process to mask the properties of musical notations and rebuild them. Experimental results show that the proposed model accelerates training process of the model and improves the prediction accuracy. Besides, compared with baseline models such as MuseGAN (Multi-track sequential Generative Adversarial Network) and SymphonyNet, various music evaluation metrics of the generated multi-track sequences are closer to the real music. The listening test further proves the validity of the proposed model.

Key words: music generation, multi-track music, sequence model, pre-training model, music sequence representation

摘要:

为解决音乐表征学习领域缺少充足成对多轨乐谱数据集的问题,提出一种音乐生成的预训练模型。首先,基于多轨音乐的生成需要保证单轨内的连续性的同时保证轨道间的和谐性的事实,提出基于Transformers的多生成器的生成模型,即基于预训练的多轨音乐生成网络(MMGPNet)作为基线模型;其次,为利用充足的单轨乐器数据集,在生成模型上设计音乐预训练模块;最后,在预训练过程中设计一个重建任务遮盖音乐符号的属性并对它们进行重建。实验结果表明,所提模型加速了模型训练,并提高了预测准确率,且该模型生成的多轨序列在多种音乐领域的评价指标相较于MuseGAN(Multi-track sequential Generative Adversarial Network)、SymphonyNet等基线模型更接近真实音乐。听力测试结果进一步验证了所提模型的有效性。

关键词: 音乐生成, 多轨音乐, 序列模型, 预训练模型, 音乐序列表示

CLC Number: