Symbolic music generation with pre-training

doi:10.11772/j.issn.1001-9081.2024030264

Abstract

Abstract:

To address the lack of sufficient paired multi-track music score datasets in the field of music representation learning， a music generation pre-training model was proposed. Firstly， a multi-generator model based on Transformers named MMGPNet （Multi-track Music Generation with Pre-training Network） as the baseline model was proposed as the fact that multi-track music generation needs to ensure continuity within the single track and harmony between the tracks at the same time. Secondly， in order to use sufficient single track musical instrument datasets， a music pre-training module was designed on the generation model. Finally， a reconstruction task was designed during the pre-training process to mask the properties of musical notations and rebuild them. Experimental results show that the proposed model accelerates training process of the model and improves the prediction accuracy. Besides， compared with baseline models such as MuseGAN （Multi-track sequential Generative Adversarial Network） and SymphonyNet， various music evaluation metrics of the generated multi-track sequences are closer to the real music. The listening test further proves the validity of the proposed model.

Key words: music generation, multi-track music, sequence model, pre-training model, music sequence representation

摘要：

为解决音乐表征学习领域缺少充足成对多轨乐谱数据集的问题，提出一种音乐生成的预训练模型。首先，基于多轨音乐的生成需要保证单轨内的连续性的同时保证轨道间的和谐性的事实，提出基于Transformers的多生成器的生成模型，即基于预训练的多轨音乐生成网络（MMGPNet）作为基线模型；其次，为利用充足的单轨乐器数据集，在生成模型上设计音乐预训练模块；最后，在预训练过程中设计一个重建任务遮盖音乐符号的属性并对它们进行重建。实验结果表明，所提模型加速了模型训练，并提高了预测准确率，且该模型生成的多轨序列在多种音乐领域的评价指标相较于MuseGAN（Multi-track sequential Generative Adversarial Network）、SymphonyNet等基线模型更接近真实音乐。听力测试结果进一步验证了所提模型的有效性。

关键词: 音乐生成, 多轨音乐, 序列模型, 预训练模型, 音乐序列表示

CLC Number:

TP37

Yuchen HONG, Jinlong LI. Symbolic music generation with pre-training[J]. Journal of Computer Applications, 2025, 45(2): 578-583.

洪予晨, 李金龙. 基于预训练的符号化音乐生成[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 578-583.

Figures/Tables 8

References 19

1	DONG H W， CHEN K， DUBNOV S， et al. Multitrack music transformer［C］// Proceedings of the 2023 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2023： 1-5.
2	ENS J， PASQUIER P. MMM： exploring conditional multi-track music generation with the Transformer［EB/OL］. ［2023-08-03］..
3	LIANG X， WU J， CAO J. MIDI-Sandwich2： RNN-based hierarchical multi-modal fusion generation VAE networks for multi-track symbolic music generation［EB/OL］. ［2023-08-04］..
4	OORE S， SIMON I， DIELEMAN S， et al. This time with feeling： Learning expressive musical performance［J］. Neural Computing and Applications， 2020， 32（4）： 955-967.
5	HADJERES G， PACHET F， NIELSEN F. DeepBach： a steerable model for Bach chorales generation［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1362-1371.
6	ECK D， SCHMIDHUBER J. Finding temporal structure in music： Blues improvisation with LSTM recurrent networks［C］// Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing. Piscataway： IEEE， 2002： 747-756.
7	LI S， SUNG Y. MRBERT： pre-training of melody and rhythm for automatic music generation［J］. Mathematics， 2023， 11（4）： No.798.
8	CHOU Y H， CHEN I C， CHANG C J， et al. BERT-like pre-training for symbolic piano music classification tasks［EB/OL］. ［2024-09-23］..
9	ZENG M， TAN X， WANG R， et al. MusicBERT： symbolic music understanding with large-scale pre-training［C］// Findings of the Association for Computational Linguistics： ACL-IJCNLP 2021. Stroudsburg： ACL， 2021： 791-800.
10	DONG H W， HSIAO W Y， YANG L C， et al. MuseGAN： multi-track sequential generative adversarial networks for symbolic music generation and accompaniment［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2018： 34-41.
11	REN Y， HE J， TAN X， et al. PopMAG： pop music accompaniment generation［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 1198-1206.
12	LIU J， DONG Y， CHENG Z， et al. Symphony generation with permutation invariant language model［EB/OL］. ［2023-09-24］..
13	DONG H W， YANG Y H. Convolutional generative adversarial networks with binary neurons for polyphonic music generation［EB/OL］. ［2023-08-22］..
14	YANG L C， CHOU S Y， YANG Y H. MidiNet： a convolutional generative adversarial network for symbolic-domain music generation［EB/OL］. ［2023-07-30］..
15	ZHU H， LIU Q， YUAN N J， et al. XiaoIce Band： a melody and arrangement generation framework for pop music［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 2837-2846.
16	HSIAO W Y， LIU J Y， YEH Y C， et al. Compound word transformer： learning to compose full-song music over dynamic directed hypergraphs［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 178-186.
17	HUANG Y S， YANG Y H. Pop music Transformer： beat-based modeling and generation of expressive pop piano compositions generating music with rhythm and harmony［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 1180-1188.
18	王嵩超，李金龙. 基于多Transformer网络协同生成的自动作曲［J］. 信息技术与网络安全， 2022， 41（5）：51-58.
	WANG S C， LI J L. Automatic music composition based on multi-Transformer cooperation［J］. Information Technology and Network Security， 2022， 41（5）：51-58.
19	WU S L， YANG Y H. The Jazz Transformer on the front line： exploring the shortcomings of AI-composed music through quantitative measures［EB/OL］. ［2023-09-08］..

算法	CS					UPC					EB/%
算法	贝斯	提琴	吉他	钢琴	全轨	贝斯	提琴	吉他	钢琴	全轨	EB/%
真实乐谱	0.453	0.569	0.630	0.657	0.849	1.18	1.69	1.87	2.00	2.91	23.51
MuseGAN	0.597	0.435	0.469	0.725	0.725	1.21	1.56	1.56	1.43	3.02	6.15
SymphonyNet	0.545	0.567	0.705	0.828	0.828	1.82	2.05	2.05	1.98	3.09	25.93
MMT	0.340	0.354	0.398	0.581	0.581	1.99	1.93	1.93	2.06	2.43	5.47
MMGPNet	0.438	0.564	0.598	0.679	0.868	1.21	1.82	2.13	2.05	2.88	22.16

算法	CS					UPC					EB/%
算法	贝斯	提琴	吉他	钢琴	全轨	贝斯	提琴	吉他	钢琴	全轨	EB/%
真实乐谱	0.453	0.569	0.630	0.657	0.849	1.18	1.69	1.87	2.00	2.91	23.51
MuseGAN	0.597	0.435	0.469	0.725	0.725	1.21	1.56	1.56	1.43	3.02	6.15
SymphonyNet	0.545	0.567	0.705	0.828	0.828	1.82	2.05	2.05	1.98	3.09	25.93
MMT	0.340	0.354	0.398	0.581	0.581	1.99	1.93	1.93	2.06	2.43	5.47
MMGPNet	0.438	0.564	0.598	0.679	0.868	1.21	1.82	2.13	2.05	2.88	22.16

算法	生成任务	PPL	EB/%	GC
无预训练的MMGPNet	伴奏	1.14	20.34	0.81
无预训练的MMGPNet	多轨音乐	1.14	22.16	0.81
有预训练的MMGPNet	伴奏	0.26	21.55	0.84
有预训练的MMGPNet	多轨音乐	0.26	23.41	0.85

算法	生成任务	PPL	EB/%	GC
无预训练的MMGPNet	伴奏	1.14	20.34	0.81
无预训练的MMGPNet	多轨音乐	1.14	22.16	0.81
有预训练的MMGPNet	伴奏	0.26	21.55	0.84
有预训练的MMGPNet	多轨音乐	0.26	23.41	0.85

方法	总体	连贯性	和谐度	结构性
人工作曲	4.22	4.11	4.22	3.89
MuseGAN	3.33	2.78	3.00	2.89
MMGPNet（无预训练）	3.78	3.56	3.44	3.11
MMGPNet（有预训练）	3.89	4.00	3.67	3.00