《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (12): 3585-3589.DOI: 10.11772/j.issn.1001-9081.2021060909

• 第十八届中国机器学习会议(CCML 2021) • 上一篇    

基于Transformer的多轨音乐生成对抗网络

汪涛1, 靳聪2(), 李小兵3, 帖云1,3, 齐林1   

  1. 1.郑州大学 信息工程学院,郑州 450001
    2.中国传媒大学 信息与通信工程学院,北京 100024
    3.中央音乐学院,北京 100031
  • 收稿日期:2021-05-12 修回日期:2021-07-15 接受日期:2021-07-23 发布日期:2021-12-28 出版日期:2021-12-10
  • 通讯作者: 靳聪
  • 作者简介:汪涛(1995—),男,河南许昌人,硕士研究生,主要研究方向:人工智能音乐作曲、强化学习、深度学习
    李小兵(1967—),男,江西南昌人,教授,博士,CCF会员,主要研究方向:电子音乐、音乐音响制作与编曲
    帖云(1973—),男,河南郑州人,教授,博士,CCF会员,主要研究方向:多媒体系统设计、数字图像处理、模式识别
    齐林(1961—),男,河南郑州人,教授,博士,CCF会员,主要研究方向:数字图像处理、模式识别、人工智能。
  • 基金资助:
    国家重点研发计划项目(2018YFB1403900);中央高校基本科研业务费专项资金资助项目(CUC210B011)

Multi-track music generative adversarial network based on Transformer

Tao WANG1, Cong JIN2(), Xiaobing LI3, Yun TIE1,3, Lin QI1   

  1. 1.School of Information Engineering,Zhengzhou University,Zhengzhou Henan 450001,China
    2.School of Information and Communication Engineering,Communication University of China,Beijing 100024,China
    3.Central Conservatory of Music,Beijing 100031,China
  • Received:2021-05-12 Revised:2021-07-15 Accepted:2021-07-23 Online:2021-12-28 Published:2021-12-10
  • Contact: Cong JIN
  • About author:WANG Tao, born in 1995, M. S. candidate. His research interests include Artificial Intelligence (AI) music composition, reinforcement learning, deep learning.
    LI Xiaobing, born in 1967, Ph. D., professor. His research interests include electronic music, music sound production and arrangement.
    TIE Yun, born in 1973, Ph. D., professor. His research interests include multimedia system design, digital image processing, pattern recognition.
    QI Lin, born in 1961, Ph. D., professor. His research interests include digital image processing, pattern recognition, artificial intelligence.
  • Supported by:
    National Key Research and Development Program of China(2018YFB1403900);the Fundamental Research Funds for the Central Universities(CUC210B011)

摘要:

符号音乐的生成在人工智能领域中仍然是一个尚未解决的问题,面临着诸多挑战。经研究发现,现有的多音轨音乐生成方法在旋律、节奏及和谐度上均达不到市场所要求的效果,并且生成的音乐大多不符合基础的乐理知识。为了解决以上问题,提出一种新颖的基于Transformer的多音轨音乐生成对抗网络(Transformer-GAN),以乐理规则为指导来产生具有高音乐性的音乐作品。首先,采用Transformer的译码部分与在Transformer基础之上改编的Cross-Track Transformer(CT-Transformer)分别对单音轨内部及多音轨之间的信息进行学习;然后,使用乐理规则和交叉熵损失相结合的方法引导生成网络的训练,并在训练鉴别网络的同时优化精心设计的目标损失函数;最后,生成具有旋律性、节奏性及和谐性的多音轨音乐作品。实验结果表明,与其他多乐器音乐生成模型相比,在钢琴轨、吉他轨及贝斯轨上,Transformer-GAN的预测精确度(PA)最低分别提升了12%、11%及22%,序列相似度(SS)最低分别提升了13%、6%及10%,休止符指标最低分别提升了8%、4%及17%。由此可见,Transformer-GAN在加入了CT-Transformer及音乐规则奖励模块之后能有效提升音乐的PA、SS等指标,使生成的音乐质量整体上有较大的提升。

关键词: 音乐生成, Transformer, 音乐规则, 目标损失函数, 对抗网络

Abstract:

Symbolic music generation is still an unsolved problem in the field of artificial intelligence and faces many challenges. It has been found that the existing methods for generating polyphonic music fail to meet the marke requirements in terms of melody, rhythm and harmony, and most of the generated music does not conform to basic music theory knowledge. In order to solve the above problems, a new Transformer-based multi-track music Generative Adversarial Network (Transformer-GAN) was proposed to generate music with high musicality under the guidance of music rules. Firstly, the decoding part of Transformer and the Cross-Track Transformer (CT-Transformer) adapted on the basis of Transformer were used to learn the information within a single track and between multiple tracks respectively. Then, a combination of music rules and cross-entropy loss was employed to guide the training of the generative network, and the well-designed objective loss function was optimized while training the discriminative network. Finally, multi-track music works with melody, rhythm and harmony were generated. Experimental results show that compared with other multi-instrument music generation models, for piano, guitar and bass tracks, Transformer-GAN improves Prediction Accuracy (PA) by a minimum of 12%, 11% and 22%, improves Sequence Similarity (SS) by a minimum of 13%, 6% and 10%, and improves the rest index by a minimum of 8%, 4% and 17%. It can be seen that Transformer -GAN can effectively improve the indicators including PA and SS of music after adding CT-Transformer and music rule reward module, which leads to a relatively high overall improvement of the generated music.

Key words: music generation, Transformer, music rule, objective loss function, adversarial network

中图分类号: