《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (2): 622-629.DOI: 10.11772/j.issn.1001-9081.2021122228

• 前沿与综合应用 • 上一篇    下一篇

融合多模态数据的药物合成反应的虚拟筛选

孙晓飞1,2, 朱静远3, 陈斌2,4, 游恒志3()   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院大学 计算机科学与技术学院,北京 100049
    3.哈尔滨工业大学(深圳) 理学院,广东 深圳 518055
    4.哈尔滨工业大学(深圳) 人工智能研究院,广东 深圳 518055
  • 收稿日期:2022-01-07 修回日期:2022-03-14 接受日期:2022-03-17 发布日期:2022-05-16 出版日期:2023-02-10
  • 通讯作者: 游恒志
  • 作者简介:孙晓飞(1981—),男,山东栖霞人,博士研究生,主要研究方向:虚拟筛选、模式识别
    朱静远(1997—),男,江苏淮安人,硕士研究生,主要研究方向:合成反应预测
    陈斌(1970—),男,四川广汉人,研究员,博士,主要研究方向:机器视觉、深度学习;
  • 基金资助:
    深圳市科研基金资助项目(JCYJ20190806142203709);深圳市政府人才发展启动基金资助项目(HA11409030)

Virtual screening of drug synthesis reaction based on multimodal data fusion

Xiaofei SUN1,2, Jingyuan ZHU3, Bin CHEN2,4, Hengzhi YOU3()   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    3.School of Science,Harbin Institute of Technology,Shenzhen,Shenzhen Guangdong 518055,China
    4.International Research Institute for Artificial Intelligence,Harbin Institute of Technology,Shenzhen,Shenzhen Guangdong 518055,China
  • Received:2022-01-07 Revised:2022-03-14 Accepted:2022-03-17 Online:2022-05-16 Published:2023-02-10
  • Contact: Hengzhi YOU
  • About author:SUN Xiaofei, born in 1981, Ph. D. candidate. His research interests include virtual screening, pattern recognition.
    ZHU Jingyuan, born in 1997, M. S. candidate. His research interests include synthetic reaction prediction.
    CHEN Bin, born in 1970, Ph. D., research fellow. His research interests include machine vision, deep learning.
  • Supported by:
    Shenzhen Science and Technology Research Fund(JCYJ20190806142203709);Talent Development Starting Fund from Shenzhen Government(HA11409030)

摘要:

药物合成反应,特别是不对称反应是现代药物化学的重要组成部分。化学家们投入了巨大的人力和资源来识别各种化学反应模式,以实现高效合成和不对称催化。量子力学计算和机器学习算法在这一领域的最新研究证明了通过计算机学习现有药物合成反应数据并进行精确虚拟筛选的巨大潜力。然而,现有方法局限于单一模态的数据来源,并且由于数据少的限制,只能使用基本的机器学习方法,使它们在更广泛场景中的普遍应用受到阻碍。因此,提出两种融合多模态数据的药物合成反应的筛选模型来进行反应产率和对映选择性的虚拟筛选,并给出了一种基于Boltzmann分布进行加权的3D构象描述符,从而将分子的立体空间信息与量子力学性质结合起来。这两种多模态数据融合模型在两个代表性的有机合成反应(C-N偶联反应和N,S-缩醛反应)中进行了训练和验证,结果表明前者的R2相对于基线方法在大多数据划分上的提升超过了1个百分点,后者的平均绝对误差(MAE)相对于基线方法在大多数据划分上的下降超过了0.5个百分点。可见,在有机反应筛选的不同任务中采用基于多模态数据融合的模型都会带来好的性能。

关键词: 药物合成反应, 不对称反应, 机器学习, 多模态数据, 3D分子描述符, 虚拟筛选

Abstract:

Drug synthesis reactions, especially asymmetric reactions, are the key components of modern pharmaceutical chemistry. Chemists have invested a lot in manpower and resources to identify various chemical reaction patterns in order to achieve efficient synthesis and asymmetric catalysis. The latest researches of quantum mechanical computing and machine learning algorithms in this field have proved the great potential of accurate virtual screening and learning the existing drug synthesis reaction data by computers. However, the existing methods only use few single-modal data, and can only use the common machine learning methods due to the limitation of not enough data. This hinders their universal application in a wider range of scenarios. Therefore, two screening models of drug synthesis reaction integrating multimodal data were proposed for virtual screening of reaction yield and enantioselectivity. At the same time, a 3D conformation descriptor based on Boltzmann distribution was also proposed to combine the 3D spatial information of molecules with quantum mechanical properties. These two multimodal data fusion models were trained and verified in two representative organic synthesis reactions (C-N cross coupling reaction and N, S-acetal formation). The R2(R-squared) of the former is increased by more than 1 percentage point compared with those of the baseline methods in most data splitting, and the MAE(Mean Absolute Error) of the latter is decreased by more than 0.5 percentage points compared with those of the baseline methods in most data splitting. It can be seen that the models based on multimodal data fusion will bring good performance in different tasks of organic reaction screening.

Key words: drug synthesis reaction, asymmetric reaction, machine learning, multimodal data, 3D molecular descriptor, virtual screening

中图分类号: