Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (2): 622-629.DOI: 10.11772/j.issn.1001-9081.2021122228

• Frontier and comprehensive applications • Previous Articles     Next Articles

Virtual screening of drug synthesis reaction based on multimodal data fusion

Xiaofei SUN1,2, Jingyuan ZHU3, Bin CHEN2,4, Hengzhi YOU3()   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    3.School of Science,Harbin Institute of Technology,Shenzhen,Shenzhen Guangdong 518055,China
    4.International Research Institute for Artificial Intelligence,Harbin Institute of Technology,Shenzhen,Shenzhen Guangdong 518055,China
  • Received:2022-01-07 Revised:2022-03-14 Accepted:2022-03-17 Online:2022-05-16 Published:2023-02-10
  • Contact: Hengzhi YOU
  • About author:SUN Xiaofei, born in 1981, Ph. D. candidate. His research interests include virtual screening, pattern recognition.
    ZHU Jingyuan, born in 1997, M. S. candidate. His research interests include synthetic reaction prediction.
    CHEN Bin, born in 1970, Ph. D., research fellow. His research interests include machine vision, deep learning.
  • Supported by:
    Shenzhen Science and Technology Research Fund(JCYJ20190806142203709);Talent Development Starting Fund from Shenzhen Government(HA11409030)


孙晓飞1,2, 朱静远3, 陈斌2,4, 游恒志3()   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院大学 计算机科学与技术学院,北京 100049
    3.哈尔滨工业大学(深圳) 理学院,广东 深圳 518055
    4.哈尔滨工业大学(深圳) 人工智能研究院,广东 深圳 518055
  • 通讯作者: 游恒志
  • 作者简介:孙晓飞(1981—),男,山东栖霞人,博士研究生,主要研究方向:虚拟筛选、模式识别
  • 基金资助:


Drug synthesis reactions, especially asymmetric reactions, are the key components of modern pharmaceutical chemistry. Chemists have invested a lot in manpower and resources to identify various chemical reaction patterns in order to achieve efficient synthesis and asymmetric catalysis. The latest researches of quantum mechanical computing and machine learning algorithms in this field have proved the great potential of accurate virtual screening and learning the existing drug synthesis reaction data by computers. However, the existing methods only use few single-modal data, and can only use the common machine learning methods due to the limitation of not enough data. This hinders their universal application in a wider range of scenarios. Therefore, two screening models of drug synthesis reaction integrating multimodal data were proposed for virtual screening of reaction yield and enantioselectivity. At the same time, a 3D conformation descriptor based on Boltzmann distribution was also proposed to combine the 3D spatial information of molecules with quantum mechanical properties. These two multimodal data fusion models were trained and verified in two representative organic synthesis reactions (C-N cross coupling reaction and N, S-acetal formation). The R2(R-squared) of the former is increased by more than 1 percentage point compared with those of the baseline methods in most data splitting, and the MAE(Mean Absolute Error) of the latter is decreased by more than 0.5 percentage points compared with those of the baseline methods in most data splitting. It can be seen that the models based on multimodal data fusion will bring good performance in different tasks of organic reaction screening.

Key words: drug synthesis reaction, asymmetric reaction, machine learning, multimodal data, 3D molecular descriptor, virtual screening



关键词: 药物合成反应, 不对称反应, 机器学习, 多模态数据, 3D分子描述符, 虚拟筛选

CLC Number: