Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (10): 3070-3074.DOI: 10.11772/j.issn.1001-9081.2020111752

Special Issue: 前沿与综合应用

• Frontier and comprehensive applications • Previous Articles     Next Articles

Prediction of organic reaction based on gated graph convolutional neural network

LAI Zicheng, ZHANG Yuping, MA Yan   

  1. College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China
  • Received:2020-11-10 Revised:2021-02-04 Online:2021-10-10 Published:2021-10-27
  • Supported by:
    This work is partially supported by the Surface Program of National Natural Science Foundation of China (61802258), the Youth Program of National Natural Science Foundation of China (61572326).


赖自成, 张玉萍, 马燕   

  1. 上海师范大学 信息与机电工程学院, 上海 201418
  • 通讯作者: 张玉萍
  • 作者简介:赖自成(1995-),男,江西萍乡人,硕士研究生,主要研究方向:深度强化学习、有机化学反应预测;张玉萍(1963-),女,浙江宁波人,教授,博士,主要研究方向:深度强化学习、人工智能;马燕(1970-),女,上海人,教授,博士,主要研究方向:机器学习、深度强化学习、人工智能。
  • 基金资助:

Abstract: Under the development of modern pharmaceutical and computer technologies, using artificial intelligence technology to accelerate drug development progress has become a research hotspot. And efficient prediction of organic reaction products is a key issue in drug retrosynthesis path planning. Concerning the problem of uneven distribution of chemical reaction types in the sample dataset, an Active Sampling-training Gated Graph Convolutional Neural-network (ASGGCN) model was proposed. Firstly, the SMILES (Simplified Molecular Input Line Entry Specification) codes of the chemical reactants were input into the model, and the location of the reaction center was predicted through Gated Graph Convolutional Neural-network (GGCN) and attention mechanism. Then, according to chemical constraint conditions and the candidate reaction centers, the possible chemical bond combinations were enumerated to generate candidate reaction products. After that, the gated graph convolutional difference network was used to rank the candidate products and obtain the final reaction product. Compared with the traditional graph convolutional network, the gated graph convolutional network has three weight parameter matrices and fuse the information through gating, so it can obtain more abundant atom hidden feature information. At the same time, the gated graph convolutional network is trained by active sampling, which can take into account both the analysis abilities of poor samples and ordinary samples. Experimental results show that the Top-1 prediction accuracy of the reaction product of the proposed model reaches 87.2%, which is increased by 1.6 percentage points compared to the accuracy of WLDN (Weisfeiler-Lehman Difference Network) model, illustrating that the organic reaction products can be predicted more accurately by the proposed model.

Key words: drug retrosynthesis, Gated Graph Convolutional Neural-network (GGCN), active sampling, organic reaction, atom hidden feature

摘要: 随着现代医药技术和计算机技术的发展,采用人工智能技术来加速药物的研发进度成为了研究热点,而对有机化学反应产物的高效预测是药物逆合成路线设计中的关键问题。针对样本数据集中化学反应类型分布不均匀的问题,提出了一种主动采样训练下的门控图卷积神经网络(ASGGCN)模型。首先,输入化学反应物的简化分子线性输入规范(SMILES)编码,通过门控图卷积神经网络(GGCN)以及注意力机制预测反应中心所在位置;然后,根据化学约束条件和候选反应中心枚举出可能的化学键组合来生成候选产物,再通过门控图卷积差分网络对候选产物进行筛选;最终,得到反应产物。门控图卷积神经网络拥有三个权重参数矩阵并通过门控对信息加以融合,与传统的图卷积神经网络相比,它能获取更加丰富的原子隐藏特征信息。通过主动采样的方式进行训练,使得该模型能够兼顾较差样本和普通样本的分析能力。实验结果表明,所提模型对化学反应产物的Top-1预测准确率可达87.2%,对比Weisfeiler-Lehman差分网络(WLDN)模型提高了1.6个百分点,可见模型能够更准确地预测有机化学反应产物。

关键词: 药物逆合成, 门控图卷积神经网络, 主动采样, 有机化学反应, 原子隐藏特征

CLC Number: