Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (2): 622-629.DOI: 10.11772/j.issn.1001-9081.2021122228
• Frontier and comprehensive applications • Previous Articles Next Articles
Xiaofei SUN1,2, Jingyuan ZHU3, Bin CHEN2,4, Hengzhi YOU3()
Received:
2022-01-07
Revised:
2022-03-14
Accepted:
2022-03-17
Online:
2022-05-16
Published:
2023-02-10
Contact:
Hengzhi YOU
About author:
SUN Xiaofei, born in 1981, Ph. D. candidate. His research interests include virtual screening, pattern recognition.Supported by:
通讯作者:
游恒志
作者简介:
孙晓飞(1981—),男,山东栖霞人,博士研究生,主要研究方向:虚拟筛选、模式识别基金资助:
CLC Number:
Xiaofei SUN, Jingyuan ZHU, Bin CHEN, Hengzhi YOU. Virtual screening of drug synthesis reaction based on multimodal data fusion[J]. Journal of Computer Applications, 2023, 43(2): 622-629.
孙晓飞, 朱静远, 陈斌, 游恒志. 融合多模态数据的药物合成反应的虚拟筛选[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 622-629.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021122228
划分方法 | 添加剂(测试) | 测试集反应数量 | 添加剂(训练) | 训练集反应数量 |
---|---|---|---|---|
划分1 | 2、3、16、20、22 | 879 | 1、4、5、6、8-15、17-19、21、23 | 2 980 |
划分2 | 6、8、12、14、21 | 866 | 1-5、9-11、13、15-20、22、23 | 2 993 |
划分3 | 1、5、9、11、17 | 824 | 2-4、6、8、10、12-16、18-23 | 3 035 |
划分4 | 4、10、15、18、23 | 832 | 1-3、5-6、8、9、11-14、16、17、19-22 | 3 027 |
Tab. 1 Splitting of Buchwald-Hartwig reaction dataset
划分方法 | 添加剂(测试) | 测试集反应数量 | 添加剂(训练) | 训练集反应数量 |
---|---|---|---|---|
划分1 | 2、3、16、20、22 | 879 | 1、4、5、6、8-15、17-19、21、23 | 2 980 |
划分2 | 6、8、12、14、21 | 866 | 1-5、9-11、13、15-20、22、23 | 2 993 |
划分3 | 1、5、9、11、17 | 824 | 2-4、6、8、10、12-16、18-23 | 3 035 |
划分4 | 4、10、15、18、23 | 832 | 1-3、5-6、8、9、11-14、16、17、19-22 | 3 027 |
划分子集 | 催化剂 (1-24) | 催化剂 (52-70) | 底物 (27-30、32-35、37-40、42-45) | 底物 (31、36、41、46、47-51) | 训练集 反应数量 |
---|---|---|---|---|---|
训练集 | √ | × | √ | × | 384 |
测试集“test sub” | √ | × | × | √ | 216 |
测试集“test cat” | × | √ | √ | × | 304 |
测试集“test sub-cat” | × | √ | × | √ | 171 |
Tab. 2 Splitting of N, S-acetal formation dataset
划分子集 | 催化剂 (1-24) | 催化剂 (52-70) | 底物 (27-30、32-35、37-40、42-45) | 底物 (31、36、41、46、47-51) | 训练集 反应数量 |
---|---|---|---|---|---|
训练集 | √ | × | √ | × | 384 |
测试集“test sub” | √ | × | × | √ | 216 |
测试集“test cat” | × | √ | √ | × | 304 |
测试集“test sub-cat” | × | √ | × | √ | 171 |
Fig. 7 Performance comparison of six methods using descriptors based on quantum mechanical features and molecular fingerprints in N, S-acetal formation
方法 | random 70/30 | additive test 1 | additive test 2 | additive test 3 | additive test 4 |
---|---|---|---|---|---|
Ahneman等[ | 0.92 | 0.80 | 0.77 | 0.64 | 0.54 |
MFF[ | 0.93 | 0.85 | 0.71 | 0.64 | 0.18 |
Schwaller等[ | 0.95 | 0.84 | 0.84 | 0.75 | 0.49 |
CNN特征融合模型 | 0.95 | 0.71 | 0.85 | 0.61 | 0.55 |
Tab. 3 Prediction results (R2) on C-N coupling reaction
方法 | random 70/30 | additive test 1 | additive test 2 | additive test 3 | additive test 4 |
---|---|---|---|---|---|
Ahneman等[ | 0.92 | 0.80 | 0.77 | 0.64 | 0.54 |
MFF[ | 0.93 | 0.85 | 0.71 | 0.64 | 0.18 |
Schwaller等[ | 0.95 | 0.84 | 0.84 | 0.75 | 0.49 |
CNN特征融合模型 | 0.95 | 0.71 | 0.85 | 0.61 | 0.55 |
方法 | random 600/475 | test sub | test cat | test sub-cat |
---|---|---|---|---|
Zahrt等[ | 0.152 | 0.161 | 0.211 | 0.238 |
MFF[ | 0.144 | 0.137 | 0.254 | 0.282 |
GCN特征融合模型 | 0.147 | 0.135 | 0.248 | 0.236 |
Tab. 4 Prediction results (MAE) on N,S-acetal formation
方法 | random 600/475 | test sub | test cat | test sub-cat |
---|---|---|---|---|
Zahrt等[ | 0.152 | 0.161 | 0.211 | 0.238 |
MFF[ | 0.144 | 0.137 | 0.254 | 0.282 |
GCN特征融合模型 | 0.147 | 0.135 | 0.248 | 0.236 |
1 | ENGEL T. Basic overview of chemoinformatics[J]. Journal of Chemical Information and Modeling, 2006, 46(6): 2267-2277. 10.1021/ci600234z |
2 | WILLETT P. Chemoinformatics: a history[J]. WIREs Computational Molecular Science, 2011, 1(1): 46-56. 10.1002/wcms.1 |
3 | LAVECCHIA A. Machine-learning approaches in drug discovery: methods and applications[J]. Drug Discovery Today, 2015, 20(3): 318-331. 10.1016/j.drudis.2014.10.012 |
4 | MA J S, SHERIDAN R P, LIAW A, et al. Deep neural nets as a method for quantitative structure-activity relationships[J]. Journal of Chemical Information and Modeling, 2015, 55(2): 263-274. 10.1021/ci500747n |
5 | SVETNIK V, LIAW A, TONG C, et al. Random forest: a classification and regression tool for compound classification and QSAR modeling[J]. Journal of Chemical Information and Computer Sciences, 2003, 43(6): 1947-1958. 10.1021/ci034160g |
6 | OMATA K. Screening of new additives of active-carbon-supported heteropoly acid catalyst for Friedel-Crafts reaction by Gaussian process regression[J]. Industrial and Engineering Chemistry Research, 2011, 50(19): 10948-10954. 10.1021/ie102477y |
7 | KAYALA M A, AZENCOTT C A, CHEN J H, et al. Learning to predict chemical reactions[J]. Journal of Chemical Information and Modeling, 2011, 51(9): 2209-2222. 10.1021/ci200207y |
8 | SZYMKUĆ S, GAJEWSKA E P, KLUCZNIK T, et al. Computer-assisted synthetic planning: the end of the beginning[J]. Angewandte Chemie International Edition, 2016, 55(20): 5904-5937. 10.1002/anie.201506101 |
9 | KITE S, HATTORI T, MURAKAMI Y. Estimation of catalytic performance by neural network — product distribution in oxidative dehydrogenation of ethylbenzene[J]. Applied Catalysis A: General, 1994, 114(2): L173-L178. 10.1016/0926-860X(94)80169-X |
10 | LIU B W, RAMSUNDAR B, KAWTHEKAR P, et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models[J]. ACS Central Science, 2017, 3(10): 1103-1113. 10.1021/acscentsci.7b00303 |
11 | RACCUGLIA P, ELBERT K C, ADLER P D F, et al. Machine-learning-assisted materials discovery using failed experiments[J]. Nature, 2016, 533(7601): 73-76. 10.1038/nature17439 |
12 | TODD M H. Computer-aided organic synthesis[J]. Chemical Society Reviews, 2005, 34(3): 247-266. 10.1039/b104620a |
13 | DENMARK S E, GOULD N D, WOLF L M. A systematic investigation of quaternary ammonium ions as asymmetric phase-transfer catalysts. Application of quantitative structure activity/selectivity relationships[J]. The Journal of Organic Chemistry, 2011, 76(11): 4337-4357. 10.1021/jo2005457 |
14 | SIGMAN M S, HARPER K C, BESS E N, et al. The development of multidimensional analysis tools for asymmetric catalysis and beyond[J]. Accounts of Chemical Research, 2016, 49(6): 1292-1301. 10.1021/acs.accounts.6b00194 |
15 | HAMMETT L P. The effect of structure upon the reactions of organic compounds. Benzene derivatives[J]. Journal of the American Chemical Society, 1937, 59(1): 96-103. 10.1021/ja01280a022 |
16 | SANTANILLA A B, REGALADO E L, PEREIRA T, et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules[J]. Science, 2015, 347(6217): 49-53. 10.1126/science.1259203 |
17 | COLLINS K D, GENSCH T, GLORIUS F. Contemporary screening approaches to reaction discovery and development[J]. Nature Chemistry, 2014, 6(10): 859-871. 10.1038/nchem.2062 |
18 | SANDFORT F, STRIETH-KALTHOFF F, KÜHNEMUND M, et al. A structure-based platform for predicting chemical reactivity[J]. Chem, 2020, 6(6): 1379-1390. 10.1016/j.chempr.2020.02.017 |
19 | WEI J N, DUVENAUD D, ASPURU-GUZIK A. Neural networks for the prediction of organic chemistry reactions[J]. ACS Central Science, 2016, 2(10): 725-732. 10.1021/acscentsci.6b00219 |
20 | COLEY C W, BARZILAY R, JAAKKOLA T S, et al. Prediction of organic reaction outcomes using machine learning[J]. ACS Central Science, 2017, 3(5): 434-443. 10.1021/acscentsci.7b00064 |
21 | AHNEMAN D T, ESTRADA J G, LIN S S, et al. Predicting reaction performance in C-N cross-coupling using machine learning[J]. Science, 2018, 360(6385): 186-190. 10.1126/science.aar5169 |
22 | ZAHRT A F, HENLE J J, ROSE B T, et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning[J]. Science, 2019, 363(6424): No.eaau5631. 10.1126/science.aau5631 |
23 | SCHWALLER P, VAUCHER A C, LAINO T, et al. Prediction of chemical reaction yields using deep learning[J]. Machine Learning: Science and Technology, 2021, 2(1): No.015016. 10.1088/2632-2153/abc81d |
24 | CHEN H M, ENGKVIST O, WANG Y H, et al. The rise of deep learning in drug discovery[J]. Drug Discovery Today, 2018, 23(6): 1241-1250. 10.1016/j.drudis.2018.01.039 |
25 | ALTAE-TRAN H, RAMSUNDAR B, PAPPU A S, et al. Low data drug discovery with one-shot learning[J]. ACS Central Science, 2017, 3(4): 283-293. 10.1021/acscentsci.6b00367 |
26 | LIPKOWITZ K B, PRADHAN M. Computational studies of chiral catalysts: a comparative molecular field analysis of an asymmetric Diels-Alder reaction with catalysts containing bisoxazoline or phosphinooxazoline ligands[J]. The Journal of Organic Chemistry, 2003, 68(12): 4648-4656. 10.1021/jo0267697 |
27 | KOZLOWSKI M C, DIXON S L, PANDA M, et al. Quantum mechanical models correlating structure with selectivity: predicting the enantioselectivity of β-amino alcohol catalysts in aldehyde alkylation[J]. Journal of the American Chemical Society, 2003, 125(22): 6614-6615. 10.1021/ja0293195 |
28 | DIXON S, MERZ K M, Jr., LAURI G, et al. QMQSAR: utilization of a semiempirical probe potential in a field-based QSAR method[J]. Journal of Computational Chemistry, 2005, 26(1): 23-34. 10.1002/jcc.20142 |
29 | HUANG J, IANNI J C, ANTOLINE J E, et al. De novo chiral amino alcohols in catalyzing asymmetric additions to aryl aldehydes[J]. Organic Letters, 2006, 8(8): 1565-1568. 10.1021/ol0600640 |
30 | PHUAN P W, IANNI J C, KOZLOWSKI M C. Is the A-ring of sparteine essential for high enantioselectivity in the asymmetric lithiation-substitution of N-Boc-pyrrolidine?[J]. Journal of the American Chemical Society, 2004, 126(47): 15473-15479. 10.1021/ja046321i |
31 | 马佳良,陈斌,孙晓飞. 基于改进的Faster R-CNN的通用目标检测框架[J]. 计算机应用, 2021, 41(9): 2712-2719. 10.11772/j.issn.1001-9081.2020111852 |
MA J L, CHEN B, SUN X F. General target detection framework based on improved Faster R-CNN[J]. Journal of Computer Applications, 2021, 41(9): 2712-2719. 10.11772/j.issn.1001-9081.2020111852 | |
32 | XIONG Z P, WANG D Y, LIU X H, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism[J]. Journal of Medicinal Chemistry, 2020, 63(16): 8749-8760. 10.1021/acs.jmedchem.9b00959 |
33 | VITAKU E, SMITH D T, NJARDARSON J T. Analysis of the structural diversity, substitution patterns, and frequency of nitrogen heterocycles among US FDA approved pharmaceuticals[J]. Journal of Medicinal Chemistry, 2014, 57(24): 10257-10274. 10.1021/jm501100b |
34 | Cambridge structural database[EB/OL]. [2022-03-2].. 10.1007/978-1-4614-1533-6_100393 |
35 | MILO A, BESS E N, SIGMAN M S. Interrogating selectivity in catalysis using molecular vibrations[J]. Nature, 2014, 507(7491): 210-214. 10.1038/nature13019 |
36 | MILO A, NEEL A J, TOSTE F D, et al. A data-intensive approach to mechanistic elucidation applied to chiral anion catalysis[J]. Science, 2015, 347(6223): 737-743. 10.1126/science.1261043 |
[1] | Zihao YAO, Yuanming LI, Ziqiang MA, Yang LI, Lianggen WEI. Multi-object cache side-channel attack detection model based on machine learning [J]. Journal of Computer Applications, 2024, 44(6): 1862-1871. |
[2] | Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG. Review on security threats and defense measures in federated learning [J]. Journal of Computer Applications, 2024, 44(6): 1663-1672. |
[3] | Wei SHE, Yang LI, Lihong ZHONG, Defeng KONG, Zhao TIAN. Hyperparameter optimization for neural network based on improved real coding genetic algorithm [J]. Journal of Computer Applications, 2024, 44(3): 671-676. |
[4] | Yi ZHENG, Cunyi LIAO, Tianqian ZHANG, Ji WANG, Shouyin LIU. Image denoising-based cell-level RSRP estimation method for urban areas [J]. Journal of Computer Applications, 2024, 44(3): 855-862. |
[5] | Xuebin CHEN, Changsheng QU. Overview of backdoor attacks and defense in federated learning [J]. Journal of Computer Applications, 2024, 44(11): 3459-3469. |
[6] | Renke SUN, Zhiyu HUANGFU, Hu CHEN, Zhongnian LI, Xinzheng XU. Survey of neural architecture search [J]. Journal of Computer Applications, 2024, 44(10): 2983-2994. |
[7] | Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning [J]. Journal of Computer Applications, 2024, 44(10): 2995-3010. |
[8] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. |
[9] | Haoyang CUI, Hui ZHANG, Lei ZHOU, Chunming YANG, Bo LI, Xujian ZHAO. Multi-similarity K-nearest neighbor classification algorithm with ordered pairs of normalized real numbers [J]. Journal of Computer Applications, 2023, 43(9): 2673-2678. |
[10] | Jing ZHONG, Chen LIN, Zhiwei SHENG, Shibin ZHANG. Quantum K-Means algorithm based on Hamming distance [J]. Journal of Computer Applications, 2023, 43(8): 2493-2498. |
[11] | Mengjie LAN, Jianping CAI, Lan SUN. Self-regularization optimization methods for Non-IID data in federated learning [J]. Journal of Computer Applications, 2023, 43(7): 2073-2081. |
[12] | Xiaohui HUANG, Kaiming YANG, Jiahao LING. Order dispatching by multi-agent reinforcement learning based on shared attention [J]. Journal of Computer Applications, 2023, 43(5): 1620-1624. |
[13] | Shaochen HAO, Zizuan WEI, Yao MA, Dan YU, Yongle CHEN. Network intrusion detection model based on efficient federated learning algorithm [J]. Journal of Computer Applications, 2023, 43(4): 1169-1175. |
[14] | Junpeng ZHANG, Yujie SHI, Rui JANG, Jingjing DONG, Changjian QIU. Review on advances in recognition and classification of cognitive impairment based on EEG signals [J]. Journal of Computer Applications, 2023, 43(10): 3297-3308. |
[15] | Hongliang LI, Nong ZHANG, Ting SUN, Xiang LI. Performance interference analysis and prediction for distributed machine learning jobs [J]. Journal of Computer Applications, 2022, 42(6): 1649-1655. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||