Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 325-332.DOI: 10.11772/j.issn.1001-9081.2021071218

• Frontier and comprehensive applications • Previous Articles    

Multi-aspect multi-attention fusion of molecular features for drug-target affinity prediction

Runze WANG1, Yueqin ZHANG1(), Qiqi QIN1, Zehua ZHANG1, Xumin GUO2   

  1. 1.College of Information and Computer,Taiyuan University of Technology,Taiyuan Shanxi 030600,China
    2.Department of Computer and Information Engineering,Shanxi Youth Vocational College,Taiyuan Shanxi 030032,China
  • Received:2021-07-14 Revised:2021-08-16 Accepted:2021-08-23 Online:2021-08-16 Published:2022-01-10
  • Contact: Yueqin ZHANG
  • About author:WANG Runze, born in 1997, M. S. candidate. His research interests include graph representation learning, biometric identification.
    ZHANG Yueqin, born in 1963, M. S., professor. Her research interests include data mining, intelligent information processing.
    QIN Qiqi, born in 1996, M. S. candidate. Her research interests include recommendation system.
    ZHANG Zehua, born in 1981, Ph. D., lecturer. His research interests include soft computing, machine learning, biometric identification, social network, complex network pattern analysis.
    GUO Xumin, born in 1987, M. S., lecturer. His research interests include big data.
  • Supported by:
    the National Natural Science Foundation of China(61702356);Industry-University Cooperation Education Program of Ministry of Education, Research Support Project for Returned Overseas Students in Shanxi Province

多视角多注意力融合分子特征的药物-靶标亲和力预测

王润泽1, 张月琴1(), 秦琪琦1, 张泽华1, 郭旭敏2   

  1. 1.太原理工大学 信息与计算机学院,太原 030600
    2.山西青年职业学院 计算机与信息工程系,太原 030032
  • 通讯作者: 张月琴
  • 作者简介:王润泽(1997—),男,山东德州人,硕士研究生,CCF会员,主要研究方向:图表示学习、生物特征识别
    张月琴(1963—),女,山西太原人,教授,硕士,CCF会员,主要研究方向:数据挖掘、智能信息处理
    秦琪琦(1996—),女,山西长治人,硕士研究生,主要研究方向:推荐系统
    张泽华(1981—),男,山西太原人,讲师,博士,CCF会员,主要研究方向:软计算、机器学习、生物特征识别、社交网络、复杂网络模式分析
    郭旭敏(1987—),男,山西太原人,讲师,硕士,主要研究方向:大数据。
  • 基金资助:
    国家自然科学基金资助项目(61702356);教育部产学合作协同育人项目;山西省回国留学人员科研资助项目

Abstract:

Recent deep learning achieves great attention on the tasks of Drug-Target Affinity (DTA). However, most existing works embed the molecular single structure as a vector, while ignoring the information gain provided by multi-aspect fusion of molecular features to the final feature representation. To address the feature incompleteness problem of single-structured molecules, an end-to-end deep learning method based on attentive fusion of multi-aspect molecular features was proposed for DTA prediction. Multi-aspect molecular structure embedding (Mas) and Multi-attention feature fusion (Mat) are the core modules of the proposed method. Firstly, the multi-aspect molecular structure was embedded into the feature vector space by Mas module. Secondly, the attention mechanism of molecular feature level was incorporated for the weighted fusion of molecular features from different aspects through Mat module. Thirdly, feature cascade of the above two was performed according to Drug-Target Interaction (DTI). Finally, the fully connected neural network was used to realize the regression prediction of the affinity. Experiments on Davis and KIBA datasets were carried out to evaluate the influence of training ratio, multi-aspect feature incorporation, multi-attention fusion, and related parameters on the performance of affinity prediction. Compared with the GraphDTA method, the proposed method has the Mean Square Error (MSE) reduced by 4.8% and 6% on the two datasets, respectively. Experimental results show that attentive fusion of multi-aspect molecular features can capture the molecular features that are more relevant for linkages on protein targets.

Key words: Drug-Target Affinity (DTA) prediction, multi-attention molecular feature fusion, multi-aspect molecular structure embedding, molecular feature level, attention mechanism

摘要:

近期深度学习在药物-靶标亲和力(DTA)任务上受到极大关注,然而现有工作多将分子单一结构嵌入为向量,忽略了多视角融合分子特征对最终特征表示提供的信息增益。针对单一结构分子存在特征不完备性的问题,提出了一种基于注意力融合多视角分子特征的预测DTA的端到端深度学习方法,其核心模块为多视角分子结构嵌入(Mas)和多注意力特征融合(Mat)。首先,使用Mas模块将多视角分子结构嵌入到特征向量空间;然后,通过Mat模块融入分子特征层级的注意力机制,从而对不同视角的分子特征进行加权融合;其次,根据药物-靶标相互作用(DTI)执行两者特征级联;最后,利用全连接神经网络回归预测亲和力。在Davis和KIBA数据集上的实验验证了训练比率、多视角特征融入、多注意力融合、以及相关参数对亲和力预测性能的影响。与GraphDTA方法相比,所提方法的均方误差(MSE)在Davis和KIBA两个数据集上分别降低了4.8%和6%。实验结果表明,注意力融合多视角分子特征能够捕获对蛋白质靶位上链接的相关性更高的分子特征。

关键词: 药物-靶标亲和力预测, 多注意力分子特征融合, 多视角分子结构嵌入, 分子特征层级, 注意力机制

CLC Number: