《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2125-2132.DOI: 10.11772/j.issn.1001-9081.2022060872

• 人工智能 • 上一篇    

基于语义与全局双重注意力机制的长链非编码RNA-疾病关联预测模型

张奕1,2, 蔡钢生1(), 王真梅1   

  1. 1.桂林理工大学 信息科学与工程学院, 广西 桂林 541006
    2.广西嵌入式技术与智能系统重点实验室(桂林理工大学), 广西 桂林 541006
  • 收稿日期:2022-06-16 修回日期:2022-08-22 接受日期:2022-08-30 发布日期:2022-09-22 出版日期:2023-07-10
  • 通讯作者: 蔡钢生
  • 作者简介:张奕(1977—),女,江西九江人,教授,博士,主要研究方向:机器学习、推荐系统;
    蔡钢生(1995—),男,广东揭阳人,硕士研究生,主要研究方向:机器学习、生物信息学;
    王真梅(1996—),女,广西玉林人,硕士研究生,主要研究方向:生物信息学。
  • 基金资助:
    国家自然科学基金资助项目(62166014);广西自然科学基金资助项目(2020GXNSFAA297255)

Long non-coding RNA-disease association prediction model based on semantic and global dual attention mechanism

Yi ZHANG1,2, Gangsheng CAI1(), Zhenmei WANG1   

  1. 1.College of Information Science and Engineering,Guilin University of Technology,Guilin Guangxi 541006,China
    2.Guangxi Key Laboratory of Embedded Technology and Intelligent System (Guilin University of Technology),Guilin Guangxi 541006,China
  • Received:2022-06-16 Revised:2022-08-22 Accepted:2022-08-30 Online:2022-09-22 Published:2023-07-10
  • Contact: Gangsheng CAI
  • About author:ZHANG Yi, born in 1977, Ph. D., professor. Her research interests include machine learning, recommender system.
    CAI Gangsheng, born in 1995, M. S. candidate. His research interests include machine learning, bioinformatics.
    WANG Zhenmei, born in 1996, M. S. candidate. Her research interests include bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(62166014);Guangxi Natural Science Foundation(2020GXNSFAA297255)

摘要:

针对现有长链非编码RNA (lncRNA)-疾病关联预测模型在综合利用异构生物网络的交互、语义信息上存在局限性的问题,提出一种基于语义与全局双重注意力机制的lncRNA-疾病关联预测模型(SGALDA)。首先,基于相似性和已知关联构建一个lncRNA-疾病-微小RNA(miRNA)异构网络,并基于消息传递类型设计特征提取模块来提取和融合异构网络上同质、异质节点的邻域特征,以捕捉异构网络上的多层面交互关系。其次,基于元路径将异构网络分解为多个语义子网络,并分别在各个子网络上应用图卷积网络(GCN)来提取节点的语义特征,以捕捉异构网络上的高阶交互关系。然后,基于语义与全局双重注意力机制融合节点的语义和邻域特征,以获得更具代表性的节点特征。最后,利用lncRNA节点特征和疾病节点特征的内积运算重建lncRNA-疾病关联。5折交叉验证结果显示,SGALDA的受试者工作特征曲线下面积(AUROC)为0.994 5±0.000 2,PR曲线下面积(AUPR)为0.916 7±0.001 1,在所有对比模型中均为最高,验证了SGALDA良好的预测性能。对乳腺癌、胃癌的案例研究进一步证实了SGALDA识别潜在lncRNA-疾病关联的能力,说明SGALDA有潜力成为一种可靠的lncRNA-疾病关联预测模型。

关键词: 关联预测, 异构网络, 元路径, 双重注意力, 图卷积网络, 长链非编码RNA

Abstract:

Aiming at the limitations of existing long non-coding RNA (lncRNA) -disease association prediction models in comprehensively utilizing interaction and semantic information of heterogeneous biological networks, an lncRNA-Disease Association prediction model based on Semantic and Global dual Attention mechanism (SGALDA) was proposed. Firstly, an lncRNA-disease-microRNA (miRNA) heterogeneous network was constructed based on similarity and known associations. And a feature extraction module was designed based on message passing types to extract and fuse the neighborhood features of homogeneous and heterogeneous nodes on the network, so as to capture multi-level interactive relations on the heterogeneous network. Secondly, the heterogeneous network was decomposed into multiple semantic sub-networks based on meta-paths. And a Graph Convolutional Network (GCN) was applied on each sub-network to extract semantic features of nodes, so as to capture the high-order interactive relations on the heterogeneous network. Thirdly, a semantic and global dual attention mechanism was used to fuse semantic and neighborhood features of the nodes to obtain more representative node features. Finally, lncRNA-disease associations were reconstructed by using the inner product of lncRNA node features and disease node features. The 5-fold cross-validation results show that the Area Under Receiver Operating Characteristic curve (AUROC) of SGALDA is 0.994 5±0.000 2, and the Area Under Precision-Recall curve (AUPR) of SGALDA is 0.916 7±0.001 1, both of them are the highest among AUROCs sand AUPRs of all the comparison models. It proves SGALDA’s good prediction performance. Case studies on breast cancer and stomach cancer further prove the ability of SGALDA to identify potential lncRNA-disease associations, indicating that SGALDA has the potential to be a reliable lncRNA-disease association prediction model.

Key words: association prediction, heterogeneous network, meta-path, dual attention, Graph Convolutional Network (GCN), long non-coding RNA (lncRNA)

中图分类号: