Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1324-1329.DOI: 10.11772/j.issn.1001-9081.2021030508

• Artificial intelligence • Previous Articles     Next Articles

Short text classification method by fusing corpus features and graph attention network

Shigang YANG, Yongguo LIU()   

  1. School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 610054,China
  • Received:2021-04-06 Revised:2021-06-18 Accepted:2021-06-21 Online:2022-06-11 Published:2022-05-10
  • Contact: Yongguo LIU
  • About author:YANG Shigang,born in 1998,M. S. candidate. His research interests include text classification.
    LIU Yongguo, born in 1974,Ph. D.,professor. His research interests include digital medicine, computing health, artificial intelligence,big data.
  • Supported by:
    National Key Research and Development Program of China(2017YFC1703905);National Natural Science Foundation of China(81803851);Key Research and Development Program of Sichuan Province(2020YFS0372);Application Basic Research and Development Program of Sichuan Province(2021YJ0184)

融合语料库特征与图注意力网络的短文本分类方法

杨世刚, 刘勇国()   

  1. 电子科技大学 信息与软件工程学院,成都 610054
  • 通讯作者: 刘勇国
  • 作者简介:杨世刚(1998—),,男,四川广安人,硕士研究生,主要研究方向:文本分类;
    刘勇国(1974—),男,四川绵阳人,教授,博士,主要研究方向:数字医疗、计算健康、人工智能、大数据。liuyg@uestc.edu.cn

Abstract:

Short text classification is an important research problem of Natural Language Processing (NLP), and is widely used in news classification, sentiment analysis, comment analysis and other fields. Aiming at the problem of data sparsity in short text classification, by introducing node and edge weight features of corpora, based on Graph ATtention network (GAT), a new graph attention network named Node-Edge GAT (NE-GAT) by fusing node and edge weight features was proposed. Firstly, a heterogeneous graph was constructed for each corpus, Gravity Model (GM) was used to evaluate the importance of word nodes, and edge weights were obtained through Point Mutual Information (PMI) between nodes. Secondly, a text-level graph was constructed for each sentence, node importance and edge weights were integrated into the update process of nodes. Experimental results show that, the average accuracy of the proposed model on the test sets reaches 75.48%, which is better than those of the models such as Text Graph Convolution Network (Text-GCN), Text-Level-Graph Neural Network (TL-GNN) and Text classification method for INductive word representations via Graph neural networks (Text-ING). Compared with original GAT, the proposed model has the average accuracy improved by 2.32 percentage points, which verifies the effectiveness of the proposed model.

Key words: short text classification, Graph Attention Network (GAT), corpus feature, Gravity Model (GM), Pointwise Mutual Information (PMI)

摘要:

短文本分类是自然语言处理(NLP)中的重要研究问题,广泛应用于新闻分类、情感分析、评论分析等领域。针对短文本分类中存在的数据稀疏性问题,通过引入语料库的节点和边权值特征,基于图注意力网络(GAT),提出了一个融合节点和边权值特征的图注意力网络NE-GAT。首先,针对每个语料库构建异构图,利用引力模型(GM)评估单词节点的重要性,并通过节点间的点互信息(PMI)获得边权重;其次,为每个句子构建文本级别图,并将节点重要性和边权重融入节点更新过程。实验结果表明,所提模型在测试集上的平均准确率达到了75.48%,优于用于文本分类的图卷积网络(Text-GCN)、TL-GNN、Text-ING等模型;相较原始GAT,所提模型的平均准确率提升了2.32个百分点,验证了其有效性。

关键词: 短文本分类, 图注意力网络, 语料库特征, 引力模型, 点互信息

CLC Number: