Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 3757-3763.DOI: 10.11772/j.issn.1001-9081.2024121814

• Artificial intelligence • Previous Articles     Next Articles

Multi-label classification method integrating external semantic knowledge

Jincai YANG, Qixu BAN, Xusheng YANG, Xianjun SHEN   

  1. School of Computer Science,Central China Normal University,Wuhan Hubei 430079,China
  • Received:2024-12-25 Revised:2025-03-15 Accepted:2025-03-20 Online:2025-03-27 Published:2025-12-10
  • Contact: Qixu BAN
  • About author:YANG Jincai, born in 1967, Ph. D., professor. His research interests include database and information systems, Chinese information processing, natural language processing, artificial intelligence.
    BAN Qixu, born in 1998, M. S. candidate. His research interests include natural language processing.
    YANG Xusheng, born in 1996, Ph. D. candidate. His research interests include natural language processing.
    SHEN Xianjun, born in 1973, Ph. D., professor. His research interests include artificial intelligence, Bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(61977032);National Social Science Foundation of China(19BYY092)

融合外部语义知识的多标签分类方法

杨进才, 班启旭, 杨旭生, 沈显君   

  1. 华中师范大学 计算机学院,武汉 430079
  • 通讯作者: 班启旭
  • 作者简介:杨进才(1967—),男,湖北咸宁人,教授,博士,CCF会员,主要研究方向:数据库与信息系统、中文信息处理、自然语言处理、人工智能
    班启旭(1998—),男,河南新乡人,硕士研究生,主要研究方向:自然语言处理
    杨旭生(1996—),男,河北秦皇岛人,博士研究生,主要研究方向:自然语言处理
    沈显君(1973—),男,湖北仙桃人,教授,博士,CCF会员,主要研究方向:人工智能、生物信息学。
  • 基金资助:
    国家自然科学基金资助项目(61977032);国家社会科学基金资助项目(19BYY092)

Abstract:

Text classification is regarded as a crucial task in Natural Language Processing (NLP) field, with multi-label classification becoming a challenge due to large label space. To address this issue, a multi-label classification method integrating external semantic knowledge was proposed, named HSGIN(Heterogeneous Semantic Gated Interaction Network), using values markers in children’s books as a case study. Firstly, text features were extracted through SBERT (Sentence Embeddings from Siamese BERT (Bidirectional Encoder Representations from Transformers)) and Bidirectional Long Short-Term Memory (Bi-LSTM) network. Then, entities and relations in the Knowledge Graph (KG) were modeled jointly using a Heterogeneous Graph Transformer (HGT), and label features were extracted using the prior knowledge and semantic associations. Finally, the attention mechanism was employed to fuse text features and label features, generating distinct label feature representations. These embeddings were fed into a Gated Graph Neural Network (GGNN) to capture semantic dependencies and interaction patterns among labels for prediction. Experimental results show that compared with the existing state-of-the-art comparison method BERT, the proposed method achieves increases of 2.66, 0.47, and 1.16 percentage points in precision, recall, and F1 score, respectively. The above experimental results verify the effectiveness of the proposed method. At the same time, precise analysis of values markers in children’s books helps choose healthy books for children.

Key words: Multi-Label Text Classification (MLTC), Knowledge Graph (KG), Heterogeneous Graph Transformer (HGT) architecture, Gated Graph Neural Network (GGNN), label correlation

摘要:

文本分类作为自然语言处理(NLP)领域的重要任务,它的多标签分类因标签空间大而成为难点。针对该问题,以儿童读物中的价值观标识为实例,提出一种融合外部语义知识的多标签分类方法HSGIN(Heterogeneous Semantic Gated Interaction Network)。首先,利用SBERT (Sentence embeddings from Siamese BERT (Bidirectional Encoder Representations from Transformers))和双向长短期记忆(Bi-LSTM)网络提取文本特征;其次,通过异质图转换架构(HGT)联合建模知识图谱(KG)中的实体和关系,并利用先验知识和语义关联提取标签特征;最后,将文本特征和标签特征进行注意力融合以得到不同的标签特征表示,且引入门控图神经网络(GGNN)捕捉标签间的语义依赖和交互模式并进行预测。实验结果表明,相较于目前性能先进的对比方法BERT,所提方法的精确率、召回率和F1分数分别提升了2.66、0.47和1.16个百分点。以上实验结果验证了所提方法的有效性,同时,对儿童读物中价值观标识的精准分析有助于为儿童选择健康的读物。

关键词: 多标签文本分类, 知识图谱, 异质图转换架构, 门控图神经网络, 标签相关性

CLC Number: