《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1050-1057.DOI: 10.11772/j.issn.1001-9081.2025040517

• 人工智能 • 上一篇    下一篇

基于多视角信息增强和层次化权重的关键短语抽取模型

胡婕1,2,3, 李鹏程1, 孙军1,2,3(), 张佳傲1   

  1. 1.湖北大学 计算机学院,武汉 430062
    2.大数据智能分析与行业应用湖北省重点实验室 (湖北大学),武汉 430062
    3.智慧政务与人工智能应用湖北省工程研究中心 (湖北大学),武汉 430062
  • 收稿日期:2025-05-12 修回日期:2025-07-15 接受日期:2025-07-17 发布日期:2025-07-22 出版日期:2026-04-10
  • 通讯作者: 孙军
  • 作者简介:胡婕(1977—),女,湖北汉川人,教授,博士,主要研究方向:复杂语义大数据管理、自然语言处理
    李鹏程(2000—),男,湖北枣阳人,硕士研究生,主要研究方向:自然语言处理
    张佳傲(2001—),男,河南漯河人,硕士研究生,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(61977021)

Key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting

Jie HU1,2,3, Pengcheng LI1, Jun SUN1,2,3(), Jiaao ZHANG1   

  1. 1.School of Computer Science,Hubei University,Wuhan Hubei 430062,China
    2.Hubei Key Laboratory of Big Data Intelligent Analysis and Application (Hubei University),Wuhan Hubei 430062,China
    3.Engineering Research Center of Hubei Province in Intelligent Government Affairs and Application of Artificial Intelligence,Wuhan Hubei 430062,China
  • Received:2025-05-12 Revised:2025-07-15 Accepted:2025-07-17 Online:2025-07-22 Published:2026-04-10
  • Contact: Jun SUN
  • About author:HU Jie, born in 1977, Ph. D., professor. Her research interests include complex semantic big data management, natural language processing.
    LI Pengcheng, born in 2000, M. S. candidate. His research interests include natural language processing.
    ZHANG Jiaao, born in 2001, M. S. candidate. His research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61977021)

摘要:

现有的无监督关键短语提取模型对复杂的上下文和多层次语义信息的捕获能力不足,无法获取多维度信息。因此,提出一种基于多视角信息增强和层次化权重的关键短语提取模型。首先,利用BERT(Bidirectional Encoder Representations from Transformers)预训练模型对文本和候选短语进行编码,获得嵌入表示;并且,通过加权平均池化优化文本嵌入,并计算它们与候选短语的全局相似度,以实现全局信息增强,提升对语义关联的理解。其次,提出基于图结构的边界感知局部中心性计算方法,以增强局部信息获取能力。最后,融合多因素计算权重,从多个维度评估候选短语的重要性。在Inspec、SemEval 2017和SemEval-2010等6个公开数据集上的实验结果表明,与基线模型PromptRank相比,所提模型的F1@5值提高了0.87~2.68个百分点,F1@10值提高了1.11~2.24个百分点,F1@15值提高了0.54~2.25个百分点。可见,所提模型的综合性能得到了有效提升。

关键词: 无监督学习, 关键短语提取, 信息增强, BERT模型, 层次化权重

Abstract:

The existing unsupervised key phrase extraction models have insufficient capability to capture complex contexts and multi-level semantic information, thereby failing to acquire multi-dimensional information. Therefore, a key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting was proposed. Firstly, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained model was employed to encode the text and candidate phrases, thereby obtaining the embedding representations. Besides, the text embeddings were optimized through weighted average pooling, and the global similarity between them and candidate phrases was calculated to achieve global information enhancement, thereby improving the understanding of semantic associations. Secondly, a graph structure-based boundary-aware local centrality calculation method was introduced to improve the ability to capture local information. Finally, multiple factors were integrated for weight calculation to evaluate the importance of candidate phrases from various dimensions. Experiments were conducted on six public datasets such as Inspec, SemEval 2017, and SemEval-2010. The results show that compared to the baseline model PromptRank, the proposed model achieves improvements in F1@5 score by 0.87 to 2.68 percentage points, has the F1@10 score increased by 1.11 to 2.24 percentage points, and the F1@15 score improved by 0.54 to 2.25 percentage points. It can be seen that the overall performance of the proposed model has been enhanced effectively.

Key words: unsupervised learning, key phrase extraction, information enhancement, BERT (Bidirectional Encoder Representations from Transformers) model, hierarchical weighting

中图分类号: