Journal of Computer Applications

    Next Articles

Key phrase extraction model based on multi-perspective information enhancement and hierarchical weighting

  

  • Received:2025-05-12 Revised:2025-07-15 Accepted:2025-07-17 Online:2025-07-22 Published:2025-07-22

基于多视角信息增强和层次化权重的关键短语抽取模型

胡婕1,2,3,李鹏程1,孙军1,2,3*,张佳傲1   

  1. 1.湖北大学 计算机学院,武汉  430062; 2.大数据智能分析与行业应用湖北省重点实验室(湖北大学), 武汉 430062; 3.智慧政务与人工智能应用湖北省工程研究中心, 武汉 430062


  • 通讯作者: 孙军
  • 基金资助:
    基于知识图谱的按专业招生高考志愿智能推荐方法研究

Abstract: To address the significant communication overhead and gradient inversion privacy leakage risks of federated learning in Internet of Things (IoT) scenarios, a two-stage communication compression framework named QPR (Quantization and Pull Reduction) was proposed. Firstly, training nodes utilized gradient quantization to compress local gradients before uploading them to the server, reducing gradient transmission overhead. Secondly, a probabilistically-triggered lazy pulling mechanism was introduced to reduce model synchronization frequency. Training nodes synchronized the global model with a preset probability; for other iterations, locally cached historical models were reused. Finally, rigorous theoretical analysis was established, confirming that QPR achieves the same asymptotic convergence order as the standard FedAvg algorithm without communication compression and possesses the linear speedup property with increasing number of nodes, ensuring system scalability. Experimental results demonstrate that the QPR algorithm significantly enhances communication efficiency across multiple benchmark datasets and machine learning models. Taking the ResNet18 training task on the Cifar10 dataset as an example, the algorithm achieves up to an 8.27x communication speedup ratio compared to uncompressed FedAvg, without any loss in model accuracy.

Key words: unsupervised learning, key phrase extraction, information enhancement, BERT (Bidirectional Encoder Representations from Transformers) model, hierarchical weighting

摘要: 现有无监督关键短语提取模型对复杂的上下文和多层次语义信息的捕获能力不足,无法获取多维度信息。对此,提出一种基于多视角信息增强和层次化权重的关键短语提取模型,首先利用BERT(Bidirectional Encoder Representations from Transformers)预训练模型对文本和候选短语进行编码,获得嵌入表示;通过加权平均池化优化文本嵌入,并计算与候选短语的全局相似度,实现全局信息增强,提升对语义关联的理解。其次,提出基于图结构的边界感知局部中心性计算方法,以增强局部信息获取能力。最后,融合多因素进行权重计算,从多个维度评估候选短语的重要性。在6个公开数据集Inspec、SemEval2017、SemEval2010、DUC2001、NUC和Krapivin上进行了模型验证,与基线模型PromptRank相比,所提模型的F1@5值在6个数据集上分别提高了2.16、2.68、2.10、1.44、1.90和0.87个百分点,F1@10值分别提高了2.24、1.68、1.95、1.73、1.30和1.11个百分点,F1@15值分别提高了2.25、1.54、1.79、1.50、0.64和0.92个百分点。实验结果表明所提模型综合性能得到有效提升。

关键词: 无监督学习, 关键短语提取, 信息增强, BERT模型, 层次化权重

CLC Number: