Journal of Computer Applications

    Next Articles

Chinese medical named entity recognition fused with prompt and metric learning

  

  • Received:2025-11-04 Revised:2026-01-05 Accepted:2026-01-08 Online:2026-01-12 Published:2026-01-12

融合提示与度量学习的中文医疗命名实体识别

牛莉1,2,刘纳1,2*,张念帅1,2,康伟业1,2,马鑫妍1,2   

  1. 1.北方民族大学 计算机科学与工程学院,银川 750021;
    2.图像图形智能处理国家民委重点实验室(北方民族大学),银川 750021

  • 通讯作者: 刘纳
  • 基金资助:
    国家自然科学基金资助项目;宁夏重点研发计划引才专项项目;宁夏自然科学基金项目;北方民族大学校级科研项目

Abstract: The Chinese medical Named Entity Recognition (NER) task aims to extract medical entities from unstructured medical texts and assign these entities to predefined medical entity categories. Addressing the challenges of sample scarcity, semantic diversity, and cross-contextual distribution differences in Chinese medical NER, a unified framework integrating prompt learning and metric learning was proposed, called Prompted-by-Synonyms cross-set-aligned Uncertainty-Guided aggregation (PS-UG). First, a synonym-aware interrogative prompt template was constructed based on the Unified Medical Language System (UMLS) to explicitly expose label semantics and improve category separability. Second, a bidirectional cross-set attention and gating fusion mechanism was designed to achieve accurate semantic alignment between the support set and the query set. Finally, word-level representations were mapped to a diagonal Gaussian distribution, and symmetric Kullback-Leibler (KL) divergence is used as a metric for consistency between training and inference for category discrimination. Experiments were conducted on the CCKS2019, IMCS-V2-NER medical datasets, and CLUENER2020 general dataset using 5-way 1~2-shot and 5-way 5~10-shot methods. The experimental results show that PS-UG achieves significant improvements over the Prompt-based Metric Learning (ProML) baseline, with improvements of 8.99 percentage points and 7.38 percentage points on the CCKS2019 dataset, 5.03 percentage points and 4.24 percentage points on the IMCS-V2-NER dataset, and 6.14 percentage points and 2.24 percentage points on the CLUENER2020 dataset.

Key words: Named Entity Recognition (NER), few-shot learning, prompt learning, metric learning, Unified Medical Language System (UMLS) knowledge injection

摘要: 中文医疗命名实体识别(NER)任务旨在从医疗领域非结构化文本中抽取医疗实体并为这些医疗实体分配预先定义的医疗实体类别。针对中文医疗NER中的样本稀缺、语义多样性和跨语境分布差异等问题,提出一种融合提示学习与度量学习的统一框架——PS-UG (Prompted-by-Synonyms cross-set-aligned Uncertainty-Guided aggregation)。首先,基于统一医学语言系统 (UMLS)构建同义感知的疑问式提示模板,显式暴露标签语义,并提升类别可分性;其次,设计双向跨集注意力和门控融合机制,实现支持集与查询集间的精准语义对齐;最后,将字词级表示映射为对角高斯分布,利用对称KL (Kullback-Leibler)散度作为训练与推理一致的度量进行类别判别。在CCKS2019、IMCS-V2-NER医疗数据集和CLUENER2020通用数据集上开展5-way 1~2-shot和5-way 5~10-shot实验,实验结果表明,PS-UG相较于ProML (Prompt-based Metric Learning)基线均取得了显著提升,在CCKS2019数据集上分别提升了8.99与 7.38个百分点,在IMCS-V2-NER数据集上分别提升了5.03与 4.24个百分点,在CLUENER2020数据集上分别提升了6.14与 2.24个百分点。 

关键词: 命名实体识别, 小样本学习, 提示学习, 度量学习, 统一医学语言系统知识注入

CLC Number: