Chinese medical named entity recognition fused with prompt and metric learning

doi:10.11772/j.issn.1001-9081.2025101291

Journal of Computer Applications

Received:2025-11-04 Revised:2026-01-05 Accepted:2026-01-08 Online:2026-01-12 Published:2026-01-12

融合提示与度量学习的中文医疗命名实体识别

牛莉^1,2，刘纳^1,2*，张念帅^1,2，康伟业^1,2，马鑫妍^1,2

1.北方民族大学计算机科学与工程学院，银川 750021；
2.图像图形智能处理国家民委重点实验室（北方民族大学），银川 750021

通讯作者: 刘纳
基金资助:
国家自然科学基金资助项目;宁夏重点研发计划引才专项项目;宁夏自然科学基金项目;北方民族大学校级科研项目

Abstract

Abstract: The Chinese medical Named Entity Recognition (NER) task aims to extract medical entities from unstructured medical texts and assign these entities to predefined medical entity categories. Addressing the challenges of sample scarcity, semantic diversity, and cross-contextual distribution differences in Chinese medical NER, a unified framework integrating prompt learning and metric learning was proposed, called Prompted-by-Synonyms cross-set-aligned Uncertainty-Guided aggregation (PS-UG). First, a synonym-aware interrogative prompt template was constructed based on the Unified Medical Language System (UMLS) to explicitly expose label semantics and improve category separability. Second, a bidirectional cross-set attention and gating fusion mechanism was designed to achieve accurate semantic alignment between the support set and the query set. Finally, word-level representations were mapped to a diagonal Gaussian distribution, and symmetric Kullback-Leibler (KL) divergence is used as a metric for consistency between training and inference for category discrimination. Experiments were conducted on the CCKS2019, IMCS-V2-NER medical datasets, and CLUENER2020 general dataset using 5-way 1~2-shot and 5-way 5~10-shot methods. The experimental results show that PS-UG achieves significant improvements over the Prompt-based Metric Learning (ProML) baseline, with improvements of 8.99 percentage points and 7.38 percentage points on the CCKS2019 dataset, 5.03 percentage points and 4.24 percentage points on the IMCS-V2-NER dataset, and 6.14 percentage points and 2.24 percentage points on the CLUENER2020 dataset.

Key words: Named Entity Recognition (NER), few-shot learning, prompt learning, metric learning, Unified Medical Language System (UMLS) knowledge injection

摘要： 中文医疗命名实体识别(NER)任务旨在从医疗领域非结构化文本中抽取医疗实体并为这些医疗实体分配预先定义的医疗实体类别。针对中文医疗NER中的样本稀缺、语义多样性和跨语境分布差异等问题，提出一种融合提示学习与度量学习的统一框架——PS-UG (Prompted-by-Synonyms cross-set-aligned Uncertainty-Guided aggregation)。首先，基于统一医学语言系统 (UMLS)构建同义感知的疑问式提示模板，显式暴露标签语义，并提升类别可分性；其次，设计双向跨集注意力和门控融合机制，实现支持集与查询集间的精准语义对齐；最后，将字词级表示映射为对角高斯分布，利用对称KL (Kullback-Leibler)散度作为训练与推理一致的度量进行类别判别。在CCKS2019、IMCS-V2-NER医疗数据集和CLUENER2020通用数据集上开展5-way 1~2-shot和5-way 5~10-shot实验，实验结果表明，PS-UG相较于ProML (Prompt-based Metric Learning)基线均取得了显著提升，在CCKS2019数据集上分别提升了8.99与 7.38个百分点，在IMCS-V2-NER数据集上分别提升了5.03与 4.24个百分点，在CLUENER2020数据集上分别提升了6.14与 2.24个百分点。

关键词: 命名实体识别, 小样本学习, 提示学习, 度量学习, 统一医学语言系统知识注入

CLC Number:

TP391.1

牛莉刘纳张念帅康伟业马鑫妍. 融合提示与度量学习的中文医疗命名实体识别[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025101291.

[1]	Yuhang XIAO, Guanfeng LI, Yuyin CHEN, Jing QIN. Few-shot relation extraction model with graph-based multi-view contrastive learning [J]. Journal of Computer Applications, 2026, 46(3): 732-740.
[2]	Penghuan QU, Wei WEI, Jing YAN, Feng WANG. Dual imputation based incomplete multi-view metric learning [J]. Journal of Computer Applications, 2025, 45(9): 2755-2763.
[3]	Dengran REN, Shuying WANG. Nested named entity recognition model for wind power equipment based on differential boundary enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2798-2805.
[4]	Li LI, Han SONG, Peihe LIU, Hanlin CHEN. Named entity recognition for sensitive information based on data augmentation and residual networks [J]. Journal of Computer Applications, 2025, 45(9): 2790-2797.
[5]	Jing WANG, Jiaxing LIU, Wanying SONG, Jiaxing XUE, Wenxin DING. Few-shot skin image classification model based on spatial transformer network and feature distribution calibration [J]. Journal of Computer Applications, 2025, 45(8): 2720-2726.
[6]	Jing YU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Sequence labeling optimization method combined with entity boundary offset [J]. Journal of Computer Applications, 2025, 45(8): 2522-2529.
[7]	Ruifeng BAI, Guanglei GOU, Lang WEN, Wanyu MIAO. Granular-ball prototypical network for few-shot image classification [J]. Journal of Computer Applications, 2025, 45(7): 2269-2277.
[8]	Zhangjie XU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Nested named entity recognition combined with boundary generation by multi-objective learning [J]. Journal of Computer Applications, 2025, 45(7): 2229-2236.
[9]	Lixiao ZHANG, Yao MA, Yuli YANG, Dan YU, Yongle CHEN. Large-scale IoT binary component identification based on named entity recognition [J]. Journal of Computer Applications, 2025, 45(7): 2288-2295.
[10]	Shuangshuang CUI, Hongzhi WANG, Jiahao ZHU, Hao WU. Two-stage data selection method for classifier with low energy consumption and high performance [J]. Journal of Computer Applications, 2025, 45(6): 1703-1711.
[11]	Biqing ZENG, Guangbin ZHONG, James Zhiqing WEN. Few-shot named entity recognition based on decomposed fuzzy span [J]. Journal of Computer Applications, 2025, 45(5): 1504-1510.
[12]	Jie HU, Shuaixing WU, Zhilan CAO, Yan ZHANG. Named entity recognition model based on global information fusion and multi-dimensional relation perception [J]. Journal of Computer Applications, 2025, 45(5): 1511-1519.
[13]	Yiqin YAN, Chuan LUO, Tianrui LI, Hongmei CHEN. Cross-domain few-shot classification model based on relation network and Vision Transformer [J]. Journal of Computer Applications, 2025, 45(4): 1095-1103.
[14]	Yiheng SUN, Maofu LIU. Tender information extraction method based on prompt tuning of knowledge [J]. Journal of Computer Applications, 2025, 45(4): 1169-1176.
[15]	Can MA, Ruizhang HUANG, Lina REN, Ruina BAI, Yaoyao WU. Chinese spelling correction method based on LLM with multiple inputs [J]. Journal of Computer Applications, 2025, 45(3): 849-855.