Entity disambiguation method for private domain data QA via continuous learning

doi:10.11772/j.issn.1001-9081.2026020209

Journal of Computer Applications

Received:2026-03-09 Revised:2026-03-19 Online:2026-04-22 Published:2026-04-22

面向私域数据问答的持续学习型实体消歧方法

崔智源,张钧波

西南交通大学

通讯作者: 崔智源
基金资助:
信息不充分条件下城市生命线工程运营管理与韧性恢复研究（国家自然科学基金资助项目）;2025交叉-李博+张钧波（北京市高层次创新创业人才支持计划科技新星计划资助项目）

Abstract

Abstract: To address the issue of ambiguous entity references in domain-specific data question answering caused by the semantic gap between informal queries and database schemas, this paper proposes a Continuous Learning-based Progressive Entity Disambiguation framework (CL-PED). First, a memory-driven rewriting module is designed to transform queries into standardized expressions. Second, a hybrid retrieval strategy is employed to quickly isolate unambiguous candidates. Next, an Execution-Decision architecture featuring collaboration between large and small models is constructed for low-cost, deep ambiguity discrimination. Finally, a continuous learning mechanism based on experience replay is introduced, which transforms user feedback into weakly supervised signals to fine-tune the model, thereby achieving closed-loop evolution. Evaluation on real-world application datasets demonstrates that the proposed method achieves a 7.19 percentage points improvement in query clarity classification accuracy compared to Retrieval-Augmented Generation (RAG) methods. In continuous learning experiments, the proposed framework achieves an average accuracy of 84.22% in incremental learning scenarios. This is 0.78 percentage points higher than the full retraining method, while requiring only 46.6% of the training time, effectively balancing new knowledge absorption with the retention of old knowledge.

Key words: Keywords: private domain data question answering, entity disambiguation, continuous learning, small and large model synergy, experience replay

摘要： 针对私域数据问答中因口语化查询与数据库模式存在语义鸿沟而导致的实体指代不明问题，本文提出一种基于持续学习的渐进式实体消歧方法(Continuous Learning based Progressive Entity Disambiguation framework, CL-PED)。首先，设计记忆驱动的重写模块，将查询转化为规范表达；其次，通过混合检索策略快速过滤无歧义候选；接着，构建大小模型协作的“执行-判决”架构以低成本进行深度歧义判别；最后，引入基于经验回放的持续学习机制，将用户反馈转化为弱监督信号微调模型，实现闭环进化。实际应用场景数据集评测表明，本文方法问题清晰度判别准确率较检索增强生成方法上升7.19个百分点。在持续学习实验中，本文方法在增量学习场景下平均准确率达84.22%，较全量重训方法上升0.78个百分点，且训练耗时仅为后者的46.6%，有效兼顾了新知识吸纳与旧知识保持。

关键词: 关键词: 私域数据问答, 实体消歧, 持续学习, 大小模型协同, 经验回放

CLC Number:

崔智源张钧波. 面向私域数据问答的持续学习型实体消歧方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2026020209.

[1]	Yongdi LI, Caihong LI, Yaoyu ZHANG, Guosheng ZHANG. Mobile robot path planning based on improved SAC algorithm [J]. Journal of Computer Applications, 2023, 43(2): 654-660.
[2]	CHEN Bo, WANG Jinyan. Design of experience-replay module with high performance [J]. Journal of Computer Applications, 2019, 39(11): 3242-3249.
[3]	. Research on continuous learning of anomaly detection pattern [J]. Journal of Computer Applications, 2006, 26(11): 2615-2617.