Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
崔智源,张钧波
通讯作者:
基金资助:
Abstract: To address the issue of ambiguous entity references in domain-specific data question answering caused by the semantic gap between informal queries and database schemas, this paper proposes a Continuous Learning-based Progressive Entity Disambiguation framework (CL-PED). First, a memory-driven rewriting module is designed to transform queries into standardized expressions. Second, a hybrid retrieval strategy is employed to quickly isolate unambiguous candidates. Next, an Execution-Decision architecture featuring collaboration between large and small models is constructed for low-cost, deep ambiguity discrimination. Finally, a continuous learning mechanism based on experience replay is introduced, which transforms user feedback into weakly supervised signals to fine-tune the model, thereby achieving closed-loop evolution. Evaluation on real-world application datasets demonstrates that the proposed method achieves a 7.19 percentage points improvement in query clarity classification accuracy compared to Retrieval-Augmented Generation (RAG) methods. In continuous learning experiments, the proposed framework achieves an average accuracy of 84.22% in incremental learning scenarios. This is 0.78 percentage points higher than the full retraining method, while requiring only 46.6% of the training time, effectively balancing new knowledge absorption with the retention of old knowledge.
Key words: Keywords: private domain data question answering, entity disambiguation, continuous learning, small and large model synergy, experience replay
摘要: 针对私域数据问答中因口语化查询与数据库模式存在语义鸿沟而导致的实体指代不明问题,本文提出一种基于持续学习的渐进式实体消歧方法(Continuous Learning based Progressive Entity Disambiguation framework, CL-PED)。首先,设计记忆驱动的重写模块,将查询转化为规范表达;其次,通过混合检索策略快速过滤无歧义候选;接着,构建大小模型协作的“执行-判决”架构以低成本进行深度歧义判别;最后,引入基于经验回放的持续学习机制,将用户反馈转化为弱监督信号微调模型,实现闭环进化。实际应用场景数据集评测表明,本文方法问题清晰度判别准确率较检索增强生成方法上升7.19个百分点。在持续学习实验中,本文方法在增量学习场景下平均准确率达84.22%,较全量重训方法上升0.78个百分点,且训练耗时仅为后者的46.6%,有效兼顾了新知识吸纳与旧知识保持。
关键词: 关键词: 私域数据问答, 实体消歧, 持续学习, 大小模型协同, 经验回放
CLC Number:
TP181
TP391.1
崔智源 张钧波. 面向私域数据问答的持续学习型实体消歧方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2026020209.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2026020209