《计算机应用》唯一官方网站

• •    下一篇

融合知识增强和对比学习的高中英语阅读理解模型

王昱麒1,张仰森2,王璞1   

  1. 1. 北京信息科技大学
    2. 北京信息科技大学 计算机学院,北京 100192
  • 收稿日期:2025-08-04 修回日期:2025-09-09 发布日期:2025-11-05 出版日期:2025-11-05
  • 通讯作者: 王昱麒

High school English reading comprehension model integrating knowledge augmentation and contrastive learning

  • Received:2025-08-04 Revised:2025-09-09 Online:2025-11-05 Published:2025-11-05

摘要: 为解决高中英语多项选择教育考试模型应用中信息过载、缺乏内容匹配和解释性不足的问题,提出一种融合知识增强和对比学习策略的多项选择阅读理解模型(KCL-MCRC)。首先使用大语言模型(Large Language Models)进行文章摘要生成和线索句子提取,通过知识增强减少无关信息的输入。其次利用BERT(Bidirectional Encoder Representations from Transformers)模型提取文章、问题和选项的嵌入表示。再次采用平均池化、绝对差值的对比学习策略获取问题、选项与文章的信息差异向量,与原嵌入表示拼接,共同作用于答案选择。最后,使用LLM进行问题类型标注,通过评估KCL-MCRC模型在各种问题类型上的答题准确率分析模型性能提升的原因,加强验证本文模型的有效性。在RACE-H数据集上开展了模型对比实验、KCL-MCRC模型消融实验、问题类型效果对比实验和KCL-MCRC模型的知识增强及对比学习模块在各问题类型上的消融实验。结果表明,KCL-MCRC模型在验证集和测试集上的整体准确率达到了65.75%、64.12%,相比次优模型MDT (Multi-Decision-Transformer)和MMA(Multi-stage Maximization Attention)分别提高1.02、0.88和0.35、0.92个百分点,取得了最佳性能。知识增强和对比学习对模型性能提升都具有重要意义,在逻辑推理、主旨概括和作者态度这三种问题类型上的准确率提升明显,验证了针对问题类型进行模块设计的有效性。

Abstract: To address the issues of information overload, lack of content matching and insufficient interpretability of the educational exam’s models of high school English multiple-choice reading comprehension, the model integrating knowledge augmentation and contrastive learning (KCL-MCRC) was proposed. Firstly, the Large Language Model (LLM) was employed to generate summary and extract clues from article for knowledge augmentation to reduce the inputs overload. Subsequently, the embedded vectors of article, question and options were represented by Bidirectional Encoder Representations from Transformers (BERT). The information difference vector was obtained through contrastive learning strategy combining mean pooling and absolute difference, which were subsequently concatenated with inputted embedded vectors to jointly impact the answer. Finally, the performance of KCL-MCRC was evaluated on each question type annotated by LLM to analysis the raising reason. The series of experiments were conducted on RACE-H dataset, including model comparisons, an ablation study of KCL-MCRC model, the effectiveness comparisons across question types, and an ablation study of its knowledge enhancement and contrastive learning modules for each question type. The results indicate that KCL-MCRC achieves optimal overall accuracies of 65.75% on validation set and 64.12% on test set, surpassing the second-best model called Multi-Decision-Transformer (MDT) and Multi-stage Maximization Attention (MMA) by 1.02, 0.88 and 0.35, 0.92 percentage points, respectively. Both knowledge enhancement and contrastive learning are crucial for model’s performance improvement, with significant accuracy gains in three question types: logical reasoning, main idea summarization, and author’s attitude. This validates effectiveness of designing modules specifically for different question types.

中图分类号: