《计算机应用》唯一官方网站

• •    下一篇

基于大语言模型的本科教学评估智能系统

沈斌1,2,陈晓宁1,程华2,3,房一泉1,王慧锋2   

  1. 1.华东理工大学 信息化与数据管理中心 2. 华东理工大学 信息科学与工程学院 3. 华东理工大学 教务处
  • 收稿日期:2025-03-31 修回日期:2025-05-07 发布日期:2025-05-13 出版日期:2025-05-13
  • 通讯作者: 陈晓宁
  • 作者简介:沈斌(1989—),男,浙江绍兴人,工程师,硕士,主要研究方向:深度学习、人工智能及其应用;陈晓宁(1983—),男,福建福州人,工程师,硕士,主要研究方向:大语言模型、信息安全;程华(1975—),男,安徽黄山人,教授,博士,主要研究方向:自然语言处理、人工智能及其应用;房一泉(1975—),女,江苏扬州人,正高级工程师,硕士,主要研究方向:信息安全;王慧锋(1969—),女,黑龙江哈尔滨人,教授,博士,主要研究方向:智能感知。
  • 基金资助:
    教育部课题-高校评估数智化研究(P250603)

Intelligent undergraduate teaching evaluation system based on large language models

SHEN Bin1,2, CHEN Xiaoning1, CHENG Hua2,3, FANG Yiquan1, WANG Huifeng2   

  1. 1. Informatization and Data Management Center, East China University of Science and Technology 2. School of Information Science and Engineering, East China University of Science and Technology 3. Undergraduate Academic Affairs Office, East China University of Science and Technology
  • Received:2025-03-31 Revised:2025-05-07 Online:2025-05-13 Published:2025-05-13
  • About author:SHEN Bin, born in 1989, M. S., engineer. His research interests include deep learning, artificial intelligence application. CHEN Xiaoning, born in 1983, M. S., engineer. His research interests include large language models, artificial intelligence application. CHENG Hua, born in 1975, Ph. D., professor. His research interests include natural large processing, artificial intelligence application. FANG Yiquan, born in 1975, M. S., professor-level senior engineer. Her research interests include information security. WANG Huifeng, born in 1969, Ph. D., professor. Her research interests include intelligent perception.
  • Supported by:
    Ministry of Education Project: Research on the Digital and Intelligent Transformation of University Evaluation (P250603)

摘要: 本科教学审核评估作为高等教育质量保障的重要手段,科学合理的实施直接影响高校人才培养水平。然而,传统人工审阅模式在面对海量异构数据时效率低下且主观性强,难以满足本科教学评估对精准性和标准化的需求。为此,提出一种基于大语言模型和多智能体架构的本科教学评估系统——智评宝(SmartEval)。该系统通过语义理解模块解析输入内容,并利用计划器进行任务分解与调度,同时结合检索增强生成模块及问答、摘要与诊断三类智能体,实现了对“数据采集—指标分析—决策支持”全流程的自动化处理。在2023年度部分高校本科教学评估的“1+3+3”系列报告基础上开展的实验结果表明,与GLM-4、qwen2.5等主流大语言模型相比,SmartEval在问答准确率、摘要Rouge-L值,以及诊断F1值等指标上均表现出显著优势。通过与专家组的一致性检验比对,进一步验证了它的结果的可靠性。

关键词: 本科教学评估, 大语言模型, 智能体, 数智化, 教育信息化

Abstract: As a critical component of higher education quality assurance, the scientific and rational implementation of undergraduate teaching audit and evaluation directly impacts the level of talent cultivation in universities. However, traditional manual review models are inefficient and subjective when faced with massive heterogeneous data, making it difficult to meet the demands for accuracy and standardization in undergraduate teaching evaluation. To address this, an intelligent undergraduate teaching evaluation system based on large language models and a multi-agent architecture—SmartEval was proposed. The system parses input content through a semantic understanding module, decomposes and schedules tasks using a planner, and integrates a retrieval-augmented generation module with three types of agents (question-answering, summarization, and diagnostics) to automate the entire process of "data collection—indicator analysis—decision support." Experimental validation based on the "1+3+3" series reports from the 2023 undergraduate teaching evaluation of selected universities demonstrates that SmartEval significantly outperforms existing mainstream large language models, such as GLM-4 and qwen2.5, in metrics such as question-answering accuracy, Rouge-L score for summarization, and F1 score for diagnostics. Furthermore, consistency tests with expert groups further validate the reliability of its results.

Key words: undergraduate teaching evaluation, large language model, agent, digital intelligence, educational informatization

中图分类号: