Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (3): 993-1003.DOI: 10.11772/j.issn.1001-9081.2025030334

• Frontier and comprehensive applications • Previous Articles     Next Articles

Intelligent undergraduate teaching evaluation system based on large language models

Bin SHEN1,2, Xiaoning CHEN1(), Hua CHENG2,3, Yiquan FANG1, Huifeng WANG2   

  1. 1.Informatization and Data Management Center,East China University of Science and Technology,Shanghai 200237,China
    2.School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
    3.Undergraduate Academic Affairs Office,East China University of Science and Technology,Shanghai 200237,China
  • Received:2025-04-01 Revised:2025-05-08 Accepted:2025-05-09 Online:2025-05-13 Published:2026-03-10
  • Contact: Xiaoning CHEN
  • About author:SHEN Bin, born in 1989, M. S., engineer. His research interests include deep learning, artificial intelligence and its application.
    CHENG Hua, born in 1975, Ph. D., professor. His research interests include natural language processing, artificial intelligence and its application.
    FANG Yiquan, born in 1975, M. S., professor-level senior engineer. Her research interests include information security.
    WANG Huifeng, born in 1969, Ph. D., professor. Her research interests include intelligent perception.
  • Supported by:
    Ministry of Education Project: Research on the Digital and Intelligent Transformation of University Evaluation(P250603)

基于大语言模型的本科教学评估智能系统

沈斌1,2, 陈晓宁1(), 程华2,3, 房一泉1, 王慧锋2   

  1. 1.华东理工大学 信息化与数据管理中心,上海 200237
    2.华东理工大学 信息科学与工程学院,上海 200237
    3.华东理工大学 教务处,上海 200237
  • 通讯作者: 陈晓宁
  • 作者简介:沈斌(1989—),男,浙江绍兴人,工程师,硕士,主要研究方向:深度学习、人工智能及其应用
    程华(1975—),男,安徽黄山人,教授,博士,主要研究方向:自然语言处理、人工智能及其应用
    房一泉(1975—),女,江苏扬州人,正高级工程师,硕士,主要研究方向:信息安全
    王慧锋(1969—),女,黑龙江哈尔滨人,教授,博士,主要研究方向:智能感知。
  • 基金资助:
    教育部课题-高校评估数智化研究(P250603)

Abstract:

As a critical way of higher education quality assurance, the scientific and rational implementation of undergraduate teaching audit and evaluation impacts the level of talent cultivation in universities directly. However, traditional manual review modes are inefficient and subjective when faced with massive heterogeneous data, making it difficult to meet the demands for accuracy and standardization in undergraduate teaching evaluation. Therefore, an intelligent undergraduate teaching evaluation system based on Large Language Models (LLMs) and multi-agent architecture — SmartEval — was proposed. In the system, input contents were parsed through a semantic understanding module, the tasks were decomposed and scheduled using a planner, and a Retrieval-Augmented Generation (RAG) module was integrated with three types of agents: question-answering, summarization, and diagnostics to realize end-to-end automation of the entire process of “data collection-metric analysis-decision support”. Experimental results based on the “1+3+3” series reports on 2023 undergraduate teaching evaluation of selected universities demonstrate that SmartEval outperforms the existing mainstream LLMs, such as GLM-4 and Qwen2.5, in metrics such as question-answering accuracy, ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation L) score for summarization, and F1-value for diagnostics significantly. Furthermore, consistency tests with expert groups validate the reliability of SmartEval results.

Key words: undergraduate teaching evaluation, Large Language Model (LLM), agent, digital intelligence, educational informatization

摘要:

本科教学审核评估作为高等教育质量保障的重要手段,它的科学合理实施直接影响高校人才培养水平。然而,传统的人工审阅模式在面对海量异构数据时效率低下且主观性强,难以满足本科教学评估对精准性和标准化的需求。因此,提出一种基于大语言模型(LLM)和多智能体架构的本科教学评估系统——智评宝(SmartEval)。该系统通过语义理解模块解析输入内容,并利用计划器进行任务分解与调度,同时结合检索增强生成(RAG)模块及问答、摘要与诊断三类智能体实现对“数据采集-指标分析-决策支持”全流程的自动化处理。在2023年度部分高校本科教学评估的“1+3+3”系列报告基础上的实验结果表明,与GLM-4和Qwen2.5等主流LLM相比,SmartEval在问答准确率、摘要ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation L)值以及诊断F1值等指标上均表现出显著优势。进一步地,通过与专家组的一致性检验比对,验证了SmartEval结果的可靠性。

关键词: 本科教学评估, 大语言模型, 智能体, 数智化, 教育信息化

CLC Number: