As a critical way of higher education quality assurance, the scientific and rational implementation of undergraduate teaching audit and evaluation impacts the level of talent cultivation in universities directly. However, traditional manual review modes are inefficient and subjective when faced with massive heterogeneous data, making it difficult to meet the demands for accuracy and standardization in undergraduate teaching evaluation. Therefore, an intelligent undergraduate teaching evaluation system based on Large Language Models (LLMs) and multi-agent architecture — SmartEval — was proposed. In the system, input contents were parsed through a semantic understanding module, the tasks were decomposed and scheduled using a planner, and a Retrieval-Augmented Generation (RAG) module was integrated with three types of agents: question-answering, summarization, and diagnostics to realize end-to-end automation of the entire process of “data collection-metric analysis-decision support”. Experimental results based on the “1+3+3” series reports on 2023 undergraduate teaching evaluation of selected universities demonstrate that SmartEval outperforms the existing mainstream LLMs, such as GLM-4 and Qwen2.5, in metrics such as question-answering accuracy, ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation L) score for summarization, and F1-value for diagnostics significantly. Furthermore, consistency tests with expert groups validate the reliability of SmartEval results.