Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (2): 378-385.DOI: 10.11772/j.issn.1001-9081.2025020228

• Artificial intelligence • Previous Articles    

Chinese automated essay scoring based on joint learning of multi-scale features using graph neural network

Hongjian WEN1, Ruijiao HU1, Baowen WU1, Jiaxing SUN2, Huan LI1, Qing ZHANG2, Jie LIU2,3()   

  1. 1.School of Artificial Intelligence,Wenshan University,Wenshan Yunnan 663000,China
    2.School of Artificial Intelligence and Computer Science,North China University of Technology,Beijing 100144,China
    3.Research Center for Language Intelligence of China (Capital Normal University),Beijing 100089,China
  • Received:2025-03-10 Revised:2025-06-06 Accepted:2025-06-10 Online:2025-08-08 Published:2026-02-10
  • Contact: Jie LIU
  • About author:WEN Hongjian, born in 1982, M. S., lecturer. His research interests include natural language processing.
    HU Ruijiao, born in 1985, M. S., lecturer. Her research interests include natural language processing.
    WU Baowen, born in 1980, M. S., associate professor. Her research interests include natural language processing.
    SUN Jiaxing, born in 1999, M. S. candidate. His research interests include natural language processing.
    LI Huan, born in 1963, associate professor. Her research interests include natural language processing.
    ZHANG Qing, born in 1984, Ph. D., lecturer. His research interests include natural language processing.
    LIU Jie, born in 1970, Ph. D., professor. His research interests include natural language processing. Email:liujxxxy@126.com
  • Supported by:
    National Science and Technology Innovation 2030 — Major Project of “New Generation Artificial Intelligence”(2020AAA0109700);National Natural Science Foundation of China(62076167);Beijing Natural Science Foundation(4252035)

基于图神经网络实现多尺度特征联合学习的中文作文自动评分

文洪建1, 胡瑞娇1, 吴保文1, 孙家兴2, 李环1, 张晴2, 刘杰2,3()   

  1. 1.文山学院 人工智能学院,云南 文山 663000
    2.北方工业大学 人工智能与计算机学院,北京 100144
    3.国家语委中国语言智能研究中心(首都师范大学),北京 100089
  • 通讯作者: 刘杰
  • 作者简介:文洪建(1982—),男,四川岳池人,讲师,硕士,主要研究方向:自然语言处理
    胡瑞娇(1985—),女,云南通海人,讲师,硕士,主要研究方向:自然语言处理
    吴保文(1980—),女,云南文山人,副教授,硕士,主要研究方向:自然语言处理
    孙家兴(1999—),男,河北保定人,硕士研究生,主要研究方向:自然语言处理
    李环(1963—),女,湖南益阳人,副教授,主要研究方向:自然语言处理
    张晴(1984—),男,北京人,讲师,博士,主要研究方向:自然语言处理
    刘杰(1970—),男,江苏丰县人,教授,博士,CCF会员,主要研究方向:自然语言处理。 Email:liujxxxy@126.com
  • 基金资助:
    国家科技创新2030 — “新一代人工智能”重大项目(2020AAA0109700);国家自然科学基金资助项目(62076167);北京市自然科学基金资助项目(4252035)

Abstract:

The existing Automated Essay Scoring (AES) methods based on Pre-trained Language Model (PLM) tend to use global semantic features extracted from PLM directly to represent essay quality, while neglecting the associations between essay quality and more fine-grained features. In order to solve the problem, focused on Chinese AES research, the quality of essays was analyzed and evaluated from various textual perspectives, and a Chinese AES method was proposed by learning multi-scale essay features jointly using Graph Neural Network (GNN). Firstly, discourse features at both the sentence level and paragraph level were extracted by utilizing GNN. Then, joint feature learning was performed on these discourse features and the global semantic features of essays, so as to achieve more accurate scoring of essays. Finally, a Chinese AES dataset was constructed to provide a data foundation for Chinese AES research. Experimental results on the constructed dataset show that the proposed method has an improvement of 1.1 percentage points in average Quadratic Weighted Kappa (QWK) coefficient across six essay topics compared to R2-BERT(Bidirectional Encoder Representations from Transformers model with Regression and Ranking), validating the effectiveness of joint multi-scale feature learning in AES tasks. Meanwhile, ablation experimental results further demonstrate the contribution of essay features at different scales to scoring effect. To prove the superiority of small models in specific task scenarios, a comparison was conducted with the currently popular large language models, GPT-3.5-turbo and DeepSeek-V3. The results show that BERT (Bidirectional Encoder Representations from Transformers) model using the proposed method has the average QWK across six essay topics 65.8 and 45.3 percentage points higher than GPT-3.5-turbo and DeepSeek-V3, respectively, validating the observation that Large Language Models (LLMs) underperform in domain-specific discourse-level essay scoring tasks due to the lack of large-scale supervised fine-tuning data.

Key words: Chinese Automated Essay Scoring (AES), pre-trained language model, graph neural network, Chinese AES dataset, multi-feature learning

摘要:

现有基于预训练语言模型(PLM)的作文自动评分(AES)方法偏向于直接使用从PLM提取的全局语义特征表示作文的质量,却忽略了作文质量与更细粒度特征关联关系的问题。聚焦于中文AES研究,从多种文本角度分析和评估作文质量,提出利用图神经网络(GNN)对作文的多尺度特征进行联合学习的中文AES方法。首先,利用GNN分别获取作文在句子级别和段落级别的篇章特征;然后,将这些篇章特征与作文的全局语义特征进行联合特征学习,实现对作文更精准的评分;最后,构建一个中文AES数据集,为中文AES研究提供数据基础。在所构建的数据集上的实验结果表明,所提方法在6个作文主题上的平均二次加权Kappa(QWK)系数相较于R2-BERT(Bidirectional Encoder Representations from Transformers model with Regression and Ranking)提升了1.1个百分点,验证了在AES任务中进行多尺度特征联合学习的有效性。同时,消融实验结果进一步表明了不同尺度的作文特征对评分效果的贡献。为了证明小模型在特定任务场景下的优越性,与当前流行的通用大语言模型GPT-3.5-turbo和DeepSeek-V3进行了对比。结果表明,使用所提方法的BERT(Bidirectional Encoder Representations from Transformers)模型在6个作文主题上的平均QWK比GPT-3.5-turbo和DeepSeek-V3分别高出了65.8和45.3个百分点,验证了大语言模型(LLMs)在面向领域的篇章级作文评分任务中,因缺乏大规模有监督微调数据而表现不佳的观点。

关键词: 中文作文自动评分, 预训练语言模型, 图神经网络, 中文作文自动评分数据集, 多特征学习

CLC Number: