Chinese automated essay scoring based on joint learning of multi-scale features using graph neural network

doi:10.11772/j.issn.1001-9081.2025020228

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (2): 378-385.DOI: 10.11772/j.issn.1001-9081.2025020228

• Artificial intelligence • Previous Articles

Chinese automated essay scoring based on joint learning of multi-scale features using graph neural network

Hongjian WEN¹, Ruijiao HU¹, Baowen WU¹, Jiaxing SUN², Huan LI¹, Qing ZHANG², Jie LIU²^,³()

^1.School of Artificial Intelligence，Wenshan University，Wenshan Yunnan 663000，China
^2.School of Artificial Intelligence and Computer Science，North China University of Technology，Beijing 100144，China
^3.Research Center for Language Intelligence of China （Capital Normal University），Beijing 100089，China

Received:2025-03-10 Revised:2025-06-06 Accepted:2025-06-10 Online:2025-08-08 Published:2026-02-10
Contact: Jie LIU
About author:WEN Hongjian， born in 1982， M. S.， lecturer. His research interests include natural language processing.
HU Ruijiao， born in 1985， M. S.， lecturer. Her research interests include natural language processing.
WU Baowen， born in 1980， M. S.， associate professor. Her research interests include natural language processing.
SUN Jiaxing， born in 1999， M. S. candidate. His research interests include natural language processing.
LI Huan， born in 1963， associate professor. Her research interests include natural language processing.
ZHANG Qing， born in 1984， Ph. D.， lecturer. His research interests include natural language processing.
LIU Jie， born in 1970， Ph. D.， professor. His research interests include natural language processing. Email:liujxxxy@126.com
Supported by:
National Science and Technology Innovation 2030 — Major Project of “New Generation Artificial Intelligence”(2020AAA0109700);National Natural Science Foundation of China(62076167);Beijing Natural Science Foundation(4252035)

基于图神经网络实现多尺度特征联合学习的中文作文自动评分

文洪建¹, 胡瑞娇¹, 吴保文¹, 孙家兴², 李环¹, 张晴², 刘杰²^,³()

^1.文山学院人工智能学院，云南文山 663000
^2.北方工业大学人工智能与计算机学院，北京 100144
^3.国家语委中国语言智能研究中心（首都师范大学），北京 100089

通讯作者: 刘杰
作者简介:文洪建（1982—），男，四川岳池人，讲师，硕士，主要研究方向：自然语言处理
胡瑞娇（1985—），女，云南通海人，讲师，硕士，主要研究方向：自然语言处理
吴保文（1980—），女，云南文山人，副教授，硕士，主要研究方向：自然语言处理
孙家兴（1999—），男，河北保定人，硕士研究生，主要研究方向：自然语言处理
李环（1963—），女，湖南益阳人，副教授，主要研究方向：自然语言处理
张晴（1984—），男，北京人，讲师，博士，主要研究方向：自然语言处理
刘杰（1970—），男，江苏丰县人，教授，博士，CCF会员，主要研究方向：自然语言处理。 Email:liujxxxy@126.com
基金资助:
国家科技创新2030 — “新一代人工智能”重大项目(2020AAA0109700);国家自然科学基金资助项目(62076167);北京市自然科学基金资助项目(4252035)

Abstract

Abstract:

The existing Automated Essay Scoring （AES） methods based on Pre-trained Language Model （PLM） tend to use global semantic features extracted from PLM directly to represent essay quality， while neglecting the associations between essay quality and more fine-grained features. In order to solve the problem， focused on Chinese AES research， the quality of essays was analyzed and evaluated from various textual perspectives， and a Chinese AES method was proposed by learning multi-scale essay features jointly using Graph Neural Network （GNN）. Firstly， discourse features at both the sentence level and paragraph level were extracted by utilizing GNN. Then， joint feature learning was performed on these discourse features and the global semantic features of essays， so as to achieve more accurate scoring of essays. Finally， a Chinese AES dataset was constructed to provide a data foundation for Chinese AES research. Experimental results on the constructed dataset show that the proposed method has an improvement of 1.1 percentage points in average Quadratic Weighted Kappa （QWK） coefficient across six essay topics compared to R2-BERT（Bidirectional Encoder Representations from Transformers model with Regression and Ranking）， validating the effectiveness of joint multi-scale feature learning in AES tasks. Meanwhile， ablation experimental results further demonstrate the contribution of essay features at different scales to scoring effect. To prove the superiority of small models in specific task scenarios， a comparison was conducted with the currently popular large language models， GPT-3.5-turbo and DeepSeek-V3. The results show that BERT （Bidirectional Encoder Representations from Transformers） model using the proposed method has the average QWK across six essay topics 65.8 and 45.3 percentage points higher than GPT-3.5-turbo and DeepSeek-V3， respectively， validating the observation that Large Language Models （LLMs） underperform in domain-specific discourse-level essay scoring tasks due to the lack of large-scale supervised fine-tuning data.

Key words: Chinese Automated Essay Scoring (AES), pre-trained language model, graph neural network, Chinese AES dataset, multi-feature learning

摘要：

现有基于预训练语言模型（PLM）的作文自动评分（AES）方法偏向于直接使用从PLM提取的全局语义特征表示作文的质量，却忽略了作文质量与更细粒度特征关联关系的问题。聚焦于中文AES研究，从多种文本角度分析和评估作文质量，提出利用图神经网络（GNN）对作文的多尺度特征进行联合学习的中文AES方法。首先，利用GNN分别获取作文在句子级别和段落级别的篇章特征；然后，将这些篇章特征与作文的全局语义特征进行联合特征学习，实现对作文更精准的评分；最后，构建一个中文AES数据集，为中文AES研究提供数据基础。在所构建的数据集上的实验结果表明，所提方法在6个作文主题上的平均二次加权Kappa（QWK）系数相较于R2-BERT（Bidirectional Encoder Representations from Transformers model with Regression and Ranking）提升了1.1个百分点，验证了在AES任务中进行多尺度特征联合学习的有效性。同时，消融实验结果进一步表明了不同尺度的作文特征对评分效果的贡献。为了证明小模型在特定任务场景下的优越性，与当前流行的通用大语言模型GPT-3.5-turbo和DeepSeek-V3进行了对比。结果表明，使用所提方法的BERT（Bidirectional Encoder Representations from Transformers）模型在6个作文主题上的平均QWK比GPT-3.5-turbo和DeepSeek-V3分别高出了65.8和45.3个百分点，验证了大语言模型（LLMs）在面向领域的篇章级作文评分任务中，因缺乏大规模有监督微调数据而表现不佳的观点。

关键词: 中文作文自动评分, 预训练语言模型, 图神经网络, 中文作文自动评分数据集, 多特征学习

CLC Number:

TP391

Hongjian WEN, Ruijiao HU, Baowen WU, Jiaxing SUN, Huan LI, Qing ZHANG, Jie LIU. Chinese automated essay scoring based on joint learning of multi-scale features using graph neural network[J]. Journal of Computer Applications, 2026, 46(2): 378-385.

文洪建, 胡瑞娇, 吴保文, 孙家兴, 李环, 张晴, 刘杰. 基于图神经网络实现多尺度特征联合学习的中文作文自动评分[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 378-385.

Figures/Tables 7

References 27

[1]	WANG Y， WANG C， LI R， et al. On the use of BERT for automated essay scoring： joint learning of multi-scale essay representation［C］// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2022： 3416-3425.
[2]	KATTENBORN T， LEITLOFF J， SCHIEFER F， et al. Review on Convolutional Neural Networks （CNN） in vegetation remote sensing［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2021， 173： 24-49.
[3]	DU J， VONG C M， CHEN C L P. Novel efficient RNN and LSTM-like architectures： recurrent and gated broad learning systems and their applications for text classification［J］. IEEE Transactions on Cybernetics， 2021， 51（3）： 1586-1597.
[4]	DONG F， ZHANG Y， YANG J. Attention-based recurrent convolutional neural network for automatic essay scoring［C］// Proceedings of the 21st Conference on Computational Natural Language Learning. Stroudsburg： ACL， 2017： 153-162.
[5]	PRABHU S， AKHILA K， SANRIYA S. A hybrid approach towards automated essay evaluation based on BERT and feature engineering［C］// Proceedings of the IEEE 7th International Conference for Convergence in Technology. Piscataway： IEEE， 2022： 1-4.
[6]	YANG R， CAO J， WEN Z， et al. Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking［C］// Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg： ACL， 2020： 1560-1569.
[7]	UTO M， XIE Y， UENO M. Neural automated essay scoring incorporating handcrafted features［C］// Proceedings of the 28th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2020： 6077-6088.
[8]	HE Y， JIANG F， CHU X， et al. Automated Chinese essay scoring from multiple traits［C］// Proceedings of the 29th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2022： 3007-3016.
[9]	SULTAN M A， SALAZAR C， SUMNER T. Fast and easy short answer grading with high accuracy［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 1070-1075.
[10]	MATHIAS S， BHATTACHARYYA P. Thank “goodness”！ A way to measure style in student essays［C］// Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications. Stroudsburg： ACL， 2018： 35-41.
[11]	SALIM Y， STEVANUS V， BARLIAN E， et al. Automated English digital essay grader using machine learning［C］// Proceedings of the 2019 IEEE International Conference on Engineering， Technology and Education. Piscataway： IEEE， 2019： 1-6.
[12]	CONTRERAS J O， HILLES S， ABUBAKAR Z B. Automated essay scoring with ontology based on text mining and NLTK tools［C］// Proceedings of the 2018 International Conference on Smart Computing and Electronic Enterprise. Piscataway： IEEE， 2018： 1-6.
[13]	VAJJALA S. Automated assessment of non-native learner essays： Investigating the role of linguistic features［J］. International Journal of Artificial Intelligence in Education， 2018， 28（1）： 79-105.
[14]	LIANG G， ON B W， JEONG D， et al. Automated essay scoring： a Siamese bidirectional LSTM neural network architecture［J］. Symmetry， 2018， 10（12）： No.682.
[15]	XIE J， CAI K， KONG L， et al. Automated essay scoring via pairwise contrastive regression［C］// Proceedings of the 29th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2022： 2724-2733.
[16]	BOQUIO E N V， NAVAL P C， Jr. Beyond canonical fine-tuning： leveraging hybrid multi-layer pooled representations of BERT for automated essay scoring［C］// Proceedings of the 2024 Joint International Conference on Computational Linguistics， Language Resources and Evaluation. Stroudsburg： ACL， 2024： 2285-2295.
[17]	YANG K， RAKOVIĆ M， LI Y， et al. Unveiling the tapestry of automated essay scoring： a comprehensive investigation of accuracy， fairness， and generalizability［C］// Proceedings of the 38th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2024： 22466-22474.
[18]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[19]	CAO Y， JIN H， WAN X， et al. Domain-adaptive neural automated essay scoring［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 1011-1020.
[20]	LI I， FENG A， RADEV D， et al. HiPool： modeling long documents using graph neural networks［C］// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg： ACL， 2023： 161-171.
[21]	ZHAO J， ZHANG Y， ZHANG Q， et al. Actively supervised clustering for open relation extraction［C］// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2023： 4985-4997.
[22]	ZENG J， JIANG Y， YIN Y， et al. Soft language clustering for multilingual model pre-training［C］// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2023： 7021-7035.
[23]	ALIKANIOTIS D， YANNAKOUDAKIS H， REI M. Automatic text scoring using neural networks［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2016： 715-725.
[24]	XIAO C， MA W， SONG Q， et al. Human-AI collaborative essay scoring： a dual-process framework with LLMs［C］// Proceedings of the 15th International Learning Analytics and Knowledge Conference. New York： ACM， 2025： 293-305.
[25]	张春云，赵洪焱，邓纪芹，等. 基于类别对抗联合学习的跨提示自动作文评分方法［J］. 计算机研究与发展， 2025， 62（5）： 1190-1204.
	ZHANG C Y， ZHAO H Y， DENG J Q， et al. Category adversarial joint learning method for cross-prompt automated essay scoring［J］. Journal of Computer Research and Development， 2025， 62（5）： 1190-1204.
[26]	李鑫尧，李晶晶，朱磊，等. 资源受限的大模型高效迁移学习算法研究综述［J］. 计算机学报， 2024， 47（11）： 2491-2521.
	LI X Y， LI J J， ZHU L， et al. Efficient transfer learning of large models with limited resources： a survey［J］. Chinese Journal of Computers， 2024， 47（11）： 2491-2521.
[27]	陈宇航，杨勇，帕力旦·吐尔逊，等. 融合句法特征与语义特征的作文自动评分方法［J］. 计算机与现代化， 2024（11）： 64-69.
	CHEN Y H， YANG Y， PALIDAN Tuerxun， et al. Integrating syntactic and semantic features for automated essay scoring［J］. Computer and Modernization， 2024（11）： 64-69.

题目	年级	分数范围	样本数	作文平均长度
我们是一家人	初一	5~34	587	845
在尝试中成长	初一	5~35	589	598
被误解之后	初二	2~32	527	603
谢谢你，使我成为更好的自己	初二	5~32	522	598
发光	初三	3~50	448	628
读书与成长	初三	6~50	478	634

题目	年级	分数范围	样本数	作文平均长度
我们是一家人	初一	5~34	587	845
在尝试中成长	初一	5~35	589	598
被误解之后	初二	2~32	527	603
谢谢你，使我成为更好的自己	初二	5~32	522	598
发光	初三	3~50	448	628
读书与成长	初三	6~50	478	634

模型	prompt1	prompt2	prompt3	prompt4	prompt5	prompt6	AVE
HCNN	0.800	0.832	0.833	0.781	0.816	0.822	0.814
CNN-LSTM	0.817	0.849	0.843	0.795	0.827	0.837	0.828
CNN-LSTM-Att	0.828	0.852	0.851	0.804	0.838	0.847	0.837
BERT	0.834	0.872	0.863	0.829	0.852	0.861	0.852
R2-BERT	0.831	0.897	0.870	0.836	0.847	0.873	0.859
BERT（本文方法）	0.850	0.898	0.884	0.845	0.861	0.882	0.870
Ro-BERT（本文方法）	0.845	0.897	0.886	0.849	0.869	0.879	0.869
GPT-3.5-turbo	0.265	0.279	0.238	0.147	0.175	0.169	0.212
DeepSeek-V3	0.541	0.389	0.540	0.414	0.360	0.255	0.417

模型	prompt1	prompt2	prompt3	prompt4	prompt5	prompt6	AVE
HCNN	0.800	0.832	0.833	0.781	0.816	0.822	0.814
CNN-LSTM	0.817	0.849	0.843	0.795	0.827	0.837	0.828
CNN-LSTM-Att	0.828	0.852	0.851	0.804	0.838	0.847	0.837
BERT	0.834	0.872	0.863	0.829	0.852	0.861	0.852
R2-BERT	0.831	0.897	0.870	0.836	0.847	0.873	0.859
BERT（本文方法）	0.850	0.898	0.884	0.845	0.861	0.882	0.870
Ro-BERT（本文方法）	0.845	0.897	0.886	0.849	0.869	0.879	0.869
GPT-3.5-turbo	0.265	0.279	0.238	0.147	0.175	0.169	0.212
DeepSeek-V3	0.541	0.389	0.540	0.414	0.360	0.255	0.417

模型	prompt1	prompt2	prompt3	prompt4	prompt5	prompt6
仅文档级表示	0.834	0.872	0.863	0.829	0.852	0.861
BERT w/o段落级表示（本文方法）	0.838	0.879	0.868	0.832	0.848	0.867
BERT w/o 词级表示（本文方法）	0.841	0.887	0.875	0.840	0.852	0.875
BERT（本文方法）	0.850	0.898	0.884	0.845	0.861	0.882

Chinese automated essay scoring based on joint learning of multi-scale features using graph neural network

基于图神经网络实现多尺度特征联合学习的中文作文自动评分

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 27

Related Articles 15

Recommended Articles

Metrics

[1]	Haoqian JIANG, Dong ZHANG, Guanyu LI, Heng CHEN. SetaCRS： Conversational recommender system with structure-enhanced hierarchical task-oriented prompting strategy [J]. Journal of Computer Applications, 2026, 46(2): 368-377.
[2]	Fan HE, Li LI, Zhongxu YUAN, Xiu YANG, Dongxuan HAN. Knowledge tracking model based on concept association memory network with graph attention [J]. Journal of Computer Applications, 2026, 46(1): 43-51.
[3]	Wen LI, Kairong LI, Kai YANG. Subgraph-aware contrastive learning with data augmentation [J]. Journal of Computer Applications, 2026, 46(1): 1-9.
[4]	Xingyao YANG, Zheng QI, Jiong YU, Zulian ZHANG, Shuai MA, Hongtao SHEN. Session-based recommendation model based on time-aware and space-enhanced dual channel graph neural network [J]. Journal of Computer Applications, 2026, 46(1): 104-112.
[5]	Yonghao LIANG, Jinlong LI. Novel message passing network for neural Boolean satisfiability problem solver [J]. Journal of Computer Applications, 2025, 45(9): 2934-2940.
[6]	Chao LIU, Yanhua YU. Knowledge-aware recommendation model combining denoising strategy and multi-view contrastive learning [J]. Journal of Computer Applications, 2025, 45(9): 2827-2837.
[7]	Yanqun LU, Yiyi ZHAO. Customer churn prediction model integrating hierarchical graph neural network and specific feature learning [J]. Journal of Computer Applications, 2025, 45(9): 3057-3066.
[8]	Yi WANG, Yinglong MA. Multi-task social item recommendation method based on dynamic adaptive generation of item graph [J]. Journal of Computer Applications, 2025, 45(8): 2592-2599.
[9]	Wei ZHANG, Jiaxiang NIU, Jichao MA, Qiongxia SHEN. Chinese spelling correction model ReLM enhanced with deep semantic features [J]. Journal of Computer Applications, 2025, 45(8): 2484-2490.
[10]	Yinchuan TU, Yong GUO, Heng MAO, Yi REN, Jianfeng ZHANG, Bao LI. Evaluation of training efficiency and training performance of graph neural network models based on distributed environment [J]. Journal of Computer Applications, 2025, 45(8): 2409-2420.
[11]	Zhiyuan WANG, Tao PENG, Jie YANG. Integrating internal and external data for out-of-distribution detection training and testing [J]. Journal of Computer Applications, 2025, 45(8): 2497-2506.
[12]	Quan JIANG, Wenqing HUANG, Zhiyong GOU. Lagrangian particle flow simulation by equivariant graph neural network [J]. Journal of Computer Applications, 2025, 45(8): 2666-2671.
[13]	Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.
[14]	Zimo ZHANG, Xuezhuan ZHAO. Multi-scale sparse graph guided vision graph neural networks [J]. Journal of Computer Applications, 2025, 45(7): 2188-2194.
[15]	Danyang CHEN, Changlun ZHANG. Multi-scale decorrelation graph convolutional network model [J]. Journal of Computer Applications, 2025, 45(7): 2180-2187.