Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (8): 2205-2211.DOI: 10.11772/j.issn.1001-9081.2020101572

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Automated English essay scoring method based on multi-level semantic features

ZHOU Xianbing1, FAN Xiaochao1,2, REN Ge1, YANG Yong1   

  1. 1. College of Computer Science and Technology, Xinjiang Normal University, Urumqi Xinjiang 830054, China;
    2. School of Computer Science and Technology, Dalian University of Technology, Dalian Liaoning 116024, China
  • Received:2020-10-12 Revised:2021-01-22 Online:2021-08-10 Published:2021-01-27
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (62066044), the Scientific Research Program of Colleges and Universities in Xinjiang Uygur Autonomous Region (XJEDU2016S066).

基于多层次语义特征的英文作文自动评分方法

周险兵1, 樊小超1,2, 任鸽1, 杨勇1   

  1. 1. 新疆师范大学 计算机科学技术学院, 乌鲁木齐 830054;
    2. 大连理工大学 计算机科学与技术学院, 辽宁 大连 116024
  • 通讯作者: 杨勇
  • 作者简介:周险兵(1995-),男,湖北黄冈人,硕士研究生,CCF会员,主要研究方向:自然语言处理、作文评分;樊小超(1982-),男(锡伯族),新疆塔城人,讲师,博士,主要研究方向:文本情感分析;任鸽(1986-),女,河南兰考人,讲师,硕士,主要研究方向:数据挖掘、网络信息安全;杨勇(1979-),男,陕西汉中人,副教授,博士,主要研究方向:自然语言处理、软件工程。
  • 基金资助:
    国家自然科学基金资助项目(62066044);新疆维吾尔自治区高等学校科研计划项目(XJEDU2016S066)。

Abstract: The Automated Essay Scoring (AES) technology can automatically analyze and score the essay, and has become one of the hot research problems in the application of natural language processing technology in the education field. Aiming at the current AES methods that separate deep and shallow semantic features, and ignore the impact of multi-level semantic fusion on essay scoring, a neural network model based on Multi-Level Semantic Features (MLSF) was proposed for AES. Firstly, Convolutional Neural Network (CNN) was used to capture local semantic features, and the hybrid neural network was used to capture global semantic features, so that the essay semantic features were obtained from a deep level. Secondly, the feature of the topic layer was obtained by using the essay topic vector of text level. At the same time, aiming at the grammatical errors and language richness features that are difficult to mine by deep learning model, a small number of artificial features were constructed to obtain the linguistic features of the essay from the shallow level. Finally, the essay was automatically scored through the feature fusion. Experimental results show that the proposed model improves the performance significantly on all subsets of the public dataset of the Kaggle ASAP (Automated Student Assessment Prize) champion, with the average Quadratic Weighted Kappa (QWK) of 79.17%, validating the effectiveness of the model in the AES tasks.

Key words: English essay, Automated Essay Scoring (AES), Multi-Level Semantic Feature (MLSF), deep semantic understanding, feature fusion, natural language processing

摘要: 作文自动评分(AES)技术能够自动地对作文进行分析和评分,其已成为自然语言处理技术在教育领域应用的热点研究问题之一。针对目前AES方法割裂了深层和浅层语义特征,忽视了多层次语义融合对作文评分影响的问题,提出了一种基于多层次语义特征的神经网络(MLSF)模型进行AES。首先,采用卷积神经网络(CNN)捕获局部语义特征,并采用混合神经网络捕获全局语义特征,以从深层次获取作文的语义特征;其次,利用篇章级的作文主题向量来获取主题层特征,同时针对深度学习模型难以挖掘的语法错误和语言丰富程度特征,构造了少量人工特征以从浅层获取作文的语言学特征;最后,通过特征融合对作文进行自动评分。实验结果表明,所提出模型在Kaggle ASAP竞赛公开数据集的所有子集上性能均有显著提升,该模型的平均二次加权的卡帕值(QWK)达到79.17%,验证了该模型在AES任务中的有效性。

关键词: 英文作文, 作文自动评分, 多层语义特征, 深层语义理解, 特征融合, 自然语言处理

CLC Number: