Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Chinese semantic error recognition model based on hierarchical information enhancement
Yuqi ZHANG, Ying SHA
Journal of Computer Applications    2025, 45 (12): 3771-3778.   DOI: 10.11772/j.issn.1001-9081.2024111694
Abstract52)   HTML1)    PDF (615KB)(319)       Save

The semantic errors in Chinese differ from simple spelling and grammatical errors, as they are more inconspicuous and complex. Chinese Semantic Error Recognition (CSER) aims to determine whether a Chinese sentence contains semantic errors. As a prerequisite task for semantic review, the performance of recognition model is crucial for semantic error correction. To address the issue of CSER models ignoring the differences between syntactic structure and contextual structure when integrating syntactic information, a Hierarchical Information Enhancement Graph Convolutional Network (HIE-GCN) model was proposed to embed the hierarchical information of nodes in the syntactic tree into the context encoder, thereby reducing the gap between syntactic structure and contextual structure. Firstly, a traversal algorithm was used to extract the hierarchical information of nodes in the syntactic tree. Secondly, the hierarchical information was embedded into the BERT (Bidirectional Encoder Representations from Transformers) model to generate character features, the Graph Convolutional Network (GCN) was adopted to utilize these character features for the nodes in the graph, and after graph convolution calculation, the feature vector of the entire sentence was obtained. Finally, a fully connected layer was used for one-class or multi-class semantic error recognition. Results of semantic error recognition and correction experiments conducted on the FCGEC (Fine-grained corpus for Chinese Grammatical Error Correction) and NaCGEC (Native Chinese Grammatical Error Correction) datasets show that, on the FCGEC dataset, in the recognition task, compared with the baseline model: HIE-GCN improves the accuracy by at least 0.10 percentage points and the F1 score by at least 0.13 percentage points in the one-class error recognition; in the multi-class error recognition, the accuracy is improved by at least 1.05 percentage points and the F1 score is improved by at least 0.53 percentage points. Ablation experimental results verify the effectiveness of hierarchical information embedding. Compared with Large Language Models (LLMs) such as GPT and Qwen, the proposed model’s overall performance in recognition is significantly higher. In the correction experiment, compared to the sequence-to-sequence direct error correction model, the recognition-correction two-stage pipeline improves the correction precision by 8.01 percentage points. It is also found that in the correction process of LLM GLM4, providing the model with hints on the sentence’s error type increases the correction precision by 4.62 percentage points.

Table and Figures | Reference | Related Articles | Metrics