Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network
Bin LI, Min LIN, Siriguleng, Yingjie GAO, Yurong WANG, Shujun ZHANG
Journal of Computer Applications    2025, 45 (1): 75-81.   DOI: 10.11772/j.issn.1001-9081.2023121843
Abstract170)   HTML4)    PDF (1437KB)(113)       Save

Joint entity-relation extraction methods based on “pre-training + fine-tuning” paradigm rely on large-scale annotated data. In the small sample scenarios of ancient Chinese books where data annotation is difficult and costly, the fine-tuning efficiency is low and the extraction performance is poor; entity nesting and relation overlapping problems are common in ancient Chinese books, which limit the effect of joint entity-relation extraction; pipeline extraction methods have error propagation problems, which affect the extraction effect. In response to the above problems, a joint entity-relation extraction method for ancient Chinese books based on prompt learning and global pointer network was proposed. Firstly, the prompt learning method of span extraction reading comprehension was used to inject domain knowledge into the Pre-trained Language Model (PLM) to unify the optimization goals of pre-training and fine-tuning, and the input sentences were encoded. Then, the global pointer networks were used to predict and jointly decode the boundaries of subject and object and the boundaries of subject and object of different relationships, so as to align into entity-relation triples, and complete the construction of PTBG (Prompt Tuned BERT with Global pointer) model. As the results, the problem of entity nesting and relation overlapping was solved, and the error propagation problem of pipeline decoding was avoided. Finally, based on the above work, the influence of different prompt templates on extraction performance was analyzed. Experimental results on Records of the Grand Historian dataset show that compared with OneRel model before and after injecting domain knowledge, the PTBG model has the F1-value increased by 1.64 and 1.97 percentage points respectively. It can be seen that the PTBG model can better extract entity-relation jointly in ancient Chinese books, and provides new research ideas and approaches for low-resource, small-sample deep learning scenarios.

Table and Figures | Reference | Related Articles | Metrics
Prompt learning method for ancient text sentence segmentation and punctuation based on span-extracted prototypical network
Yingjie GAO, Min LIN, Siriguleng, Bin LI, Shujun ZHANG
Journal of Computer Applications    2024, 44 (12): 3815-3822.   DOI: 10.11772/j.issn.1001-9081.2023121719
Abstract118)   HTML4)    PDF (1509KB)(33)       Save

In view of the phenomenon that automatic sentence segmentation and punctuation task in ancient book information processing relies on large-scale annotated corpora, and considering that training high-quality, large-scale samples is expensive and these samples are difficult to obtain, a prompt learning method for ancient text sentence segmentation and punctuation based on span-extracted prototypical network was proposed. Firstly, structured prompt information was incorporated into the support set to form an effective prompt template, so as to improve the model's learning efficiency. Then, combined with a punctuation position extractor and a prototype network classifier, the misjudgment impact and the interference from non-punctuation labels in traditional sequence labeling method were effectively reduced. Experimental results show that on Records of the Grand Historian dataset, the F1 score of the proposed method is 2.47 percentage points higher than that of the Siku-BERT-BiGRU-CRF (Siku - Bidirectional Encoder Representation from Transformer - Bidirectional Gated Recurrent Unit - Conditional Random Field) method. In addition, on the public multi-domain ancient text dataset CCLUE, the precision and F1 score of this method reach 91.60% and 93.12% respectively, indicating that the method can perform sentence segmentation and punctuation in multi-domain ancient text effectively and automatically by using a small number of training samples. Therefore, the proposed method offers new thought and approach for conducting in-depth research on automatic sentence segmentation and punctuation, as well as for enhancing the model's learning efficiency, in multi-domain ancient text.

Table and Figures | Reference | Related Articles | Metrics