Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (10): 3011-3017.DOI: 10.11772/j.issn.1001-9081.2021091565

• Artificial intelligence • Previous Articles    

Named entity recognition based on BERT and joint learning for judgment documents

Lanlan ZENG, Yisong WANG, Panfeng CHEN   

  1. College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
  • Received:2021-09-03 Revised:2021-12-02 Accepted:2022-01-04 Online:2022-04-15 Published:2022-10-10
  • Contact: Yisong WANG
  • About author:ZENG Lanlan, born in 1997, M. S. candidate. Her research interests include natural language processing, knowledge representation and reasoning.
    WANG Yisong, born in 1975, Ph. D. , professor. His research interests include knowledge representation and reasoning, artificial intelligence, machine learning.
    CHEN Panfeng, born in 1982, Ph. D. candidate. His research interests include artificial intelligence, pattern recognition, knowledge representation and reasoning.
  • Supported by:
    National Natural Science Foundation of China(U1836205)

基于BERT和联合学习的裁判文书命名实体识别

曾兰兰, 王以松, 陈攀峰   

  1. 贵州大学 计算机科学与技术学院,贵阳 550025
  • 通讯作者: 王以松
  • 作者简介:第一联系人:曾兰兰(1997—),女,贵州毕节人,硕士研究生,主要研究方向:自然语言处理、知识表示与推理
    王以松(1975—),男(土家族),贵州铜仁人,教授,博士,主要研究方向:知识表示与推理、人工智能、机器学习;yswang@gzu.edu.cn
    陈攀峰(1982—),男,湖北黄冈人,博士研究生,主要研究方向:人工智能、模式识别、知识表示与推理。
  • 基金资助:
    国家自然科学基金资助项目(U1836205)

Abstract:

Correctly identifying the entities in judgment documents is an important foundation for building legal knowledge graph and realizing smart courts. However, commonly used Named Entity Recognition (NER) models cannot solve the problem of polysemous word representation and entity boundary recognition errors in judgment document well. In order to effectively improve the recognition effect of various entities in the judgment documents, a Bidirectional Long Short-Term Memory with a sequential Conditional Random Field (BiLSTM-CRF) based on Joint Learning and BERT (Bidirectional Encoder Representation from Transformers) (JLB-BiLSTM-CRF) model was proposed. Firstly, the input character sequence was encoded by BERT to enhance the representation ability of word vectors. Then, the long text information was modeled by BiLSTM network, and the NER tasks and Chinese Word Segmentation (CWS) tasks were jointly trained to improve the boundary recognition rate of entities. Experimental results show that this model has the precision of 94.36%, the recall of 94.94%, and the F1 score of 94.65% on the test set, which are 1.05 percentage points, 0.48 percentage points and 0.77 percentage points higher than those of BERT-BiLSTM-CRF model respectively, verifying the effectiveness of JLB-BiLSTM-CRF model in NER tasks for judgment documents.

Key words: judgment document, Bidirectional Long Short-Term Memory (BiLSTM) network, BERT (Bidirectional Encoder Representation from Transformers), joint learning, Named Entity Recognition (NER)

摘要:

正确识别裁判文书中的实体是构建法律知识图谱和实现智慧法院的重要基础。然而常用的命名实体识别(NER)模型并不能很好地解决裁判文书中的多义词表示和实体边界识别错误的问题。为了有效提升裁判文书中各类实体的识别效果,提出了一种基于联合学习和BERT的BiLSTM-CRF(JLB-BiLSTM-CRF)模型。首先,利用BERT对输入字符序列进行编码以增强词向量的表征能力;然后,使用双向长短期记忆(BiLSTM)网络建模长文本信息,并将NER任务和中文分词(CWS)任务进行联合训练以提升实体的边界识别率。实验结果表明,所提模型在测试集上的精确率达到了94.36%,召回率达到了94.94%,F1值达到了94.65%,相较于BERT-BiLSTM-CRF模型分别提升了1.05个百分点、0.48个百分点和0.77个百分点,验证了JLB-BiLSTM-CRF模型在裁判文书NER任务上的有效性。

关键词: 裁判文书, 双向长短期记忆网络, BERT, 联合学习, 命名实体识别

CLC Number: