Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1460-1467.DOI: 10.11772/j.issn.1001-9081.2025050558

• Artificial intelligence • Previous Articles    

Judicial element extraction method by integrating global and local semantics

Yuqian HUANG1,2,3, Hui HUANG1,2,3, Yongbin QIN1,2,3(), Ruizhang HUANG1,2,3, Yanping CHEN1,2,3, Yulin ZHOU1,2,3, Qian SUN4   

  1. 1.Text Computing and Cognitive Intelligence Engineering Research Center,Ministry of Education (Guizhou University),Guiyang Guizhou 550025,China
    2.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    3.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
    4.School of Information Engineering,Mianyang Teachers' College,Mianyang Sichuan 621000,China
  • Received:2025-05-21 Revised:2025-06-16 Accepted:2025-06-26 Online:2025-07-10 Published:2026-05-10
  • Contact: Yongbin QIN
  • About author:HUANG Yuqian, born in 2001, M. S. candidate. Her research interests include natural language processing, information extraction.
    HUANG Hui, born in 1994, Ph. D. His research interests include natural language processing, intelligent question answering.
    HUANG Ruizhang, born in 1979, Ph. D., professor. Her research interests include data fusion analysis, text mining, web mining, knowledge discovery.
    CHEN Yanping, born in 1980, Ph. D., professor. His research interests include artificial intelligence, natural language processing.
    ZHOU Yulin, born in 1997, Ph. D. candidate. His research interests include big data, natural language processing.
    SUN Qian, born in 1996, M. S., lecturer. Her research interests include natural language processing.
  • Supported by:
    National Key Research and Development Program of China(2023YFC3304500);Guizhou Provincial Postgraduate Research Fund(2024YJSKYJJ041);Sci-tech Innovation (Seedling Project) Cultivation and Small Creations Project of Science and Technology Department of Sichuan Province(MZGC20240152);Science and Technology Support Program of Guizhou Province ([2023] Qian Ke He Support General 448)

融合全局和局部语义的司法要素抽取方法

黄雨倩1,2,3, 黄辉1,2,3, 秦永彬1,2,3(), 黄瑞章1,2,3, 陈艳平1,2,3, 周裕林1,2,3, 孙倩4   

  1. 1.文本计算与认知智能教育部工程研究中心(贵州大学),贵阳 550025
    2.公共大数据国家重点实验室(贵州大学),贵阳 550025
    3.贵州大学 计算机科学与技术学院,贵阳 550025
    4.绵阳师范学院 信息工程学院,四川 绵阳 621000
  • 通讯作者: 秦永彬
  • 作者简介:黄雨倩(2001—),女,湖北武汉人,硕士研究生,主要研究方向:自然语言处理、信息抽取
    黄辉(1994—),男,贵州贵阳人,博士,主要方向:自然语言处理、智能问答
    黄瑞章(1979—),女,天津人,教授,博士,CCF会员,主要研究方向:数据融合分析、文本挖掘、网络挖掘、知识发现
    陈艳平(1980—),男,贵州长顺人,教授,博士,CCF会员,主要研究方向:人工智能、自然语言处理
    周裕林(1997—),男,贵州赤水人,博士研究生,主要研究方向:大数据、自然语言处理
    孙倩(1996—),女,四川成都人,讲师,硕士,主要研究方向:自然语言处理。
  • 基金资助:
    国家重点研发计划项目(2023YFC3304500);贵州省研究生科研基金资助项目(2024YJSKYJJ041);贵州省科技支撑计划项目(黔科合支撑[2023]一般448);四川省科技厅科技创新(苗子工程)培育及小创造项目(MZGC20240152)

Abstract:

Judicial information extraction aims to identify fine-grained key elements in judicial documents, helping legal professionals efficiently manage large volumes of paperwork. Compared to general domains, elements in judicial documents are typically longer and semantically more dispersed, while fine-grained requirements place particularly strict demands on local detail extraction, making the model capable of handling long-range dependencies and precisely capturing fine-grained local semantic information. To address this challenge, a judicial element extraction method integrating global and local semantics was proposed. Firstly, element labels were concatenated with the content of judicial documents, and deep embeddings were generated using the BERT (Bidirectional Encoder Representations from Transformers) model. Secondly, a self-attention mechanism was introduced to enhance the model's comprehension of global context, while an adaptive multi-head attention mechanism dynamically adjusted attention weights to better capture rich, precise semantic features at the local level. Finally, to improve the model's generalization performance in identifying element boundaries, a combined loss function was designed that incorporated binary cross-entropy and KL (Kullback-Leibler) divergence with Gaussian-smoothed boundaries. Experimental results show that compared with sequence labeling methods, span-based extraction methods, and other methods, the proposed method achieves improvements in the F1 score on both the LAIC2023 and CAIL2021 legal element extraction datasets. Specifically, it outperforms the second-best model, DiffusionNER, by 2.88 percentage points on the LAIC2023 dataset, and on the CAIL2021 dataset, it outperforms the second-best Machine Reading Comprehension (MRC) model by 1.01 percentage points.

Key words: information extraction, judicial documents, attention-based feature fusion, global semantics, local semantics

摘要:

司法领域的信息抽取是从司法文书中提取出细粒度的关键要素,可辅助司法工作者高效处理大量文书工作。然而,相较于通用领域,司法文书中的要素通常具有长度较长、语义分布广泛的特点,同时细粒度要求对局部细节的提取尤为严格。这使得模型不仅需要具备处理长距离依赖的能力,还需在局部范围内精准捕获细粒度的语义信息。针对该问题,提出一种融合全局和局部语义的司法要素抽取方法。首先,拼接要素标签与司法文书内容,并利用BERT(Bidirectional Encoder Representations from Transformers)模型进行深度嵌入。其次,引入自注意力机制增强模型对全局上下文的理解能力;同时,利用自适应多头注意力机制动态调节关注权重,确保能获取到更丰富且准确的语义特征。最后,结合二元交叉熵损失函数和高斯分布平滑边界的KL(Kullback-Leibler)散度损失函数,提升模型对要素边界识别的泛化能力。实验结果表明,与序列标注方法、跨度抽取方法及其他方法相比,所提方法在LAIC2023、CAIL2021司法要素抽取数据集上的F1值均有提升,其中在LAIC2023数据集上比次优模型DiffusionNER高2.88个百分点,在CAIL2021数据集上比次优的机器阅读理解(MRC)模型高1.01个百分点。

关键词: 信息抽取, 司法文书, 注意力特征融合, 全局语义, 局部语义

CLC Number: