《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1741-1747.DOI: 10.11772/j.issn.1001-9081.2024070917

• 第十二届CCF大数据学术会议 • 上一篇    

融合时序行为链与事件类型的类案检索方法

詹力林1,2,3, 秦永彬1,2,3(), 黄瑞章1,2,3, 王华1,2,3, 陈艳平1,2,3   

  1. 1.文本计算与认知智能教育部工程研究中心(贵州大学),贵阳 550025
    2.公共大数据国家重点实验室(贵州大学),贵阳 550025
    3.贵州大学 计算机科学与技术学院,贵阳 550025
  • 收稿日期:2024-06-29 修回日期:2024-07-25 接受日期:2024-08-02 发布日期:2024-08-22 出版日期:2025-06-10
  • 通讯作者: 秦永彬
  • 作者简介:詹力林(2002—),男,贵州盘州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、信息检索
    秦永彬(1980—),男,山东烟台人,教授,博士,CCF高级会员,主要研究方向:大数据管理与应用、多源数据融合 ybqin@gzu.edu.cn
    黄瑞章(1979—),女,天津人,教授,博士,CCF会员,主要研究方向:大数据、数据挖掘、信息提取
    王华(1981—),男,贵州都匀人,博士研究生,CCF会员,主要研究方向:信息检索、数据挖掘
    陈艳平(1980—),男,贵州长顺人,教授,博士,CCF会员,主要研究方向:人工智能、自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62066008);贵州省科学技术基金重点项目([2024]003)

Legal case retrieval method integrating temporal behavior chain and event type

Lilin ZHAN1,2,3, Yongbin QIN1,2,3(), Ruizhang HUANG1,2,3, Hua WANG1,2,3, Yanping CHEN1,2,3   

  1. 1.Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry (Guizhou University),Guiyang Guizhou 550025,China
    2.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    3.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
  • Received:2024-06-29 Revised:2024-07-25 Accepted:2024-08-02 Online:2024-08-22 Published:2025-06-10
  • Contact: Yongbin QIN
  • About author:ZHAN Lilin, born in 2002, M. S. candidate. His research interests include natural language processing, information retrieval.
    QIN Yongbin, born in 1980, Ph. D., professor. His research interests include big data management and application, multi-source data fusion.
    HUANG Ruizhang, born in 1979, Ph. D., professor. Her research interests include big data, data mining, information extraction.
    WANG Hua, born in 1981, Ph. D. candidate. His research interests include information retrieval, data mining.
    CHEN Yanping, born in 1980, Ph. D., professor. His research interests include artificial intelligence, natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62066008);Key Project of Science and Technology Foundation of Guizhou Province([2024] 003)

摘要:

针对现有的类案检索(LCR)方法缺乏对案情要素的有效利用而容易被案例内容的语义结构相似性误导的问题,提出一种融合时序行为链与事件类型的类案检索方法。首先,采取序列标注的方法识别案情描述中的法律事件类型,并利用案例文本中的行为要素构建时序行为链,以突出案情的关键要素,从而使模型聚焦于案例的核心内容,进而解决现有方法易被案例内容的语义结构相似性误导的问题;其次,利用分段编码构造时序行为链的相似性向量表征矩阵,从而增强案例间行为要素的语义交互;最后,通过聚合评分器,从时序行为链、法律事件类型、犯罪类型这3个角度衡量案例的相关性,从而增加案例匹配得分的合理性。实验结果表明,相较于SAILER(Structure-Aware pre-traIned language model for LEgal case Retrieval)方法,所提方法在LeCaRD(Legal Case Retrieval Dataset)上的P@5值提升了4个百分点、P@10值提升了3个百分点、MAP值提升了4个百分点,而NDCG@30值提升了0.8个百分点。可见,该方法能有效利用案情要素来避免案例内容的语义结构相似性的干扰,并能为类案检索提供可靠的依据。

关键词: 案情要素, 行为要素, 事件类型, 时序行为链, 聚合评分器

Abstract:

Aiming at the problem that the existing Legal Case Retrieval (LCR) methods lack effective utilization of case elements and are easily misled by similarity of semantic structure of the case content, an LCR method integrating temporal behavior chain and event type was proposed. Firstly, the sequence labeling method was adopted to identify legal event type in the case description, and the temporal behavior chain was constructed by using behavioral elements in the case text, thereby highlighting key elements of the case, so that the model focused on core content of the case, so as to solve the problem that the existing methods are easily misled by similarity of semantic structure of the case content. Secondly, similarity vector representation matrix of the temporal behavior chain was constructed by segmented coding to enhance semantic interaction of behavioral elements among cases. Finally, through the aggregation scorer, relevance of the cases was measured from three perspectives: temporal behavior chain, legal event type, and crime type, so as to increase rationality of the case matching score. Experimental results show that on LeCaRD (Legal Case Retrieval Dataset), compared with SAILER (Structure-Aware pre-traIned language model for LEgal case Retrieval) method, the proposed method has the P@5 value improved by 4 percentage points, the P@10 value increased by 3 percentage points, the MAP value improved by 4 percentage points, and the NDCG@30 value increased by 0.8 percentage points. It can be seen that this method utilizes case elements effectively to avoid interference of similarity of semantic structure of the case content, and can provide a reliable basis for LCR.

Key words: case element, behavioral element, event type, temporal behavioral chain, aggregation scorer

中图分类号: