融合时序行为链与事件类型的类案检索方法

doi:10.11772/j.issn.1001-9081.2024070917

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1741-1747.DOI: 10.11772/j.issn.1001-9081.2024070917

• 第十二届CCF大数据学术会议 • 上一篇

融合时序行为链与事件类型的类案检索方法

詹力林¹^,²^,³, 秦永彬¹^,²^,³(), 黄瑞章¹^,²^,³, 王华¹^,²^,³, 陈艳平¹^,²^,³

^1.文本计算与认知智能教育部工程研究中心（贵州大学），贵阳 550025
^2.公共大数据国家重点实验室（贵州大学），贵阳 550025
^3.贵州大学计算机科学与技术学院，贵阳 550025

收稿日期:2024-06-29 修回日期:2024-07-25 接受日期:2024-08-02 发布日期:2024-08-22 出版日期:2025-06-10
通讯作者: 秦永彬
作者简介:詹力林（2002—），男，贵州盘州人，硕士研究生，CCF会员，主要研究方向：自然语言处理、信息检索
秦永彬（1980—），男，山东烟台人，教授，博士，CCF高级会员，主要研究方向：大数据管理与应用、多源数据融合 ybqin@gzu.edu.cn
黄瑞章（1979—），女，天津人，教授，博士，CCF会员，主要研究方向：大数据、数据挖掘、信息提取
王华（1981—），男，贵州都匀人，博士研究生，CCF会员，主要研究方向：信息检索、数据挖掘
陈艳平（1980—），男，贵州长顺人，教授，博士，CCF会员，主要研究方向：人工智能、自然语言处理。
基金资助:
国家自然科学基金资助项目(62066008);贵州省科学技术基金重点项目(［2024］003)

Legal case retrieval method integrating temporal behavior chain and event type

Lilin ZHAN¹^,²^,³, Yongbin QIN¹^,²^,³(), Ruizhang HUANG¹^,²^,³, Hua WANG¹^,²^,³, Yanping CHEN¹^,²^,³

^1.Text Computing and Cognitive Intelligence Engineering Research Center of National Education Ministry （Guizhou University），Guiyang Guizhou 550025，China
^2.State Key Laboratory of Public Big Data （Guizhou University），Guiyang Guizhou 550025，China
^3.College of Computer Science and Technology，Guizhou University，Guiyang Guizhou 550025，China

Received:2024-06-29 Revised:2024-07-25 Accepted:2024-08-02 Online:2024-08-22 Published:2025-06-10
Contact: Yongbin QIN
About author:ZHAN Lilin， born in 2002， M. S. candidate. His research interests include natural language processing， information retrieval.
QIN Yongbin， born in 1980， Ph. D.， professor. His research interests include big data management and application， multi-source data fusion.
HUANG Ruizhang， born in 1979， Ph. D.， professor. Her research interests include big data， data mining， information extraction.
WANG Hua， born in 1981， Ph. D. candidate. His research interests include information retrieval， data mining.
CHEN Yanping， born in 1980， Ph. D.， professor. His research interests include artificial intelligence， natural language processing.
Supported by:
National Natural Science Foundation of China(62066008);Key Project of Science and Technology Foundation of Guizhou Province(［2024］ 003)

摘要/Abstract

摘要：

针对现有的类案检索（LCR）方法缺乏对案情要素的有效利用而容易被案例内容的语义结构相似性误导的问题，提出一种融合时序行为链与事件类型的类案检索方法。首先，采取序列标注的方法识别案情描述中的法律事件类型，并利用案例文本中的行为要素构建时序行为链，以突出案情的关键要素，从而使模型聚焦于案例的核心内容，进而解决现有方法易被案例内容的语义结构相似性误导的问题；其次，利用分段编码构造时序行为链的相似性向量表征矩阵，从而增强案例间行为要素的语义交互；最后，通过聚合评分器，从时序行为链、法律事件类型、犯罪类型这3个角度衡量案例的相关性，从而增加案例匹配得分的合理性。实验结果表明，相较于SAILER（Structure-Aware pre-traIned language model for LEgal case Retrieval）方法，所提方法在LeCaRD（Legal Case Retrieval Dataset）上的P@5值提升了4个百分点、P@10值提升了3个百分点、MAP值提升了4个百分点，而NDCG@30值提升了0.8个百分点。可见，该方法能有效利用案情要素来避免案例内容的语义结构相似性的干扰，并能为类案检索提供可靠的依据。

关键词: 案情要素, 行为要素, 事件类型, 时序行为链, 聚合评分器

Abstract:

Aiming at the problem that the existing Legal Case Retrieval （LCR） methods lack effective utilization of case elements and are easily misled by similarity of semantic structure of the case content， an LCR method integrating temporal behavior chain and event type was proposed. Firstly， the sequence labeling method was adopted to identify legal event type in the case description， and the temporal behavior chain was constructed by using behavioral elements in the case text， thereby highlighting key elements of the case， so that the model focused on core content of the case， so as to solve the problem that the existing methods are easily misled by similarity of semantic structure of the case content. Secondly， similarity vector representation matrix of the temporal behavior chain was constructed by segmented coding to enhance semantic interaction of behavioral elements among cases. Finally， through the aggregation scorer， relevance of the cases was measured from three perspectives： temporal behavior chain， legal event type， and crime type， so as to increase rationality of the case matching score. Experimental results show that on LeCaRD （Legal Case Retrieval Dataset）， compared with SAILER （Structure-Aware pre-traIned language model for LEgal case Retrieval） method， the proposed method has the P@5 value improved by 4 percentage points， the P@10 value increased by 3 percentage points， the MAP value improved by 4 percentage points， and the NDCG@30 value increased by 0.8 percentage points. It can be seen that this method utilizes case elements effectively to avoid interference of similarity of semantic structure of the case content， and can provide a reliable basis for LCR.

Key words: case element, behavioral element, event type, temporal behavioral chain, aggregation scorer

中图分类号:

TP391.1

詹力林, 秦永彬, 黄瑞章, 王华, 陈艳平. 融合时序行为链与事件类型的类案检索方法[J]. 计算机应用, 2025, 45(6): 1741-1747.

Lilin ZHAN, Yongbin QIN, Ruizhang HUANG, Hua WANG, Yanping CHEN. Legal case retrieval method integrating temporal behavior chain and event type[J]. Journal of Computer Applications, 2025, 45(6): 1741-1747.

图/表 9

表1 时序行为链与事件类型示例

Tab.1 Examples of temporal behavior chain and event type

被告急需钱购房，于是持刀威胁原告交出钱包……（918字）

随后，他殴打了原告使其受伤。

持→威胁→交出→

殴打→受伤

｛持械\持枪、威胁/强迫、伤害人身、受伤｝

被告急需钱购房，于是溜进原告家中偷窃其钱包……（908字）

随后，他被原告发现。

溜进→偷窃→发现

｛入户/入室，盗窃财物｝

被告在巷子里劫持了原告，随后持刀刺伤了他，导致原告受伤，

目前住院治疗……（1 608字）

劫持→持→刺伤→受伤

｛绑架、持械\持枪、伤害人身、受伤｝

图1 本文方法的整体框架

Fig. 1 Overall framework of proposed method

图2 时序行为链的构建

Fig. 2 Construction of temporal behavioral chain

表2 实验参数设置

Tab. 2 Experimental parameters setting

参数	值	参数	值
Batch size	1	weight_decay	0.01
学习率	3×10^-5	epoch	500
最大输入长度	510	行为链分段长度	254

表3 LCR实验结果对比

Tab. 3 Comparison of LCR experimental results

方法	P@5	P@10	MAP	NDCG@10	NDCG@20	NDCG@30
BM25	0.30	0.29	0.37	0.666	0.748	0.857
BERT	0.31	0.33	0.41	0.736	0.794	0.868
BERT-Crime	0.43	0.39	0.56	0.772	0.817	0.880
Lawformer	0.46	0.40	0.48	0.768	0.819	0.909
BERT-PLI	0.32	0.36	0.44	0.743	0.807	0.891
BERT-LF	0.49	0.45	0.59	0.816	0.864	0.919
SAILER	0.46	0.44	0.56	0.839	0.880	0.924
本文方法	0.50	0.47	0.60	0.842	0.882	0.932

表4 消融实验结果

Tab. 4 Ablation experimental results

方法	P@5	P@10	MAP	NDCG@10	NDCG@20	NDCG@30
-时序行为链	0.49	0.44	0.54	0.822	0.877	0.921
-事件类型	0.49	0.45	0.55	0.835	0.872	0.922
-时序行为链- 事件类型	0.42	0.43	0.49	0.820	0.830	0.910
-分段编码	0.44	0.42	0.54	0.826	0.882	0.929
本文方法	0.50	0.47	0.60	0.842	0.882	0.932

表5 参数分析实验结果

Tab. 5 Parameter analysis experimental results

参数值			P@5	P@10	MAP	NDCG@10	NDCG@20	NDCG@30
$α$	$β$	$θ$	P@5	P@10	MAP	NDCG@10	NDCG@20	NDCG@30
0.1	0.1	0.8	0.50	0.45	0.59	0.835	0.882	0.932
0.1	0.2	0.7	0.47	0.45	0.58	0.840	0.878	0.931
0.1	0.3	0.6	0.50	0.47	0.60	0.842	0.882	0.932
0.1	0.4	0.5	0.48	0.45	0.57	0.819	0.876	0.928
0.2	0.6	0.2	0.44	0.45	0.55	0.846	0.884	0.930
0.2	0.5	0.3	0.45	0.45	0.55	0.845	0.880	0.929

表5 参数分析实验结果

Tab. 5 Parameter analysis experimental results

参数值			P@5	P@10	MAP	NDCG@10	NDCG@20	NDCG@30
$α$	$β$	$θ$	P@5	P@10	MAP	NDCG@10	NDCG@20	NDCG@30
0.1	0.1	0.8	0.50	0.45	0.59	0.835	0.882	0.932
0.1	0.2	0.7	0.47	0.45	0.58	0.840	0.878	0.931
0.1	0.3	0.6	0.50	0.47	0.60	0.842	0.882	0.932
0.1	0.4	0.5	0.48	0.45	0.57	0.819	0.876	0.928
0.2	0.6	0.2	0.44	0.45	0.55	0.846	0.884	0.930
0.2	0.5	0.3	0.45	0.45	0.55	0.845	0.880	0.929

图3 易混淆案例的热力图

Fig. 3 Heat map of easily confused cases

图4 有无时序行为链向量矩阵方法的对比实验结果

Fig. 4 Comparison experimental results of methods with or without vector matrix of temporal behavior chain

参考文献 26

1	王景林，吴宜霖. 类案检索制度在司法实践中的应用研究［J］. 法制博览， 2022（2）：100-102.
	WANG J L， WU Y L. Research on the application of case-based retrieval system in judicial practice［J］. Legality Vision， 2022（2）： 100-102.
2	HONG Z， ZHOU Q， ZHANG R， et al. Legal feature enhanced semantic matching network for similar case matching［C］// Proceeding of the 2020 International Joint Conference on Neural Networks. Piscataway： IEEE， 2020：1-8.
3	LI H， AI Q， CHEN J， et al. SAILER： structure-aware pre-trained language model for legal case retrieval［C］// Proceeding of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2023： 1035-1044.
4	SHAO Y， MAO J， LIU Y， et al. BERT-PLI： modeling paragraph-level interactions for legal case retrieval［C］// Proceeding of the 29th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2020： 3501-3507.
5	MA Y X， SHAO Y， WU Y， et al. LeCaRD： a legal case retrieval dataset for Chinese law system［C］// Proceeding of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2021： 2342-2348.
6	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceeding of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
7	LAFFERTY J， McCALLUM A， PEREIRA F C N. Conditional random fields： probabilistic models for segmenting and labeling sequence data ［C］// Proceeding of the 18th International Conference on Machine Learning. San Francisco： Morgan Kaufmann Publishers Inc.， 2001： 282-289.
8	SALTON G， BUCKLEY C. Term-weighting approaches in automatic text retrieval［J］. Information Processing and Management， 1988， 24（5）： 513-523.
9	ROBERTSON S， ZARAGOZA H. The probabilistic relevance framework： BM25 and beyond［J］. Foundations and Trends^® in Information Retrieval， 2009， 3（4）： 333-389.
10	PONTE J M， CROFT W B. A language modeling approach to information retrieval［C］// Proceeding of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 1998：275-281.
11	TRAN V， NGUYEN M L， SATOH K. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model［C］// Proceeding of the 17th International Conference on Artificial Intelligence and Law. New York： ACM， 2019： 275-282.
12	ASKARI A， VERBERNE S， et al. Combining lexical and neural retrieval with Longformer-based summarization for effective case law retrieval［C］// Proceeding of the 2nd Design of Experimental Search and Information Retrieval Systems. Aachen： CEUR-WS.org， 2021： 162-170.
13	BHATTACHARYA P， GHOSH K， PAL A， et al. Methods for computing legal document similarity： a comparative study［EB/OL］. ［2024-03-15］..
14	LI J， LIU X， NIE X， et al. Weighted-attribute triplet hashing for large-scale similar judicial case matching［J］. Computational Intelligence and Neuroscience， 2021， 2021： No.6650962.
15	NIGAM S K， GOEL N， BHATTACHARYA A. nigam@COLIEE-22： legal case retrieval and entailment using cascading of lexical and semantic-based models［C］// Proceeding of the 2022 JSAI International Symposium on Artificial Intelligence， LNCS 13859. Cham： Springer， 2023： 96-108.
16	DE MARTINO G， PIO G， CECI M. PRILJ： an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments［J］. Artificial Intelligence and Law， 2022， 30（3）： 359-390.
17	GE J， HUANG Y， SHEN X， et al. Learning fine-grained fact-article correspondence in legal cases［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2021， 29： 3694-3706.
18	WANG Z. Legal element-oriented modeling with multi-view contrastive learning for legal case retrieval ［C］// Proceeding of the 2022 International Joint Conference on Neural Networks. Piscataway： IEEE， 2022： 1-10.
19	曹发鑫，孙媛媛，王治政，等. 面向借贷案件的相似案例匹配模型［J］.计算机工程， 2024， 50（1）：306-312.
	CAO F X， SUN Y Y， WANG Z Z， et al. Similar case matching model for lending cases ［J］. Computer Engineering， 2024， 50（1）：306-312.
20	刘权，余正涛，高盛祥，等. 融合案件要素的相似案例匹配［J］. 中文信息学报， 2022， 36（11）：140-147.
	LIU Q， YU Z T， GAO S X， et al. Incorporating case elements for case matching［J］. Journal of Chinese Information Processing， 2022， 36（11）：140-147.
21	XIAO C， ZHONG H， GUO Z， et al. CAIL2019-SCM： a dataset of similar case matching in legal domain［EB/OL］. ［2024-03-20］..
22	HU W， ZHAO S， ZHAO Q， et al. BERT_LF： a similar case retrieval method based on legal facts［J］. Wireless Communications and Mobile Computing， 2022， 2022： No.2511147.
23	SUN Z， XU J， ZHANG X， et al. Law article-enhanced legal case matching： a causal learning approach［C］// Proceeding of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2023： 1549-1558.
24	YAO F， XIAO C， WANG X， et al. LEVEN： a large-scale Chinese legal event detection dataset［C］// Findings of the Association for Computational Linguistics： ACL 2022. Stroudsburg： ACL， 2022： 183-201.
25	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. ［2024-03-25］..
26	XIAO C， HU X， LIU Z， et al. Lawformer： a pre-trained language model for Chinese legal long documents［J］. AI Open 2021， 2： 79-84.

[1]	肖毓航李贯峰陈昱胤秦晶. 基于图的多视角对比学习小样本关系抽取模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	余明峰, 秦永彬, 黄瑞章, 陈艳平, 林川. 基于对比学习增强双注意力机制的多标签文本分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1732-1740.
[3]	李自亮, 朱广丽, 张玉雷, 刘佳佳, 焦熠璇, 张顺香. 集成句法与情感知识的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1724-1731.
[4]	高飞陈董边帝行范文强刘起东吕培张朝阳徐明亮. 面向学科撤销与科研人员重分配的多阶段耦合决策框架[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	黄奕明邹喜华邓果郑狄. 预回答与召回过滤：双阶段RAG问答系统优化方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[6]	程梓洋黄瑞章薛菁菁. 深度演化文档主题聚类模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[7]	沈斌陈晓宁程华房一泉王慧锋. 基于大语言模型的本科教学评估智能系统[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[8]	田海燕, 黄赛豪, 张栋, 李寿山. 视觉指导的分词和词性标注[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1488-1495.
[9]	张庆, 杨凡, 方宇涵. 基于多模态信息融合的中文拼写纠错算法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1528-1534.
[10]	胡婕, 武帅星, 曹芝兰, 张龑. 基于全域信息融合和多维关系感知的命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1511-1519.
[11]	徐博, 郝德志, 于迩晨, 林鸿飞, 宗林林. 面向对话生成和心理疾病检测的心理咨询式人机对话数据集构建[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1395-1402.
[12]	张瑜琦沙灜. 基于层次信息增强的中文语义错误识别模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[13]	周景唐振洋. 融合特征增强和对比学习的电力客服工单多标签文本分类方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[14]	龚永罡陈舒汉廉小亲李乾生莫鸿铭刘宏宇. 基于大语言模型的中文开放领域实体关系抽取策略[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[15]	张正悦彭菊红丁子胥范馨予胡长玉. 融合情感词典的多视角语言特征方面情感三元组抽取模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.

融合时序行为链与事件类型的类案检索方法

Legal case retrieval method integrating temporal behavior chain and event type

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 26

相关文章 15

编辑推荐

Metrics