Journal of Computer Applications

    Next Articles

Judgment document summarization method combining large language model and dynamic prompts

ZHANG Binbin1,2,3, QIN Yongbin1,2,3, HUANG Ruizhang1,2,3, CHEN Yanping1,2,3   

  1. 1.College of Computer Science and Technology , Guizhou University 2.State Key Laboratory of Public Big Data (Guizhou University) 3.Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry, Guizhou University
  • Received:2024-09-28 Revised:2024-12-29 Online:2025-03-21 Published:2025-03-21
  • About author:ZHANG Binbin, born in 1999, M. S. candidate. His research interests include natural language processing, judicial summarization. QIN Yongbin, born in 1980, Ph. D., professor. His research interests include big data management, multi-source data fusion. HUANG Ruizhang, born in 1979, Ph. D., professor. Her research interests include big data, data mining, information extraction. CHEN Yanping, born in 1980, Ph. D., professor. His research interests include artificial intelligence, natural language processing.
  • Supported by:
    National Key R&D Program (2023YFC3304500); Key Project of Science and Technology Foundation of Guizhou Province ([2024] 003).

结合大语言模型与动态提示的裁判文书摘要方法

张滨滨1,2,3,秦永彬1,2,3,黄瑞章1,2,3,陈艳平1,2,3   

  1. 1. 贵州大学 计算机科学与技术学院 2. 公共大数据国家重点实验室(贵州大学) 3.贵州大学 文本计算与认知智能教育部工程研究中心
  • 通讯作者: 秦永彬
  • 作者简介:张滨滨(1999—),男,贵州仁怀人,硕士研究生,CCF学生会员,主要研究方向:自然语言处理、司法摘要;秦永彬(1980—),男,山东烟台人,教授,博士,CCF高级会员,主要研究方向:大数据管理、多源数据融合;黄瑞章(1979 —),女,天津人,教授,博士,CCF会员,主要研究方向:大数据、数据挖掘、信息提取;陈艳平(1980—),男,贵州长顺人,教授,博士,CCF 会员,主要研究方向:人工智能、自然语言处理。
  • 基金资助:
    国家重点研发计划(2023YFC3304500);贵州省科学技术基金重点资助项目([2024]003)

Abstract:  In view of the problems of complex case structure, redundant facts involved, and wide distribution of cases in judicial documents, the existing large language model was difficult to effectively focus on structural information and factual errors, resulting in missing structural information and inconsistent facts. To this end, a judicial document summary method combining a large language model and dynamic prompts, DPCM (Dynamic Prompt Correction Method), was proposed. First, a large language model was used for single-sample learning to generate judicial document summaries. Secondly, the high-dimensional similarity between the original text and the summary was calculated to detect possible structural missing or factual inconsistency problems in the summary. If a problem was found, the wrong summary was spliced with the original text, and the prompt word was added. Then, single-sample learning was performed again to correct and generate a new summary, and a similarity test was performed. If the problem still existed, the generation and detection process was repeated. Finally, through this iterative method, the prompt word was dynamically adjusted to gradually optimize the generated summary. The experimental results on the CAIL2021 public justice summary dataset showed that compared with Least-To-Most Prompting, Zero-Shot Reasoners, Self-Consistency CoT and other methods, the proposed method showed improvements in Rouge-1, Rouge-2, Rouge-L, BERTscore, FactCC (F1), and FactCC (Acc) indicators.

Key words: large language model, dynamic prompt, judgment document summary, lack of structure, factual inconsistency

摘要: 针对裁判文书案件结构复杂、涉案事实冗余且案情分布广泛的问题,现有的大语言模型难以有效关注结构信息及事实错误关联,从而导致结构信息缺失和事实不一致。为此,提出一种结合大语言模型与动态提示的裁判文书摘要方法DPCM(Dynamic Prompt Correction Method)。首先,利用大语言模型进行单样本学习,以生成裁判文书摘要。其次,计算原文与摘要之间的高维相似性,检测摘要中可能存在的结构缺失或事实不一致问题。如果发现问题,将错误摘要与原文拼接,并加入提示词,随后再次进行单样本学习以修正并生成新的摘要,并进行相似性检测。如果问题仍然存在,则重复此生成与检测过程。最后,通过这种反复迭代的方式,动态调整提示词,逐步优化生成的摘要。在CAIL2021公共司法摘要数据集上的实验结果表明,相较于Least_To_Most_Prompting、Zero_Shot_Reasoners、Self_Consistency_Cot等方法,所提方法在Rouge-1、Rouge-2、Rouge-L、BERTscore、FactCC(F1)和FactCC(Acc)指标上均有所提高。

关键词: 大语言模型, 动态提示, 裁判文书摘要, 结构缺失, 事实不一致

CLC Number: