《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 746-754.DOI: 10.11772/j.issn.1001-9081.2024060833

• 大模型前沿研究与典型应用 • 上一篇    下一篇

面向军事领域知识问答系统的多策略检索增强生成方法

张艳萍1, 陈梅芳2,3, 田昌海3(), 易子博3, 胡文鹏3, 罗威3, 罗准辰3   

  1. 1.河北工程大学 数理科学与工程学院,河北 邯郸 056038
    2.河北工程大学 信息与电气工程学院,河北 邯郸 056038
    3.中国人民解放军军事科学院 军事科学信息研究中心,北京 100142
  • 收稿日期:2024-06-24 修回日期:2024-08-16 接受日期:2024-08-20 发布日期:2024-09-02 出版日期:2025-03-10
  • 通讯作者: 田昌海
  • 作者简介:张艳萍(1980—),女,河北邯郸人,教授,博士,主要研究方向:生物信息学、生物医学统计
    陈梅芳(1999—),女,福建莆田人,硕士研究生,主要研究方向:信息检索
    易子博(1991—),男,湖北荆州人,助理研究员,博士,主要研究方向:信息检索、科技信息智能挖掘与服务
    胡文鹏(1992—),男,山东日照人,助理研究员,博士,主要研究方向:机器学习、自然语言处理
    罗威(1979—),男,安徽桐城人,正高级工程师,硕士,主要研究方向:科技信息智能挖掘与服务
    罗准辰(1984—),男,湖南长沙人,高级工程师,博士,主要研究方向:科技信息智能挖掘与服务、自然语言处理。

Multi-strategy retrieval-augmented generation method for military domain knowledge question answering systems

Yanping ZHANG1, Meifang CHEN2,3, Changhai TIAN3(), Zibo YI3, Wenpeng HU3, Wei LUO3, Zhunchen LUO3   

  1. 1.School of Mathematics and Physics,Hebei University of Engineering,Handan Hebei 056038,China
    2.School of Information and Electrical Engineering,Hebei University of Engineering,Handan Hebei 056038,China
    3.Information Research Center of Military Science,PLA Academy of Military Science,Beijing 100142,China
  • Received:2024-06-24 Revised:2024-08-16 Accepted:2024-08-20 Online:2024-09-02 Published:2025-03-10
  • Contact: Changhai TIAN
  • About author:ZHANG Yanping, born in 1980, Ph. D., professor. Her research interests include bioinformatics, biomedical statistics.
    CHEN Meifang, born in 1999, M. S. candidate. Her research interests include information retrieval.
    YI Zibo, born in 1991, Ph. D., assistant research fellow. His research interests include information retrieval, intelligent mining and service of scientific and technological information.
    HU Wenpeng, born in 1992, Ph. D., assistant research fellow. His research interests include machine learning, natural language processing.
    LUO Wei, born in 1979, M. S., professor of engineering. His research interests include intelligent mining and service of scientific and technological information.
    LUO Zhunchen, born in 1984, Ph. D., senior engineer. His research interests include intelligent mining and service of scientific and technological information, natural language processing.
  • Supported by:
    Youth Program of National Natural Science Foundation of China(62206308)

摘要:

基于检索增强生成(RAG)的军事领域知识问答系统已经逐渐成为现代情报人员收集和分析情报的重要工具。针对目前RAG方法的应用策略中的混合检索存在可移植性不强以及非必要使用查询改写容易诱发语义漂移的问题,提出一种多策略检索增强生成(MSRAG)方法。首先,根据用户输入的查询特点自适应地匹配检索模型来召回相关文本;其次,利用文本过滤器提取出能够回答问题的关键文本片段;再次,使用文本过滤器进行内容有效性判断以启动基于同义词拓展的查询改写,并将初始查询与改写后的信息合并输入检索控制器以进行更有针对性的再次检索;最后,合并能够回答问题的关键文本片段和问题,并使用提示工程输入生成答案模型来生成响应返回给用户。实验结果表明,MSRAG方法在军事领域数据集(Military)和Medical数据集的ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence)指标上相较于凸线性组合RAG方法分别提高了14.35和5.83个百分点。可见,MSRAG方法具备较强的通用性和可移植性,能够缓解非必要查询改写导致的语义漂移现象,有效帮助大模型生成更准确的答案。

关键词: 检索增强生成, 军事知识问答, 信息检索, 文本过滤, 查询改写

Abstract:

The military domain knowledge question answering system based on Retrieval-Augmented Generation (RAG) has become an important tool for modern intelligence personnel to collect and analyze intelligence gradually. Focusing on the issue that the application strategies of RAG methods currently suffer from poor portability in hybrid retrieval as well as the problem of semantic drift caused by unnecessary query rewriting easily, a Multi-Strategy Retrieval-Augmented Generation (MSRAG) method was proposed. Firstly, the retrieval model was matched adaptively to recall relevant text based on query characteristics of the user input. Secondly, a text filter was utilized to extract the key text fragments that can answer the question. Thirdly, the content validity was assessed by the text filter to trigger query rewriting based on synonym expansion, and the initial query was merged with the rewritten information and used as input of the retrieval controller for more targeted re-retrieval. Finally, the key text fragments that can answer the question were merged with the question, prompt engineering input was used to generate answer model, and the response generated by the model was returned to the user. Experimental results show that compared to the convex linear combination RAG method, MSRAG method improves the ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation Longest common subsequence) by 14.35 percentage points on the Military domain dataset (Military) and by 5.83 percentage points on the Medical dataset. It can be seen that MSRAG method has strong universality and portability, enables the reduction of the semantic drift caused by unnecessary query rewriting, and effectively helps large language models generate more accurate answers.

Key words: Retrieval-Augmented Generation (RAG), military knowledge question answering, information retrieval, text filtering, query rewriting

中图分类号: