《计算机应用》唯一官方网站

• •    下一篇

面向航空数据链的大模型信息抽取

汤魁1,高拴梁2,李阳2,李赓2,张嘉兴2,祝远芳2,付永明1,毛瑞棋1,梁红茹2   

  1. 1.中国人民解放军93216部队 2.四川大学 计算机学院
  • 收稿日期:2025-09-22 修回日期:2025-11-06 发布日期:2025-12-04 出版日期:2025-12-04
  • 通讯作者: 高拴梁
  • 作者简介:汤魁(1978—),男,湖北黄石人,副研究员,硕士,主要研究方向:信息系统、软件工程;高拴梁(2000—),男,内蒙古乌兰察布人,硕士研究生,主要研究方向:自然语言处理;李阳(2003—),男,四川遂宁人,硕士研究生,主要研究方向:自然语言处理;李赓(2003—),男,山东聊城人,本科生,主要研究方向:自然语言处理;张嘉兴(2004—),男,河北唐山人,本科生,主要研究方向:多模态大模型;祝远芳(2004—),女,江西新余人,本科生,主要研究方向:自然语言处理;付永明(1987—),男,山西晋城人,工程师,博士,主要研究方向:航空通信;毛瑞棋(1996—),男,湖南常德人,工程师,博士,主要研究方向:航空通信;梁红茹(1992—),女,河北新乐人,副教授,博士,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金面上项目(8217XXXXX);多域数据协同处理与控制全国重点实验室开放基金资助项目(CLDL-20240201)

Large model information extraction for aeronautical data link

TANG Kui1, GAO Shuanliang2, LI Yang2, LI Geng2, ZHANG Jiaxing2, ZHU Yuanfang2, FU Yongming1, MAO Ruiqi1, LIANG Hongru2   

  1. 1. Unit 93216 of the Chinese People's Liberation Army 2. College of Computer Science, Sichuan University
  • Received:2025-09-22 Revised:2025-11-06 Online:2025-12-04 Published:2025-12-04
  • About author:TANG Kui, born in 1978, M. S., associate research fellow. His research interests include information systems, software engineering. GAO Shuanliang, born in 2000, M. S. candidate. His research interests include natural language processing. LI Yang, born in 2003, M. S. candidate. His research interests include natural language processing. LI Geng, born in 2003. His research interests include natural language processing. ZHANG Jiaxing, born in 2004. His research interests include multimodal large models. ZHU Yuanfang, born in 2004. Her research interests include natural language processing. FU Yongming, born in 1987, Ph. D., engineer. His research interests include aviation communications. MAO Ruiqi, born in 1996, Ph. D., engineer. His research interests include aviation communications. LIANG Hongru, born in 1992, Ph. D., associate professor. Her research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China (8217XXXXX); Open Fund of National Key Laboratory for Multi-domain Data Collaborative Processing and Control (CLDL-2024020)

摘要: 在诸如航空数据链指令解析等军事领域的严肃任务场景中,信息抽取系统需要同时具备高准确性与结构一致性,以确保任务执行的安全与可靠。然而,传统基于规则或少量示例提示的抽取方法在面对复杂语义、灵活句式和高约束结构时,抽取结果存在严重的幻觉。为了解决这个问题,提出基于大语言模型(LLM)的检索增强式信息抽取(RAG-IE)算法,该算法结合检索增强技术与信息抽取技术,首先构建面向目标信息类型的元语言库,并据此提供标准化抽取模板;随后基于倒排索引结合关键词与语义扩展策略实现高效模板检索;最后通过提示词设计引导大模型将自然语言转换为结构化信息,实现对航空数据链自然语言指令的具有高度可控性与准确率的信息抽取。实验结果表明,该算法优于现有的信息抽取算法,大幅减少了大模型抽取信息中所存在的幻觉,同时具有良好的领域无关性,可通过重构元语言库扩展至其他需要高可靠抽取的场景,为智能化、可验证的信息处理系统提供了一种通用技术路径。

关键词: 信息抽取, 检索增强, 大语言模型, 提示工程, 少样本学习

Abstract: In serious task scenarios such as aeronautical data link command parsing in the military domain, information extraction systems must achieve both high accuracy and structural consistency to ensure the safety and reliability of mission execution. However, traditional extraction methods based on rules or few-shot prompting often suffer from severe hallucinations when dealing with complex semantics, flexible sentence structures, and highly constrained formats. To address this issue, a retrieval enhanced information extraction algorithm based on a large language model was developed. This algorithm combines retrieval enhancement technology with information extraction technology. First, a meta-language library oriented to the target information types was constructed to provide standardized extraction templates. Then, an efficient template retrieval process was achieved through an inverted index combined with keyword matching and semantic expansion strategies. Finally, prompt instructions were designed to guide the large language model to convert natural language into structured information, enabling highly controllable and accurate extraction of natural language commands in the aviation data-link scenario. Experimental results show that this algorithm is superior to existing information extraction algorithms, greatly reducing the illusions that exist in the extraction of information from large models. Meanwhile, the proposed framework exhibits strong domain independence and can be extended to other high-reliability extraction scenarios by reconstructing the meta-language library, providing a generalizable technical pathway for building intelligent and verifiable information processing systems.

Key words: information extraction, retrieval augmentation, Large Language Model (LLM), prompt engineering, few-shot learning

中图分类号: