Journal of Computer Applications

    Next Articles

Help information extraction model for flood event in social media data

  

  • Received:2023-08-10 Revised:2023-10-11 Online:2023-12-18 Published:2023-12-18
  • Supported by:
    The National Natural Science Foundation of China;the National key R & D Program Foundation;Natural Science Foundation of Liaoning Province;the Project of the Educational Department of Liaoning Province

社交媒体数据中水灾事件求助信息提取模型

孙焕良,王思懿,刘俊岭,许景科   

  1. 沈阳建筑大学
  • 通讯作者: 孙焕良
  • 基金资助:
    国家自然科学基金资助项目;国家重点研发计划课题;辽宁省自然科学基金资助项目;辽宁省教育厅资助项目

Abstract: Abstract: Facing a sudden surge of risks and impairments brought by the flood, it’s of great significance to identify help information posted on the social media accurately and timely. Because of concerns on data inconsistency and information priority, how to extract desired information from the social media precisely and automatically becomes a challenging task. To solve this urging problem, a type of fine-tuned Large Language Models (LLMs)–ChatFlowflood, an information extraction model based on the TencentPretrain framework, is developed. Through the Formal Concept Analysis (FCA), the word co-occurrence relationship and the knowledge systems built up by contextual semantics, this model can extract the in-live disaster information such as information on locations, material shortage and ect. Besides information extraction, this model can use Fuzzy Analytic Hierarchy Process (FAHP) and CRITIC to rate the priority of the rescue, delivering a better understanding of the emergency of disasters for decision makers. The experimental results show that the F1 score of ChatFlowFlood reaches 85.69% for Chinese social media, suggesting a profound practical implication for emergency management and social rescue.

Key words: Chinese social media, Named Entity Recognition(NER), Large Language Model(LLM), instruction fine-tuning, flood event

摘要: 摘 要: 面对水灾事件带来的巨大风险和灾害,及时有效地从水灾社交媒体中识别求助信息,对救援工作具有重要意义。由于社交媒体平台上发布求助相关信息存在数据不一致、重要程度不同等问题,使得自动准确抽取所需信息并标注受灾级别成为一个有挑战性的工作。因此,构建一种基于TencentPretrain框架对大规模语言预训练模型进行指令微调的ChatFlowFlood信息抽取模型,结合基于形式概念分析(FCA)、词共现关系和上下文语义信息构建的水灾事件知识体系,可以在少量人工标记情况下,实现准确自动抽取被困情况、紧缺物资等信息;在信息抽取模型的基础上,通过模糊层次分析法(FAHP)和CRITIC法主客观结合评定求助信息的救援优先级,帮助决策者理解灾情紧急程度。实验结果表明,ChatFlowFlood模型在中文社交媒体数据上F1指标达到85.69%,为灾害事件处理与社会救援工作带来了重要的现实意义和应用价值。

关键词: 中文社交媒体, 命名实体识别, 大规模语言模型, 指令微调, 水灾事件

CLC Number: