Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (8): 2437-2445.DOI: 10.11772/j.issn.1001-9081.2023081080

• Artificial intelligence • Previous Articles     Next Articles

Help-seeking information extraction model for flood event in social media data

Huanliang SUN1,2(), Siyi WANG1,2, Junling LIU1,2, Jingke XU1,2,3   

  1. 1.School of Computer Science and Engineering,Shenyang Jianzhu University,Shenyang Liaoning 110168,China
    2.Liaoning Province Big Data Management and Analysis Laboratory of Urban Construction(Shenyang Jianzhu University),Shenyang Liaoning 110168,China
    3.Shenyang Branch of National Special Computer Engineering Technology Research Center,Shenyang Liaoning 110168,China
  • Received:2023-08-10 Revised:2023-10-11 Accepted:2023-10-17 Online:2023-12-18 Published:2024-08-10
  • Contact: Huanliang SUN
  • About author:SUN Huanliang , born in 1969, Ph. D., professor. His researchinterests include spatial data management, data mining.
    WANG Siyi, born in 1998, M. S. candidate. Her research interestsinclude natural language processing, knowledge graph.
    LIU Junling , born in 1972, Ph. D., associate professor. Herresearch interests include spatio-temporal data query, data mining.
    XU Jingke , born in 1976, Ph. D., professor. His research interestsinclude spatio-temporal database, data mining.
  • Supported by:
    This work is partially supported by National Key R&D Program(2021YFF0306303); Project of Educational Department of LiaoningProvince( LJKZ0582).

社交媒体数据中水灾事件求助信息提取模型

孙焕良1,2(), 王思懿1,2, 刘俊岭1,2, 许景科1,2,3   

  1. 1.沈阳建筑大学 计算机科学与工程学院,沈阳 110168
    2.辽宁省城市建设大数据管理与分析重点实验室(沈阳建筑大学),沈阳 110168
    3.国家特种计算机工程技术研究中心沈阳分中心,沈阳 110168
  • 通讯作者: 孙焕良
  • 作者简介:孙焕良(1969—),男,黑龙江望奎人,教授,博士生导师,博士,CCF高级会员,主要研究方向:空间数据管理、数据挖掘 sunhl@sjzu.edu.cn
    王思懿(1998—),女,黑龙江大庆人,硕士研究生,CCF会员,主要研究方向:自然语言处理、知识图谱
    刘俊岭(1972—),女,辽宁沈阳人,副教授,博士,CCF会员,主要研究方向:时空数据查询、数据挖掘
    许景科(1976—),男,辽宁海城人,教授,博士,CCF会员,主要研究方向:时空数据库、数据挖掘。
  • 基金资助:
    国家重点研发计划项目(2021YFF0306303);辽宁省教育厅项目(LJKZ0582)

Abstract:

Because of data inconsistency and different information importance, how to extract desired information from the social media precisely and automatically becomes a challenging task. To solve the above problem, through Formal Concept Analysis (FCA), word co-occurrence relationship and contextual semantics, the knowledge system of flood event was built up. Using the constructed knowledge system, a type of fine-tuned Large Language Model (LLM), ChatFlowFlood, an information extraction model based on the TencentPretrain framework, was developed. The in-live disaster information such as locations and material shortage could be extracted only with few mannual annotations. Based on the information extraction model, Fuzzy Analytic Hierarchy Process (FAHP) and CRITIC (CRiteria Importance Through Intercriteria Correlation) methods were combined to evaluate the rescue priority of help-seeking information subjectively and objectively, which helped decision makers understand the emergency degree of the disaster. The experimental results show that on Chinese social media data, compared with the ChatFlow-7B model, the FBERT index of the ChatFlowFlood model is improved by 73.09%.

Key words: Chinese social media, Named Entity Recognition (NER), Large Language Model (LLM), instruction fine-tuning, flood event

摘要:

由于社交媒体平台上所发布的非结构化信息存在数据不一致、重要程度不同等问题,使自动准确抽取所需信息并标注受灾级别成为一个有挑战性的工作。因此,结合形式概念分析(FCA)、词共现关系和上下文语义信息构建了水灾事件知识体系。利用所构建的知识体系,基于TencentPretrain框架对大规模语言预训练模型(LLM)进行指令微调,构建了ChatFlowFlood信息抽取模型,可以在少量人工标记情况下,准确自动抽取被困情况、紧缺物资等信息;在信息抽取模型的基础上,通过模糊层次分析法(FAHP)和CRITIC法(CRiteria Importance Through Intercriteria Correlation)主客观结合评定求助信息的救援优先级,帮助决策者理解灾情紧急程度。实验结果表明,在中文社交媒体数据上,与ChatFlow-7B模型相比,ChatFlowFlood模型的FBERT指标提升了73.09%。

关键词: 中文社交媒体, 命名实体识别, 大规模语言模型, 指令微调, 水灾事件

CLC Number: