Help-seeking information extraction model for flood event in social media data

doi:10.11772/j.issn.1001-9081.2023081080

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (8): 2437-2445.DOI: 10.11772/j.issn.1001-9081.2023081080

• Artificial intelligence • Previous Articles Next Articles

Help-seeking information extraction model for flood event in social media data

Huanliang SUN¹^,²(), Siyi WANG¹^,², Junling LIU¹^,², Jingke XU¹^,²^,³

^1.School of Computer Science and Engineering，Shenyang Jianzhu University，Shenyang Liaoning 110168，China
^2.Liaoning Province Big Data Management and Analysis Laboratory of Urban Construction（Shenyang Jianzhu University），Shenyang Liaoning 110168，China
^3.Shenyang Branch of National Special Computer Engineering Technology Research Center，Shenyang Liaoning 110168，China

Received:2023-08-10 Revised:2023-10-11 Accepted:2023-10-17 Online:2023-12-18 Published:2024-08-10
Contact: Huanliang SUN
About author:SUN Huanliang ， born in 1969， Ph. D.， professor. His researchinterests include spatial data management， data mining.
WANG Siyi， born in 1998， M. S. candidate. Her research interestsinclude natural language processing， knowledge graph.
LIU Junling ， born in 1972， Ph. D.， associate professor. Herresearch interests include spatio-temporal data query， data mining.
XU Jingke ， born in 1976， Ph. D.， professor. His research interestsinclude spatio-temporal database， data mining.
Supported by:
This work is partially supported by National Key R&D Program（2021YFF0306303）； Project of Educational Department of LiaoningProvince（ LJKZ0582）.

社交媒体数据中水灾事件求助信息提取模型

孙焕良¹^,²(), 王思懿¹^,², 刘俊岭¹^,², 许景科¹^,²^,³

^1.沈阳建筑大学计算机科学与工程学院，沈阳 110168
^2.辽宁省城市建设大数据管理与分析重点实验室（沈阳建筑大学），沈阳 110168
^3.国家特种计算机工程技术研究中心沈阳分中心，沈阳 110168

通讯作者: 孙焕良
作者简介:孙焕良（1969—），男，黑龙江望奎人，教授，博士生导师，博士，CCF高级会员，主要研究方向：空间数据管理、数据挖掘 sunhl@sjzu.edu.cn
王思懿（1998—），女，黑龙江大庆人，硕士研究生，CCF会员，主要研究方向：自然语言处理、知识图谱
刘俊岭（1972—），女，辽宁沈阳人，副教授，博士，CCF会员，主要研究方向：时空数据查询、数据挖掘
许景科（1976—），男，辽宁海城人，教授，博士，CCF会员，主要研究方向：时空数据库、数据挖掘。
基金资助:
国家重点研发计划项目(2021YFF0306303);辽宁省教育厅项目(LJKZ0582)

Abstract

Abstract:

Because of data inconsistency and different information importance， how to extract desired information from the social media precisely and automatically becomes a challenging task. To solve the above problem， through Formal Concept Analysis （FCA）， word co-occurrence relationship and contextual semantics， the knowledge system of flood event was built up. Using the constructed knowledge system， a type of fine-tuned Large Language Model （LLM）， ChatFlowFlood， an information extraction model based on the TencentPretrain framework， was developed. The in-live disaster information such as locations and material shortage could be extracted only with few mannual annotations. Based on the information extraction model， Fuzzy Analytic Hierarchy Process （FAHP） and CRITIC （CRiteria Importance Through Intercriteria Correlation） methods were combined to evaluate the rescue priority of help-seeking information subjectively and objectively， which helped decision makers understand the emergency degree of the disaster. The experimental results show that on Chinese social media data， compared with the ChatFlow-7B model， the F_BERT index of the ChatFlowFlood model is improved by 73.09%.

Key words: Chinese social media, Named Entity Recognition (NER), Large Language Model (LLM), instruction fine-tuning, flood event

摘要：

由于社交媒体平台上所发布的非结构化信息存在数据不一致、重要程度不同等问题，使自动准确抽取所需信息并标注受灾级别成为一个有挑战性的工作。因此，结合形式概念分析（FCA）、词共现关系和上下文语义信息构建了水灾事件知识体系。利用所构建的知识体系，基于TencentPretrain框架对大规模语言预训练模型（LLM）进行指令微调，构建了ChatFlowFlood信息抽取模型，可以在少量人工标记情况下，准确自动抽取被困情况、紧缺物资等信息；在信息抽取模型的基础上，通过模糊层次分析法（FAHP）和CRITIC法（CRiteria Importance Through Intercriteria Correlation）主客观结合评定求助信息的救援优先级，帮助决策者理解灾情紧急程度。实验结果表明，在中文社交媒体数据上，与ChatFlow-7B模型相比，ChatFlowFlood模型的F_BERT指标提升了73.09%。

关键词: 中文社交媒体, 命名实体识别, 大规模语言模型, 指令微调, 水灾事件

CLC Number:

TP391

Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data[J]. Journal of Computer Applications, 2024, 44(8): 2437-2445.

孙焕良, 王思懿, 刘俊岭, 许景科. 社交媒体数据中水灾事件求助信息提取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2437-2445.

Figures/Tables 12

References 42

1	救灾和物资保障司.应急管理部发布 2023年7月全国自然灾害情况［EB/OL］. （2023-08-04）［2023-08-05］. .
	Disaster Relief and Material Support Division. The Ministry of Emergency Management released the national natural disaster situation in July 2023 ［EB/OL］. （2023-08-04）［2023-08-05］. .
2	郑万波，陈慧敏，吴燕清，等.基于应急救援的信息共享策略仿真［J］.计算机应用，2023，43（1）：306-311.
	ZHENG W B， CHEN H M， WU Y Q， et al. Simulation of information sharing strategy based on emergency rescue ［J］. Journal of Computer Applications，2023，43（1）： 306-311.
3	李玥琪，王晰巍，王楠阿雪，等.突发事件下社交媒体网络舆情风险识别及预警模型研究［J］.情报学报， 2022， 41（10）： 1085-1099.
	LI Y Q， WANG X W， WANG N A X， et al. Risk identification and early warning model of social media network public opinion in emergencies［J］. Journal of the China Society for Scientific and Technical Information， 2022， 41（10）： 1085-1099.
4	DONG Z S， MENG L， CHRISTENSON L， et al. Social media information sharing for natural disaster response［J］. Natural Hazards， 2021， 107： 2077-2104.
5	王益鹏，张雪英，党玉龙，等.顾及时空过程的台风灾害事件知识图谱表示方法［J］.地球信息科学学报，2023，25（6）：1228-1239.
	WANG Y P， ZHANG X Y， DANG Y L， et al. Knowledge graph representation of typhoon disaster events based on spatiotemporal processes［J］. Journal of Geo-Information Science，2023， 25（6）：1228-1239.
6	鲍彤，章成志.ChatGPT中文信息抽取能力测评——以三种典型的抽取任务为例［J］.数据分析与知识发现， 2023， 7（9）： 1-10.
	BAO T， ZHANG C Z. Extracting Chinese information with ChatGPT： an empirical study by three typical tasks［J/OL］. Data Analysis and Knowledge Discovery， 2023， 7（9）： 1-10.
7	META. ChatFlow-7B ［EB/OL］. （2023-08-01）［2023-06-15］. .
8	田董炜，仇阿根，张志然.多层次地质灾害领域本体构建与应用［J］.测绘科学，2019，44（6）：330-336.
	TIAN D W， QIU A G， ZHANG Z R. Domain ontology construction and application of multi-level geological disasters［J］. Science of Surveying and Mapping，2019，44（6）：330-336.
9	杨选辉，邓硕，刘春年.基于本体的自然灾害应急物流领域知识表示［J］.图书馆学研究，2012（22）： 60-66.
	YANG X H， DENG S， LIU C N. Knowledge representation of natural disaster emergency logistics domain based on ontology ［J］. Research on Library Science， 2012（22）： 60-66.
10	KHADIR A C， ALIANE H， GUESSOUM A. Ontology learning： Grand tour and challenges［J］. Computer Science Review， 2021， 39： 100339.
11	任飞亮，沈继坤，孙宾宾，等.从文本中构建领域本体技术综述［J］.计算机学报，2019，42（3）：654-676.
	REN F L， SHEN J K， SUN B B， et al. A review for domain ontology construction from text ［J］. Chinese Journal of Computers，2019，42（3）： 654-676.
12	KRAVETS P， BUROV Y， LYTVYN V， et al. Gaming method of ontology clusterization［J］. Webology， 2019， 16（1）：55-76.
13	胡段牧，袁武，牛方曲，等.中文文本蕴含气象灾害事件信息多模型融合抽取方法［J］.地球信息科学学报，2022， 24（12）： 2342-2355.
	HU D M， YUAN W， NIU F Q， et al. Multi-model fusion extraction method for Chinese text implicative meteorological disasters event information［J］. Journal of Geo-Information Science， 2022， 24（12）： 2342-2355.
14	唐晓波，王琼赋，牟昊.基于词共现与词向量的概念层次关系自动抽取模型——以学术论文评价领域为例［J］.情报科学， 2022， 40（10）： 3-11.
	TANG X B， WANG Q F， MU H. Automatic extraction of concept hierarchies based on word co-occurrence and word vector： taking the academic paper evaluation as an example ［J］. Information Science， 2022， 40（10）： 3-11.
15	赵继贵，钱育蓉，王魁，等.中文命名实体识别研究综述［J］.计算机工程与应用，2024， 60（1）： 15-27.
	ZHAO J G， QIAN Y R， WANG K， et al. Survey of Chinese named entity recognition research［J］. Computer Engineering and Applications， 2024， 60（1）： 15-27.
16	KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1746-1751.
17	GRAVES A. Long short-term memory［C］// Supervised Sequence Labelling with Recurrent Neural Networks. Berlin： Springer， 2012： 37-45.
18	代建华，彭若瑶，许路，等.基于深度神经网络的信息抽取研究综述［J］.西南师范大学学报（自然科学版）， 2022， 47（4）： 1-11.
	DAI J H， PENG R Y， XU L， et al. A survey of information extraction based on deep neural networks［J］. Journal of Southwest Normal University （Natural Science Edition），2022， 47（4）： 1-11.
19	LI X， YAN H， QIU X， et al. FLAT：Chinese NER using flat- lattice transformer［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020：6836-6842.
20	WU S， SONG X， FENG Z. MECT： multi-metadata embedding based cross-transformer for Chinese named entity recognition ［EB/OL］. ［2022-12-20］. .
21	FLORIDI L， CHIRIATTI M. GPT-3： its nature， scope， limits， and consequences［J］. Minds and Machines， 2020， 30： 681-694.
22	TOUVRON H， LAVRIL T， IZACARD G， et al. LLaMA： open and efficient foundation language models［EB/OL］. （2023-05-25）［2023-06-10］. .
23	Google. Google responds to OpenAI with its own chatbot： ‘bard’［EB/OL］. （2023-02-06）［2023-06-07］. .
24	百度.文心大模型［EB/OL］. ［2023-06-07］. .
	BAIDU. Wen Xin big model ［EB/OL］. ［2023-06-10］. .
25	BROWN T B， MANN B， RYDER N， et al. Language models are few-shot learners［J］. Advances in Neural Information Processing Systems， 2020， 33： 1877-1901.
26	OUYANG L， WU J， JIANG X， et al. Training language models to follow instructions with human feedback［EB/OL］. ［2023-08-01］. .
27	SANH V， WEBSON A， RAFFEL C， et al. Multitask prompted training enables zero-shot task generalization ［EB/OL］. （2022-05-17）［2023-06-10］. .
28	ZHOU C， LIU P， XU P， et al. LIMA： less is more for alignment ［EB/OL］. （2023-05-18）［2023-06-10］. .
29	CHEN H， ZHANG Y， ZHANG Q， et al. Maybe only 0.5% data is needed： a preliminary exploration of low training data instruction tuning ［EB/OL］. （2023-05-16）［2023-06-10］. .
30	ZHAO Z， LI Y， HOU C， et al. TencentPretrain： a scalable and flexible toolkit for pre-training models of different modalities ［EB/OL］. （2023-06-11）［2023-06-20］. .
31	田兴鹏，朱晓荣，朱洪波.基于KM算法的分布式无线节点任务分配方法［J］.北京邮电大学学报， 2020， 43（6）： 96-102.
	TIAN X P， ZHU X R， ZHU H B. Distributed wireless node task allocation method based on KM algorithm ［J］. Journal of Beijing University of Posts and Telecommunications， 2020， 43（6）： 96-102.
32	李文韬，张明洁，张京红，等.基于模糊综合评价法的海南热带气旋灾害经济损失影响评估［J］.热带农业科学， 2022， 42（9）：133-139.
	LI W T， ZHANG M J， ZHANG J H， et al. The assessment of economic loss of tropical cyclone disaster in Hainan based on fuzzy and comprehensive evaluation［J］. Chinese Journal of Tropical Agriculture， 2022， 42（9）： 133-139.
33	朱贵玉，方世跃，尹春风，等.基于FAHP-CRITIC的暴雨洪涝灾害风险评估：以西安市临潼区为例［J］. 水利水电技术， 2023， 54（4）： 37-48.
	ZHU G Y， FANG S Y， YIN C F， et al. Risk assessment of rainstorm-flood disaster based on FAHP-CRITIC： taking Lintong District of Xi’an City as an example［J］. Water Resources and Hydropower Engineering， 2023， 54（4）： 37-48.
34	PENG J， ZHANG J. Urban flooding risk assessment based on GIS-game theory combination weight： a case study of Zhengzhou City［J］. International Journal of Disaster Risk Reduction， 2022， 77： 103080.
35	全国减灾救灾标准化技术委员会（SAC/TC 307）. 自然灾害分类与代码：［S］. 北京：中华人民共和国国家质量监督检验检疫总局，2012-10-12.
	National Standardization Technical Committee for Disaster Reduction and Relief （SAC/TC 307）. Classification and Code of Natural Disasters：［S］. Beijing： General Administration of Quality Supervision， Inspection and Quarantine of the People’s Republic of China， 2012-10-12.
36	CUI Y， HUANG C. A Chinese text classification method based on BERT and convolutional neural network［C］// Proceedings of the 2021 7th International Conference on Systems and Informatics. Piscataway： IEEE， 2021：1-6.
37	NERUDA G A， WINARKO E. Traffic event detection from twitter using a combination of CNN and BERT［C］// Proceedings of the 2021 International Conference on Advanced Computer Science and Information Systems. Piscataway： IEEE， 2021：1-7.
38	DING B， QIN C， LIU L， et al. Is GPT-3 a good data annotator？［EB/OL］. （2023-06-14）［2023-06-20］. .
39	TAORI R， GULRAJANI I， ZHANG T， et al. Stanford Alpaca： an instruction-following LLaMA model［EB/OL］. （2023-04-03）［2023-06-25］. .
40	史研.水利行业标准《洪涝灾情评估标准》（SL579—2012）出版发行［J］.中国水利，2012（18）：72.
	SHI Y. Water conservancy industry standard “Flood Disaster Evaluation Standard” （SL579-2012） was published and issued ［J］. China Water Resources，2012（18）：72.
41	吴海波，张珺茹，沈玉玲.长护险背景下失能人群机构护理等级评定标准研究［J］.上海保险， 2020（11）： 39-44.
	WU H B， ZHANG J R， SHEN Y L. Research on institutional nursing rating standards for disabled people under the background of long-term care insurance ［J］. Shanghai Insurance， 2020（11）： 39-44.
42	ZHANG T， KISHORE V， WU F， et al. BERTScore： evaluating text generation with BERT ［EB/OL］. （2019-02-06）［2023-05-07］. .

序号	文本	应抽取信息	等级
1	崔庙镇王宗店村有老人骨折被困，现在救援队过不去没有办法，寻求社会帮助，急需冲锋舟，快艇，救生衣	地点：崔庙镇王宗店村；人员：老人；情况：老人骨折被困；物资：冲锋舟，快艇，救生衣	紧急
2	凤泉英才幼儿园困了8个孩子，没电没水，水深将近两米。18337358185，联系一下救援队把孩子救出来	地点：凤泉英才幼儿园；人员：孩子；数量：8个；生命线：没电没水，水深将近两米	重要
3	救援队员正在卫辉重灾区救援，晚上需要安排食宿地方，20人。电话18238798911	地点：卫辉重灾区；物资种类：食宿地方；物资数量：20人	一般
4	浚县滑县交接处，没走村民，不要观望，抓紧走了		无关

序号	文本	应抽取信息	等级
1	崔庙镇王宗店村有老人骨折被困，现在救援队过不去没有办法，寻求社会帮助，急需冲锋舟，快艇，救生衣	地点：崔庙镇王宗店村；人员：老人；情况：老人骨折被困；物资：冲锋舟，快艇，救生衣	紧急
2	凤泉英才幼儿园困了8个孩子，没电没水，水深将近两米。18337358185，联系一下救援队把孩子救出来	地点：凤泉英才幼儿园；人员：孩子；数量：8个；生命线：没电没水，水深将近两米	重要
3	救援队员正在卫辉重灾区救援，晚上需要安排食宿地方，20人。电话18238798911	地点：卫辉重灾区；物资种类：食宿地方；物资数量：20人	一般
4	浚县滑县交接处，没走村民，不要观望，抓紧走了		无关

示例	分类标签
…郑州预测28日11点到29日11点雨量最大…	水灾提示
愿中国人平安在紧要关头发挥中国力量…	舆情信息
郑州市这边有三十多床被子，几箱口罩以及消毒用品…	救援事件
守护贾鲁河的700多名消防官兵急需背心，内裤，袜子等物资，村里进水高度已超过2米，没水没电已成重灾区。	求助事件
我们专业所以价值贵千丝发语®高级发型定制…	无关信息

示例	分类标签
…郑州预测28日11点到29日11点雨量最大…	水灾提示
愿中国人平安在紧要关头发挥中国力量…	舆情信息
郑州市这边有三十多床被子，几箱口罩以及消毒用品…	救援事件
守护贾鲁河的700多名消防官兵急需背心，内裤，袜子等物资，村里进水高度已超过2米，没水没电已成重灾区。	求助事件
我们专业所以价值贵千丝发语®高级发型定制…	无关信息

一级指标	二级指标	一级指标	二级指标
人员伤亡情况	受灾人数	物资需求情况	救援物资
	疾病情况		生活物资
	年龄		医疗物资
洪涝问题	被淹情况	能源中断情况	水
洪涝问题	淹没深度		电
房屋倒损情况	裂缝		燃气
房屋倒损情况	爆炸		通信

Help-seeking information extraction model for flood event in social media data

社交媒体数据中水灾事件求助信息提取模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 42

Related Articles 12

Recommended Articles

Metrics

程度	受灾人口	年龄	疾病情况	淹没深度
Ⅰ	（0，10］	成人	慢性病（高血压、高血脂、糖尿病等）	部分腿部或脚部
Ⅱ	（10，100］	成人孕妇	传染病（感冒发烧）	腿部至腰部，行走困难
Ⅲ	（100，500］	儿童	腿脚不便：需要借助轮椅或其他工具	腰部以上部位，导致完全丧失肢体功能
Ⅳ	>500	70岁以上老人	完全无法行走：肢体瘫痪、神经障碍等	个体所在的楼层已经完全淹没，完全无法脱离受灾环境

实体	实体类别数	实体	实体类别数
地点	448	人员数	149
人员类型	182	物资种类	356
人员情况	189	物资数	37
人员年龄	45	生命线状况	241

模型	提示模板	输入文本	标注结果	答案
ChatFlow-7B	｛text｝提取文本信息，如地点、人员类型、人员情况、人员数量、物资需求、物资数量、生命线状况	凤泉区大块镇北庄村口赛特钢瓶厂有七人被困，已经被泡两天一夜没有进食了，其中还有两位70岁老人急需救援，已经停水停气提取文本信息，如地点、人员类型、人员情况、人员年龄、人员数量、物资种类、生命线状况	（地点：凤泉区大块镇北庄村口赛特钢瓶厂），（人员类型：老人），（人员情况：被困，已经被泡两天一夜没有进食了），（人员年龄：70岁），（人员数量：七人），（物资种类：食），（物资数量：），（生命线状况：已经停电停气）	地点：凤泉区大块镇北庄村，人员类型：无，人员情况：七人被困，资种类：无，生命线状况：无
ChatFlowFlood	｛text｝提取信息	凤泉区大块镇北庄村口赛特钢瓶厂有七人被困，已经被泡两天一夜没有进食了，其中还有两位70岁老人急需救援，已经停水停气提取信息	标注结果如ChatFlow-7B模型	地点：凤泉区大块镇北庄村口赛特钢瓶厂，人员类型：老人，人员情况：被困，已经被泡两天一夜没有进食了，人员年龄：70岁，人员数量：七人，物资种类：没有进食，物资数量：，生命线状况：已经停水停气

模型	标签数据量	P_BERT/%	R_BERT/%	F_BERT/%
BERT-FLAT	800	63.47	61.54	62.49
BERT-MECT	800	75.81	69.23	72.37
BERT-IDCNN+CRF	800	70.43	73.12	71.75
BERT-BiLSTM-CRF	800	75.05	72.29	73.64
ChatGLM-6B	500	70.36	73.62	71.95
Chinese-LLaMA-Alpaca-7B	500	65.04	61.95	63.46
ChatFlow-7B	0	49.18	50.33	49.75
ChatFlowFlood	500	86.17	86.06	86.11

[1]	Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712.
[2]	Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708.
[3]	Xiaoyan ZHANG, Zhengyu DUAN. Cross-lingual zero-resource named entity recognition model based on sentence-level generative adversarial network [J]. Journal of Computer Applications, 2023, 43(8): 2406-2411.
[4]	Jingsheng LEI, Kaijun LA, Shengying YANG, Yi WU. Joint entity and relation extraction based on contextual semantic enhancement [J]. Journal of Computer Applications, 2023, 43(5): 1438-1444.
[5]	Jie HU, Yan HU, Mengchi LIU, Yan ZHANG. Chinese named entity recognition based on knowledge base entity enhanced BERT model [J]. Journal of Computer Applications, 2022, 42(9): 2680-2685.
[6]	Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer [J]. Journal of Computer Applications, 2022, 42(9): 2693-2700.
[7]	Yayao ZUO, Haoyu CHEN, Zhiran CHEN, Jiawei HONG, Kun CHEN. Named entity recognition method combining multiple semantic features [J]. Journal of Computer Applications, 2022, 42(7): 2001-2008.
[8]	Yi ZHANG, Shuangsheng WANG, Bin HE, Peiming YE, Keqiang LI. Named entity recognition method of elementary mathematical text based on BERT [J]. Journal of Computer Applications, 2022, 42(2): 433-439.
[9]	Lanlan ZENG, Yisong WANG, Panfeng CHEN. Named entity recognition based on BERT and joint learning for judgment documents [J]. Journal of Computer Applications, 2022, 42(10): 3011-3017.
[10]	Yue WANG, Mengxuan WANG, Sheng ZHANG, Wen DU. Alarm text named entity recognition based on BERT [J]. Journal of Computer Applications, 2020, 40(2): 535-540.
[11]	YAN Hong, CHEN Xingshu, WANG Wenxian, WANG Haizhou, YIN Mingyong. Recognition model for French named entities based on deep neural network [J]. Journal of Computer Applications, 2019, 39(5): 1288-1292.
[12]	ZHOU Xiang, LI Shaobo, YANG Guanci. Entity recognition of clothing commodity attributes [J]. Journal of Computer Applications, 2015, 35(7): 1945-1949.