基于事件表示的机器阅读理解模型

doi:10.11772/j.issn.1001-9081.2021050719

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (7): 1979-1984.DOI: 10.11772/j.issn.1001-9081.2021050719

• 人工智能 •

基于事件表示的机器阅读理解模型

王元龙(), 刘晓敏, 张虎

山西大学计算机与信息技术学院，太原 030006

收稿日期:2021-05-07 修回日期:2022-02-21 接受日期:2022-02-25 发布日期:2022-03-15 出版日期:2022-07-10
通讯作者: 王元龙
作者简介:王元龙（1983—），男，山西大同人，副教授，博士，CCF会员，主要研究方向：自然语言处理、机器学习
刘晓敏（2000—），女，山西朔州人，硕士研究生，主要研究方向：自然语言处理
张虎（1979—），男，山西大同人，副教授，博士，CCF会员，主要研究方向：自然语言处理。
基金资助:
国家自然科学基金资助项目(61806117)

Machine reading comprehension model based on event representation

Yuanlong WANG(), Xiaomin LIU, Hu ZHANG

School of Computer and Information Technology，Shanxi University，Taiyuan Shanxi 030006，China

Received:2021-05-07 Revised:2022-02-21 Accepted:2022-02-25 Online:2022-03-15 Published:2022-07-10
Contact: Yuanlong WANG
About author:WANG Yuanlong， born in 1983， Ph. D.， associate professor. His research interests include natural language processing， machine learning.
LIU Xiaomin， born in 2000， M. S. candidate. Her research interests include natural language processing.
ZHANG Hu， born in 1979， Ph. D.， associate professor. His research interests include natural language processing.
Supported by:
National Natural Science Foundation of China(61806117)

摘要/Abstract

摘要：

要真正理解一段语篇，在阅读理解过程对原文主旨线索的把握是非常重要的。针对机器阅读理解中主旨线索类型的问题，提出了基于事件表示的机器阅读理解分析方法。首先，通过线索短语从阅读材料中抽取篇章事件图，其中包括事件的表示、事件要素的抽取和事件关系的抽取等；然后，综合考虑事件的时间要素、情感要素以及每个词在文档中的重要性，采用TextRank算法选出线索相关的事件；最后，依据所选出的线索事件构建问题的答案。在收集了339道线索类题组成的测试集上，实验结果表明所提方法在BLEU和CIDEr评价指标上与基于TextRank算法的句子排序方法相比均有所提升，具体来说，BLEU-4指标提升了4.1个百分点，CIDEr指标提升了9个百分点。

关键词: 自然语言处理, 阅读理解, 主旨线索类型问题, 事件表示, 篇章事件图

Abstract:

In order to truly understand a piece of text， it is very important to grasp the main clues of the original text in the process of reading comprehension. Aiming at the questions of main clues in machine reading comprehension， a machine reading comprehension method based on event representation was proposed. Firstly， the textual event graph including the representation of events， the extraction of event elements and the extraction of event relations was extracted from the reading material by clue phrases. Secondly， after considering the time elements， emotional elements of events and the importance of each word in the document， the TextRank algorithm was used to select the events related to the clues. Finally， the answers of the questions were constructed based on the selected clue events. Experimental results show that on the test set composed of the collected 339 questions of clues， the proposed method is better than the sentence ranking method based on TextRank algorithm on BiLingual Evaluation Understudy （BLEU） and Consensus-based Image Description Evaluation （CIDEr） evaluation indexes. In specific， BLEU-4 index is increased by 4.1 percentage points and CIDEr index is increased by 9 percentage points.

Key words: natural language processing, reading comprehension, question of main clues, event representation, textual event graph

中图分类号:

TP391.1

王元龙, 刘晓敏, 张虎. 基于事件表示的机器阅读理解模型[J]. 计算机应用, 2022, 42(7): 1979-1984.

Yuanlong WANG, Xiaomin LIU, Hu ZHANG. Machine reading comprehension model based on event representation[J]. Journal of Computer Applications, 2022, 42(7): 1979-1984.

图/表 4

参考文献 24

1	CHEN D Q， BOLTON J， MANNING C D. A thorough examination of the CNN/Daily Mail reading comprehension task［C］// Proceeding of the 54th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2016： 2359-2367. 10.18653/v1/p16-1223
2	CUI Y M， LIU T， CHEN Z P， et al. Consensus attention-based neural networks for Chinese reading comprehension［C］// Proceeding of the 26th International Conference on Computational Linguistics： Technical Papers. ［S.l.］： The COLING 2016 Organizing Committee， 2016：1777-1786. 10.18653/v1/p17-1055
3	RICHARDSON M， BURGES C J C， RENSHAW E. MCTest： a challenge dataset for the open-domain machine comprehension of text［C］// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2013： 193-203.
4	LAI G K， XIE Q Z， LIU H X， et al. RACE： large-scale Reading comprehension dataset from examinations［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2017：785-794. 10.18653/v1/d17-1082
5	RAJPURKAR P， ZHANG J， LOPYREV K， et al. SQuAD： 100，000+ questions for machine comprehension of text［C］// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2016： 2383-2392. 10.18653/v1/d16-1264
6	JOSHI M， CHOI E， WELD D S， et al. TriviaQA： a large scale distantly supervised challenge dataset for reading comprehension［C］// Proceeding of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2017： 1601-1611. 10.18653/v1/p17-1147
7	HE W， LIU K， LIU J， et al. DuReader： a Chinese reading comprehension dataset from real-world applications［C］// Proceedings of the 2018 Workshop on Machine Reading for Question Answering. Stroudsburg， PA： Association for Computational Linguistics， 2018：37-46. 10.18653/v1/w18-2605
8	YANG Z L， QI P， ZHANG S Z， et al. HotpotQA： a dataset for diverse， explainable multi-hop question answering［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2018：2369-2380. 10.18653/v1/d18-1259
9	KOČISKÝ T， SCHWARZ J， BLUNSOM P， et al. The NarrativeQA reading comprehension challenge［J］. Transactions of the Association for Computational Linguistics， 2017， 6：317-328. 10.1162/tacl_a_00023
10	郭少茹，张虎，钱揖丽，等. 面向高考阅读理解的句子语义相关度［J］. 清华大学学报（自然科学版）， 2017， 57（6）：575-579， 585.
	GUO S R， ZHANG H， QIAN Y L， et al. Semantic relevancy between sentences for Chinese reading comprehension on college entrance examinations［J］. Journal of Tsinghua University （Science and Technology）， 2017， 57（6）：575-579， 585.
11	王元龙，李茹，张虎，等. 阅读理解中因果关系类选项的研究［J］. 清华大学学报（自然科学版）， 2018， 58（3）：272-278.
	WANG Y L， LI R， ZHANG H， et al. Causal options in Chinese reading comprehension［J］. Journal of Tsinghua University （Science and Technology）， 2018， 58（3）：272-278.
12	王智强，李茹，梁吉业，等. 基于汉语篇章框架语义分析的阅读理解问答研究［J］. 计算机学报， 2016， 39（4）： 795-807. 10.11897/SP.J.1016.2016.00795
	WANG Z Q， LI R， LIANG J Y， et al. Research on question answering for reading comprehension based on Chinese discourse frame semantic parsing［J］. Chinese Journal of Computers， 2016， 39（4）： 795-807. 10.11897/SP.J.1016.2016.00795
13	谭红叶，赵红红，李茹. 面向阅读理解复杂问题的句子融合［J］. 中文信息学报， 2017， 31（1）：8-16.
	TAN H Y， ZHAO H H， LI R. Sentence fusion for complex problems in reading comprehension［J］. Journal of Chinese Information Processing， 2017， 31（1）：8-16.
14	张兆滨，王素格，陈鑫，等. 阅读理解中观点类问题的扩展研究［J］. 中文信息学报， 2020， 34（6）： 89-96， 105. 10.3969/j.issn.1003-0077.2020.06.012
	ZHANG Z B， WANG S G， CHEN X， et al. Question expansion for machine reading comprehension of opinion［J］. Journal of Chinese Information Processing， 2020， 34（6）： 89-96， 105. 10.3969/j.issn.1003-0077.2020.06.012
15	谭红叶，屈保兴. 面向多类型问题的阅读理解方法研究［J］. 中文信息学报， 2020， 34（6）： 81-88. 10.3969/j.issn.1003-0077.2020.06.011
	TAN H Y， QU B X. An approach to multi-type question machine reading comprehension［J］. Journal of Chinese Information Processing， 2020， 34（6）：81-88. 10.3969/j.issn.1003-0077.2020.06.011
16	杨陟卓，李春转，张虎，等. 基于CFN和篇章主题的概括型问答题的解答［J］. 中文信息学报， 2020， 34（12）： 73-81. 10.3969/j.issn.1003-0077.2020.12.011
	YANG Z Z， LI C Z， ZHANG H， et al. Question answering for overview questions based on CFN and discourse topic［J］. Journal of Chinese Information Processing， 2020， 34（12）： 73-81. 10.3969/j.issn.1003-0077.2020.12.011
17	ZWANN R A， RADVANSKY G A， HILLIARD A E， et al. Constructing multidimensional situation models during reading［J］. Scientific Studies of Reading， 1998， 2（3）：199-220. 10.1207/s1532799xssr0203_2
18	HUANG L F， JI H， CHO K， et al. Zero-shot transfer learning for event extraction［C］// Proceeding of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2018： 2160-2170. 10.18653/v1/p18-1201
19	CHAMBERS N， JURAFSKY D. Unsupervised learning of narrative event chains［C］// Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： Association for Computational Linguistics， 2008：789-797.
20	PALMER T， GILDEA D， KINGSBURY P. The proposition bank： an annotated corpus of semantic roles［J］. Computational Linguistics， 2005， 31（1）： 71-106. 10.1162/0891201053630264
21	PRASAD R， MILTSAKAKI E， DINESH N， et al. The Penn Discourse TreeBank 2.0 annotation manual［R/OL］. （2007-12-17）［2021-02-12］..
22	KRUENGKRAI C， TORISAWA K， HASHIMOTO C， et al. Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017：3466-3473. 10.1609/aaai.v31i1.11005
23	KANG D， GANGAL V， LU A， et al. Detecting and explaining causes from text for a time series event［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2017：2758-2767. 10.18653/v1/d17-1292
24	徐琳宏，林鸿飞，潘宇，等. 情感词汇本体的构造［J］. 情报学报， 2008， 27（2）： 180-185. 10.3969/j.issn.1000-0135.2008.02.004
	XU L H， LIN H F， PAN Y， et al. Constructing the affective lexicon ontology［J］. Journal of the China Society for Scientific and Technical Information， 2008， 27（2）： 180-185. 10.3969/j.issn.1000-0135.2008.02.004

框架	词元
时间测量	秒，分钟，小时，天，周，月，年，世纪
时间向量	先，先期，以还，后，前，以前，以后，之后，之前，从此，从头，从小，先，从，后来，今后，先前
时间跨度	日子，日月，时间，时候，时期，周，时代，时辰，时光，时长，时段，平生，生平
历法单位	秒，分，时，天，日，周，月，年，世纪，早上，凌晨，黄昏，中午，下午，晚上，年代，春，夏，秋，冬，春季，夏季，秋季，冬季，公元，学期，学年，今日，今天，今年，周末
时量场景	期间，时段，时期，过程
时间亚区	开始，早期，结束，晚期，中间，开端，后，前，早，晚
相对时间	后，后来，前，前期，之前，之后，以前，同时，过后，过去，当，迟，晚，早，准时，跟着，接着，迟到，提前

框架	词元
时间测量	秒，分钟，小时，天，周，月，年，世纪
时间向量	先，先期，以还，后，前，以前，以后，之后，之前，从此，从头，从小，先，从，后来，今后，先前
时间跨度	日子，日月，时间，时候，时期，周，时代，时辰，时光，时长，时段，平生，生平
历法单位	秒，分，时，天，日，周，月，年，世纪，早上，凌晨，黄昏，中午，下午，晚上，年代，春，夏，秋，冬，春季，夏季，秋季，冬季，公元，学期，学年，今日，今天，今年，周末
时量场景	期间，时段，时期，过程
时间亚区	开始，早期，结束，晚期，中间，开端，后，前，早，晚
相对时间	后，后来，前，前期，之前，之后，以前，同时，过后，过去，当，迟，晚，早，准时，跟着，接着，迟到，提前

方法	BLEU-1	BLEU-2	BLEU-3	BLEU-4	CIDEr
句子排序方法	56.3	38.2	23.4	12.4	46.3
文献［16］方法	57.0	39.2	23.9	13.1	47.0
本文方法	64.2	45.6	27.1	16.5	55.3

方法	BLEU-1	BLEU-2	BLEU-3	BLEU-4	CIDEr
句子排序方法	56.3	38.2	23.4	12.4	46.3
文献［16］方法	57.0	39.2	23.9	13.1	47.0
本文方法	64.2	45.6	27.1	16.5	55.3

[1]	王颖洁, 朱久祺, 汪祖民, 白凤波, 弓箭. 自然语言处理在文本情感分析领域应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1011-1020.
[2]	刘羽茜, 刘玉奇, 张宗霖, 卫志华, 苗冉. 注入注意力机制的深度特征融合新闻推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 426-432.
[3]	彭宇, 李晓瑜, 胡世杰, 刘晓磊, 钱伟中. 基于BERT的三阶段式问答模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 64-70.
[4]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[5]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[6]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.
[7]	周险兵, 樊小超, 任鸽, 杨勇. 基于多层次语义特征的英文作文自动评分方法[J]. 计算机应用, 2021, 41(8): 2205-2211.
[8]	王朱君, 王石, 李雪晴, 朱俊武. 基于深度学习的事件因果关系抽取综述[J]. 《计算机应用》唯一官方网站, 2021, 41(5): 1247-1255.
[9]	李雪晴, 王石, 王朱君, 朱俊武. 自然语言生成综述[J]. 《计算机应用》唯一官方网站, 2021, 41(5): 1227-1235.
[10]	李文惠, 曾上游, 王金金. 基于改进注意力机制的图像描述生成算法[J]. 计算机应用, 2021, 41(5): 1262-1267.
[11]	刘睿珩, 叶霞, 岳增营. 面向自然语言处理任务的预训练模型综述[J]. 《计算机应用》唯一官方网站, 2021, 41(5): 1236-1246.
[12]	姚博文, 曾碧卿, 蔡剑, 丁美荣. 基于预训练和多层次信息的中文人物关系抽取模型[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3637-3644.
[13]	杨璐, 何明祥. 基于门控机制和卷积神经网络的中文文本情感分析模型[J]. 计算机应用, 2021, 41(10): 2842-2848.
[14]	杨国峰, 杨勇. 基于BERT的常见作物病害问答系统问句分类[J]. 计算机应用, 2020, 40(6): 1580-1586.
[15]	赵亚欧, 张家重, 李贻斌, 付宪瑞, 生伟. 融合基于语言模型的词嵌入和多尺度卷积神经网络的情感分析[J]. 计算机应用, 2020, 40(3): 651-657.

基于事件表示的机器阅读理解模型

Machine reading comprehension model based on event representation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 4

参考文献 24

相关文章 15

编辑推荐

Metrics