《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (7): 2004-2010.DOI: 10.11772/j.issn.1001-9081.2023081178

• 人工智能 • 上一篇    下一篇

面向机器阅读理解的边界感知方法

刘青1,2,3, 陈艳平1,2,3(), 邹安琪1,2,3, 黄瑞章1,2,3, 秦永彬1,2,3   

  1. 1.贵州大学 文本计算与认知智能教育部工程研究中心, 贵阳 550025
    2.公共大数据国家重点实验室(贵州大学), 贵阳 550025
    3.贵州大学 计算机科学与技术学院, 贵阳 550025
  • 收稿日期:2023-09-01 修回日期:2023-09-20 接受日期:2023-10-09 发布日期:2024-07-18 出版日期:2024-07-10
  • 通讯作者: 陈艳平
  • 作者简介:刘青(1996—),女,湖南衡阳人,硕士研究生,主要研究方向:自然语言处理、机器阅读理解;
    邹安琪(1996—),男,贵州安顺人,博士研究生,主要研究方向:自然语言处理、智能问答;
    黄瑞章(1979—),女,天津人,教授,博士,CCF会员,主要研究方向:大数据与数据挖掘、信息提取;
    秦永彬(1980—),男,山东烟台人,教授,博士,CCF高级会员,主要研究方向:大数据管理与应用、多源数据融合与应用。
    第一联系人:陈艳平(1980—),男,贵州长顺人,教授,博士,CCF会员,主要研究方向:人工智能、自然语言处理;
  • 基金资助:
    国家自然科学基金资助项目(62166007);贵州省科技支撑计划项目([2022]277)

Boundary-aware approach to machine reading comprehension

Qing LIU1,2,3, Yanping CHEN1,2,3(), Anqi ZOU1,2,3, Ruizhang HUANG1,2,3, Yongbin QIN1,2,3   

  1. 1.Text Computing and Cognitive Intelligence Engineering Research Center of Ministry of Education,Guizhou University,Guiyang Guizhou 550025,China
    2.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    3.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
  • Received:2023-09-01 Revised:2023-09-20 Accepted:2023-10-09 Online:2024-07-18 Published:2024-07-10
  • Contact: Yanping CHEN
  • About author:LIU Qing, born in 1996, M. S. candidate. Her research interests include natural language processing, machine reading comprehension.
    ZOU Anqi, born in 1996, Ph. D. candidate. His research interests include natural language processing, intelligent question-answering.
    HUANG Ruizhang, born in 1979, Ph. D., professor. Her research interests include big data and data mining, information extraction.
    QIN Yongbin, born in 1980, Ph. D., professor. His research interests include big data management and application, multi-source data fusion and application.
    First author contact:CHEN Yanping, born in 1980, Ph. D., professor. His research interests include artificial intelligence, natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62166007);Key Technology Research and Development Program of Guizhou Province([2022]277)

摘要:

针对现有的基于预训练语言模型的答案获取方法存在预测边界不够准确的问题,提出一种面向片段抽取式机器阅读理解(MRC)的边界感知方法。首先,在问题输入阶段引入特殊字符标记问题边界,通过增强问题语义信息的方式实现对问题边界的感知;其次,在答案预测阶段,构建答案边界回归器,实现感知的问题边界语义信息与输出的预测答案边界语义信息的语义交互;最后,通过交互后的语义信息进一步调整存在偏差的预测答案边界,实现对预测答案的校准。实验结果表明,与SpanBERT (Span-based Bidirectional Encoder Representation from Transformers)相比,该方法在公共数据集SQuAD(Stanford Question Answering Dataset)1.1上的F1值提升了0.2个百分点、精确匹配(EM)值提升了0.9个百分点;在HotpotQA(Hotpot Question Answering)数据集上的F1值和EM值都提升了0.7个百分点;在NewsQA(News Question Answering)数据集上的F1值提升了2.8个百分点、EM值提升了3.3个百分点。可见,该方法能有效增强对问题边界信息的感知并且实现对预测答案边界的校准,有利于更好地理解和分析文本数据,在智能问答、智能客服等领域的应用中提高系统的准确性。

关键词: 机器阅读理解, 问题边界感知, 答案边界回归, 片段抽取

Abstract:

Existing methods for answer acquisition based on pre-trained language models may suffer from inaccuracies in predicting boundaries, a boundary-aware approach for span-based extraction Machine Reading Comprehension (MRC) is proposed to mitigate this issue. Firstly, special characters were introduced to mark the question boundary during the question input stage, enhancing the semantic information of the question to improve boundary perception. Secondly, during the answer prediction stage, an answer boundary regressor was constructed to facilitate semantic interaction between the perceived question boundary and the output of the predicted answer boundary. Lastly, the biased predicted answer boundary was further adjusted based on the post-interaction semantic information to calibrate the predicted answers. Experimental results demonstrate that when compared to the SpanBERT (Span-based Bidirectional Encoder Representation from Transformers), the proposed method improves the F1 value by 0.2 percentage points and the Exact Match (EM) value by 0.9 percentage points on the public dataset SQuAD (Stanford Question Answering Dataset)1.1, it achieved improvements of 0.7 percentage points in both F1 score and EM value on the HotpotQA (Hotpot Question Answering) dataset, and it improved the F1 score by 2.8 percentage points and the EM value by 3.3 percentage points on the NewsQA (News Question Answering) dataset. The effectiveness of this method is rooted in its capacity to enhance the model’s perception of question boundary information and to accomplish the calibration of predicted answer boundary. Consequently, it results in an enhancement of system accuracy in applications such as intelligent question answering and intelligent customer service when dealing with text data comprehension and analysis.

Key words: Machine Reading Comprehension (MRC), question boundary-awareness, answer boundary regression, span extraction

中图分类号: