《计算机应用》唯一官方网站

• •    下一篇

基于图卷积网络的掩码数据增强 #br#

胡新荣1,陈静雪1,2,黄子键1,王帮超3,姚迅1,刘军平4,朱强1,杨捷5   

  1. 1. 武汉纺织大学
    2. 计算机与人工智能学院
    3. 武汉纺织大学数学与计算机学院
    4. 武汉纺织大学 数学与计算机学院
    5. 伍伦贡大学
  • 收稿日期:2023-11-27 修回日期:2024-03-21 接受日期:2024-04-10 发布日期:2024-04-12 出版日期:2024-04-12
  • 通讯作者: 王帮超
  • 基金资助:
    CCF-智谱大模型基金项目

Graph convolution network-based MASK Data Augmentation

  • Received:2023-11-27 Revised:2024-03-21 Accepted:2024-04-10 Online:2024-04-12 Published:2024-04-12
  • Supported by:
    CCF-Zhipu AI Large Model of 202212

摘要: 摘 要: 针对多项选择问答(MCQA)领域中原始数据信息不准确、样本质量低以及模型泛化能力差等问题,提出一种基于图卷积网络(GCN)的掩码数据增强方法。该方法以GCN作为基础框架,首先将文章中的单词抽象为图节点,并利用问题-候选答案对(QA)节点进行连接,建立与相关的文章节点之间的联系;其次,通过计算节点之间的相似性,并应用掩码技术对图中的节点进行掩盖,生成增强样本;接着,利用GCN对增强样本进行特征扩充,以提升模型的信息表达能力;最后,引入打分器对原始样本和增强样本进行评分,并结合课程学习策略提高答案预测的准确性。所提方法在RACE-M、RACE-H和DREAM等三个数据集上进行了综合的实验评估,实验结果表明与RACE数据集上最优基线模型EAM相比,所提方法在准确率上平均分别提高了0.8、0.4个百分点,而与DREAM数据集上最优基线模型STM相比,所提方法所提方法在准确率上平均提高了1.4个百分点。通过对比实验证明了所提方法方法在MCQA任务中的有效性,并为数据增强技术在该领域的进一步研究和应用提供了新的启示。

关键词: 多项选择问答, 数据增强, 图卷积网络, 打分器, 课程学习

Abstract: Abstract: A masked data augmentation method based on graph convolutional networks (GCN) is proposed to address issues of inaccurate information, low sample quality, and poor model generalization in the domain of multiple-choice question answering (MCQA). In this method, the words in the articles are abstracted as graph nodes and connected with question candidate answer pair (QA) nodes to establish relationships with relevant article nodes. By calculating the similarity between the nodes and applying masking techniques to the graph nodes, augmented samples are generated. The augmented samples are then subjected to feature expansion using GCN to enhance the model's information representation capability. Furthermore, a scorer mechanism is incorporated to evaluate the original and augmented samples. The evaluation is accompanied by curriculum learning strategies to improve the accuracy of answer prediction. The proposed method was comprehensively evaluated on three datasets: RACE-M, RACE-H, and DREAM. Experimental results show that compared with the typical baseline model EAM on the RACE data set, the proposed method improves the accuracy by an average of 0.8 and 0.4 percentage points respectively, and compared with the typical baseline model STM on the DREAM data set, the proposed method The average accuracy is improved by 1.4 percentage points. Comparative experiments prove the effectiveness of the proposed method in MCQA tasks, and provide new inspiration for further research and application of data augmentation technology in this field.

Key words: Multiple-Choice Question Answering, Data Augmentation, Graph Convolutional Network, Scorer, Curriculum Learning

中图分类号: