Journal of Computer Applications ›› 0, Vol. ›› Issue (): 1-6.DOI: 10.11772/j.issn.1001-9081.2024050667

• Artificial intelligence • Previous Articles     Next Articles

WH-CoT: 6W2H-based chain-of-thought prompting framework on large language models

Mengke CHEN1,2, Yun BIAN1(), Yunhao LIANG1,2, Haiquan WANG1,2   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610213,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2024-05-27 Revised:2024-06-26 Accepted:2024-07-01 Online:2025-01-24 Published:2024-12-31
  • Contact: Yun BIAN

基于6W2H的大语言模型思维链提示框架WH-CoT

陈孟科1,2, 边赟1(), 梁云浩1,2, 王海全1,2   

  1. 1.中国科学院 成都计算机应用研究所,成都 610213
    2.中国科学院大学 计算机科学与技术学院,北京 100049
  • 通讯作者: 边赟
  • 作者简介:陈孟科(1998—),男,四川巴中人,硕士研究生,主要研究方向:自然语言处理、大语言模型
    边赟(1988—),女,甘肃酒泉人,工程师,博士研究生,主要研究方向:自然语言处理、大语言模型
    梁云浩(2000—),男,四川成都人,硕士研究生,主要研究方向:自然语言处理、大语言模型
    王海全(1999—),男,河南焦作人,硕士研究生,主要研究方向:自然语言处理、大语言模型。

Abstract:

Concerning the limitations of Chain-of-Thought (CoT) prompting technology, such as insufficient integration of human strategies and poorly performance for small-scale Large Language Models (LLMs), a CoT prompting framework based on the 6W2H (Why, What, Which, When, Where, Who, How, How much) problem decomposition strategy, WH-CoT (6W2H Chain-of-Thought), was proposed. Firstly, the task dataset was clustered, sampled and divided into training and test datasets by using the Sentence-BERT model. Then, in the training dataset, all samples were subjected to element extraction, problem decomposition, answer paragraph construction, and answer generation to form the CoT, thereby constructing a task-specific corpus. Finally, during the reasoning stage, demonstration samples were extracted adaptively from the corpus and added to the prompts, allowing the model to combine the prompts to generate answers to test questions. For the Qwen-turbo model, on arithmetic reasoning task, the average accuracy of WH-CoT is improved by 3.35 and 4.27 percentage points respectively compared with those of the mainstream Zero-Shot-CoT and Manual-CoT; on multi-hop reasoning task, compared with Zero-Shot-CoT and Manual-CoT, WH-CoT has the total performance improvement ratio on EM (Exact Matching ratio) increased by 36 and 111 percentage points respectively. In addition, for the Qwen-14B-Chat and Qwen-7B-Chat models, the total performance improvement ratios of WH-CoT are higher than those of Zero-Shot-CoT and Manual-CoT on both EM and F1. It can be seen that by further integrating human strategies with machine intelligence, WH-CoT can improve the reasoning performance of LLMs of different sizes effectively on both arithmetic reasoning and multi-hop reasoning tasks.

Key words: 6W2H(Why, What, Which, When, Where, Who, How, How much), Chain-of-Thought (CoT) prompting, prompt learning, Large Language Model (LLM), In-Context Learning (ICL), adaptive sampling

摘要:

针对当前思维链(CoT)提示技术缺乏进一步人类策略的指导以及对小规模大语言模型(LLM)应用效果不佳的问题,提出一种基于6W2H问题分解策略的CoT提示框架WH-CoT (6W2H Chain-of-Thought)。首先,利用Sentence-BERT模型对任务数据集进行聚类采样并划分为训练集和测试集;其次,在训练集中对所有样本进行元素提取、问题分解、答案段落构建和答案生成等操作以形成CoT,进而构建任务语料库;最后,在推理阶段,自适应地从语料库中采样演示样本并添加至提示词,使得模型结合提示词生成测试问题的答案。对于Qwen-turbo模型,在算术推理任务上,WH-CoT的平均准确率相较于主流的Zero-Shot-CoT和Manual-CoT分别提升了3.35和4.27个百分点;在多跳推理任务上,WH-CoT的总性能提升比在预测与答案完全相同的比例(EM)上相较于Zero-Shot-CoT和Manual-CoT分别提升了36和111个百分点。另外,对于中小规模的Qwen-14B-Chat和Qwen-7B-Chat模型,WH-CoT的总性能提升比在EM和F1上均高于Zero-Shot-CoT和Manual-CoT。可见WH-CoT通过进一步结合人类策略与机器智能,对于不同规模的LLM,均能有效地提升它们在算术推理和多跳推理任务上的推理性能。

关键词: 6W2H, 思维链提示, 提示学习, 大语言模型, 上下文学习, 自适应采样

CLC Number: