Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (5): 1395-1402.DOI: 10.11772/j.issn.1001-9081.2024050705

• China Conference on Data Mining 2024 (CCDM 2024) • Previous Articles    

Psychological counseling human-machine dialogue dataset construction for dialogue generation and mental disorder detection

Bo XU1(), Dezhi HAO1, Erchen YU1, Hongfei LIN1, Linlin ZONG2   

  1. 1.School of Computer Science and Technology,Dalian University of Technology,Dalian Liaoning 116024,China
    2.School of Software,Dalian University of Technology,Dalian Liaoning 116024,China
  • Received:2024-05-29 Revised:2024-08-02 Accepted:2024-08-20 Online:2024-09-25 Published:2025-05-10
  • Contact: Bo XU
  • About author:XU Bo, born in 1988, Ph. D., associate professor. His research interests include mental health computing, natural language processing.
    HAO Dezhi, born in 1998, M. S. His research interests include psychological counseling human-machine dialogue.
    YU Erchen, born in 2001, M. S. candidate. His research interests include multimodal affective computing.
    LIN Hongfei, born in 1962, Ph. D., professor. His research interests include natural language processing.
    ZONG Linlin, born in 1987, Ph. D., associate professor. Her research interests include multimodal affective computing.
  • Supported by:
    Liaoning Provincial Social Science Planning Fund(L21CXW003)

面向对话生成和心理疾病检测的心理咨询式人机对话数据集构建

徐博1(), 郝德志1, 于迩晨1, 林鸿飞1, 宗林林2   

  1. 1.大连理工大学 计算机科学与技术学院,辽宁 大连 116024
    2.大连理工大学 软件学院,辽宁 大连 116024
  • 通讯作者: 徐博
  • 作者简介:徐博(1988—),男,辽宁大连人,副教授,博士,CCF会员,主要研究方向:心理健康计算、自然语言处理
    郝德志(1998—),男,山东临沂人,硕士,主要研究方向:心理咨询式人机对话
    于迩晨(2001—),男,辽宁鞍山人,硕士研究生,主要研究方向:多模态情感计算
    林鸿飞(1962—),男,内蒙古通辽人,教授,博士,主要研究方向:自然语言处理
    宗林林(1987—),女,河北沧州人,副教授,博士,主要研究方向:多模态情感计算。
  • 基金资助:
    辽宁省社会科学规划基金资助项目(L21CXW003)

Abstract:

To address the lack of publicly available data for modeling effective dialogue models in psychological counseling human-machine dialogues, a psychological counseling dialogue dataset was constructed for dialogue generation and mental disorder detection. Firstly, a multi-round dialogue dataset containing 3 268 doctor-patient conversations was collected from an online medical consultation platform, enriched with comprehensive metadata including hospital affiliations, medical departments, disease categories, and patient self-descriptions. Secondly, a knowledge-enhanced dialogue model named Empathy Bidirectional and Auto-Regressive Transformers (EmBART) was proposed to enhance the empathic capabilities of the dialogue model. Finally, an experimental evaluation of the dataset usability was conducted through psychological response generation and mental disorder detection tasks. In psychological response generation, EmBART trained on this dataset performed excellently on all metrics in both automatic and human evaluations, with the perplexity reduced by 2.31 compared to baseline model CDial-GPT(Chinese Dialogue Generative Pre-trained Transformer). In mental disorder detection, CPT (Chinese Pre-trained unbalanced Transformer) and RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers approach) trained on this dataset demonstrated outstanding mental disorder prediction capabilities. Experimental results confirm the strong utility of this dataset in generating empathic dialogues and detecting mental disorders, providing a data base for future research on psychological counseling human-machine dialogues.

Key words: psychological counseling dialogue, mental disorder detection, dialogue generation, empathic response, emotion analysis

摘要:

针对心理咨询式人机对话中缺乏用于建立有效对话模型的公开数据的问题,构建一个面向对话生成和心理疾病检测的心理医疗咨询对话数据集。首先,通过在线医疗问诊平台获取包含3 268个医生和患者之间的多轮对话数据集,并附有广泛的相关元数据,包括就诊医院、就诊科室、疾病类型和患者自我陈述等;其次,提出一个知识增强的对话模型——情感感知双向自回归模型(EmBART),以增强对话模型的共情能力;最后,通过心理医疗响应生成和心理疾病检测进行数据集可用性的实验评估。在心理医疗响应生成中,基于所提数据集训练的EmBART模型在自动评估与人工评估中的各项指标上均表现出色,其中困惑度较基准模型CDial-GPT(Chinese Dialogue Generative Pre-trained Transformer)降低了2.31;在心理疾病检测中,基于所提数据集训练的CPT(Chinese Pre-trained unbalanced Transformer)和RoBERTa(Robustly optimized Bidirectional Encoder Representations from Transformers approach)模型具有出色的心理疾病检测能力。实验结果表明,本数据集在生成共情对话和检测心理疾病方面具有较强的实用性,能为未来基于心理咨询式人机对话研究提供数据基础。

关键词: 心理咨询对话, 心理疾病检测, 对话生成, 共情响应, 情感分析

CLC Number: