Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (9): 2549-2554.DOI: 10.11772/j.issn.1001-9081.2020010119

• Artificial intelligence • Previous Articles     Next Articles

Intent recognition dataset for dialogue systems in power business

LIAO Shenglan1, YIN Shi1, CHEN Xiaoping1, ZHANG Bo2, OUYANG Yu2, ZHANG Heng3   

  1. 1. College of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230026, China;
    2. State Grid Anhui Electric Power Company Limited., Hefei Anhui 230022, China;
    3. State Grid Fuyang Power Supply Company, Fuyang Anhui 236000, Anhui, China
  • Received:2020-02-13 Revised:2020-03-15 Online:2020-09-10 Published:2020-03-24
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1613216).

面向电力业务对话系统的意图识别数据集

廖胜兰1, 殷实1, 陈小平1, 张波2, 欧阳昱2, 张衡3   

  1. 1. 中国科学技术大学 计算机科学与技术学院, 合肥 230026;
    2. 国网安徽省电力有限公司, 合肥 230022;
    3. 国网阜阳市城郊供电公司, 安徽 阜阳 236000
  • 通讯作者: 廖胜兰
  • 作者简介:廖胜兰(1995-),女,安徽六安人,硕士研究生,主要研究方向:语义分析、文本分类、对话系统;殷实(1994-),男,重庆人,博士研究生,主要研究方向:多模态语义识别、情感识别、自然语言处理;陈小平(1955-),男,重庆人,教授,博士,CCF会员,主要研究方向:智能体形式化建模、多机器人系统关键技术;张波(1966-),男,安徽寿县人,高级经济师,硕士,主要研究方向:"互联网+"电力营销服务管理;欧阳昱(1971-),湖南隆回人,高级工程师,硕士,主要研究方向:电力营销信息化;张衡(1991-),安徽阜阳人,助理工程师,主要研究方向:电力营销服务。
  • 基金资助:
    国家自然科学基金资助项目(U1613216)。

Abstract: For the intelligent dialogue system of customer service robots in power supply business halls, a large-scale dataset of power business user intents was constructed. The dataset includes 9 577 user queries and their labeling categories. First, the real voice data collected from the power supply business halls were cleaned, processed and filtered. In order to enable the data to drive the study of deep learning models related to intent classification, the data were labeled and augmented with high quality by the professionals according to the background knowledge of power business. In the labeling process, 35 types of service category labels were defined according to power business. In order to test the practicability and effectiveness of the proposed dataset, several classical models of intent classification were used for experiments, and the obtained intent classification models were put in the dialogue system. The classical Text classification model-Recurrent Convolutional Neural Network (Text-RCNN) was able to achieve 87.1% accuracy on this dataset. Experimental results show that the proposed dataset can effectively drive the research on power business related dialogue systems and improve user satisfaction.

Key words: intent recognition, text classification, Chinese dataset, dialogue system, service robot, power business

摘要: 针对供电营业厅客服机器人的智能对话系统,构建了一个较大规模的电力业务用户意图数据集。该数据集包括了9 577条用户问询语句及其标注类别。首先对从供电营业厅采集到的真实语音数据进行清洗、处理和过滤。为了使数据能够驱动意图分类相关的深度学习模型的研究,专业人员根据电力业务背景知识对数据进行高质量的标注和扩充。标注中根据电力业务定义了35种业务类别标签。为了测试该数据集的实用性和有效性,采用了多个意图分类经典模型进行实验,并将得到的意图分类模型嵌入到对话系统中。经典的文本分类模型循环卷积神经网络(Text-RCNN)在该数据集上可得到87.1%的准确率。实验结果表明该数据集可以有效驱动电力业务相关对话系统的研究,提升用户的满意度。

关键词: 意图识别, 文本分类, 中文数据集, 对话系统, 服务机器人, 电力业务

CLC Number: