1. College of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230026, China; 2. State Grid Anhui Electric Power Company Limited., Hefei Anhui 230022, China; 3. State Grid Fuyang Power Supply Company, Fuyang Anhui 236000, Anhui, China
Abstract:For the intelligent dialogue system of customer service robots in power supply business halls, a large-scale dataset of power business user intents was constructed. The dataset includes 9 577 user queries and their labeling categories. First, the real voice data collected from the power supply business halls were cleaned, processed and filtered. In order to enable the data to drive the study of deep learning models related to intent classification, the data were labeled and augmented with high quality by the professionals according to the background knowledge of power business. In the labeling process, 35 types of service category labels were defined according to power business. In order to test the practicability and effectiveness of the proposed dataset, several classical models of intent classification were used for experiments, and the obtained intent classification models were put in the dialogue system. The classical Text classification model-Recurrent Convolutional Neural Network (Text-RCNN) was able to achieve 87.1% accuracy on this dataset. Experimental results show that the proposed dataset can effectively drive the research on power business related dialogue systems and improve user satisfaction.
[1] YOUNG S, GAŠIĆ M, THOMSON B, et al. POMDP-based statistical spoken dialog systems:a review[J]. Proceedings of the IEEE,2013,101(5):1160-1179. [2] PEREZ-MARIN D,PASCUAL-NIETO I. Conversational Agents and Natural Language Interaction:Techniques and Effective Practices[M]. Hershey,PA:IGI Global,2011:1-22. [3] 上官霞. 人工智能技术在福建省电力营业厅中的应用探讨[J]. 通讯世界,2018(1):234-235.(SHANGGUAN X,Discussion on application of artificial intelligence technology in Fujian electric power business hall[J]. Telecom World,2018(1):234-235.) [4] CHEN H,LIU X,YIN D,et al. A survey on dialogue systems:recent advances and new frontiers[J]. ACM SIGKDD Explorations Newsletter,2017,19(2):25-35. [5] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2014:1746-1751. [6] ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990,14(2):179-211. [7] JOULIN A,GRAVE E,BOJANOWSKI P,et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:427-431. [8] LAI S,XU L,LIU K,et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press, 2015:2267-2273. [9] RAMANAND J, BHAVSAR K, PEDANEKAR N. Wishful thinking:finding suggestions and'buy'wishes from product reviews[C]//Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Stroudsburg,PA:Association for Computational Linguistics, 2010:54-61. [10] LI X,ROTH D. Learning question classifiers:the role of semantic information[J]. Natural Language Engineering,2006,12(3):229-249. [11] GENKIN A,LEWIS D D,MADIGAN D. Large-scale Bayesian logistic regression for text categorization[J]. Technometrics, 2007,49(3):291-304. [12] HAFFNER P,TUR G,WRIGHT J H. Optimizing SVMs for complex call classification[C]//Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway:IEEE,2003:I-I. [13] MCCALLUM A,NIGAM K. A comparison of event models for naive Bayes text classification[C]//Proceedings of the AAAI/ICML 1998 Workshop on Learning for Text Categorization. Palo Alto, CA:AAAI Press,1998:41-48. [14] SCHAPIRE R E, SINGER Y. BoosTexter:a boosting-based system for text categorization[J]. Machine Learning,2000,39(2/3):135-168. [15] YANG Y,PEDERSEN J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning. Burlington,MA:Morgan Kaufmann Publishers Inc.,1997:412-420. [16] JOACHIMS T. Text categorization with support vector machines:Learning with many relevant features[C]//Proceedings of the 10th European Conference on Machine Learning,LNCS 1398. Berlin:Springer,1998:137-142. [17] BENGIO Y,DUCHARME R,VINCENT P,et al. A neural probabilistic language model[J]. Journal of Machine Learning Research,2003,3:1137-1155. [18] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780. [19] CHO K,VAN MERRIËNBOER,GULCEHRE C,et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2014:1724-1734. [20] BAHDANAU D,CHO K,BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2020-01-12]. https://arxiv.org/pdf/1409.0473.pdf. [21] 杨志明, 王来奇, 王泳. 深度学习算法在问句意图分类中的应用研究[J]. 计算机工程与应用,2019,55(10):154-160. (YANG Z M,WANG L Q,WANG Y. Application research of deep learning algorithm in question intention classification[J]. Computer Engineering and Applications, 2019, 55(10):154-160.) [22] 杨志明, 王来奇, 王泳. 基于双通道卷积神经网络的问句意图分类研究[J]. 中文信息学报,2019,33(5):122-131.(YANG Z M,WANG L Q,WANG Y. Questions intent classification based on dual channel convolutional neural network[J]. Journal of Chinese Information Processing,2019,33(5):122-131.) [23] 周俊佐, 朱宗奎, 何正球, 等. 面向人机对话意图分类的混合神经网络模型[J]. 软件学报,2019,30(11):3313-3325.(ZHOU J Z,ZHU Z K,HE Z Q,et al. Hybrid neural network models for human-machine dialogue intention classification[J]. Journal of Software,2019,30(11):3313-3325.) [24] 孙鑫, 王厚峰. 问答中的问句意图识别和约束条件分析[J]. 中文信息学报,2017, 31(6):132-139.(SUN X,WANG H F. Intent determination and slot filling in question answering[J]. Journal of Chinese Information Processing, 2017, 31(6):132-139.) [25] DING X,CAI B,LIU T,et al. Domain adaptation via tree kernel based maximum mean discrepancy for user consumption intention identification[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Palo Alto, CA:AAAI Press,2018:4026-4032. [26] MIKOLOV T,CHEN K,CORRADO G,et al. Efficient estimation of word representations in vector space[EB/OL].[2020-01-12]. https://arxiv.org/pdf/1301.3781.pdf.