• •    

基于深度自编码网络的高校招生咨询算法研究

俸世洲1,2,周尚波3   

  1. 1. 重庆大学计算机学院
    2. 重庆师范大学涉外商贸学院
    3. 重庆大学 计算机学院,重庆 400030
  • 收稿日期:2017-05-16 修回日期:2017-06-15 发布日期:2017-06-15
  • 通讯作者: 周尚波

Research on College Enrollment Consultation Algorithm Based on Deep Autoencoders

Shi-Zhou FENG,ZHOU Shangbo   

  • Received:2017-05-16 Revised:2017-06-15 Online:2017-06-15
  • Contact: ZHOU Shangbo

摘要: 高校招生在线咨询通常采用人工回复或基于关键词匹配的问答系统来处理,常常存在人工回复效率低下,问答系统答非所问的问题。此外,咨询文本往往比较简短,文本向量化表示易导致高维稀疏问题。针对上述问题,提出一种基于栈式降噪稀疏自编码网络(SDSAE)的招生咨询算法。首先,利用自编码网络对短文本进行特征提取和降维,引入数据集增强技术和添加噪声技术解决训练样本规模较小且分类不均问题,提高算法的泛化能力;获得短文本低维特征表示后,结合BP算法对文本进行分类。本文提出的算法分类效果优于BP、SVM、ELM等算法,能显著提高招生咨询文本的分类效果,为高校设计制作智能咨询系统提供了一种新途径,在高校招生咨询领域有良好的应用前景。

关键词: 深度学习, 自编码, 神经网络, 招生, 文本分类

Abstract: College enrollment consultation service usually relies on the artificial reply or keyword matching Q&A system, that exists the problems of low efficiency and irrelevant answers. In addition, a consultation text is often a short statement, therefore its vectorization representation may easily lead to the high-dimensional sparse problem. To solve the problems mentioned above, we propose an enrollment consultation algorithm based on Stacked Denoising Sparse Autoencoders(SDSAE). First of all, to improve generalization ability of the algorithm, an autoencoders network is used to extract the feature and reduce the data dimension, dataset enhancement technique and noise-adding technique are introduced to solve the size problem of training sample set and uneven classification. After the low dimensional representation of the short text being obtained, a text classification using BP algorithm is conducted afterwards. From the experimental results, it can be seen that the proposed algorithm has a better classification performance over BP, SVM, ELM algorithm and etc., it significantly improve the classification effect of enrollment consultant texts, it provides a new way for the design of intelligent consultation system, and has a good application prospect in the field of college enrollment consultant.

Key words: deep learning, autoencoder, neural network, enrollment, text classification

中图分类号: