计算机应用 ›› 2017, Vol. 37 ›› Issue (11): 3323-3329.DOI: 10.11772/j.issn.1001-9081.2017.11.3323

• 应用前沿、交叉与综合 • 上一篇    下一篇

基于深度自编码网络的高校招生咨询算法

俸世洲1,2, 周尚波2   

  1. 1. 重庆师范大学 涉外商贸学院, 重庆 401520;
    2. 重庆大学 计算机学院, 重庆 400030
  • 收稿日期:2017-05-16 修回日期:2017-06-15 出版日期:2017-11-10 发布日期:2017-11-11
  • 通讯作者: 周尚波
  • 作者简介:俸世洲(1981-),男,四川广汉人,助理研究员,硕士,CCF会员,主要研究方向:人工神经网络、数据挖掘;周尚波(1963-),男,广西宁明人,教授,博士,主要研究方向:人工神经网络、信息安全、图像处理、计算机仿真。
  • 基金资助:
    重庆市教委科学技术研究项目(KJ1501703);重庆市基础科学与前沿技术研究专项计划重点项目(cstc2015jcyjBX0124)。

College enrollment consultation algorithm based on deep autoencoders

FENG Shizhou1,2, ZHOU Shangbo2   

  1. 1. College of Foreign Trade and Business, Chongqing Normal University, Chongqing 401520, China;
    2. College of Computer Science, Chongqing University, Chongqing 400030, China
  • Received:2017-05-16 Revised:2017-06-15 Online:2017-11-10 Published:2017-11-11
  • Supported by:
    This work is partially supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJ1501703), the Key Project of Basic Science and Frontier Technology Research of Chongqing (cstc2015jcyjBX0124).

摘要: 高校招生在线咨询通常采用人工回复或基于关键词匹配的问答系统来处理,常存在人工回复效率低下,问答系统答非所问的问题;此外,咨询文本往往比较简短,文本向量化表示易导致高维稀疏问题。针对上述问题,提出一种基于栈式降噪稀疏自编码网络(SDSAE)的招生咨询算法。首先,利用自编码网络对短文本进行特征提取和降维,引入数据集增强技术和添加噪声技术解决训练样本规模较小且分类不均问题,提高算法的泛化能力;获得短文本低维特征表示后,结合反向传播(BP)算法对文本进行分类。所提算法分类效果优于BP、支持向量机(SVM)、极限学习机(ELM)等算法,能显著提高招生咨询文本的分类效果。

关键词: 深度学习, 自编码, 神经网络, 文本分类

Abstract: College enrollment consultation service usually relies on artificial reply or keyword matching Question and Answer (Q&A) system, which exists the problems of low efficiency and irrelevant answers. In addition, a consultation text is often a short statement, therefore its vectorized representation may easily lead to the high-dimensional sparse problem. To solve the problems mentioned above, an enrollment consultation algorithm based on Stacked Denoising Sparse AutoEncoders (SDSAE) was proposed. First of all, to improve generalization ability of the algorithm, an autoencoder network was used to extract features and reduce the data dimension; at the same time, dataset enhancement technique and noise-adding technique were introduced to solve the problems of small training sample set and uneven classification. After low dimensional representation of short texts being obtained, a text classification was conducted afterwards by using Back Propagation (BP) algorithm. The experimental results show that the proposed algorithm has a better classification performance over BP, Support Vector Machine (SVM), Extreme Learning Machine (ELM) algorithm and etc., and it significantly improves the classification effect of enrollment consultant texts.

Key words: deep learning, autoencoder, neural network, text classification

中图分类号: