Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (3): 644-650.DOI: 10.11772/j.issn.1001-9081.2018081757

Previous Articles     Next Articles

SVD-CNN barrage text classification algorithm combined with improved active learning

QIU Ningjia, CONG Lin, ZHOU Sicheng, WANG Peng, LI Yanfang   

  1. College of Computer Science and Technology, Changchun University of Science and Technology, Changchun Jilin 130022, China
  • Received:2018-08-23 Revised:2018-10-29 Online:2019-03-10 Published:2019-03-11
  • Contact: 邱宁佳
  • Supported by:
    This work is partially supported by the Major Science and Technology Bidding Project of Jilin Province (20170203004GX), the Provincial Industrial Innovation Project of Jilin Province (2017C051).

结合改进主动学习的SVD-CNN弹幕文本分类算法

邱宁佳, 丛琳, 周思丞, 王鹏, 李岩芳   

  1. 长春理工大学 计算机科学技术学院, 长春 130022
  • 作者简介:邱宁佳(1984-),男,河南南阳人,讲师,博士,CCF会员,主要研究方向:数据挖掘、算法分析;丛琳(1992-),女,吉林吉林人,硕士研究生,主要研究方向:数据挖掘;周思丞(1994-),男,吉林长春人,硕士研究生,主要研究方向:数据挖掘;王鹏(1973-),男,内蒙古包头人,教授,博士,CCF会员,主要研究方向:数据挖掘;李岩芳(1965-),女,吉林长春人,教授,博士,主要研究方向:数据库与数据挖掘、软件工程、信息系统。
  • 基金资助:
    吉林省重大科技招标项目(20170203004GX);吉林省省级产业创新专项(2017C051)。

Abstract: For the loss of much semantic information in dimension reduction of text features when using pooling layer of the traditional Convolutional Network (CNN) model, a Convolutional Neural Network model based on Singular Value Decomposition algorithm (SVD-CNN) was proposed. Firstly, an improved Active Learning algorithm based on Density Center point sampling (DC-AL) was used to tag samples contributing a lot to the classification model, obtaining a high-quality model training set at a low tagging cost. Secondly, an SVD-CNN barrage text classification model was established by combining SVD algorithm, and SVD was used to replace the traditional CNN model pooling layer for feature extraction and dimension reduction, then the barrage text classification task was completed on these bases. Finally, the model parameters were optimized by using Partial Sampling Gradient Descent algorithm (PSGD). In order to verify the effectiveness of the improved algorithm, multiple barrage data sample sets were used in the comparison experiments between the proposed model and the common text classification model. The experimental results show that the improved algorithm can better preserve semantic features of the text, ensure the stability of training process and improve the convergence speed of the model. In summary, the proposed algorithm has better classification performance than traditional algorithms on multiple barrage texts.

Key words: Convolutional Nerual Network (CNN), Singular Value Decompostion (SVD), Active Learning (AL), gradient descent, text classification

摘要: 为解决传统卷积神经网络(CNN)模型使用池化层进行文本特征降维会损失较多文本语义信息的问题,提出一种基于奇异值分解(SVD)算法的卷积神经网络模型(SVD-CNN)。首先,采用改进的基于密度中心点采样的主动学习算法(DBC-AL)选择对分类模型贡献率较高的样本进行标注,以低标注代价获得高质量模型训练集;然后,结合SVD算法建立SVD-CNN弹幕文本分类模型,使用奇异值分解的方法代替传统CNN模型池化层进行特征提取和降维,并在此基础上完成弹幕文本分类任务;最后,使用改进的梯度下降算法(PSGD)对模型参数进行优化。为了验证改进算法的有效性,使用多种弹幕数据样本集,对提出的模型与常用的文本分类模型进行对比实验。实验结果表明,改进的算法能够更好地保留文本语义特征,保证训练过程的稳定性并提高了模型的收敛速度,在不同的弹幕文本上较传统算法具有更好的分类性能。

关键词: 卷积神经网络, 奇异值分解, 主动学习, 梯度下降, 文本分类

CLC Number: