Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (3): 639-643.DOI: 10.11772/j.issn.1001-9081.2018081759

Previous Articles     Next Articles

Suggestion sentence classification method based on PU learning

ZHANG Pu, LIU Chang, LI Xiao   

  1. College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2018-08-23 Revised:2018-09-30 Online:2019-03-10 Published:2019-03-11
  • Contact: 张璞
  • Supported by:
    This work is partially supported by the Youth Program of Humanities and Social Science Foundation of the Ministry of Education in China (17YJCZH247), the Humanities and Social Science Foundation of the Chongqing Municipal Education Commission (17SKG055).

基于PU学习的建议语句分类方法

张璞, 刘畅, 李逍   

  1. 重庆邮电大学 计算机科学与技术学院, 重庆 400065
  • 作者简介:张璞(1977-),男,云南昭通人,副教授,博士,CCF会员,主要研究方向:文本挖掘、情感分析;刘畅(1993-),男,湖北孝感人,硕士研究生,主要研究方向:文本挖掘、情感分析;李逍(1994-),男,湖北孝感人,硕士研究生,主要研究方向:文本挖掘、情感分析。
  • 基金资助:
    教育部人文社会科学研究青年基金资助项目(17YJCZH247);重庆市教委人文社会科学研究项目(17SKG055)。

Abstract: As a new research task, suggestion mining has important application value. Since traditional suggestion sentence classification methods have problems like complex rules, large labeling workload, high feature dimension and data sparsity, a PU (Positive and Unlabeled)-based suggestion sentence classification method was proposed. Firstly, some suggestion sentences were selected from an unlabeled review set by using a simple rule to form a positive example set; then a reliable negative example set was constructed by Spy technique in the feature space of autoencoder neural network to reduce the feature dimension and alleviate data sparsity; finally, Multi-Layer Perceptron (MLP) was trained by the positive example set and the reliable negative example set to classify the remaining unlabeled samples. On a Chinese dataset, the F1 value and the accuracy of the proposed method, reached 81.98% and 82.67% respectively. The experimental results show that the proposed method can classify suggestion sentences effectively without manually labelling the data.

Key words: suggestion mining, suggestion sentence classification, PU (Positive and Unlabeled) learning, autoencoder, Multi-Layer Perceptron (MLP)

摘要: 建议挖掘作为一项新兴研究任务,具有重要的应用价值。针对传统建议语句分类方法所存在的规则复杂、标注工作量大、特征维度高、数据稀疏等问题,提出一种基于PU学习的建议语句分类方法。首先,使用简单规则从无标注评论集合中选择建议语句的正例集合;然后,为了降低特征维度,缓解数据稀疏性,在自编码神经网络(Autoencoder)特征空间中使用Spy技术划分可靠反例集合;最后,利用正例集合和可靠反例集合来训练多层感知机(MLP)对剩余的无标注样例进行分类。该方法在中文数据集上的F1值和准确率值分别达到81.98%和82.67%,实验结果表明,该方法能够有效地对建议语句进行分类,且不需要对数据进行人工标注。

关键词: 建议挖掘, 建议语句分类, PU学习, 自编码器, 多层感知机

CLC Number: