Suggestion sentence classification method based on PU learning

doi:10.11772/j.issn.1001-9081.2018081759

Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (3): 639-643.DOI: 10.11772/j.issn.1001-9081.2018081759

Previous Articles Next Articles

Suggestion sentence classification method based on PU learning

ZHANG Pu, LIU Chang, LI Xiao

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2018-08-23 Revised:2018-09-30 Online:2019-03-11 Published:2019-03-10
Contact: 张璞
Supported by:
This work is partially supported by the Youth Program of Humanities and Social Science Foundation of the Ministry of Education in China (17YJCZH247), the Humanities and Social Science Foundation of the Chongqing Municipal Education Commission (17SKG055).

基于PU学习的建议语句分类方法

张璞, 刘畅, 李逍

重庆邮电大学计算机科学与技术学院, 重庆 400065

作者简介:张璞(1977-),男,云南昭通人,副教授,博士,CCF会员,主要研究方向:文本挖掘、情感分析;刘畅(1993-),男,湖北孝感人,硕士研究生,主要研究方向:文本挖掘、情感分析;李逍(1994-),男,湖北孝感人,硕士研究生,主要研究方向:文本挖掘、情感分析。
基金资助:
教育部人文社会科学研究青年基金资助项目（17YJCZH247）；重庆市教委人文社会科学研究项目（17SKG055）。

Abstract

Abstract: As a new research task, suggestion mining has important application value. Since traditional suggestion sentence classification methods have problems like complex rules, large labeling workload, high feature dimension and data sparsity, a PU (Positive and Unlabeled)-based suggestion sentence classification method was proposed. Firstly, some suggestion sentences were selected from an unlabeled review set by using a simple rule to form a positive example set; then a reliable negative example set was constructed by Spy technique in the feature space of autoencoder neural network to reduce the feature dimension and alleviate data sparsity; finally, Multi-Layer Perceptron (MLP) was trained by the positive example set and the reliable negative example set to classify the remaining unlabeled samples. On a Chinese dataset, the F1 value and the accuracy of the proposed method, reached 81.98% and 82.67% respectively. The experimental results show that the proposed method can classify suggestion sentences effectively without manually labelling the data.

Key words: suggestion mining, suggestion sentence classification, PU (Positive and Unlabeled) learning, autoencoder, Multi-Layer Perceptron (MLP)

摘要： 建议挖掘作为一项新兴研究任务，具有重要的应用价值。针对传统建议语句分类方法所存在的规则复杂、标注工作量大、特征维度高、数据稀疏等问题，提出一种基于PU学习的建议语句分类方法。首先，使用简单规则从无标注评论集合中选择建议语句的正例集合；然后，为了降低特征维度，缓解数据稀疏性，在自编码神经网络（Autoencoder）特征空间中使用Spy技术划分可靠反例集合；最后，利用正例集合和可靠反例集合来训练多层感知机（MLP）对剩余的无标注样例进行分类。该方法在中文数据集上的F1值和准确率值分别达到81.98%和82.67%，实验结果表明，该方法能够有效地对建议语句进行分类，且不需要对数据进行人工标注。

关键词: 建议挖掘, 建议语句分类, PU学习, 自编码器, 多层感知机

CLC Number:

TP391

ZHANG Pu, LIU Chang, LI Xiao. Suggestion sentence classification method based on PU learning[J]. Journal of Computer Applications, 2019, 39(3): 639-643.

张璞, 刘畅, 李逍. 基于PU学习的建议语句分类方法[J]. 计算机应用, 2019, 39(3): 639-643.

References

[1] NEGI S. Suggestion mining from opinionated text[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics-Student Research Workshop. Stroudsburg, PA:Association for Computational Linguistics, 2016:7-12.
[2] BRUN C, HAGEGE C. Suggestion mining:detecting suggestions for improvement in users' comments[J]. Research in Computing Science, 2013, 70:171-181.
[3] RAMANAND J, BHAVSAR K, PEDANEKAR N. Wishful thinking:finding suggestions and ‘buy’ wishes from product reviews[C]//Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Stroudsburg, PA:Association for Computational Linguistics, 2010:54-61.
[4] WICAKSONO A F, MYAENG S-H. Automatic extraction of advice-revealing sentences for advice mining from online forums[C]//Proceedings of the 7th International Conference on Knowledge Capture. New York:ACM, 2013:97-104.
[5] DONG L, WEI F, DUAN Y, et al. The automated acquisition of suggestions from tweets[C]//Proceedings of the 27th AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI Press, 2013:239-245.
[6] NEGI S, ASOOJA K, MEHROTRA S, et al. A study of suggestions in opinionated texts and their automatic detection[C]//Proceedings of the 5th Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA:Association for Computational Linguistics, 2016:170-178.
[7] 张璞,刘畅,王永.基于特征融合和集成学习的建议语句分类模型[J].山东大学学报(工学版),2018,48(5):47-54.(ZHANG P, LIU C, WANG Y. Suggestion sentence classification model based on feature fusion and ensemble learning[J]. Journal of Shandong University (Engineering Science), 2018, 48(5):47-54.)
[8] ZHANG Q C, YANG L T, CHEN Z K. Deep computation model for unsupervised feature learning on big data[J]. IEEE Transactions on Services Computing, 2016, 9(1):161-171.
[9] LIU B, LEE W S, YU P S, et al. Partially supervised classification of text documents[C]//Proceedings of the 19th International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann Publishers, 2002:387-394.
[10] 任亚峰,姬东鸿,张红斌,等.基于PU学习算法的虚假评论识别研究[J].计算机研究与发展,2015,52(3):639-648.(REN Y F, JI D H, ZHANG H B, et al. Deceptive reviews detection based on positive and unlabeled learning[J]. Journal of Computer Research and Development, 2015, 52(3):639-648.)
[11] 刘露,彭涛,左万利,等.一种基于聚类的PU主动文本分类方法[J].软件学报,2013,24(11):2571-2583.(LIU L, PENG T, ZUO W L, et al. Clustering-based PU active text classification method[J]. Journal of Software, 2013, 24(11):2571-2583.)
[12] 王宗尧,刘金岭.基于支持向量机的PU中文文本分类器构建[J].南京邮电大学学报(自然科学版),2015,35(6):100-105.(WANG Z Y, LIU J L. PU Chinese text classifier based on support vector machine construction[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2015, 35(6):100-105.)
[13] LIU B, DAI Y, LI X, et al. Building text classifiers using positive and unlabeled examples[C]//Proceedings of the 3rd IEEE International Conference on Data Mining. Washington, DC:IEEE Computer Society, 2003:179-188.
[14] WLODARCZAK P, SOAR J, ALLY M. Multimedia data mining using deep learning[C]//Proceedings of the 20155th International Conference on Digital Information Processing and Communications. Piscataway, NJ:IEEE, 2015:190-196.
[15] ZHOU H, CHEN L, HUANG D. Cross-lingual sentiment classification based on denoising autoencoder[M]//ZONG C, NIE J Y, ZHAO D, et al. Natural Language Processing and Chinese Computing, CCIS 496. Berlin:Springer, 2014:181-192.
[16] 魏超,罗森林,张竞,等.自编码网络短文本流形表示方法[J].浙江大学学报(工学版),2015,49(8):1591-1599.(WEI C, LUO S L, ZHANG J, et al. Short text manifold representation based on AutoEncoder network[J]. Journal of Zhejiang University (Engineering Science), 2015, 49(8):1591-1599.)
[17] 高妮,高岭,贺毅岳,等.基于自编码网络特征降维的轻量级入侵检测模型[J].电子学报,2017,45(3):730-739.(GAO N, GAO L, HE Y Y, et al. A lightweight intrusion detection model based on autoencoder network with feature reduction[J]. Acta Electronica Sinica, 2017, 45(3):730-739.)
[18] ZHU Z, WANG X, BAI S, et al. Deep learning representation using autoencoder for 3D shape retrieval[J]. Neurocomputing, 2016, 204(C):41-50.
[19] LE Q, MKOLOV T. Distributed representations of sentences and documents[EB/OL].[2018-06-20]. https://arxiv.org/pdf/1405.4053v2.pdf.
[20] LI X, LIU B. Learning to classify texts using positive and unlabeled data[C]//Proceedings of the 18th International Joint Conference on Artificial Intelligence. San Francisco, CA:Morgan Kaufmann Publishers, 2003:587-592.

Suggestion sentence classification method based on PU learning

基于PU学习的建议语句分类方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Tingjie TANG, Jiajin HUANG, Jin QIN, Hui LU. Session-based recommendation based on graph co-occurrence enhanced multi-layer perceptron [J]. Journal of Computer Applications, 2024, 44(8): 2357-2364.
[2]	Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603.
[3]	Zongyu LI, Siwei QIANG, Xiaobo GUO, Zhenfeng ZHU. Re-weighted adversarial variational autoencoder and its application in industrial causal effect estimation [J]. Journal of Computer Applications, 2024, 44(4): 1099-1106.
[4]	Liqing QIU, Xiaopan SU. Personalized multi-layer interest extraction click-through rate prediction model [J]. Journal of Computer Applications, 2024, 44(11): 3411-3418.
[5]	Hui JIANG, Qiuyan YAN, Zhujun JIANG. Symmetric positive definite autoencoder method for multivariate time series anomaly detection [J]. Journal of Computer Applications, 2024, 44(10): 3294-3299.
[6]	Chunyong YIN, Liwen ZHOU. Unsupervised time series anomaly detection model based on re-encoding [J]. Journal of Computer Applications, 2023, 43(3): 804-811.
[7]	Qing JIA, Laihua WANG, Weisheng WANG. Anomaly detection in video via independently recurrent neural network and variational autoencoder network [J]. Journal of Computer Applications, 2023, 43(2): 507-513.
[8]	Shaokang XU, Zhancheng ZHANG, Haonan YAO, Zhiwei ZOU, Baocheng ZHANG. 2D/3D spine medical image real-time registration method based on pose encoder [J]. Journal of Computer Applications, 2023, 43(2): 589-594.
[9]	Zhifeng MA, Junyang YU, Longge WANG. Diversity represented deep subspace clustering algorithm [J]. Journal of Computer Applications, 2023, 43(2): 407-412.
[10]	Anqin ZHANG, Xiaohui WANG. Power battery safety warning based on time series anomaly detection [J]. Journal of Computer Applications, 2023, 43(12): 3799-3805.
[11]	Chuyuan WEI, Mengke WANG, Chuanhao HU, Guangqi ZHANG. Deep review attention neural network model for enhancing explainability of recommendation system [J]. Journal of Computer Applications, 2023, 43(11): 3443-3448.
[12]	Kun FU, Yuhan HAO, Minglei SUN, Yinghua LIU. Network representation learning based on autoencoder with optimized graph structure [J]. Journal of Computer Applications, 2023, 43(10): 3054-3061.
[13]	YUAN Lining, LIU Zhao. Graph representation learning by autoencoder with one-shot aggregation [J]. Journal of Computer Applications, 2023, 43(1): 8-14.
[14]	ZHOU Jiahang, XING Hongjie. Novelty detection method based on dual autoencoders and Transformer network [J]. Journal of Computer Applications, 2023, 43(1): 22-29.
[15]	Yiyang GUO, Jiong YU, Xusheng DU, Shaozhi YANG, Ming CAO. Outlier detection algorithm based on autoencoder and ensemble learning [J]. Journal of Computer Applications, 2022, 42(7): 2078-2087.