Analysis on distinguishing product reviews based on top-k emerging patterns
LIU Lu1, WANG Yining1, DUAN Lei1,2, NUMMENMAA Jyrki3, YAN Li1, TANG Changjie1
1. School of Computer Science, Sichuan University, Chengdu Sichuan 610065, China; 2. West China School of Public Health, Sichuan University, Chengdu Sichuan 610041, China; 3. School of Information Sciences, University of Tampere, Tampere FI-33014, Finland
Abstract:With the development of e-commerce, online shopping Web sites provide reviews for helping a customer to make the best choice. However, the number of reviews is huge, and the content of reviews is typically redundant and non-standard. Thus, it is difficult for users to go through all reviews in a short time and find the distinguishing characteristics of a product from the reviews. To resolve this problem, a method to mine top-k emerging patterns was proposed and applied to mining reviews of different products. Based on the proposed method, a prototype, called ReviewScope, was designed and implemented. ReviewScope can find significant comments of certain goods as decision basis, and provide visualization results. The case study on real world data set of JD.com demonstrates that ReviewScope is effective, flexible and user-friendly.
[1] DONG G, JAMES B. Contrast data mining: concepts, algorithms, and applications [M]. Boca Raton: CRC Press, 2013: 3-12. [2] DONG G, LI J. Efficient mining of emerging patterns: discovering trends and differences [C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2000: 43-52. [3] RAN X, DUAN L, LYU G. A contrast learning based approach for customer reviews crawling from dynamic Web pages [C]//Proceedings of the 29th National Database Conference of China. Beijing: Science Press, 2012: 52-57. (冉熙璐, 段磊, 吕广奕,等. 基于对比学习的动态网页商品评论获取方法[C]//第29届中国数据库学术会议论文集. 北京:科学出版社, 2012: 52-57.) [4] ZHANG Q, WU Y, LI T, et al. Mining product reviews based on shallow dependency parsing [C]//SIGIR 2009: Proceedings of the 2009 International Conference on Research on Development in Information Retrieval. New York: ACM Press, 2009:726-727. [5] HU M, LIU B. Mining and summarizing customer reviews [C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2004: 168-177. [6] ARCHAK N, GHOSE A, IPEIROTIS P G. Show me the money!: deriving the pricing power of product features by mining consumer reviews [C]//KDD 2007: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2007: 56-65. [7] DUAN L, TANG C, DONG G, et al. Survey on emerging pattern based contrast mining and applications [J]. Journal of Computer Applications, 2012, 32(2): 304-308. (段磊, 唐常杰, DONG G,等. 基于显露模式的对比挖掘研究及应用进展[J]. 计算机应用, 2012, 32(2): 304-308.) [8] LI Y, FAN M. E-mail categorization and filtering technology based on essential emerging pattern [J]. Journal of Nanjing University: Natural Science, 2008, 44(5): 544-550. (李艳, 范明. 基于基本显露模式的电子邮件分类与过滤技术 [J]. 南京大学学报:自然科学版, 2008, 44(5): 544-550.) [9] LI J, LIU H, JAMES R D, et al. Simple rules underlying gene expression profiles of more than six subtypes of Acute Lymphoblastic Leukemia (ALL) patients [J]. Bioinformatics, 2003, 19(1): 71-78. [10] DONG G, FORE N. Discovering dynamic logical blog communities based on their distinct interest profiles [EB/OL]. [2014-12-10]. http://www.thinkmind.org/download.php?articleid=sotics_2011_2_10_30018. [11] DON A, ZHELEVA E, GREGORY M, et al. Discovering interesting usage patterns in text collections: integrating text mining with visualization [C]//Proceedings of the 16th ACM Conference on Information and Knowledge Management, New York: ACM Press, 2007: 213-222. [12] LEUNG C, CARMICHAEL C. Exploring social networks: a frequent pattern visualization approach [C]//Proceedings of the 2010 IEEE Second International Conference on Social Computing. Piscataway: IEEE Press,2010: 419-424. [13] LAVRAC N, JESENOVE D, TRDIN N. Mining spatio-temporal data of traffic accidents and spatial pattern visualization [J]. Metodoloski Zvezki, 2008, 5(1): 45-63. [14] KASER O, LEMIRE D. Tag-cloud drawing: algorithms for cloud visualization [EB/OL].[2015-01-20]. http://arxiv.org/abs/cs/0703109. [15] ZHOU L, ZHANG D. NLPIR: a theoretical framework for applying natural language processing to information retrieval [J]. Journal of the American Society for Information Science and Technology, 2003, 54(2): 115-123.