BTopicMiner: domain-specific topic mining system for Chinese microblog
LI Jin1,2,ZHANG Hua3,WU Hao-xiong3,XIANG Jun3
1. Information Management Department,Central China Normal University, Wuhan Hubei 430074, China 2. School of Information Engineering, Hubei University for Nationalities, Enshi Hubei 445000,China 3. School of Information Engineering, Hubei University for Nationalities, Enshi Hubei 445000, China
Abstract:As microblog application grows rapidly, how to extract users' interested popular topic from massive microblog information automatically becomes a challenging research area. This paper studied and proposed a topic extraction algorithm of Chinese microblog based on extended topic model. In order to deal with data sparse problem of microblog, the content related microblog text would be firstly clustered to generate synthetic document. Based on the assumption that posting relationship among microblogs implied topical correlation, the traditional LDA (Latent Dirichlet Allocation) topic model was extended to model the posting relationship among microblogs. At last, Mutual Information (MI) measurement was utilized to calculate topic vocabulary after extracting topics by proposing extended LDA topic model for topic recommendation. Furthermore, a prototype system for domain-specific topical mining system, named BTopicMiner, was implemented so as to verify the effectiveness of the proposed algorithm. The experimental result shows that the proposed algorithm can extract topics from microblogs more accurately. Meanwhile, the semantic similarity between automatically calculated topic vocabulary and manually selected topic vocabulary exceeds 75% while automatically calculating topic vocabulary based on MI.
ALLAN J, PAPKA R, LAVRENKO V. On-line new event detection and tracking [C]// SIGIR '98: Proceedings of the 21th ACM SIGIR International Conference on Research and Development in Information Retrieval. New York: ACM, 1998:37-45.
RAMAGE D,DUMAIS S T,LIEBLING D J.Characterizing microblogs with topic models [C]// Proceedings of the Fourth International Conference on Weblogs and Social Media.Menlo Park: AAAI Press,2010:130-137.
[4]
ASUNCION A, SMYTH P, WELLING M. Asynchronous distributed learning of topic models [C]// NIPS 2008: Proceedings of the 22th Annual Conference on Neural Information Processing Systems. Atlanta: Curran Associates Inc, 2008: 81-88.
[5]
BLEI D M, LAFFERTY J D. A correlated topic model of science [J].Annals of Applied Statistics, 2007, 1(1):17-35.
[6]
SANKARANARAYANAN J, SAMET H, BENJAMIN E T, et al. TwitterStand: news in Tweets [C]// Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 2009:42-51.
[7]
SHARIFI B M, HUTTON A, KALITA J K. Automatic microblog classification and summarization [C]// Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. Stroudsburg: The Association for Computational Linguistics, 2010: 685-688.
[8]
INOUYE D. Multiple post microblog summarization [R]. Colorado Springs, GA: University of Colorado at Colorado Springs, 2010.
[9]
YEUNG C-M A, IWATA T. Capturing implicit user influencein online social sharing [C]// Proceedings of the 21th ACM Conference on Hypertext and Hypermedia. New York: ACM, 2010:245-254.
[10]
ANAGNOSTOPOULOS A, KUMAR R, MAHDIAN M. Influence and correlation in social networks [C]// KDD'08: Proceeding of the 14th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008: 7-15.
[11]
CRANDALL D, COSLEY D, HUTTENLOCHER D, et al. Feedback effects between similarity and social influence in online communities [C]// KDD'08: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008:160-168.
[12]
GOYAL A, BONCHI F, LAKSHMANAN L V S. Learning influence probabilities in social networks [C]// WSDM'10: Proceedings of the Third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010:241-250.
[13]
GUERRA P H C, VELOSO A, MEIRA W, Jr, et al. From bias to opinion: A transfer-learning approach to real-time sentiment analysis [C]// KDD'11: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011: 150-158.
[14]
SILVA I S, GOMIDE J, VELOSO A, et al. Effective sentiment stream analysis with self-augmenting training and demand-driven projection [C]// SIGIR'11: Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011:475-484.
[15]
WANG XIAOLONG, WEI FURU, LIU XIAOHUA, et al. Topic sentiment analysis in Twitter: A graph-based hashtag sentiment classification approach [C]// CIKM '11: Proceedings of the 20th ACM Conference on Information and Knowledge Management. New York: ACM, 2011:1031-1040.