Microblog Events Detection and Tracking based on RIHDBSCAN using Cloud Framework
FENG Yong1,2,HAN Nan1,2,JIA Dongfeng1,2
1. College of Computer Science, Chongqing University, Chongqing 400044, China 2. Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education (Chongqing University), Chongqing 400044 China
Abstract:For the purpose of events extraction from large-scale short posts of microblogging service, a complete event detection and tracking algorithm was proposed using cloud framework. First, based on the number of forward and comment of the microblog, the posts were expressed as Vector Space Model (VSM). Then the keywords were extracted using RIHDBSCAN (Incremental Hierarchical DBSCAN based on Representative posts) to realize the event detection and tracking. Considering that a single node cannot quickly and efficiently handle the large amount of data, the algorithm would be deployed on Hadoop, a cloud computing platform. The experiment on real microblog data extracted from Sina microblogging platform shows that the proposed method achieves higher performance than that of TF-IDF (Term Frequency-Inverse Document Frequency) and UF-ITUF (User Frequency-Inverse Thread User Frequency), and the use of cloud framework improves the processing speed. Therefore, it is suitable for data analysis and mining on huge datasets.
冯永 韩楠 贾东风. 云计算环境下基于代表点增量层次密度聚类的微博事件检测及跟踪[J]. 计算机应用, 2013, 33(12): 3559-3562.
FENG Yong HAN Nan JIA Dongfeng. Microblog Events Detection and Tracking based on RIHDBSCAN using Cloud Framework. Journal of Computer Applications, 2013, 33(12): 3559-3562.