Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3364-3370.DOI: 10.11772/j.issn.1001-9081.2022010045

• CCF Bigdata 2021 • Previous Articles     Next Articles

Popularity prediction method of Twitter topics based on evolution patterns

Weifan XIE1,2, Yan GUO1(), Guangsheng KUANG1,3, Zhihua YU1, Yuanhai XUE1, Huawei SHEN1   

  1. 1.Data Intelligence System Research Center,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 101408,China
    3.School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 101408,China
  • Received:2022-01-17 Revised:2022-03-03 Accepted:2022-03-07 Online:2022-04-18 Published:2022-11-10
  • Contact: Yan GUO
  • About author:XIE Weifan, born in 1997, M. S. candidate. His research interests include popularity prediction.
    GUO Yan, born in 1974, Ph. D., senior engineer. Her research interests include network information acquisition, network content processing.
    KUANG Guangsheng, born in 1995, M. S. candidate. His research interests include natural language processing, data fusion.
    YU Zhihua, born in 1973, Ph. D., chief senior engineer. His research interests include internet public opinion analysis.
    XUE Yuanhai, born in 1987, Ph. D., senior engineer. His research interests include information retrieval, big data.
    SHEN Huawei, born in 1982, Ph. D., research fellow. His research interests include social computing, data mining, machine learning.
  • Supported by:
    National Natural Science Foundation of China(U21B2046)

基于演化模式的推特话题流行度预测方法

解伟凡1,2, 郭岩1(), 匡广生1,3, 余智华1, 薛源海1, 沈华伟1   

  1. 1.中国科学院计算技术研究所 数据智能系统研究中心, 北京 100190
    2.中国科学院大学 计算机科学与技术学院, 北京 101408
    3.中国科学院大学 人工智能学院, 北京 101408
  • 通讯作者: 郭岩
  • 作者简介:解伟凡(1997—),男,山西运城人,硕士研究生,主要研究方向:流行度预测
    郭岩(1974—),女,陕西西安人,高级工程师,博士,主要研究方向:网络信息获取、网络内容处理 guoy@ict.ac.cn
    匡广生(1995—),男,江西赣州人,硕士研究生,主要研究方向:自然语言处理、数据融合
    余智华(1973—),男,江西吉安人,正高级工程师,博士,主要研究方向:网络舆情分析
    薛源海(1987—),男,云南玉溪人,高级工程师,博士,主要研究方向:信息检索、大数据
    沈华伟(1982—),男,河南周口人,研究员,博士,主要研究方向:社会计算、数据挖掘、机器学习。

Abstract:

A popularity prediction method of Twitter topics based on evolution patterns was proposed to address the problem that the differences between evolution patterns and the time?effectiveness of prediction were not taken into account in previous popularity prediction methods. Firstly, the K?SC (K?Spectral Centroid) algorithm was used to cluster the popularity sequences of a large number of historical topics, and 6 evolution patterns were obtained. Then, a Fully Connected Network (FCN) was trained as the prediction model by using historical topic data of each evolution pattern. Finally, in order to select the prediction model for the topic to be predicted, Amplitude?Alignment Dynamic Time Warping (AADTW) algorithm was proposed to calculate the similarity between the known popularity sequence of the topic to be predicted and each evolution pattern, and the prediction model of the evolution pattern with the highest similarity was selected to predict the popularity. In the task of predicting the popularity of the next 5 hours based on the known popularity of the first 20 hours, the Mean Absolute Percentage Error (MAPE) of the prediction results of the proposed method was reduced by 58.2% and 31.0% respectively, compared with those of the Auto?Regressive Integrated Moving Average (ARIMA) method and method using a single fully connected network. Experimental results show that the model group based on the evolution patterns can predict the popularity of Twitter topic more accurately than single model.

Key words: Twitter topic, evolution pattern, popularity prediction, social network, time series

摘要:

针对以往流行度预测方法未利用演化模式之间的差异和忽略预测时效性的问题,提出了一种基于演化模式的推特话题流行度预测方法。首先,基于K?SC算法对大量历史话题的流行度序列进行聚类,并得到6类演化模式;然后,使用各类演化模式下的历史话题数据分别训练全连接网络(FCN)作为预测模型;最后,为选择待预测话题的预测模型,提出幅度对齐的动态时间规整(AADTW)算法来计算待预测话题的已知流行度序列与各演化模式的相似度,并选取相似度最高的演化模式的预测模型进行流行度预测。在根据已知前20 h的流行度预测后5 h的流行度的任务中,与差分整合移动平均自回归(ARIMA)方法以及使用单一的全连接网络进行预测的方法相比,所提方法的预测结果的平均绝对百分比误差(MAPE)分别降低了58.2%和31.0%。实验结果表明,基于演化模式得到的模型群相较于单一模型能更加准确地预测推特话题流行度。

关键词: 推特话题, 演化模式, 流行度预测, 社交网络, 时间序列

CLC Number: