Journal of Computer Applications

    Next Articles

Predictive model of Weibo public opinion heat analysis by integrating BERT and X-means algorithm

JIANG Zhangtao, LI Xin, ZHANG Shihao, ZHAO Xinyang   

  1. Department of Information Network Security, People's Public Security University of China
  • Received:2024-09-24 Revised:2025-01-24 Online:2025-03-14 Published:2025-03-14
  • About author:JIANG Zhangtao, born in 2000, M. S. candidate. His research interests include cyber security, technical intelligence. LI Xin, born in 1977, Ph.D., professor. His research interests include artificial intelligence, cyber security. ZHANG Shihao, born in 1992, Ph.D. candidate, lecturer. His research interests include data mining, social network analysis. ZHAO Xinyang, born in 2002, M. S. candidate. His research interests include information hiding.
  • Supported by:
    Fundamental Research Funds for the Central University (2020JKF316)

融合BERT与X-means算法的微博舆情热度分析预测模型

蒋章涛,李欣,张士豪,赵心阳   

  1. 中国人民公安大学 信息网络安全学院
  • 通讯作者: 张士豪
  • 作者简介:蒋章涛(2000—),男,山东济南人,硕士研究生,CCF会员,主要研究方向:网络安全、技术情报;李欣(1977—),男,江西赣州人,教授,博士,CCF会员,主要研究方向:人工智能、网络安全;张士豪(1992—),男,山西临汾人,讲师,博士研究生,主要研究方向:数据挖掘,社会网络分析;赵心阳(2002—),男,山东临沂人,硕士研究生,主要研究方向:信息隐藏。
  • 基金资助:
    中央高校基本科研业务费专项资金资助基金项目(2020JKF316)

Abstract: In public opinion discovery and prediction on social media platforms such as Weibo, "fake hotspots" created by internet trolls were found to affect analysis accuracy. To accurately reflect Weibo public opinion trends, a Weibo public opinion heat analysis and prediction model, BXpre, which integrates BERT and the X-means algorithm, was proposed. First, Weibo posts and interaction user data were preprocessed. The fine-tuned StructBERT model was used to classify these data, determining the relevance between interactive users and the original posts. This relevance was used as a reference value for calculating users' contribution weights to the heat growth of the posts. Second, based on the X-means algorithm, interaction user characteristics were clustered, and trolls were filtered according to the homogeneity features of the clusters. A weight penalty mechanism targeting troll samples was introduced, and a heat index model was further constructed by combining label relevance. Finally, the second derivative of the prior heat value over time and its cosine similarity with real data were calculated to predict future changes in Weibo heat. BXpre was designed to integrate the attribute features of participating users and the temporal features of heat changes, thereby improving prediction accuracy. The experimental results showed that BXpre integrated the attribute features of participating Weibo users and the temporal features of heat changes. Under different user scale conditions, the Weibo public opinion heat rankings produced by the model were closer to the real data. Under mixed-scale test conditions, the prediction correlation index reached 90.88%. Compared with three traditional methods, the average improvement was 12.94 percentage points. Additionally, compared with ChatGPT and Wenxin Yiyan, the prediction correlation index was improved by 9.76 and 11.95 percentage points, respectively.

Key words: Weibo public opinion heat analysis and prediction, BERT model, X-means algorithm, troll detection, social network analysis

摘要: 在微博等社交媒体的舆情发现和预测中,网络水军制造的“假热点”会影响分析准确性。为真实反映微博舆情热度,提出一种融合BERT和X-means算法的微博舆情热度分析预测模型BXpre。首先,对微博原文和互动用户数据进行预处理,利用微调后的StructBERT模型分别对这些数据进行分类,从而确定参与互动用户与微博原文的关联度,作为用户对该微博热度增长贡献度权重计算的参考值;其次,基于X-means算法,以互动用户特征为依据进行聚类,根据聚集态的同质性特征进行水军过滤,并引入针对水军样本的权重惩罚机制,结合标签关联度,进一步构建微博热度指标模型;最后,模型通过计算先验热度值随时间变化的二阶导数与真实数据的余弦相似度预测未来微博热度变化。BXpre旨在融合微博参与用户的属性特征与热度变化时域特征,提高热度预测的准确性。实验结果表明,BXpre融合微博参与用户的属性特征与热度变化时域特征,在不同用户量级下输出的微博舆情热度排序结果更贴近真实数据,量级混合测试条件下,预测相关性指标达90.88%。较三种传统方法,平均提升了12.94个百分点,相较ChatGPT和文心一言,预测相关性指标提升了9.76和11.95个百分点。

关键词: 微博舆情热度分析预测, BERT模型, X-means算法, 水军识别, 社交网络分析

CLC Number: