Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Analysis and prediction model of Weibo public opinion heat by integrating BERT and X-means algorithm
Zhangtao JIANG, Xin LI, Shihao ZHANG, Xinyang ZHAO
Journal of Computer Applications    2025, 45 (10): 3138-3145.   DOI: 10.11772/j.issn.1001-9081.2024091371
Abstract29)   HTML0)    PDF (2525KB)(9)       Save

In public opinion discovery and prediction on social media platforms such as Weibo, “fake hotspots” created by internet trolls will affect analysis accuracy. To reflect Weibo public opinion heat accurately, a Weibo public opinion heat analysis and prediction model integrating BERT (Bidirectional Encoder Representations from Transformers) and X-means algorithm, called BXpre, was proposed, which was designed to integrate attribute features of the participating users and time domain features of the heat changes, thereby improving prediction accuracy of heat. Firstly, Weibo original posts and interaction user data were preprocessed, and the fine-tuned StructBERT model was used to classify these data, determining the relevance between interaction users and the original posts. This relevance was used as a reference value for calculating users’ contribution weights to the heat growth of the posts. Secondly, interaction users were clustered according to their features by using X-means algorithm, and trolls were filtered based on the resulting cluster states. After that, a weight penalty mechanism targeting troll samples was introduced, and a Weibo heat index model was further constructed by combining label relevance. Finally, cosine similarity of the second derivative of the prior heat value varying with time and real data was calculated to predict future changes in Weibo heat. Experimental results show that BXpre has the Weibo public opinion heat rankings produced by the model closer to the real data under different user scales. Under mixed-scale test conditions, BXpre has the prediction correlation index reached 90.88%, which is improved by 12.71, 14.80, and 11.30 percentage points compared with three traditional methods based on LSTM (Long Short-Term Memory) network, XGBoost (eXtreme Gradient Boosting) algorithm, and TDR (Temporal Difference Ranking) separately, and is improved by 9.76 and 11.95 percentage points, respectively, compared with ChatGPT and Wenxin Yiyan.

Table and Figures | Reference | Related Articles | Metrics