Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (10): 3138-3145.DOI: 10.11772/j.issn.1001-9081.2024091371
• Artificial intelligence • Previous Articles
Zhangtao JIANG, Xin LI, Shihao ZHANG(), Xinyang ZHAO
Received:
2024-09-27
Revised:
2025-02-14
Accepted:
2025-02-17
Online:
2025-03-14
Published:
2025-10-10
Contact:
Shihao ZHANG
About author:
JIANG Zhangtao, born in 2000, M. S. candidate. His research interests include cyber security, technical intelligence.Supported by:
通讯作者:
张士豪
作者简介:
蒋章涛(2000—),男,山东济南人,硕士研究生,CCF会员,主要研究方向:网络安全、技术情报基金资助:
CLC Number:
Zhangtao JIANG, Xin LI, Shihao ZHANG, Xinyang ZHAO. Analysis and prediction model of Weibo public opinion heat by integrating BERT and X-means algorithm[J]. Journal of Computer Applications, 2025, 45(10): 3138-3145.
蒋章涛, 李欣, 张士豪, 赵心阳. 融合BERT与X-means算法的微博舆情热度分析预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3138-3145.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024091371
编号 | IP属地 | 设备名称 | 互动时间 | 关注数 | 粉丝数 | 关联度 | 原创率/% | 性别 |
---|---|---|---|---|---|---|---|---|
1 | 辽宁 | iPhone客户端 | 2024-03-17 T 13:43 | 52 | 8 | 0.247 | 20 | 女 |
2 | 辽宁 | iPhone 13 | 2024-03-17 T 13:41 | 179 | 12 | 0.234 | 60 | 女 |
3 | 湖北 | iPhone客户端 | 2024-03-17 T 10:03 | 45 | 179 | 0.238 | 30 | 女 |
4 | 江苏 | 华为P30 Pro | 2024-03-17 T 05:23 | 126 | 17 | 0.167 | 10 | 女 |
5 | 广东 | iPhone客户端 | 2024-03-16 T 23:32 | 148 | 10 | 0.266 | 0 | 女 |
6 | 江苏 | 华为 Mate 60 | 2024-03-16 T 23:29 | 65 | 3 | 0.244 | 10 | 女 |
7 | 吉林 | iPhone客户端 | 2024-03-16 T 23:21 | 111 | 24 | 0.167 | 10 | 男 |
8 | 广东 | iPad Pro | 2024-03-16 T 22:55 | 94 | 13 | 0.185 | 0 | 女 |
9 | 山东 | 华为Mate40 | 2024-03-16 T 20:09 | 74 | 24 | 0.249 | 0 | 女 |
10 | 河北 | Redmi K40 | 2024-03-16 T 18:36 | 26 | 15 | 0.259 | 0 | 女 |
Tab. 1 Examples of some data of interaction users
编号 | IP属地 | 设备名称 | 互动时间 | 关注数 | 粉丝数 | 关联度 | 原创率/% | 性别 |
---|---|---|---|---|---|---|---|---|
1 | 辽宁 | iPhone客户端 | 2024-03-17 T 13:43 | 52 | 8 | 0.247 | 20 | 女 |
2 | 辽宁 | iPhone 13 | 2024-03-17 T 13:41 | 179 | 12 | 0.234 | 60 | 女 |
3 | 湖北 | iPhone客户端 | 2024-03-17 T 10:03 | 45 | 179 | 0.238 | 30 | 女 |
4 | 江苏 | 华为P30 Pro | 2024-03-17 T 05:23 | 126 | 17 | 0.167 | 10 | 女 |
5 | 广东 | iPhone客户端 | 2024-03-16 T 23:32 | 148 | 10 | 0.266 | 0 | 女 |
6 | 江苏 | 华为 Mate 60 | 2024-03-16 T 23:29 | 65 | 3 | 0.244 | 10 | 女 |
7 | 吉林 | iPhone客户端 | 2024-03-16 T 23:21 | 111 | 24 | 0.167 | 10 | 男 |
8 | 广东 | iPad Pro | 2024-03-16 T 22:55 | 94 | 13 | 0.185 | 0 | 女 |
9 | 山东 | 华为Mate40 | 2024-03-16 T 20:09 | 74 | 24 | 0.249 | 0 | 女 |
10 | 河北 | Redmi K40 | 2024-03-16 T 18:36 | 26 | 15 | 0.259 | 0 | 女 |
微博 ID | 微博内容 | 预测排名 | 真实 排名 | |||
---|---|---|---|---|---|---|
LSTM | XGBoost | TDR | BXpre | |||
X727 | 高校迎校庆…… | 18 | 16 | 4 | 2 | 1 |
X450 | 心中有爱…… | 15 | 14 | 1 | 3 | 2 |
X083 | 工作中遇到了…… | 3 | 12 | 3 | 1 | 3 |
X632 | 第五人格与…… | 6 | 18 | 5 | 4 | 4 |
X054 | 发现网购的桌…… | 5 | 6 | 2 | 5 | 5 |
X117 | 今天的风景…… | 1 | 2 | 6 | 6 | 6 |
X682 | 冰美式制作教学 | 19 | 20 | 7 | 7 | 7 |
X171 | 感受到了一丝…… | 7 | 8 | 8 | 8 | 8 |
X778 | 女子吐槽花高…… | 10 | 5 | 9 | 9 | 9 |
X827 | 男子在农村家…… | 2 | 3 | 12 | 12 | 10 |
X020 | 货拉拉搬家广告 | 4 | 4 | 10 | 11 | 11 |
X214 | 快乐源于简单…… | 8 | 17 | 14 | 13 | 12 |
X703 | 爱笑的人运气…… | 12 | 9 | 13 | 14 | 13 |
X744 | 给你一个亿…… | 20 | 15 | 15 | 16 | 14 |
X377 | 爱是下意识的…… | 16 | 10 | 16 | 15 | 15 |
X660 | 生活充满希望…… | 11 | 13 | 17 | 18 | 16 |
X581 | 智能驾驶遇见…… | 13 | 7 | 18 | 17 | 17 |
X397 | 美食与生活…… | 9 | 1 | 19 | 10 | 18 |
X429 | 应届大学生就…… | 17 | 19 | 11 | 20 | 19 |
X450 | 兰迪喝药视频 | 14 | 11 | 20 | 19 | 20 |
Tab. 2 Comparison of predicted and true rankings of different models
微博 ID | 微博内容 | 预测排名 | 真实 排名 | |||
---|---|---|---|---|---|---|
LSTM | XGBoost | TDR | BXpre | |||
X727 | 高校迎校庆…… | 18 | 16 | 4 | 2 | 1 |
X450 | 心中有爱…… | 15 | 14 | 1 | 3 | 2 |
X083 | 工作中遇到了…… | 3 | 12 | 3 | 1 | 3 |
X632 | 第五人格与…… | 6 | 18 | 5 | 4 | 4 |
X054 | 发现网购的桌…… | 5 | 6 | 2 | 5 | 5 |
X117 | 今天的风景…… | 1 | 2 | 6 | 6 | 6 |
X682 | 冰美式制作教学 | 19 | 20 | 7 | 7 | 7 |
X171 | 感受到了一丝…… | 7 | 8 | 8 | 8 | 8 |
X778 | 女子吐槽花高…… | 10 | 5 | 9 | 9 | 9 |
X827 | 男子在农村家…… | 2 | 3 | 12 | 12 | 10 |
X020 | 货拉拉搬家广告 | 4 | 4 | 10 | 11 | 11 |
X214 | 快乐源于简单…… | 8 | 17 | 14 | 13 | 12 |
X703 | 爱笑的人运气…… | 12 | 9 | 13 | 14 | 13 |
X744 | 给你一个亿…… | 20 | 15 | 15 | 16 | 14 |
X377 | 爱是下意识的…… | 16 | 10 | 16 | 15 | 15 |
X660 | 生活充满希望…… | 11 | 13 | 17 | 18 | 16 |
X581 | 智能驾驶遇见…… | 13 | 7 | 18 | 17 | 17 |
X397 | 美食与生活…… | 9 | 1 | 19 | 10 | 18 |
X429 | 应届大学生就…… | 17 | 19 | 11 | 20 | 19 |
X450 | 兰迪喝药视频 | 14 | 11 | 20 | 19 | 20 |
用户量级 | 分组数 | 分组主题 |
---|---|---|
[0,5 000) | 30 | 娱乐明星、体育、新闻资讯、美食、情感心理、旅游、时尚美妆、摄影、购物、职场、育儿家庭、校园、动漫游戏、 科学技术、历史人文、军事、健康养生、经济理财、环保、电视剧 |
[5 000,10 000) | 15 | 娱乐与文化、体育运动、新闻资讯、美食与旅行、时尚与生活、摄影与艺术、情感与心理健康、职场发展、 家庭与育儿、科学技术、经济与理财、环保、历史与文化、社会与时事、休闲娱乐 |
[10 000,15 000) | 10 | 娱乐与文化、运动与健康、新闻与时事、美食与旅行、时尚与生活、家庭与情感、教育与校园、科技与经济、 人文与社会、军事与安全 |
[15 000,20 000) | 8 | 娱乐与文化、生活与消费、体育与健康、社会与时事、人文与教育、情感与心理、职场与经济、科技与创新 |
[20 000,25 000) | 6 | 娱乐与文化、社会新闻、运动健康、经济职场、教育家庭、旅行情感 |
[25 000,30 000) | 5 | 娱乐与文化、社会与生活、教育与知识、健康与运动、环境与探索 |
Tab. 3 Grouping under different user scales
用户量级 | 分组数 | 分组主题 |
---|---|---|
[0,5 000) | 30 | 娱乐明星、体育、新闻资讯、美食、情感心理、旅游、时尚美妆、摄影、购物、职场、育儿家庭、校园、动漫游戏、 科学技术、历史人文、军事、健康养生、经济理财、环保、电视剧 |
[5 000,10 000) | 15 | 娱乐与文化、体育运动、新闻资讯、美食与旅行、时尚与生活、摄影与艺术、情感与心理健康、职场发展、 家庭与育儿、科学技术、经济与理财、环保、历史与文化、社会与时事、休闲娱乐 |
[10 000,15 000) | 10 | 娱乐与文化、运动与健康、新闻与时事、美食与旅行、时尚与生活、家庭与情感、教育与校园、科技与经济、 人文与社会、军事与安全 |
[15 000,20 000) | 8 | 娱乐与文化、生活与消费、体育与健康、社会与时事、人文与教育、情感与心理、职场与经济、科技与创新 |
[20 000,25 000) | 6 | 娱乐与文化、社会新闻、运动健康、经济职场、教育家庭、旅行情感 |
[25 000,30 000) | 5 | 娱乐与文化、社会与生活、教育与知识、健康与运动、环境与探索 |
微博互动用户量级 | Spearman相关系数/% | |||
---|---|---|---|---|
LSTM | XGBoost | TDR | BXpre | |
[0,5 000) | 87.36 | 80.62 | 92.42 | 94.60 |
[5 000,10 000) | 95.66 | 94.88 | 81.81 | 98.52 |
[10 000,15 000) | 62.19 | 64.80 | 78.15 | 92.32 |
[15 000,20 000) | 50.70 | 47.47 | 74.49 | 92.30 |
[20 000,25 000) | 84.71 | 81.94 | 82.39 | 87.64 |
[25 000,30 000) | 71.14 | 45.99 | 72.32 | 82.53 |
混合 | 78.17 | 76.08 | 79.58 | 90.88 |
Tab. 4 Prediction results under different user scales
微博互动用户量级 | Spearman相关系数/% | |||
---|---|---|---|---|
LSTM | XGBoost | TDR | BXpre | |
[0,5 000) | 87.36 | 80.62 | 92.42 | 94.60 |
[5 000,10 000) | 95.66 | 94.88 | 81.81 | 98.52 |
[10 000,15 000) | 62.19 | 64.80 | 78.15 | 92.32 |
[15 000,20 000) | 50.70 | 47.47 | 74.49 | 92.30 |
[20 000,25 000) | 84.71 | 81.94 | 82.39 | 87.64 |
[25 000,30 000) | 71.14 | 45.99 | 72.32 | 82.53 |
混合 | 78.17 | 76.08 | 79.58 | 90.88 |
模型架构 | 混合量级条件下的 预测排序相关性 |
---|---|
BXpre(without StructBERT) | 82.30 |
BXpre(without X-means) | 83.92 |
BXpre(without X-means & StructBERT) | 78.46 |
BXpre(BERT) | 88.53 |
BXpre(K-means) | 87.12 |
BXpre | 90.88 |
Tab. 5 Ablation experimental results and performance of replacement algorithm schemes
模型架构 | 混合量级条件下的 预测排序相关性 |
---|---|
BXpre(without StructBERT) | 82.30 |
BXpre(without X-means) | 83.92 |
BXpre(without X-means & StructBERT) | 78.46 |
BXpre(BERT) | 88.53 |
BXpre(K-means) | 87.12 |
BXpre | 90.88 |
算法 | 混合量级条件下的预测排序相关性 |
---|---|
ChatGPT | 81.12 |
文心一言 | 78.93 |
BXpre | 90.88 |
Tab. 6 Prediction effect of large language models
算法 | 混合量级条件下的预测排序相关性 |
---|---|
ChatGPT | 81.12 |
文心一言 | 78.93 |
BXpre | 90.88 |
[1] | 中国互联网络信息中心. 第53 次中国互联网络发展状况统计报告[R/OL]. [2024-06-11]. . |
China Internet Network Information Center. The 53rd statistical report on China’s Internet development[R/OL]. [2024-06-11]. . | |
[2] | CHEN X, LAN X, WAN J, et al. Evolutionary prediction of nonstationary event popularity dynamics of Weibo social network using time-series characteristics[J]. Discrete Dynamics in Nature and Society, 2021, 2021: No.5551718. |
[3] | NIA Z M, KHAYYAMBASHI M R. Improving content popularity prediction with k-means clustering and deep-belief networks[J]. Multimedia Tools and Applications, 2021, 80(10): 15745-15764. |
[4] | 韩玮,陈安. 基于焦耳定律的公共危机事件网络舆情热度模型研究[J]. 情报科学, 2021, 39(2): 24-33. |
HAN W, CHEN A. The Internet public opinion hot-degree model of public crisis events based on Joule’s law[J]. Information Science, 2021, 39(2): 24-33. | |
[5] | LYMPEROPOULOS I N. RC-Tweet: modeling and predicting the popularity of tweets through the dynamics of a capacitor[J]. Expert Systems with Applications, 2021, 163: No.113785. |
[6] | 刘经纬,张淑琪. 基于情感分析的微博热点话题演化分析[J]. 信息系统工程, 2022(12):137-140. |
LIU J W, ZHANG S Q. Analysis of the evolution of Weibo hot topics based on sentiment analysis[J]. Information Systems Engineering, 2022(12): 137-140. | |
[7] | 黄微,刘熠,许烨婧,等. 网络舆情推文的热度测度模型构建[J]. 图书情报工作, 2019, 63(20):17-25. |
HUANG W, LIU Y, XU Y J, et al. The construction of heat assessment model for tweets of network public opinion[J]. Library and Information Service, 2019, 63(20): 17-25. | |
[8] | 杜慧,郭岩,范意兴,等. 基于因果模型的主题热度计算与预测方法[J]. 中文信息学报, 2016, 30(2): 50-55. |
DU H, GUO Y, FAN Y X, et al. Calculation and prediction methods of topic hot-degree based on causal models[J]. Journal of Chinese Information Processing, 2016, 30(2): 50-55. | |
[9] | 郑作武,邵斯绮,高晓沨,等. 基于社交圈层和注意力机制的信息热度预测[J]. 计算机学报, 2021, 44(5): 921-936. |
ZHENG Z W, SHAO S Q, GAO X F, et al. Social circle and attention based information popularity prediction[J]. Chinese Journal of Computers, 2021, 44(5): 921-936. | |
[10] | ZHANG C. Analysis of Weibo user characteristics and emotional tendency in COVID-19 scenario based on K-means clustering algorithm[C]// Proceedings of the 6th Annual International Conference on Data Science and Business Analytics. Piscataway: IEEE, 2022: 29-32. |
[11] | 张梦瑶,朱广丽,张顺香,等. 基于情感分析的微博热点话题用户群体划分模型[J]. 数据分析与知识发现, 2021, 5(2): 43-49. |
ZHANG M Y, ZHU G L, ZHANG S X, et al. Grouping microblog users of trending topics based on sentiment analysis[J]. Data Analysis and Knowledge Discovery, 2021, 5(2): 43-49. | |
[12] | 王惠茹. 基于二维分析框架的新浪微博舆情热度预测模型[D]. 北京:中国石油大学(北京), 2023. |
WANG H R. Prediction model of Sina Weibo public opinion heat based on two-dimensional analysis framework[D]. Beijing: China University of Petroleum, Beijing, 2023. | |
[13] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
[14] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
[15] | YIN W, HAY J, ROTH D. Benchmarking zero-shot text classification: datasets, evaluation and entailment approach[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3914-3923. |
[16] | WANG W, ZHENG V W, YU H, et al. A survey of zero-shot learning: settings, methods, and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): No.13. |
[17] | CHEN L, CHEN J. Global social network warfare on public opinion[C]// Proceedings of the 20th European Conference on Cyber Warfare and Security. Sonning Common: Academic Conferences and Publishing International Ltd., 2021: 71-79. |
[18] | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. |
[19] | CHEN T, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794. |
[20] | CERQUEIRA V, TORGO L, SMAILOVIĆ J, et al. A comparative study of performance estimation methods for time series forecasting[C]// Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics. Piscataway: IEEE, 2017: 529-538. |
[1] | Bohan ZHANG, Le LYU, Junchang JING, Dong LIU. Genetic algorithm-based community hiding method in attribute networks [J]. Journal of Computer Applications, 2025, 45(9): 2817-2826. |
[2] | Dingmu YANG, Longqiang NI, Jing LIANG, Zhaoyuan QIU, Yongzhen ZHANG, Zhiqiang QI. Protocol conversion method based on semantic similarity [J]. Journal of Computer Applications, 2025, 45(4): 1263-1270. |
[3] | Yu YANG, Weiwei DUAN. Spectral clustering based dynamic community discovery algorithm in social network [J]. Journal of Computer Applications, 2023, 43(10): 3129-3135. |
[4] | LI Yuanhao, LU Ping, WU Yifan, WEI Wei, SONG Guojie. Mobile social network oriented user feature recognition of age and sex [J]. Journal of Computer Applications, 2016, 36(2): 364-371. |
[5] | WU Jiehua. TAN Model For Ties Prediction In Social Networks [J]. Journal of Computer Applications, 2013, 33(11): 3134-3137. |
[6] | FENG Yong LI Junping XU Hongyan DANG Xiaowan. Collaborative recommendation method improvement based on social network analysis [J]. Journal of Computer Applications, 2013, 33(03): 841-844. |
[7] | CHEN Ke-jia HAN Jing-yu ZHENG Zheng-zhong ZHANG Hai-jin. Application of active learning to recommender system in communication network [J]. Journal of Computer Applications, 2012, 32(11): 3038-3041. |
[8] | . Three probes into the social network analysis and consortium information mining — mining the structure, core and communication behavior of virtual consortium [J]. Journal of Computer Applications, 2006, 26(9): 2020-2023. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||