Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (3): 615-619.DOI: 10.11772/j.issn.1001-9081.2017071846

    Next Articles

Network public opinion prediction by empirical mode decomposition-autoregression based on extreme gradient boosting model

MO Zan, ZHAO Bing, HUANG Yanying   

  1. School of Management, Guangdong University of Technology, Guangzhou Guangdong 510520, China
  • Received:2017-07-31 Revised:2017-09-18 Online:2018-03-10 Published:2018-03-07
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (711710); the "Twelfth Five-Year" National Science and Technology Support Program Major Issues (2011BAD13B11); the Guangdong Provincial Regional Demonstration Project for Marine Economic Innovation and Development (GD2013-D01-001).

基于经验模态分解自回归组合模型的网络舆情预测

莫赞, 赵冰, 黄艳莹   

  1. 广东工业大学 管理学院, 广州 510520
  • 通讯作者: 赵冰
  • 作者简介:莫赞(1962-),男,广东广州人,教授,博士,主要研究方向:电子商务、管理信息系统;赵冰(1993-),女,河南周口人,硕士研究生,主要研究方向:机器学习、数据挖掘;黄艳莹(1991-),女,广东韶关人,硕士研究生,主要研究方向:机器学习、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(711710);"十二五"国家科技支撑计划重大课题(2011BAD13B11);广东省海洋经济创新发展区域示范专项(GD2013-D01-001)。

Abstract: With the arrival of big data, network public opinion data reveals the features of massive information and wide coverage. For the complicated network public opinion data, traditional single models may not efficiently predict the trend of network public opinion. To address this question, the improved combination model based on the Empirical Mode Decomposition-AutoRegression (EMD-AR) model was proposed, called EMD-ARXG (Empirical Mode Decomposition-AutoRegression based on eXtreme Gradient boosting)model. EMD-ARXG model was applied to the prediction of the trend of complex network public opinion. In this model, the Empirical Mode Decomposition (EMD) algorithm was employed to decompose the time series, and then AutoRegression (AR) model was applied to fit the decomposed time series and establish sub-models. Finally, the sub-models were reconstructed and then the modelling process was completed. In addition, in the fitting process AR model, in order to reduce the fitting error, the residual error was learned by eXtreme Gradient Boosting (XGBoost), and each sub-model was iteratively updated to improve its prediction accuracy. In order to verify the prediction performance of EMD-ARXG model, the proposed model was compared with wavelet neural network model and back propagation neural network based on EMD model. The experimental results show that the EMD-ARXG model is superior to two other models in terms of the statistical indicators including Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Theil Inequality Coefficient (TIC).

Key words: trend fitting, network public opinion prediction, Empirical Mode Decomposition (EMD), AutoRegression (AR), eXtreme Gradient Boosting (XGBoost), residual learning

摘要: 随着大数据时代的到来,网络舆情数据呈现信息量大和领域覆盖广等特征。面对复杂的网络舆情数据时,传统单一模型预测能力有限,不能对舆情趋势进行有效预测。针对此问题,提出一种基于经验模态分解-自回归(EMD-AR)改进的组合模型——EMD-ARXG模型,应用于复杂网络舆情的预测。该模型利用经验模态分解算法对时间序列进行分解,然后通过自回归模型对分解后的时间序列进行各自趋势拟合,建立子模型。最后再对各个子模型进行重构,完成建模。另外,在利用自回归(AR)模型拟合过程中,为了减少拟合误差,采用极限梯度提升算法对残差进行学习,并使预测模型迭代更新,提高各个子模型预测精度。为验证EMD-ARXG模型的预测效果,该模型与小波神经网络模型和基于经验模态分解的神经网络模型进行实验对比。实验结果表明,在均方根误差(RMSE)、平均绝对百分误差(MAPE)和希尔不等系数(TIC)三项指标上,EMD-ARXG模型获得的结果均优于小波神经网络模型和基于经验模态分解的神经网络模型的结果。

关键词: 趋势拟合, 网络舆情预测, 经验模态分解, 自回归, 极限梯度提升, 残差学习

CLC Number: