Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (6): 1675-1681.DOI: 10.11772/j.issn.1001-9081.2017112786

Previous Articles     Next Articles

Variance reduced stochastic variational inference algorithm for topic modeling of large-scale data

LIU Zhanghu, CHENG Chunling   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210003, China
  • Received:2017-11-27 Revised:2018-02-02 Online:2018-06-10 Published:2018-06-13

面向大规模数据主题建模的方差减小的随机变分推理算法

刘张虎, 程春玲   

  1. 南京邮电大学 计算机学院, 南京 210003
  • 通讯作者: 刘张虎
  • 作者简介:刘张虎(1992-),男,江苏盐城人,硕士研究生,主要研究方向:机器学习、自然语言处理;程春玲(1972-),女,陕西西安人,教授,博士,CCF会员,主要研究方向:数据管理、云计算中的资源管理和优化。

Abstract: Stochastic Variational Inference (SVI) has been successfully applied to many types of models including topic models. Although it is extended to deal with large-scale data set with mapping the problem of reasoning to the optimization problems involving random gradient, the inherent noise of the stochastic gradient in SVI algorithm makes it produce large variance, which hinders fast convergence. In order to solve the problem, an improved Variance Reduced SVI (VR-SVI) was proposed. Firstly, the sliding window method was used to recalculate the noise term in the stochastic gradient, a new stochastic gradient was constructed, and the influence of noise on the stochastic gradient was reduced. Then, it was proved that the proposed algorithm could reduce the variance of random gradient on the basis of SVI. Finally, the influence of window size on the algorithm was discussed, and the convergence of algorithm was analyzed. The experimental results show that, the proposed VR-SVI algorithm can not only reduce the variance of stochastic gradient, but also save the computation time and achieve fast convergence.

Key words: Stochastic Variational Inference (SVI), sliding window, stochastic gradient, variance reduction, topic modeling

摘要: 随机变分推理(SVI)已被成功应用于在包括主题模型在内的众多类型的模型。虽然它将推理问题映射到涉及随机梯度的优化问题,使其扩展到处理大规模数据集,但是SVI算法中随机梯度固有的噪声使其产生较大的方差,阻碍了快速收敛。为此,对SVI作出改进,提出一种方差减小的SVI (VR-SVI)算法。首先,采取滑动窗口的方法重新计算随机梯度中的噪声项,构建新的随机梯度,减少了噪声对随机梯度的影响;然后,对提出的算法可在SVI基础上使得随机梯度的方差减小进行证明;最后,讨论窗口大小对算法的影响,并分析算法的收敛性。实验结果表明,VR-SVI算法既减小了随机梯度的方差,又节省了计算时间,可达到快速收敛的效果。

关键词: 随机变分推理, 滑动窗口, 随机梯度, 方差减小, 主题建模

CLC Number: