|
Multi-source text topic mining model based on Dirichlet multinomial allocation model
XU Liyang, HUANG Ruizhang, CHEN Yanping, QIAN Zhisen, LI Wanying
Journal of Computer Applications
2018, 38 (11):
3094-3099.
DOI: 10.11772/j.issn.1001-9081.2018041359
With the rapid increase of text data sources, topic mining for multi-source text data becomes the research focus of text mining. Since the traditional topic model is mainly oriented to single-source, there are many limitations to directly apply to multi-source. Therefore, a topic model for multi-source based on Dirichlet Multinomial Allocation model (DMA) was proposed considering the difference between sources of topic word-distribution and the nonparametric clustering quality of DMA, namely MSDMA (Multi-Source Dirichlet Multinomial Allocation). The main contributions of the proposed model are as follows:1) it takes into account the characteristics of each source itself when modeling the topic, and can learn the source-specific word distributions of topic k; 2) it can improve the topic discovery performance of high noise and low information through knowledge sharing; 3) it can automatically learn the number of topics within each source without the need for human pre-given. The experimental results in the simulated data set and two real datasets indicate that the proposed model can extract topic information more effectively and efficiently than the state-of-the-art topic models.
Reference |
Related Articles |
Metrics
|
|