Classification text with incomplete data based on Bernoulli mixture mode

Journal of Computer Applications

• Articles • Previous Articles Next Articles

Classification text with incomplete data based on Bernoulli mixture mode

CAI Chong-Chao CAI WANG Shi-Tong WANG

Received:2006-11-28 Revised:2007-01-21 Online:2007-05-01 Published:2007-05-01
Contact: CAI Chong-Chao CAI

一种基于Bernoulli混合模型的不完整数据文本分类方法

蔡崇超王士同

江苏省无锡市江南大学信息工程学院 05级硕士江苏省无锡市江南大学信息工程学院

通讯作者: 蔡崇超

Abstract

Abstract: It is an important issue to construct the text classification with incomplete data. An improved method that based on Bernoulli Mixture Model and Expectation Maximization(EM) algorithm was introduced. Based on Bernoulli Mixture Model and EM algorithm, by learning the labeled data, the initial value of likelihood function parameter was obtained first. Then the parameter estimate of prior probability model on the classifier with EM algorithm including weight was presented. Finally we got the improved classifier. The results show that our new method is better than the nave bayes text classification in the recall and precision.

Key words: incomplete data, text classification, naive bayes classification, Bernoulli mixture model, EM algorithm

摘要： 在Bernoulli混合模型和期望最大化(EM)算法的基础上给出了一种基于不完整数据的改进方法。首先在已标记数据的基础上通过Bernoulli混合模型和朴素贝叶斯算法得到似然函数参数估计初始值，然后利用含有权值的EM算法对分类器的先验概率模型进行参数估计，得到最终的分类器。实验结果表明，该方法在准确率和查全率方面要优于朴素贝叶斯文本分类。

关键词: 不完整数据集, 文本分类, 朴素贝叶斯分类, Bernoulli混合模型, EM算法

CAI Chong-Chao CAI WANG Shi-Tong WANG. Classification text with incomplete data based on Bernoulli mixture mode[J]. Journal of Computer Applications.

蔡崇超王士同. 一种基于Bernoulli混合模型的不完整数据文本分类方法[J]. 计算机应用.

[1]	ZHANG Yang, JIANG Minghu. Authorship identification of text based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(7): 1897-1901.
[2]	PENG Li, ZHANG Haiqing, LI Daiwei, TANG Dan, YU Xi, HE Lei. Imputation algorithm for hybrid information system of incomplete data analysis approach based on rough set theory [J]. Journal of Computer Applications, 2021, 41(3): 677-685.
[3]	WEN Chaodong, ZENG Cheng, REN Junwei, ZHANG Yan. Patent text classification based on ALBERT and bidirectional gated recurrent unit [J]. Journal of Computer Applications, 2021, 41(2): 407-412.
[4]	LUO Jun, CHEN Lifei. Sentiment classification of incomplete data based on bidirectional encoder representations from transformers [J]. Journal of Computer Applications, 2021, 41(1): 139-144.
[5]	LIAO Shenglan, YIN Shi, CHEN Xiaoping, ZHANG Bo, OUYANG Yu, ZHANG Heng. Intent recognition dataset for dialogue systems in power business [J]. Journal of Computer Applications, 2020, 40(9): 2549-2554.
[6]	YIN Chunyong, HE Miao. Text classification based on improved capsule network [J]. Journal of Computer Applications, 2020, 40(9): 2525-2530.
[7]	WANG Minrui, GAO Shu, YUAN Ziyong, YUAN Lei. Sequence generation model with dynamic routing for multi-label text classification [J]. Journal of Computer Applications, 2020, 40(7): 1884-1890.
[8]	LI Ming, GUO Chenhao, CHEN Xing. Automatic annotation of visual deep neural network [J]. Journal of Computer Applications, 2020, 40(6): 1593-1600.
[9]	ZHANG Xiaochuan, DAI Xuyao, LIU Lu, FENG Tianshuo. Chinese short text classification model with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2020, 40(12): 3485-3489.
[10]	ZHAO Guanghua, LAI Jianhui, CHEN Yanyan, SUN Haodong, ZHANG Ye. Residents' travel origin and destination identification method based on naive Bayes classification [J]. Journal of Computer Applications, 2020, 40(1): 36-42.
[11]	QIU Ningjia, CONG Lin, ZHOU Sicheng, WANG Peng, LI Yanfang. SVD-CNN barrage text classification algorithm combined with improved active learning [J]. Journal of Computer Applications, 2019, 39(3): 644-650.
[12]	LU Ling, YANG Wu, WANG Yuanlun, LEI Zijian, LI Ying. Long text classification combined with attention mechanism [J]. Journal of Computer Applications, 2018, 38(5): 1272-1277.
[13]	ZUO Kaizhong, SHANG Ning, TAO Jian, WANG Taochun. Privacy-preserving incomplete data Skyline query protocol in two-tiered sensor networks [J]. Journal of Computer Applications, 2017, 37(6): 1599-1604.
[14]	ZHANG Zhonglin, LIU Shuchang, JIANG Fentao. Candidate category search algorithm in deep level classification [J]. Journal of Computer Applications, 2017, 37(3): 635-639.
[15]	FENG Shizhou, ZHOU Shangbo. College enrollment consultation algorithm based on deep autoencoders [J]. Journal of Computer Applications, 2017, 37(11): 3323-3329.

Classification text with incomplete data based on Bernoulli mixture mode

一种基于Bernoulli混合模型的不完整数据文本分类方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics