在逐渐缩小的空间上渐进学习朴素贝叶斯参数

doi:10.3724/SP.J.1087.2012.00223

计算机应用 ›› 2012, Vol. 32 ›› Issue (01): 223-227.DOI: 10.3724/SP.J.1087.2012.00223

在逐渐缩小的空间上渐进学习朴素贝叶斯参数

欧阳泽华,郭华平,范明

郑州大学信息工程学院，郑州 450052

收稿日期:2011-06-21 修回日期:2011-08-14 发布日期:2012-02-06 出版日期:2012-01-01
通讯作者: 欧阳泽华
作者简介:欧阳泽华(1987-)，男，河南商丘人，硕士研究生，主要研究方向：数据挖掘、机器学习；郭华平(1982-)，男，河南信阳人，博士研究生，主要研究方向：数据挖掘、机器学习；范明(1948-)，男，河南信阳人，教授，博士生导师，CCF高级会员，主要研究方向：数据库、数据挖掘、机器学习。
基金资助:
国家自然科学基金资助项目(60901078)

Learning Naive Bayes Parameters Gradually on a Series of Contracting Spaces

OUYANG Ze-hua,GUO Hua-ping,FAN Ming

School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450052, China

Received:2011-06-21 Revised:2011-08-14 Online:2012-02-06 Published:2012-01-01
Contact: OUYANG Ze-hua

摘要/Abstract

摘要： 局部加权朴素贝叶斯(LWNB)是朴素贝叶斯(NB)的一种较好的改进，判别频率估计(DFE)可以极大地提高NB的泛化正确率。受LWNB和DFE启发，提出逐渐缩小空间(GCS)算法用来学习NB参数：对于一个测试实例，寻找包含全体训练实例的全局空间的一系列逐渐缩小的子空间。这些子空间具有两种性质：1)它们都包含测试实例；2)一个空间一定包含在任何一个比它大的空间中。在逐渐缩小的空间上使用修改的DFE(MDFE)算法渐进地学习NB的参数，然后使用NB分类测试实例。与LWNB的根本不同是：GCS使用全体训练实例学习NB并且GCS可以实现为非懒惰版本。实现了GCS的决策树版本(GCS-T)，实验结果显示，与C4.5以及贝叶斯分类算法(如Naive Bayes、BaysianNet、NBTree、LWNB、隐朴素贝叶斯)相比，GCS-T具有较高的泛化正确率，并且GCS-T的分类速度明显快于LWNB。

关键词: 朴素贝叶斯, 局部模型, 全局模型, 决策树, 朴素贝叶斯树

Abstract: Locally Weighted Naive Bayes (LWNB) is a good improvement of Naive Bayes (NB) and Discriminative Frequency Estimate (DFE) remarkably improves the generalization accuracy of Naive Bayes. Inspired by LWNB and DFE, this paper proposed Gradually Contracting Spaces (GCS) algorithm to learn parameters of Naive Bayes. Given a test instance, GCS found a series of subspaces in global space which contained all training instances. All of these subspaces contained the test instance and any of them must be contained by others that are bigger than it. Then GCS used training instances contained in those subspaces to gradually learn parameters of Naive Bayes (NB) by Modified version of DFE (MDFE) which was a modified version of DFE and used NB to classify test instances. GSC trained Naive Bayes with all training data and achieved an eager version, which was the essential difference between GSC and LWNB. Decision tree version of GCS named GCS-T was implemented in this paper. The experimental results show that GCS-T has higher generalization accuracy compared with C4.5 and some Bayesian classification algorithms such as Naive Bayes, BaysianNet, NBTree, Hidden Naive Bayes (HNB), LWNB, and the classification speed of GCS-T is remarkably faster than LWNB.

Key words: Naive Bayes (NB), local model, global model, decision tree, NBTree

中图分类号:

TP181

欧阳泽华郭华平范明. 在逐渐缩小的空间上渐进学习朴素贝叶斯参数[J]. 计算机应用, 2012, 32(01): 223-227.

OUYANG Ze-hua GUO Hua-ping FAN Ming. Learning Naive Bayes Parameters Gradually on a Series of Contracting Spaces[J]. Journal of Computer Applications, 2012, 32(01): 223-227.

参考文献

[1]

THEODORIDIS S, KOUTROUMBAS K. Pattern recognition ［M］. 4th ed. Maryland Heights, MO: Elsevier, 2009.

[2]

KOHAVI R. Scaling up the accuracy of nave Bayes classifiers: A decision-tree hybrid ［C］// Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 1996: 202-207.

[3]

FRIEDMAN N, GEIGER D, GOLDSZMIDT M. Bayesian network classifiers ［J］. Machine Learning, 1997, 29(2/3): 131-163.

[4]

张连文,郭海鹏.贝叶斯网络引论［M］.北京：科学出版社,2006.

[5]

FRANK E, HALL M, PFAHRINGER B. Locally weighted naive Bayes ［C］// Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence. Seattle: Morgan Kaufmann, 2003: 249-256.

[6]

JIANG L, ZHANG H, CAI Z. A novel Bayes model: Hidden naive Bayes ［J］. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(10): 1361-1371.

[7]

KAI M T JONATHAN R W, SWEE C T, et al. Feature-subspace aggregating: Ensembles for stable and unstable learners ［J］. Machine Learning, 2011, 82(3): 375-397.

[8]

PERNKOPF F, WOHLMAYR M. On discriminative parameter learning of Bayesian network classifiers ［C］// European Conference on Machine Learning and Principle and Practice of Knowledge Discovery in Databases. Berlin: Springer-Verlag, 2009: 221-237.

[9]

GREINER R, SU X, SHEN B, et al. Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers ［J］. Machine Learning, 2005, 59(3): 297-322.

[10]

SU J, ZHANG H, LING C X, et al. Discriminative parameter learning for Bayesian networks ［C］// Proceedings of the 25th International Conference on Machine Learning. New York: ACM Press, 2008: 1016-1023.

[11]

QUINLAN J R. C4.5: Programs for machine learning ［M］. Seattle: Morgan Kaufmann, 1993.

[12]

ASUNCION A, NEWMAN D J. UCI repository of machine learning databases ［EB/OL］. ［2011-03-25］. http://www.ics.uci.edu/~mlearn/ MLRepository.html.

[13]

WITTEN I H, FRANK E, HALL M A. Data mining: Practical machine learning tools and techniques ［M］. 3rd ed. Seattle: Morgan Kaufmann, 2011.

[1]	王雅辉, 钱宇华, 刘郭庆. 基于模糊优势互补互信息的有序决策树算法[J]. 计算机应用, 2021, 41(10): 2785-2792.
[2]	周翔, 翟俊海, 黄雅婕, 申瑞彩, 侯璎真. 基于随机森林和投票机制的大数据样例选择算法[J]. 计算机应用, 2021, 41(1): 74-80.
[3]	吴崇数, 林霖, 薛蕴菁, 时鹏. 基于自监督学习的病理图像层次分割[J]. 计算机应用, 2020, 40(6): 1856-1862.
[4]	赵一, 段兴, 谢仕义, 梁春林. 面向特定目标自识别的交通图像语义检索方法[J]. 计算机应用, 2020, 40(2): 553-560.
[5]	赵光华, 赖见辉, 陈艳艳, 孙浩冬, 张野. 基于朴素贝叶斯分类的居民出行起讫点识别方法[J]. 计算机应用, 2020, 40(1): 36-42.
[6]	龚彦鹭, 吕佳. 结合主动学习和密度峰值聚类的协同训练算法[J]. 计算机应用, 2019, 39(8): 2297-2301.
[7]	丁超, 赵海, 司帅宗, 朱剑. 正常衰老的人脑功能网络演化模型[J]. 计算机应用, 2019, 39(4): 963-971.
[8]	王伟, 谢耀滨, 尹青. 针对不平衡数据的决策树改进方法[J]. 计算机应用, 2019, 39(3): 623-628.
[9]	莫赞, 盖彦蓉, 樊冠龙. 基于GAN-AdaBoost-DT不平衡分类算法的信用卡欺诈分类[J]. 计算机应用, 2019, 39(2): 618-622.
[10]	郭冰楠, 吴广潮. 基于改进的代价敏感决策树的网络贷款分类[J]. 计算机应用, 2019, 39(10): 2888-2892.
[11]	颜宏文, 盛成功. 基于层次聚类和极限学习机的母线短期负荷预测[J]. 计算机应用, 2018, 38(8): 2437-2441.
[12]	折蓉蓉, 张丽萍, 侯敏, 闫盛. 基于决策树推荐克隆重构的方法[J]. 计算机应用, 2018, 38(7): 2037-2043.
[13]	徐光宪, 赵越, 赖俊宁. 基于网络编码的确定性逐层构造算法[J]. 计算机应用, 2018, 38(3): 769-775.
[14]	段大高, 盖新新, 韩忠明, 刘冰心. 基于梯度提升决策树的微博虚假消息检测[J]. 计算机应用, 2018, 38(2): 410-414.
[15]	黄宇扬, 董明刚, 敬超. 面向K最近邻分类的遗传实例选择算法[J]. 计算机应用, 2018, 38(11): 3112-3118.

在逐渐缩小的空间上渐进学习朴素贝叶斯参数

Learning Naive Bayes Parameters Gradually on a Series of Contracting Spaces

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics