计算机应用 ›› 2012, Vol. 32 ›› Issue (01): 223-227.DOI: 10.3724/SP.J.1087.2012.00223

• 人工智能 • 上一篇    下一篇

在逐渐缩小的空间上渐进学习朴素贝叶斯参数

欧阳泽华,郭华平,范明   

  1. 郑州大学 信息工程学院,郑州 450052
  • 收稿日期:2011-06-21 修回日期:2011-08-14 发布日期:2012-02-06 出版日期:2012-01-01
  • 通讯作者: 欧阳泽华
  • 作者简介:欧阳泽华(1987-),男,河南商丘人,硕士研究生,主要研究方向:数据挖掘、机器学习;郭华平(1982-),男,河南信阳人,博士研究生,主要研究方向:数据挖掘、机器学习;范明(1948-),男,河南信阳人,教授,博士生导师,CCF高级会员,主要研究方向:数据库、数据挖掘、机器学习。
  • 基金资助:

    国家自然科学基金资助项目(60901078)

Learning Naive Bayes Parameters Gradually on a Series of Contracting Spaces

OUYANG Ze-hua,GUO Hua-ping,FAN Ming   

  1. School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450052, China
  • Received:2011-06-21 Revised:2011-08-14 Online:2012-02-06 Published:2012-01-01
  • Contact: OUYANG Ze-hua

摘要: 局部加权朴素贝叶斯(LWNB)是朴素贝叶斯(NB)的一种较好的改进,判别频率估计(DFE)可以极大地提高NB的泛化正确率。受LWNB和DFE启发,提出逐渐缩小空间(GCS)算法用来学习NB参数:对于一个测试实例,寻找包含全体训练实例的全局空间的一系列逐渐缩小的子空间。这些子空间具有两种性质:1)它们都包含测试实例;2)一个空间一定包含在任何一个比它大的空间中。在逐渐缩小的空间上使用修改的DFE(MDFE)算法渐进地学习NB的参数,然后使用NB分类测试实例。与LWNB的根本不同是:GCS使用全体训练实例学习NB并且GCS可以实现为非懒惰版本。实现了GCS的决策树版本(GCS-T),实验结果显示,与C4.5以及贝叶斯分类算法(如Naive Bayes、BaysianNet、NBTree、LWNB、隐朴素贝叶斯)相比,GCS-T具有较高的泛化正确率,并且GCS-T的分类速度明显快于LWNB。

关键词: 朴素贝叶斯, 局部模型, 全局模型, 决策树, 朴素贝叶斯树

Abstract: Locally Weighted Naive Bayes (LWNB) is a good improvement of Naive Bayes (NB) and Discriminative Frequency Estimate (DFE) remarkably improves the generalization accuracy of Naive Bayes. Inspired by LWNB and DFE, this paper proposed Gradually Contracting Spaces (GCS) algorithm to learn parameters of Naive Bayes. Given a test instance, GCS found a series of subspaces in global space which contained all training instances. All of these subspaces contained the test instance and any of them must be contained by others that are bigger than it. Then GCS used training instances contained in those subspaces to gradually learn parameters of Naive Bayes (NB) by Modified version of DFE (MDFE) which was a modified version of DFE and used NB to classify test instances. GSC trained Naive Bayes with all training data and achieved an eager version, which was the essential difference between GSC and LWNB. Decision tree version of GCS named GCS-T was implemented in this paper. The experimental results show that GCS-T has higher generalization accuracy compared with C4.5 and some Bayesian classification algorithms such as Naive Bayes, BaysianNet, NBTree, Hidden Naive Bayes (HNB), LWNB, and the classification speed of GCS-T is remarkably faster than LWNB.

Key words: Naive Bayes (NB), local model, global model, decision tree, NBTree

中图分类号: