计算机应用 ›› 2011, Vol. 31 ›› Issue (11): 3072-3074.DOI: 10.3724/SP.J.1087.2011.03072

• 数据库技术 • 上一篇    下一篇

不同类变量下属性聚类的朴素贝叶斯分类算法

彭兴媛,刘琼荪   

  1. 重庆大学 数学与统计学院,重庆 401331
  • 收稿日期:2011-05-10 修回日期:2011-07-06 发布日期:2011-11-16 出版日期:2011-11-01
  • 通讯作者: 彭兴媛
  • 作者简介:彭兴媛(1985-),女,四川遂宁人,硕士研究生,主要研究方向:数据分析、统计决策;刘琼荪(1956-),女,重庆人,教授,主要研究方向:智能计算、数据挖掘、应用统计。
  • 基金资助:
    中央高校基本科研业务费资助项目

Naive Bayesian classification algorithm based on attribute clustering under different classification

PENG Xing-yuan,LIU Qiong-sun   

  1. College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China
  • Received:2011-05-10 Revised:2011-07-06 Online:2011-11-16 Published:2011-11-01
  • Contact: PENG Xing-yuan

摘要: 朴素贝叶斯(NB)分类算法虽是一种简单且有效的分类方法,但其条件属性独立性假设忽略了属性变量间存在的相关性。考虑到条件独立性假设对分类效果的影响,提出一种新的将条件属性进行聚类的分组技术,不仅避免了传统朴素贝叶斯算法假设各条件属性间独立的这一缺陷,而且反映出了在不同类别情况下条件属性间具有的不同依赖程度。经过对UCI的几个数据集的仿真实验,结果表明了新算法的有效性。

关键词: 朴素贝叶斯, 属性关联程度, 聚类算法, χ2统计量

Abstract: In numerous classification methods, although Naive Bayesian (NB) classification algorithm is simple and effective, its attribute independence assumption ignores the correlation among attributes. To consider the influence of the attribute independence assumption, a new grouping technology which clusters the conditional attributes was proposed. This technology not only overcomes the deficiency arising from the attribute independence assumption of the traditional NB classification algorithm, but also reflects the different correlation intensity among attributes when the classification is different. Simulation results on a variety of UCI data sets illustrate the efficiency of this method.

Key words: Naive Bayesian (NB), attribute correlation intensity, clustering algorithm, chi-square statistic

中图分类号: