计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3268-3272.DOI: 10.11772/j.issn.1001-9081.2014.11.3268

• 人工智能 • 上一篇    下一篇

基于改进朴素贝叶斯的区间不确定性数据分类方法

李文进1,熊小峰1,毛伊敏2   

  1. 1. 江西理工大学 理学院,江西 赣州 341000
    2. 江西理工大学 应用科学学院,江西 赣州 341000
  • 收稿日期:2014-06-05 修回日期:2014-06-30 出版日期:2014-11-01 发布日期:2014-12-01
  • 通讯作者: 李文进
  • 作者简介:李文进(1988-),男,江苏赣榆人,硕士研究生,主要研究方向:数据挖掘;熊小峰(1965-),男,江西赣州人,教授,主要研究方向:建模及应用软件、应用数理统计;毛伊敏(1970-),女,新疆伊宁人,副教授,博士,主要研究方向:数据挖掘。
  • 基金资助:

    国家自然科学基金资助项目;江西省自然科学基金资助项目

Classification method for interval uncertain data based on improved naive Bayes

LI Wenjin1,XIONG Xiaofeng1,MAO Yimin2   

  1. 1. School of Science, Jiangxi University of Science and Technology, Ganzhou Jiangxi 341000, China;
    2. College of Applied Science, Jiangxi University of Science and Technology, Ganzhou Jiangxi 341000,China
  • Received:2014-06-05 Revised:2014-06-30 Online:2014-11-01 Published:2014-12-01
  • Contact: LI Wenjin

摘要:

基于Parzen窗的朴素贝叶斯在区间不确定性数据分类中存在计算复杂度高、空间需求大的不足。针对该问题,提出一种改进的区间不确定性数据分类方法IU-PNBC。首先采用Parzen窗估计区间样本的类条件概率密度函数(CCPDF);然后通过代数插值得到类条件概率密度函数的近似函数;最后利用近似代数插值函数计算样本的后验概率, 并用于预测。通过人工生成的仿真数据和UCI标准数据集验证了算法假设的合理性以及插值点数对IU-PNBC算法分类精度的影响。实验结果表明,当插值点数大于15时,IU-PNBC算法的分类精度趋于稳定,且插值点数越多,算法分类精度越高;该算法可以避免原Parzen窗估计对训练样本的依赖, 并有效降低计算复杂度;同时由于该算法具有远低于基于Parzen窗的朴素贝叶斯的运行时间和空间需求, 因此适合解决数据量较大的区间不确定性数据分类问题。

Abstract:

Considering the high computation complexity and storage requirement of Naive Bayes (NB) based on Parzen Window Estimation (PWE), especially for classification on interval uncertain data, an improved method named IU-PNBC was proposed for classifying the interval uncertain data. Firstly, Class-Conditional Probability Density Function (CCPDF) was estimated by using PWE. Secondly, an approximate function for CCPDF was obtained by using algebraic interpolation. Finally, the posterior probability was computed and used for classification by using the approximate interpolation function. Artificial simulation data and UCI standard dataset were used to assume the rationality of the proposed algorithm and the affection of the interpolation points to classification accuracy of IU-PNBC. The experimental results show that: when the interpolation points are more than 15, the accuracy of IU-PNBC tends to be stable, and the accuracy increases with the increase of the interpolation points; IU-PNBC can avoid the dependence on the training samples and improve the computation efficiency effectively. Thus, IU-PNBC is suitable for classification on large interval uncertain data with lower computation complexity and storage requirement than NB based on Parzen window estimation.

中图分类号: