基于改进朴素贝叶斯的区间不确定性数据分类方法

doi:10.11772/j.issn.1001-9081.2014.11.3268

计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3268-3272.DOI: 10.11772/j.issn.1001-9081.2014.11.3268

基于改进朴素贝叶斯的区间不确定性数据分类方法

李文进¹,熊小峰¹,毛伊敏²

1. 江西理工大学理学院,江西赣州 341000
2. 江西理工大学应用科学学院,江西赣州 341000

收稿日期:2014-06-05 修回日期:2014-06-30 出版日期:2014-11-01 发布日期:2014-12-01
通讯作者: 李文进
作者简介:李文进(1988-),男,江苏赣榆人,硕士研究生,主要研究方向:数据挖掘;熊小峰(1965-),男,江西赣州人,教授,主要研究方向:建模及应用软件、应用数理统计;毛伊敏(1970-),女,新疆伊宁人,副教授,博士,主要研究方向:数据挖掘。
基金资助:
国家自然科学基金资助项目;江西省自然科学基金资助项目

Classification method for interval uncertain data based on improved naive Bayes

LI Wenjin¹,XIONG Xiaofeng¹,MAO Yimin²

1. School of Science, Jiangxi University of Science and Technology, Ganzhou Jiangxi 341000, China;
2. College of Applied Science, Jiangxi University of Science and Technology, Ganzhou Jiangxi 341000,China

Received:2014-06-05 Revised:2014-06-30 Online:2014-11-01 Published:2014-12-01
Contact: LI Wenjin

摘要/Abstract

摘要：

基于Parzen窗的朴素贝叶斯在区间不确定性数据分类中存在计算复杂度高、空间需求大的不足。针对该问题,提出一种改进的区间不确定性数据分类方法IU-PNBC。首先采用Parzen窗估计区间样本的类条件概率密度函数(CCPDF)；然后通过代数插值得到类条件概率密度函数的近似函数；最后利用近似代数插值函数计算样本的后验概率, 并用于预测。通过人工生成的仿真数据和UCI标准数据集验证了算法假设的合理性以及插值点数对IU-PNBC算法分类精度的影响。实验结果表明,当插值点数大于15时,IU-PNBC算法的分类精度趋于稳定,且插值点数越多,算法分类精度越高；该算法可以避免原Parzen窗估计对训练样本的依赖, 并有效降低计算复杂度；同时由于该算法具有远低于基于Parzen窗的朴素贝叶斯的运行时间和空间需求, 因此适合解决数据量较大的区间不确定性数据分类问题。

Abstract:

Considering the high computation complexity and storage requirement of Naive Bayes (NB) based on Parzen Window Estimation (PWE), especially for classification on interval uncertain data, an improved method named IU-PNBC was proposed for classifying the interval uncertain data. Firstly, Class-Conditional Probability Density Function (CCPDF) was estimated by using PWE. Secondly, an approximate function for CCPDF was obtained by using algebraic interpolation. Finally, the posterior probability was computed and used for classification by using the approximate interpolation function. Artificial simulation data and UCI standard dataset were used to assume the rationality of the proposed algorithm and the affection of the interpolation points to classification accuracy of IU-PNBC. The experimental results show that: when the interpolation points are more than 15, the accuracy of IU-PNBC tends to be stable, and the accuracy increases with the increase of the interpolation points; IU-PNBC can avoid the dependence on the training samples and improve the computation efficiency effectively. Thus, IU-PNBC is suitable for classification on large interval uncertain data with lower computation complexity and storage requirement than NB based on Parzen window estimation.

中图分类号:

TP18

李文进熊小峰毛伊敏. 基于改进朴素贝叶斯的区间不确定性数据分类方法[J]. 计算机应用, 2014, 34(11): 3268-3272.

LI Wenjin XIONG Xiaofeng MAO Yimin. Classification method for interval uncertain data based on improved naive Bayes[J]. Journal of Computer Applications, 2014, 34(11): 3268-3272.

参考文献

[1]AGGARWAL C C, YU P S. A survey of uncertain data algorithms and applications [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609-623.
[2]ZHOU A, JIN C, WANG G, et al. A survey on the management of uncertain data [J]. Chinese Journal of Computers, 2009, 32(1):1-16.(周傲英,金澈清,王国仁,等.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16.)
[3]REN S. Interval number-based uncertain data mining and its applications [D]. Hangzhou: Zhejiang University, 2006.(任世锦.基于区间数的不确定性数据挖掘及其应用研究[D].杭州: 浙江大学,2006.)
[4]YANG J Q, GUNN S. Exploiting uncertain data in support vector classification[C]// 〖HJ1.8mm〗KES 2007: Proceedings of the 11th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, LNCS 4694. Berlin: Springer-Verlag, 2007: 148-155.
[5]QIN B, XIA Y, PRABHAKAR S, et al. A rule-based classification algorithm for uncertain data[C]// ICDE 2009: Proceedings of the 25th IEEE International Conference on Data Engineering. Piscataway: IEEE Press, 2009,23(1): 1633-1640.
[6]TSANG S, KAO B, YIP K Y, et al. Decision trees for uncertain data[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(1): 64-78.
[7]LI F, LI Y, WANG C. Uncertain data decision tree classification algorithm[J]. Journal of Computer Applications, 2009,29(11):3092-3095.(李芳,李一媛,王冲.不确定数据的决策树分类算法[J].计算机应用,2009,29(11):3092-3095.)
[8]REN J, LEE S D, CHEN X L, et al. Naive Bayes classification of uncertain data [C]// ICDM 2009: Proceedings of the Ninth IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2009: 944-949.
[9]QIN B, XIA Y, WANG S, et al. A novel Bayesian classification for uncertain data [J]. Knowledge-Based Systems, 2011, 24(8): 1151-1158.
[10]JIANG L. Research on naive Bayes classifiers and its improved algorithms[D].Wuhan: China University of Geoscience,2009:4-66.(蒋良孝,朴素贝叶斯分类器及其改进算法研究[D].武汉:中国地质大学,2009:4-66.)
[11]WANG S, DU R, LIU Y. The learning and optimization of full Bayes classifiers with continuous attributes [J]. Chinese Journal of Computers, 2012, 35(10): 2129-2138.(王双成,杜瑞杰,刘颖.连续属性完全贝叶斯分类器的学习与优化[J].计算机学报, 2012,35(10):2129-2138.)
[12]YAN W, REN Z, ZHAO X, et al. Probabilistic photovoltaic power modeling based on nonparametric kernel density estimation[J]. Automation of Electric Power Systems,2013,37(10):1-6.(颜伟,任洲洋,赵霞,等.光伏电源输出功率的非参数核密度估计模型[J].电力系统自动化,2013,37(10):1-6.)
[13]LIU B, YANG Y, WEBB G I, et al. A comparative study of bandwidth choice in kernel density estimation for naive Bayesian classification[C]// PAKDD 2009: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNCS 5476. Berlin: Springer-Verlag, 2009: 302-313.
[14]SILVERMAN B W. Density estimation for statistics and data analysis[M]. London: Chapman and Hall, 1986.
[15]OLSON L N, SCHRODER J B, TUMINARO R S. A general interpolation strategy for algebraic multigrid using energy minimization[J]. SIAM Journal on Scientific Computing, 2011, 33(2): 966-991.
[16]ZHAO Y, LIN H, BAO H. Local progressive interpolation for subdivision surface fitting[J]. Journal of Computer Research and Development, 2012, 49(8): 1699-1707.(赵宇, 蔺宏伟, 鲍虎军. 细分曲面拟合的局部渐进插值方法[J]. 计算机研究与发展, 2012, 49(8): 1699-1707.)
[17]MAELAND E. On the comparison of interpolation methods[J]. IEEE Transactions on Medical Imaging, 1988, 7(3): 213-217.

基于改进朴素贝叶斯的区间不确定性数据分类方法

Classification method for interval uncertain data based on improved naive Bayes

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杜航原郝思聪王文剑. 结合图自编码器与聚类的半监督表示学习方法[J]. 计算机应用, 0, (): 0-0.
[2]	陈露张晓霞于洪. 基于先验知识的非负矩阵半可解释三因子分解算法[J]. 计算机应用, 0, (): 0-0.
[3]	韩舒宁徐敏董学士林青沈凡凡. 混合伊藤算法求解多尺度着色旅行商问题[J]. 计算机应用, 0, (): 0-0.
[4]	李晓杰崔超然宋广乐苏雅茜吴天泽张春云. 基于时序超图卷积神经网络的股票趋势预测方法[J]. 计算机应用, 0, (): 0-0.
[5]	张建严珂马祥. 基于神经网络的复杂垃圾信息过滤算法分析[J]. 计算机应用, 0, (): 0-0.
[6]	邱云志汪廷华戴小路. 双重特征加权模糊支持向量机[J]. 计算机应用, 0, (): 0-0.
[7]	李宗正周恺卿丁雷欧云. 基于基因交换的自适应人工鱼群算法[J]. 计算机应用, 0, (): 0-0.
[8]	刘清华廖士中. 基于随机素描方法的在线核回归[J]. 计算机应用, 0, (): 0-0.
[9]	张小清王晨曦吕彦林耀进. 基于ReliefF的层次分类在线流特征选择算法[J]. 计算机应用, 0, (): 0-0.
[10]	于婉莹梁美玉王笑笑陈徵曹晓雯. 基于深度注意力网络的课堂教学视频中学生表情识别与智能教学评估[J]. 计算机应用, 0, (): 0-0.
[11]	黄勇康梁美玉王笑笑陈徵曹晓雯. 基于深度时空残差卷积神经网络的课堂教学视频中多人课堂行为识别[J]. 计算机应用, 0, (): 0-0.
[12]	康猛蒙祖强. 基于局部条件区分能力的高效属性约简算法[J]. 计算机应用, 0, (): 0-0.
[13]	谢鑫张贤勇王旋晔唐鹏飞. 变精度邻域等价粒邻域决策树构造算法[J]. 计算机应用, 0, (): 0-0.
[14]	刘忠慧王梓宥闵帆. 近似概念的遗传生成算法及其推荐应用[J]. 计算机应用, 0, (): 0-0.
[15]	潘仁志钱付兰赵姝张燕平. 基于卷积神经网络交互的用户属性偏好建模的推荐模型[J]. 计算机应用, 0, (): 0-0.