一种基于熵的连续属性离散化算法

doi:10.3724/SP.J.1087.2005.0637

计算机应用 ›› 2005, Vol. 25 ›› Issue (03): 637-638.DOI: 10.3724/SP.J.1087.2005.0637

一种基于熵的连续属性离散化算法

贺跃¹，郑建军²，朱蕾¹

1. 北京理工大学信息科学技术学院； 2.北京理工大学管理与经济学院

出版日期:2005-03-01 发布日期:2005-03-01

An entropy-based algorithm for discretization of continuous variables

HE Yue¹,ZHENG Jian-jun²,ZHU Lei¹

1. School of Information Science and Technology, Beijing Institute of Technology, Beijing 100081, China; 2. School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China

Online:2005-03-01 Published:2005-03-01

摘要/Abstract

摘要：

连续属性离散化的关键在于合理确定离散化划分点的个数和位置。为了提高无监督离散化的效率,给出一种基于熵的连续属性离散化方法。该方法利用连续属性的信息量 (熵 )的特性,通过对连续属性变量的自身划分,最小化信息熵的减少和区间数,并寻求熵的损失与适度的区间数之间的最佳平衡,以便得到优化的离散值。实验表明该算法是行之有效的。

关键词: 熵, 连续属性, 离散化, 分类

Abstract:

It is very important to ascertain rationally the number and positions of split points for discretization of continuous variables. To improve the efficiency of unsupervised discretization, an entropy-based algorithm was proposed for discretization of continuous variables. It made use of the characteristics of the information content(entropy) of a continuous variable, and partitioned the continuous variable by itself for minimizing both the loss of entropy and the number of partitions, in order to find the best balance between the information loss and a low number of partitions, so then obtained an optimal discretization result. The experiments show this approach effective.

Key words: entropy, continuous variable, discretization, classification

中图分类号:

TP301.6

贺跃，郑建军，朱蕾. 一种基于熵的连续属性离散化算法[J]. 计算机应用, 2005, 25(03): 637-638.

HE Yue,ZHENG Jian-jun,ZHU Lei. An entropy-based algorithm for discretization of continuous variables[J]. Journal of Computer Applications, 2005, 25(03): 637-638.

[1]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[2]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[3]	李蒙蒙, 秦伟, 刘艺, 刁兴春. 结合头脑风暴优化的混合蚁群优化算法[J]. 计算机应用, 2021, 41(8): 2412-2417.
[4]	朱亮, 徐华, 崔鑫. 基于基分类器系数和多样性的改进AdaBoost算法[J]. 计算机应用, 2021, 41(8): 2225-2231.
[5]	胡天杰, 胡文军, 王士同. 分布熵惩罚的支持向量数据描述[J]. 计算机应用, 2021, 41(8): 2212-2218.
[6]	肖振远, 王逸涵, 罗建桥, 熊鹰, 李柏林. 基于部分加权损失函数的RefineDet[J]. 计算机应用, 2021, 41(7): 1928-1932.
[7]	张洋, 江铭虎. 基于注意力机制的文本作者识别[J]. 计算机应用, 2021, 41(7): 1897-1901.
[8]	尹春勇, 张帼杰. 面向分布式漂移数据流的集成分类模型[J]. 计算机应用, 2021, 41(7): 1947-1955.
[9]	章惠, 张娜娜, 黄俊. 优化LeNet-5网络的多角度头部姿态估计方法[J]. 计算机应用, 2021, 41(6): 1667-1672.
[10]	史杨潇, 章军, 陈鹏, 王兵. 基于轻量级网络的钢铁表面缺陷分类[J]. 计算机应用, 2021, 41(6): 1836-1841.
[11]	贾鹤鸣, 郎春博, 姜子超. 基于轻量级卷积神经网络的植物叶片病害识别方法[J]. 计算机应用, 2021, 41(6): 1812-1819.
[12]	陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦. 基于注意力自身线性融合的弱监督细粒度图像分类算法[J]. 计算机应用, 2021, 41(5): 1319-1325.
[13]	郭帅, 苏旸. 基于数据流的加密流量分类方法[J]. 计算机应用, 2021, 41(5): 1386-1391.
[14]	韦铭燕, 陈彧, 张亮. 针对混合变量优化问题的协同进化蚁群优化算法[J]. 计算机应用, 2021, 41(5): 1412-1418.
[15]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.

一种基于熵的连续属性离散化算法

An entropy-based algorithm for discretization of continuous variables

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics