集成最近邻规则的半监督顺序回归算法

计算机应用 ›› 2010, Vol. 30 ›› Issue (4): 1022-1025.

集成最近邻规则的半监督顺序回归算法

何海江¹,何文德²,刘华富²

1. 长沙学院
2.

收稿日期:2009-09-17 修回日期:2009-11-16 发布日期:2010-04-15 出版日期:2010-04-01
通讯作者: 何海江
基金资助:
湖南省教育厅科学研究项目

Towards semi-supervised ordinal regression with nearest neighbor

Received:2009-09-17 Revised:2009-11-16 Online:2010-04-15 Published:2010-04-01
Supported by:
(the Project Supported by Scientific Research Fund of Hunan Provincial Education Department

摘要/Abstract

摘要： 监督型顺序回归算法需要足够多的有标签样本，而在实践中，标注样本的序数耗时耗力，甚至难以完成。为此，提出一种集成最近邻规则的半监督顺序回归算法。基于最近邻，针对每个有标签样本，在无标签数据集选择与其最近似的若干样本赋以相同序数；再由监督型顺序回归算法训练有标签样本和新标注样本。多个数据集的实验结果显示，该方法能显著改善顺序回归性能。另外，引入折扣因子λ评估新标注样本的可信度，并讨论了λ和有标签数据集大小对方法的影响。

关键词: 半监督顺序回归, 最近邻, 无标签样本, 折扣因子

Abstract: The supervised ordinal regression algorithm often requires large amount of labeled samples. However, in the real applications, labeling instances is time and labor consuming, and sometimes even unrealistic. Therefore, a semi-supervised ordinal regression algorithm was proposed, which learned from both the labeled and unlabeled examples. The proposed method began by choosing some instances from unlabeled dataset that are most similar to one labeled example in labeled dataset, and assigning them the corresponding ranker. At this stage, the nearest neighbor rule was packed to score the similarity of two instances. Then, by using supervised ordinal regression, the ranking model was trained from both the labeled and the newly labeled examples. The experimental results show this method produce statistically significant improvements with respect to ranking measures. On the other hand, discount factor λ was introduced for evaluating creditable degree of new labeled examples, and how λ and the size of labeled dataset affected the method performance was discussed.

Key words: semi-supervised ordinal regression, nearest neighbor, unlabeled sample, discount factor

何海江何文德刘华富. 集成最近邻规则的半监督顺序回归算法[J]. 计算机应用, 2010, 30(4): 1022-1025.

[1]	徐童童, 解滨, 张春昊, 张喜梅. 融合转移概率矩阵的多阶最近邻图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1527-1538.
[2]	杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789.
[3]	崔昊阳, 张晖, 周雷, 杨春明, 李波, 赵旭剑. 有序规范实数对多相似度K最近邻分类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2673-2678.
[4]	陈方疏, 张为, 胡小明, 张宇飞, 孟宪凯, 石林祥. 加权路网空间中动态聚集最近邻居查询算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2026-2033.
[5]	张海永, 方贤进, 张恩皖, 李宝玉, 彭超, 穆健翔. 基于测量报告信号聚类的指纹定位方法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3947-3954.
[6]	赵学健, 李豪, 唐浩天. 基于用户兴趣概念格约简的推荐评分预测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3340-3345.
[7]	周欢欢, 郑伯川, 张征, 张琦. 基于自适应近邻参数的密度峰聚类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1464-1471.
[8]	翟东昌, 陈红梅. 基于邻域熵的高光谱波段选择算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 485-492.
[9]	郭一村, 陈华辉. 在线哈希算法研究综述[J]. 《计算机应用》唯一官方网站, 2021, 41(4): 1106-1112.
[10]	彭莉, 张海清, 李代伟, 唐聃, 于曦, 何磊. 基于粗糙集理论的不完备数据分析方法的混合信息系统填补算法[J]. 计算机应用, 2021, 41(3): 677-685.
[11]	曹阳, 闫秋艳, 吴鑫. 不平衡时间序列集成分类算法[J]. 计算机应用, 2021, 41(3): 651-656.
[12]	李明威, 蒋庆远, 解银朋, 何金栋, 吴丹. 基于哈希学习的异常SQL检测[J]. 计算机应用, 2021, 41(1): 121-126.
[13]	李博, 张晓, 颜靖艺, 李可威, 李恒, 凌玉龙, 张勇. 基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用[J]. 计算机应用, 2019, 39(9): 2784-2788.
[14]	马友忠, 张智辉, 林春杰. 大数据相似性连接查询技术研究进展[J]. 计算机应用, 2018, 38(4): 978-986.
[15]	黄宇扬, 董明刚, 敬超. 面向K最近邻分类的遗传实例选择算法[J]. 计算机应用, 2018, 38(11): 3112-3118.