计算机应用 ›› 2010, Vol. 30 ›› Issue (4): 1022-1025.

• 人工智能 • 上一篇    下一篇

集成最近邻规则的半监督顺序回归算法

何海江1,何文德2,刘华富2   

  1. 1. 长沙学院
    2.
  • 收稿日期:2009-09-17 修回日期:2009-11-16 发布日期:2010-04-15 出版日期:2010-04-01
  • 通讯作者: 何海江
  • 基金资助:
    湖南省教育厅科学研究项目

Towards semi-supervised ordinal regression with nearest neighbor

  • Received:2009-09-17 Revised:2009-11-16 Online:2010-04-15 Published:2010-04-01
  • Supported by:
    (the Project Supported by Scientific Research Fund of Hunan Provincial Education Department

摘要: 监督型顺序回归算法需要足够多的有标签样本,而在实践中,标注样本的序数耗时耗力,甚至难以完成。为此,提出一种集成最近邻规则的半监督顺序回归算法。基于最近邻,针对每个有标签样本,在无标签数据集选择与其最近似的若干样本赋以相同序数;再由监督型顺序回归算法训练有标签样本和新标注样本。多个数据集的实验结果显示,该方法能显著改善顺序回归性能。另外,引入折扣因子λ评估新标注样本的可信度,并讨论了λ和有标签数据集大小对方法的影响。

关键词: 半监督顺序回归, 最近邻, 无标签样本, 折扣因子

Abstract: The supervised ordinal regression algorithm often requires large amount of labeled samples. However, in the real applications, labeling instances is time and labor consuming, and sometimes even unrealistic. Therefore, a semi-supervised ordinal regression algorithm was proposed, which learned from both the labeled and unlabeled examples. The proposed method began by choosing some instances from unlabeled dataset that are most similar to one labeled example in labeled dataset, and assigning them the corresponding ranker. At this stage, the nearest neighbor rule was packed to score the similarity of two instances. Then, by using supervised ordinal regression, the ranking model was trained from both the labeled and the newly labeled examples. The experimental results show this method produce statistically significant improvements with respect to ranking measures. On the other hand, discount factor λ was introduced for evaluating creditable degree of new labeled examples, and how λ and the size of labeled dataset affected the method performance was discussed.

Key words: semi-supervised ordinal regression, nearest neighbor, unlabeled sample, discount factor