Journal of Computer Applications ›› 2011, Vol. 31 ›› Issue (11): 3108-3111.DOI: 10.3724/SP.J.1087.2011.03108

• Artificial intelligence • Previous Articles     Next Articles

Semi-supervised learning listwise ranking functions for document retrieval

HE Hai-jiang,LONG Yue-jin   

  1. Department of Computer Science and Technology, Changsha University, Changsha Hunan 410003, China
  • Received:2011-04-25 Revised:2011-07-13 Online:2011-11-16 Published:2011-11-01
  • Contact: HE Hai-jiang

适应文档检索的半监督多样本排序学习算法

何海江,龙跃进   

  1. 长沙学院 计算机系,长沙 410003
  • 通讯作者: 何海江
  • 作者简介:何海江(1970-),男,湖南望城人,副教授,CCF会员,主要研究方向:机器学习、Web挖掘;
    龙跃进(1958-),男,湖南会同人,讲师,主要研究方向:数据挖掘、数据库。
  • 基金资助:
    湖南省教育厅科学研究项目

Abstract: An iterative co-ranking algorithm, which aimed to extend learning to rank from a supervised setting into a semi-supervised setting, was proposed. The approach employed two listwise rankers to identify document permutations for an unlabeled query. In particular, the use of likelihood listwise loss was introduced to measure the difference score of two learners for a given query. The unlabeled query which showed significant difference score was then chosen for constructing the newly training dataset at next iteration, and its ideal document permutation for a listwise ranker was defined by another learner. The experimental results show that the proposed method can improve the ranking performance of supervised listwise ranking algorithm on the public dataset LETOR. In addition, the labeling ratio was also discussed.

Key words: document retrieval, semi-supervised, rank learning, likelihood loss, co-training

摘要: 针对标记训练集不足的问题,提出了一种协同训练的多样本排序学习算法,从无标签数据挖掘隐含的排序信息。算法使用了两类多样本排序学习机,从当前已有的标记数据集分别构造两个不同的排序函数。相应地,每一个无标签查询都有两个不同的文档排列,由似然损失来计算这两个排列的相似性,为那些文档排列相似度低的查询贴上标签,使两个多样本排序学习机新增了训练数据。在排序学习公开数据集LETOR上的实验结果证实,协同训练的排序算法很有效。另外,还讨论了标注比例对算法的影响。

关键词: 文档检索, 半监督, 排序学习, 似然损失, 协同训练