计算机应用 ›› 2016, Vol. 36 ›› Issue (5): 1313-1318.DOI: 10.11772/j.issn.1001-9081.2016.05.1313

• 人工智能 • 上一篇    下一篇

基于稳定性语义聚类的相关模型估计

孙芯宇1, 吴江1, 蒲强2   

  1. 1. 西南财经大学 经济信息工程学院, 成都 611130;
    2. 成都大学 信息科学与工程学院, 成都 610106
  • 收稿日期:2015-10-21 修回日期:2016-01-07 出版日期:2016-05-10 发布日期:2016-05-09
  • 通讯作者: 蒲强
  • 作者简介:孙芯宇(1991-),女,河北承德人,硕士研究生,主要研究方向:文本挖掘、用户个性化推荐;吴江(1980-),男,浙江衢州人,副教授,博士,主要研究方向:数据挖掘;蒲强(1971-),男,四川内江人,副教授,博士,主要研究方向:信息检索、统计语言模型、位置服务。
  • 基金资助:
    教育部人文社会科学研究青年基金资助项目(11YJCZH084);四川省科技厅科技支撑计划项目(2014GZ0013,2014SZ0107);四川省教育厅自然科学重点项目(13ZA0297)。

Relevance model estimation based on stable semantic clustering

SUN Xinyu1, WU Jiang1, PU Qiang2   

  1. 1. School of Economic Information Engineering, Southwestern University of Finance and Economics, Chengdu Sichuan 611130, China;
    2. School of Information Science and Engineering, Chengdu University, Chengdu Sichuan 610106, China
  • Received:2015-10-21 Revised:2016-01-07 Online:2016-05-10 Published:2016-05-09
  • Supported by:
    This work is partially supported by the Humanity and Social Sciences Research of Higher Education of China for Youth (11YJCZH084), the Science and Technology Support Project of Sichuan Province (2014GZ0013, 2014SZ0107), the Education Department Natural Science Project of Sichuan Province (13ZA0297).

摘要: 针对由不稳定聚类估计的相关模型影响检索性能的问题,提出了基于稳定性语义聚类的相关模型(SSRM)。首先利用初始查询前N个结果文档构成反馈数据集;然后探测数据集中稳定的语义类别数量;接着从稳定性语义聚类中选择与用户查询最相似的语义类别估计SSRM;最后通过实验对模型的检索性能进行了验证。对TREC数据集5个子集的实验结果显示,SSRM相比相关模型(RM)、语义相关模型(SRM),平均准确率(MAP)性能最少提高了32.11%和0.41%;相比基于聚类的文档模型(CBDM)、基于LDA的文档模型(LBDM)和Resampling等基于聚类的检索方法,MAP性能最少提高了23.64%,19.59%和8.03%。实验结果表明,SSRM有利于改善检索性能。

关键词: 信息检索, 语义聚类, 稳定性验证, 独立分量分析, 相关模型估计

Abstract: To solve the problem of relevance model based on unstable clustering estination and its effect on retrieval performance, a new Stable Semantic Relevance Model (SSRM) was proposed. The feedback data set was first formed by using the top N documents from user initial query, after the stable number of semantic clusters had been detected, SSRM was estimated by those stable semantic clusters selected according to higher user-query similarity. Finally, the SSRM retrieval performance was verified by experiments. Compared with Relevance Model (RM), Semantic Relevance Model (SRM) and the clustering-based retrieval methods including Cluster-Based Document Model (CBDM), LDA-Based Document Model (LBDM) and Resampling, SSRM has improvement of MAP by at least 32.11%, 0.41%, 23.64%,19.59%, 8.03% respectively. The experimental results show that retrieval performance can benefit from SSRM.

Key words: information retrieval, semantic clustering, stability validation, Independent Component Analysis (ICA), relevance model estimation

中图分类号: