计算机应用 ›› 2015, Vol. 35 ›› Issue (10): 2733-2736.DOI: 10.11772/j.issn.1001-9081.2015.10.2733

• 第十五届中国机器学习会议(CCML2015)论文 • 上一篇    下一篇

基于稀疏矩阵面向论文索引排名的启发式算法

万晓松, 王志海, 原继东   

  1. 北京交通大学 计算机与信息技术学院, 北京 100044
  • 收稿日期:2015-06-01 修回日期:2015-07-01 出版日期:2015-10-10 发布日期:2015-10-14
  • 通讯作者: 王志海(1963-),男,河南安阳人,教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习,zhhwang@bjtu.edu.cn
  • 作者简介:万晓松(1991-),女,吉林通化人,硕士研究生,主要研究方向:数据挖掘、机器学习;原继东(1989-),男,河南焦作人,博士研究生,主要研究方向:数据流挖掘、模式识别。
  • 基金资助:
    国家自然科学基金资助项目(61370130)。

Heuristic algorithms for paper index ranking based on sparse matrix

WAN Xiaosong, WANG Zhihai, YUAN Jidong   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Received:2015-06-01 Revised:2015-07-01 Online:2015-10-10 Published:2015-10-14

摘要: 为了提高学术论文检索的精准性,进而为学术研究提供便利,提出了针对学术论文检索问题的排名策略。首先,介绍了基于网页排名算法面向论文索引排名的启发式方法,其中利用Hash索引技术有效地减少了稀疏矩阵计算对内存的消耗;其次,定义了论文间引用关系图的密集度均衡值,并通过大量实验阐明了不同排名算法的迭代次数与图密集度均衡值之间的关系;最后,将所提出的基于论文索引排名的启发式算法应用于科学引文索引(SCI)数据库中,并与原被引频次降序的排序结果进行比较与分析。实验结果表明:在三种基于网页排名技术的算法中,基于链接结构分析的随机过程算法比较适合于按关键词搜索得到的相关领域学术论文的排名。

关键词: 网页排名算法, 稀疏矩阵, Hash索引, 论文索引排名, SCI数据库

Abstract: In order to enhance the accuracy of retrieved academic papers, so as to facilitate academic research extensively, a series of ranking strategies for academic paper retrieval problem were proposed. Firstly, the heuristic methods based on page ranking algorithm for paper index ranking were described, taking advantage of a Hash indexing technique to effectively reduce memory consumption of the sparse matrix computation. Secondly, the definition of intensive equilibrium value of reference relationship among papers was presented, at the same time, the correlation between iterations of different ranking algorithms and intensive equilibrium value was clarified by a large number of experiments. Finally, the proposed heuristic algorithms for paper index ranking were tested on the SCI index database, and compared with the classical citation descending sort results. The experimental results show that, in the proposed three kind of algorithms based on page ranking techniques, the stochastic process approach with link-structure analysis is much more suitable for the ranking of papers, which obtained by the searching results according to keywords in a certain field.

Key words: page ranking algorithm, sparse matrix, Hash index, paper index ranking, SCI index database

中图分类号: