《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 434-440.DOI: 10.11772/j.issn.1001-9081.2019101730

• 第36届CCF中国数据库学术会议(NDBC 2019) • 上一篇    下一篇

基于聚类的超链路预测

齐鹏飞, 周丽华(), 杜国王, 黄皓, 黄通   

  1. 云南大学 信息学院,昆明 650091
  • 收稿日期:2019-09-18 修回日期:2019-10-10 接受日期:2019-10-24 发布日期:2019-11-04 出版日期:2020-02-10
  • 通讯作者: 周丽华
  • 作者简介:齐鹏飞(1992—),男,河南郑州人,硕士,主要研究方向:社会网络分析
    杜国王(1994—),男,河南郑州人,博士研究生,主要研究方向:数据挖掘、社会网络分析
    黄皓(1994—),男,四川成都人,硕士研究生,主要研究方向:社会网络分析
    黄通(1994—),男,贵州贵阳人,硕士研究生,主要研究方向:社会网络分析。
  • 基金资助:
    国家自然科学基金资助项目(61762090);云南省自然科学基金资助项目(2016FA026);云南省创新研究团队项目(2018HC019);云南省高等学校科技创新团队项目(IRTSTYN);云南省教育厅基金资助项目(2019Y0006)

Clustering-based hyperlink prediction

Pengfei QI, Lihua ZHOU(), Guowang DU, Hao HUANG, Tong HUANG   

  1. School of Information,Yunnan University,Kunming Yunnan 650091,China
  • Received:2019-09-18 Revised:2019-10-10 Accepted:2019-10-24 Online:2019-11-04 Published:2020-02-10
  • Contact: Lihua ZHOU
  • About author:QI Pengfei, born in 1992, M. S. His research interests include social network analysis.
    DU Guowang, born in 1994, Ph. D. candidate. His research interests include data mining, social network analysis.
    HUANG Hao, born in 1994, M. S. candidate. His research interests include social network analysis.
    HUANG Tong, born in 1994, M. S. candidate. His research interests include social network analysis.
  • Supported by:
    the National Natural Science Foundation of China(61762090);the Yunnan Provincial Natural Science Foundation(2016FA026);the Yunnan Innovation Research Team Project(2018HC019);the Yunnan Provincial Higher Education Technology Innovation Team Project (IRTSTYN), the Yunnan Province Education Department Foundation(2019Y0006)

摘要:

超链路预测是利用已观测到网络的特性来复现网络中缺失的链路。现有的超链路预测算法通常利用整个网络来进行预测,预测结果会遗漏训练样本数据较少的链路类别,导致预测种类不够全面。为了解决这个问题,提出了基于聚类的超链路预测算法C-CMM,首先对数据集进行聚类分簇,进而对每一个簇建立模型进行超链路预测。所提算法能够充分利用各个簇的观察样本所蕴含的信息,扩大预测结果覆盖的类别。在三个真实数据集上的实验结果表明,C-CMM和多个先进的链路预测算法相比具有更高的预测精度和效率,同时其预测覆盖种类也更加全面。

关键词: 信息网络, 超链路预测, 聚类

Abstract:

Hyperlink prediction aims to utilize inherent properties of observed network to reproduce the missing links in the network. Existing hyperlink prediction algorithms often make predictions based on entire network, and some link types with insufficient training samples data may be missed, resulting in imcomplete link types to be detected. To address this problem, a clustering-based hyperlink prediction algorithm named C-CMM was proposed. Firstly, the dataset was divided into clusters, and then the model was constructed for each cluster to perform hyperlink prediction. The proposed algorithm can make full use of the information contained in the observation samples of each cluster, and widen the coverage range of the prediction results. Experimental results on three real-world datasets show that the proposed algorithm outperforms a great number of state-of-the-art link prediction algorithms in prediction accuracy and efficiency, and has the prediction coverage more comprehensive.

Key words: information network, hyperlink prediction, clustering

中图分类号: