基于聚类的超链路预测

doi:10.11772/j.issn.1001-9081.2019101730

《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 434-440.DOI: 10.11772/j.issn.1001-9081.2019101730

• 第36届CCF中国数据库学术会议(NDBC 2019) • 上一篇下一篇

基于聚类的超链路预测

齐鹏飞, 周丽华(), 杜国王, 黄皓, 黄通

云南大学信息学院，昆明 650091

收稿日期:2019-09-18 修回日期:2019-10-10 接受日期:2019-10-24 发布日期:2019-11-04 出版日期:2020-02-10
通讯作者: 周丽华
作者简介:齐鹏飞（1992—），男，河南郑州人，硕士，主要研究方向：社会网络分析
杜国王（1994—），男，河南郑州人，博士研究生，主要研究方向：数据挖掘、社会网络分析
黄皓（1994—），男，四川成都人，硕士研究生，主要研究方向：社会网络分析
黄通（1994—），男，贵州贵阳人，硕士研究生，主要研究方向：社会网络分析。
基金资助:
国家自然科学基金资助项目(61762090);云南省自然科学基金资助项目(2016FA026);云南省创新研究团队项目(2018HC019);云南省高等学校科技创新团队项目（IRTSTYN）;云南省教育厅基金资助项目(2019Y0006)

Clustering-based hyperlink prediction

Pengfei QI, Lihua ZHOU(), Guowang DU, Hao HUANG, Tong HUANG

School of Information，Yunnan University，Kunming Yunnan 650091，China

Received:2019-09-18 Revised:2019-10-10 Accepted:2019-10-24 Online:2019-11-04 Published:2020-02-10
Contact: Lihua ZHOU
About author:QI Pengfei， born in 1992， M. S. His research interests include social network analysis.
DU Guowang， born in 1994， Ph. D. candidate. His research interests include data mining， social network analysis.
HUANG Hao， born in 1994， M. S. candidate. His research interests include social network analysis.
HUANG Tong， born in 1994， M. S. candidate. His research interests include social network analysis.
Supported by:
the National Natural Science Foundation of China(61762090);the Yunnan Provincial Natural Science Foundation(2016FA026);the Yunnan Innovation Research Team Project(2018HC019);the Yunnan Provincial Higher Education Technology Innovation Team Project （IRTSTYN）, the Yunnan Province Education Department Foundation(2019Y0006)

摘要/Abstract

摘要：

超链路预测是利用已观测到网络的特性来复现网络中缺失的链路。现有的超链路预测算法通常利用整个网络来进行预测，预测结果会遗漏训练样本数据较少的链路类别，导致预测种类不够全面。为了解决这个问题，提出了基于聚类的超链路预测算法C-CMM，首先对数据集进行聚类分簇，进而对每一个簇建立模型进行超链路预测。所提算法能够充分利用各个簇的观察样本所蕴含的信息，扩大预测结果覆盖的类别。在三个真实数据集上的实验结果表明，C-CMM和多个先进的链路预测算法相比具有更高的预测精度和效率，同时其预测覆盖种类也更加全面。

关键词: 信息网络, 超链路预测, 聚类

Abstract:

Hyperlink prediction aims to utilize inherent properties of observed network to reproduce the missing links in the network. Existing hyperlink prediction algorithms often make predictions based on entire network， and some link types with insufficient training samples data may be missed， resulting in imcomplete link types to be detected. To address this problem， a clustering-based hyperlink prediction algorithm named C-CMM was proposed. Firstly， the dataset was divided into clusters， and then the model was constructed for each cluster to perform hyperlink prediction. The proposed algorithm can make full use of the information contained in the observation samples of each cluster， and widen the coverage range of the prediction results. Experimental results on three real-world datasets show that the proposed algorithm outperforms a great number of state-of-the-art link prediction algorithms in prediction accuracy and efficiency， and has the prediction coverage more comprehensive.

Key words: information network, hyperlink prediction, clustering

中图分类号:

TP393

齐鹏飞, 周丽华, 杜国王, 黄皓, 黄通. 基于聚类的超链路预测[J]. 计算机应用, 2020, 40(2): 434-440.

Pengfei QI, Lihua ZHOU, Guowang DU, Hao HUANG, Tong HUANG. Clustering-based hyperlink prediction[J]. Journal of Computer Applications, 2020, 40(2): 434-440.

图/表 12

图1 超链路预测

Fig. 1 Hyperlink prediction

表1 三个数据集上的召回率和AUC

Tab. 1 Recall and AUC on three datasets

算法	鸡尾酒数据集		食谱数据集		DBLP数据集
算法	召回率	AUC	召回率	AUC	召回率	AUC
CN	0.32	0.42	0.28	0.44	0.43	0.45
C-CN	0.34	0.43	0.29	0.43	0.48	0.47
Katz	0.34	0.44	0.31	0.45	0.44	0.45
C-Katz	0.38	0.47	0.32	0.47	0.50	0.48
HPLSF	0.46	0.62	0.35	0.62	0.53	0.62
C-HPLSF	0.47	0.63	0.36	0.62	0.57	0.63
Random	0.37	0.50	0.34	0.51	0.50	0.50
C-Random	0.43	0.61	0.35	0.51	0.51	0.51
BS	0.53	0.60	0.37	0.62	0.56	0.62
C-BS	0.55	0.60	0.38	0.61	0.61	0.63
CMM	0.56	0.62	0.39	0.63	0.65	0.64
C-CMM	0.64	0.67	0.41	0.66	0.75	0.70

图2 C-CMM和CMM预测到的top-5个真实存在的鸡尾酒

Fig. 2 Top-5 real cocktails predicted by C-CMM and CMM

图3 C-CMM和CMM预测到的top-10个真实存在的菜谱

Fig. 3 Top-10 real menus predicted by C-CMM and CMM

图4 C-CMM和CMM预测到的top-3个非真实存在的鸡尾酒

Fig. 4 Top-3 non-real cocktails predicted by C-CMM and CMM

图5 C-CMM和CMM预测到的top-3个非真实存在的菜谱

Fig. 5 Top-3 non-real menus predicted by C-CMM and CMM

图6 设定不同簇数目时不同聚类结果的轮廓系数

Fig. 6 Contour coefficients of different clustering results under different number of clusters

图7 C-CMM算法设定不同秩时的召回条数

Fig. 7 Recalls under different number of ranks of C-CMM algorithm

图8 六种算法分别随机删除10~80条超链路后的召回条数对比

Fig. 8 Comparison of the number of recalls after randomly deleteing 10 to 80 hyperlinks by six algorithms

图9 六种算法分别随机删除50~400条超链路后的召回条数对比

Fig. 9 Comparison of the number of recalls after randomly deleteing 50 to 400 hyperlinks by six algorithms

图10 六种算法分别随机删除500~4 000条超链路后的召回条数对比

Fig. 10 Comparison of the number of recalls after randomly deleteing 500 to 4 000 hyperlinks by six algorithms

表2 运行时间比较 ( s)

Tab. 2 Comparison of running time

数据集	C-CMM	CMM	BS	HPLSF	Random	Katz	CN
鸡尾酒数据集	6.11	6.51	5.63	5.63	3.17	4.35	3.57
食谱数据集	91.28	273.48	45.64	55.47	47.39	49.56	42.07

参考文献 21

1	吕琳媛.复杂网络链路预测［J］.电子科技大学学报，2010，39（5）： 651-661. 10.3969/j.issn.1001-0548.2010.05.002
	LYU L Y. Link prediction on complex networks［J］. Journal of University of Electronic Science and Technology of China， 2010， 39（5）： 651-661. 10.3969/j.issn.1001-0548.2010.05.002
2	LYU L， ZHOU T. Link prediction in complex networks： a survey［J］. Physica A： Statistical Mechanics and its Applications， 2011， 390（6）： 1150-1170. 10.1016/j.physa.2010.11.027
3	LEY M. The DBLP computer science bibliography： evolution， research issues， perspectives［C］// Proceedings of the 2002 International Symposium on String Processing and Information Retrieval， LNCS2476. Berlin： Springer， 2002： 1-10.
4	ZHANG M， CUI Z， JIANG S， et al. Beyond link prediction： predicting hyperlinks in adjacency space［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 4430-4437. 10.1609/aaai.v34i03.5701
5	LIBEN-NOWELL D， KLEINBERG J. The link-prediction problem for social networks［J］. Journal of the American society for Information Science and Technology， 2007， 58（7）： 1019-1031. 10.1002/asi.20591
6	CHEN Z， CHEN M， WEINBERGER K， et al. Marginalized denoising for link prediction and multi-label learning［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 1707-1713. 10.1609/aaai.v34i04.5761
7	SONG D， MEYER D A， TAO D. Top-k link recommendation in social networks［C］// Proceedings of the 2015 IEEE International Conference on Data Mining. Piscataway： IEEE， 2015： 389-398. 10.1109/icdm.2015.136
8	WU L， GE Y， LIU Q， et al. Modeling users’ preferences and social links in social networking services： a joint-evolving perspective［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 279-286. 10.1109/tkde.2017.2663422
9	ZHANG M， CHEN Y. Weisfeiler-Lehman neural machine for link prediction［C］// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2017： 575-583. 10.1145/3097983.3097996
10	黄立威，李德毅，马于涛，等.一种基于元路径的异质信息网络链路预测模型［J］.计算机学报，2014，37（4）：848-858.
	HUANG L W， LI D Y， MA Y T， et al. A meta path-based link prediction model for heterogeneous information networks［J］. Chinese Journal of Computers， 2014， 37（4）： 848-858.
11	KATZ L. A new status index derived from sociometric analysis［J］. Psychometrika， 1953， 18（1）： 39-43. 10.1007/bf02289026
12	WANG H， CHEN E， LIU Q， et al. A united approach to learning sparse attributed network embedding［C］// Proceedings of the 2018 IEEE International Conference on Data Mining. Piscataway： IEEE， 2018： 557-566. 10.1109/icdm.2018.00071
13	CHEN H， YIN H， WANG W， et al. PME： projected metric embedding on heterogeneous networks for link prediction［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 1177-1186. 10.1145/3219819.3219986
14	ZHANG Z， YANG H， BU J， et al. ANRL： Attributed Network Representation Learning via deep neural networks［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 3155-3161. 10.24963/ijcai.2018/438
15	MENG Z， LIANG S， BAO H， et al. Co-embedding attributed networks［C］// Proceedings of the 12th ACM International Conference on Web Search and Data Mining. New York： ACM， 2019： 393-401. 10.1145/3289600.3291015
16	CEN Y， ZOU X， ZHANG J， et al. Representation learning for attributed multiplex heterogeneous network［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 1358-1368. 10.1145/3292500.3330964
17	XU Y， ROCKMORE D， KLEINBAUM A M. Hyperlink prediction in hypernetworks using latent social features［C］// Proceedings of the 2013 International Conference on Discovery Science， LNCS8140. Berlin： Springer， 2013： 324-339.
18	NEVILLE J， JENSEN D， FRIEDLAND L， et al. Learning relational probability trees［C］// Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2003： 625-630. 10.1145/956750.956830
19	LEE D D， SEUNG H S. Learning the parts of objects by non-negative matrix factorization［J］. Nature， 1999， 401（6755）： 788-791. 10.1038/44565
20	GHAHRAMANI Z， HELLER K A. Bayesian sets［C］// Proceedings of the Conference on Neural Information Processing Systems. Cambridge， MA： MIT Press， 2005： 435-442. 10.1145/1102351.1102389
21	ROUSSEEUW P J. Silhouettes： a graphical aid to the interpretation and validation of cluster analysis［J］. Journal of Computational and Applied Mathematics， 1987， 20： 53-65. 10.1016/0377-0427(87)90125-7

[1]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[2]	王清, 赵杰煜, 叶绪伦, 王弄潇. 统一框架的增强深度子空间聚类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1995-2003.
[3]	董瑶, 付怡雪, 董永峰, 史进, 陈晨. 不完整多视图聚类综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1673-1682.
[4]	蒋小霞, 黄瑞章, 白瑞娜, 任丽娜, 陈艳平. 基于事件表示和对比学习的深度事件聚类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1734-1742.
[5]	黄天宇, 李远兴, 陈昊, 郭紫佳, 魏明军. 地空协同场景下加权模糊聚类用户簇划分方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1555-1561.
[6]	徐童童, 解滨, 张春昊, 张喜梅. 融合转移概率矩阵的多阶最近邻图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1527-1538.
[7]	高麟, 周宇, 邝得互. 进化双层自适应局部特征选择[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1408-1414.
[8]	丁雨, 张瀚霖, 罗荣, 孟华. 基于信念子簇切割的模糊聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1128-1138.
[9]	孙林, 刘梦含. 基于自适应布谷鸟优化特征选择的K-means聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 831-841.
[10]	孙滔, 段张甜, 朱浩楠, 郭沛豪, 孙鹤立. 基于新奇度量的社交事件推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 760-766.
[11]	张卓, 陈花竹. 基于一致性和多样性的多尺度自表示学习的深度子空间聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 353-359.
[12]	杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789.
[13]	尹春勇, 周永成. 双端聚类的自动调整聚类联邦学习[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3011-3020.
[14]	朱云华, 孔兵, 周丽华, 陈红梅, 包崇明. 图对比学习引导的多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3267-3274.
[15]	徐雪冉, 杨庚, 黄喻先. 横向联邦学习中差分隐私聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 217-222.

基于聚类的超链路预测

Clustering-based hyperlink prediction

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 21

相关文章 15

编辑推荐

Metrics