主题关注网络的表示学习

doi:10.11772/j.issn.1001-9081.2019081529

《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 441-447.DOI: 10.11772/j.issn.1001-9081.2019081529

• 第36届CCF中国数据库学术会议(NDBC 2019) • 上一篇下一篇

主题关注网络的表示学习

郭景峰¹^,², 董慧¹^,²(), 张庭玮¹^,², 陈晓³

^1.燕山大学信息科学与工程学院，河北秦皇岛 066004
^2.河北省计算机虚拟技术与系统集成重点实验室，河北秦皇岛 066004
^3.河北科技师范学院网络技术中心，河北秦皇岛 066004

收稿日期:2019-08-12 修回日期:2019-09-10 接受日期:2019-10-24 发布日期:2019-11-04 出版日期:2020-02-10
通讯作者: 董慧
作者简介:郭景峰（1962—），男，黑龙江哈尔滨人，教授，博士，CCF会员，主要研究方向：数据库、数据挖掘、社会网络分析
张庭玮（1995—），男，河北秦皇岛人，硕士研究生，主要研究方向：网络表示学习、社会网络分析
陈晓（1983—），女，河北秦皇岛人，助理研究员，博士，CCF会员，主要研究方向：图挖掘、网络表示学习、社会网络分析。
基金资助:
国家自然科学基金资助项目(61472340);河北省青年科学基金资助项目(F2017209070);河北科技师范学院博士研究启动基金（自然科学）资助项目(2019YB011);河北省自然科学基金资助项目(F2019203157);河北省高等学校科学技术研究项目重点项目(ZD2019004)

Representation learning for topic-attention network

Jingfeng GUO¹^,², Hui DONG¹^,²(), Tingwei ZHANG¹^,², Xiao CHEN³

^1.College of Information Science and Engineering，Yanshan University，Qinhuangdao Hebei 066004，China
^2.Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province，Qinhuangdao Hebei 066004，China
^3.Network Technology Center，Hebei Normal University of Science and Technology，Qinhuangdao Hebei 066004，China

Received:2019-08-12 Revised:2019-09-10 Accepted:2019-10-24 Online:2019-11-04 Published:2020-02-10
Contact: Hui DONG
About author:GUO Jingfeng， born in 1962， Ph. D.， professor. His research interests include database， data mining， social network analysis.
ZHANG Tingwei， born in 1995， M. S. candidate. His research interests include network representation learning， social network analysis.
CHEN Xiao， born in 1983， Ph. D.， research assistant. Her research interests include graph mining， network representation learning， social network analysis.
Supported by:
the National Natural Science Foundation of China(61472340);the National Youth Science Foundation of Hebei(F2017209070);the Doctoral Research Start-up Fund（Natural Science） of Hebei Normal University of Science and Technology(2019YB011);the Natural Science Foundation of Hebei(F2019203157);the Key Projects of Science and Technology Research in Colleges and Universities of Hebei Province(ZD2019004)

摘要/Abstract

摘要：

针对异质网络表示学习仅从结构方面考虑社交关系而忽略语义这一问题，结合用户间的社交关系和用户对主题的偏好两个方面，提出基于主题关注网络的表示学习算法。首先，针对主题关注网络的特点，结合集对分析理论的同异反（确定与不确定）思想，给出转移概率模型；然后，在转移概率模型的基础上提出了一种基于两类节点的随机游走算法，以得到相对高质量的随机游走序列；最后，基于序列中两类节点建模得到主题关注网络的嵌入向量空间表示。理论分析和在豆瓣数据集上的实验结果表明，结合转移概率模型的随机游走算法能更全面地分析网络中节点的连接关系，当划分社区的个数为13时，所提算法的模块度为0.699 8，相比metapath2vec算法提高了近5%，可以更详细地捕获网络中的信息。

关键词: 主题关注网络, 集对分析, 转移概率, 随机游走, 表示学习

Abstract:

Concerning the problem that heterogeneous network representation learning only considers social relations in structure and ignores semantics， combining the social relationship between users and the preference of users for topics， a representation learning algorithm based on topic-attention network was proposed. Firstly， according to the characteristics of the topic-attention network and combining with the idea of the identical-discrepancy-contrary （determination and uncertainty） of set pair analysis theory， the transition probability model was given. Then， a random walk algorithm based on two types of nodes was proposed by using the transition probability model， so as to obtain the relatively high-quality random walk sequence. Finally， the embedding vector space representation of the topic-attention network was obtained by modeling based on two types of nodes in the sequences. Theoretical analysis and experimental results on the Douban dataset show that the random walk algorithm combined with the transition probability model is more comprehensive in analyzing the connection relationship between nodes in the network. The modularity of the proposed algorithm is 0.699 8 when the number of the communities is 13， which is nearly 5% higher than that of metapath2vec algorithm， and can capture more detailed information in the network.

Key words: topic-attention network, set pair analysis, transition probability, random walk, representation learning

中图分类号:

TP391

郭景峰, 董慧, 张庭玮, 陈晓. 主题关注网络的表示学习[J]. 计算机应用, 2020, 40(2): 441-447.

Jingfeng GUO, Hui DONG, Tingwei ZHANG, Xiao CHEN. Representation learning for topic-attention network[J]. Journal of Computer Applications, 2020, 40(2): 441-447.

图/表 14

参考文献 19

1	涂存超，杨成，刘知远，等. 网络表示学习综述［J］. 中国科学：信息科学， 2017， 47（8）： 980-996. 10.1360/n112017-00145
	TU C C， YANG C， LIU Z Y， et al. Network representation learning： an overview［J］. SCIENTIA SINICA Informationis， 2017， 47（8）： 980-996. 10.1360/n112017-00145
2	ZHAO Y， LIU Z， SUN M. Representation learning for measuring entity relatedness with rich information［C］// Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 1412-1418. 10.1609/aaai.v33i01.3301817
3	TANG J， QU M， MEI Q. PTE： predictive text embedding through large-scale heterogeneous text networks［C］// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2015： 1165-1174. 10.1145/2783258.2783307
4	CHEN H， YIN H， WANG W， et al. PME： projected metric embedding on heterogeneous networks for link prediction［C］// Proceeding of the 24th ACM SIGKDD International Conference on Knowledge Discover and Data Mining. New York： ACM， 2018： 1177-1186. 10.1145/3219819.3219986
5	DONG Y， CHAWLA N V， SWAMI A. metapath2vec： scalable representation learning for heterogeneous networks ［C］// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2017： 135-144. 10.1145/3097983.3098036
6	FU T Y， LEE W C， LEI Z. HIN2Vec： explore meta-paths in heterogeneous information networks for representation learning［C］// Proceeding of the 2017 ACM International Conference on Information and Knowledge Management. New York： ACM， 2017： 1797-1806. 10.1145/3132847.3132953
7	ZHU D， CUI P， WANG D， et al. Deep variational network embedding in Wasserstein space［C］// Proceeding of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 2827-2836. 10.1145/3219819.3220052
8	DENG H， HAN J， ZHAO B， et al. Probabilistic topic models with biased propagation on heterogeneous information networks［C］// Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2011： 1271-1279. 10.1145/2020408.2020600
9	LI A Q， AHMED A， RAVI S， et al. Reducing the sampling complexity of topic models［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 891-900. 10.1145/2623330.2623756
10	陈晓，郭景峰，范超智. 基于联系度的主题关注网络社区发现方法研究［J］.计算机工程与应用，2017，53（17）：85-93. （CHEN X， GUO J F， FAN C Z. Community discovering based on connection degree in topic-attention network［J］. Journal of Computer Engineering and Applications， 2017： 53（17）： 85-93.）
11	SUN Y， HAN J， YAN X， et al. PathSim： meta path-based top-K similarity search in heterogeneous information networks［C］// Proceedings of the 37th International Conference on Very Large Data Bases. New York： ACM， 2011： 992-1003. 10.14778/3402707.3402736
12	PEROZZI B， AL-RFOU R， SKIENA S. DeepWalk： online learning of social representations［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 701-710. 10.1145/2623330.2623732
13	RONG X. word2vec parameter learning explained［EB/OL］. ［2018-05-05］. .
14	GROVER A， LESKOVEC J. Node2vec： scalable feature learning for networks［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discover and Data Mining. New York： ACM， 2016： 855-864. 10.1145/2939672.2939754
15	TANG J， QU M， WANG M， et al. LINE： large-scale information network embedding［C］// Proceedings of the 24th International Conference on World Wide Web. New York： ACM， 2015： 1067-1077. 10.1145/2736277.2741093
16	许磊，黄玲，王昌栋.保持Motif结构的网络表示学习［J］.计算机科学与探索，2019，13（8）：1261-1271. 10.3389/fnbot.2019.00011
	XU L， HUANG L， WANG C D. Motif-preserving network representation learning［J］. Journal of Frontiers of Computer Science and Technology， 2019， 13（8）： 1261-1271. 10.3389/fnbot.2019.00011
17	WANG D， CUI P， ZHU W. Structural deep network embedding［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1225-1234. 10.1145/2939672.2939753
18	SUN Y Z， NORICK B， HAN J W， et al. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks［C］// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2012： 1348-1356. 10.1145/2339530.2339738
19	赵克勤. 集对分析及其初步应用［M］. 杭州：浙江科学技术出版社， 2000： 1-198.
	ZHAO K Q. Set Pair Analysis and Preliminary Application of Set Pair［M］. Hangzhou： Zhejiang Science and Technology Press， 2000： 1-198.

节点对	（u₁，u₄）	（u₅，t₃）	（t₃，u₅）
S	1	1	1
∣CNT（u_i，u_j）₁∣	0	0	0
∣CNU（u_i，u_j）₁∣	2	2	0
∣CNT（u_i，u_j）_1∩2∣	1	0	0
∣CNT（u_i，u_j）_2∩1∣	1	0	0
∣CNT（u_i，u_j）₂∣	1	0	0

节点对	（u₁，u₄）	（u₅，t₃）	（t₃，u₅）
S	1	1	1
∣CNT（u_i，u_j）₁∣	0	0	0
∣CNU（u_i，u_j）₁∣	2	2	0
∣CNT（u_i，u_j）_1∩2∣	1	0	0
∣CNT（u_i，u_j）_2∩1∣	1	0	0
∣CNT（u_i，u_j）₂∣	1	0	0

v_current	u₂	u₃	u₄	u₅	u₆	t₁	t₂	t₃
u₁	0.19	0.08	0.18	0.05	0.06	0.12	0.23	0.10
t₃	0.00	0.25	0.25	0.25	0.25	0.00	0.00	0.00

v_current	u₂	u₃	u₄	u₅	u₆	t₁	t₂	t₃
u₁	0.19	0.08	0.18	0.05	0.06	0.12	0.23	0.10
t₃	0.00	0.25	0.25	0.25	0.25	0.00	0.00	0.00

v_next	P（v_next∣v_current）	v_next	P（v_next∣v_current）
u₅	0.029 847	u₉	0.025 764
u₆	0.029 360	u₁₀	0.005 381
u₇	0.029 360	u₁₁	0.029 847
u₈	0.034 477	u₁₂	0.020 588

主题关注网络的表示学习

Representation learning for topic-attention network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 19

相关文章 15

编辑推荐

Metrics

算法	K
算法	5	6	7	8	9	10	11	12	13
DeepWalk	0.546 5	0.552 1	0.545 7	0.530 1	0.551 2	0.565 4	0.578 5	0.482 1	0.574 4
Node2vec	0.552 4	0.582 5	0.594 3	0.563 2	0.605 8	0.635 6	0.622 4	0.627 9	0.626 3
MPNE	0.499 5	0.554 8	0.572 1	0.572 3	0.558 8	0.584 6	0.629 5	0.617 7	0.634 5
PTE	0.551 7	0.586 2	0.593 3	0.587 6	0.602 5	0.639 6	0.628 7	0.639 5	0.645 8
matepath2vec	0.558 3	0.573 6	0.589 3	0.598 1	0.599 4	0.640 3	0.631 2	0.643 1	0.650 2
TANE	0.5665	0.5976	0.5970	0.612 7	0.6092	0.6435	0.6325	0.6568	0.6998

实验组序号m	α	β	γ₁	γ₂	δ	χ₁	χ₂	ε	θ	Q_m
1	0.75	0.25	0.00	0.00	0.00	0	1	0.7	0.3	0.63
2	0.40	0.30	0.10	0.10	0.10	1/3	2/3	0.7	0.3	0.70
3	0.25	0.25	1/6	1/6	1/6	1/2	1/2	0.7	0.3	0.52
4	0.10	0.30	0.20	0.20	0.20	2/3	1/3	0.7	0.3	0.37
5	0.00	0.25	0.25	0.25	0.25	1	0	0.7	0.3	0.16

数据集	α	β	γ₁	γ₂	δ	χ₁	χ₂	ε	θ
Karate	0.35	0.25	0.15	0.15	0.1	2/3	1/3	0.7	0.3
豆瓣	0.40	0.30	0.10	0.10	0.1	2/3	1/3	0.7	0.3

[1]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[2]	杜郁, 朱焱. 构建预训练动态图神经网络预测学术合作行为消失[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2726-2731.
[3]	王清, 赵杰煜, 叶绪伦, 王弄潇. 统一框架的增强深度子空间聚类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1995-2003.
[4]	黎施彬, 龚俊, 汤圣君. 基于Graph Transformer的半监督异配图表示学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1816-1823.
[5]	徐童童, 解滨, 张春昊, 张喜梅. 融合转移概率矩阵的多阶最近邻图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1527-1538.
[6]	朱云华, 孔兵, 周丽华, 陈红梅, 包崇明. 图对比学习引导的多视图聚类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3267-3274.
[7]	王春雷, 王肖, 刘凯. 多模态知识图谱表示学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 1-15.
[8]	黄懿蕊, 罗俊玮, 陈景强. 基于对比学习和GIF标记的多模态对话回复检索[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 32-38.
[9]	王菁怡, 李超, 宋衡, 李迪, 朱俊武. 基于随机游走算法的频谱组合拍卖机制[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2352-2357.
[10]	王静红, 周志霞, 王辉, 李昊康. 双路自编码器的属性网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2338-2344.
[11]	张琨, 杨丰玉, 钟发, 曾广东, 周世健. 基于混合代码表示的源代码脆弱性检测[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2517-2526.
[12]	张忠平, 郭鑫, 张玉停, 张睿博. 基于全息图平稳分布因子的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1705-1712.
[13]	富坤, 郝玉涵, 孙明磊, 刘赢华. 基于优化图结构自编码器的网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3054-3061.
[14]	毕以镇, 马焕, 张长青. 增广模态收益动态评估方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3099-3106.
[15]	杜航原, 郝思聪, 王文剑. 结合图自编码器与聚类的半监督表示学习方法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2643-2651.

算法	向量维度d	算法	向量维度d
Deepwalk	64	PTE	100
node2vec	128	metapath2vec	100
MPNE	64	TANE	64

算法	向量维度d	算法	向量维度d
Deepwalk	64	PTE	100
node2vec	128	metapath2vec	100
MPNE	64	TANE	64