计算机应用 ›› 2020, Vol. 40 ›› Issue (10): 2929-2935.DOI: 10.11772/j.issn.1001-9081.2020020207

• 数据科学与技术 • 上一篇    下一篇

基于网络嵌入的稀疏子图发现算法

孙鹤立1,2, 何亮1, 何方1, 孙苗苗1, 贾晓琳1   

  1. 1. 西安交通大学 计算机科学与技术学院, 西安 710049;
    2. 西安交通大学 新闻与新媒体学院, 西安 710049
  • 收稿日期:2020-02-28 修回日期:2020-04-26 出版日期:2020-10-10 发布日期:2020-05-21
  • 通讯作者: 贾晓琳
  • 作者简介:孙鹤立(1983-),女,辽宁铁岭人,副教授,博士生导师,博士,主要研究方向:大规模图数据挖掘、复杂网络分析、社会网络分析、道路网络、移动数据挖掘;何亮(1975-),男,陕西西安人,讲师,博士,主要研究方向:数据挖掘、软件工程;何方(1994-),男,湖北黄冈人,硕士,主要研究方向:图数据挖掘、复杂网络分析;孙苗苗(1995-),女,河南驻马店人,硕士研究生,主要研究方向:图数据挖掘、复杂网络分析;贾晓琳(1963-),女,陕西西安人,高级工程师,博士,主要研究方向:数据挖掘、软件工程。
  • 基金资助:
    国家自然科学基金资助项目(61672417)。

Network embedding based tenuous subgraph finding

SUN Heli1,2, HE Liang1, HE Fang1, SUN Miaomiao1, JIA Xiaolin1   

  1. 1. School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an Shaanxi 710049, China;
    2. School of Journalism and New Media, Xi'an Jiaotong University, Xi'an Shaanxi 710049, China
  • Received:2020-02-28 Revised:2020-04-26 Online:2020-10-10 Published:2020-05-21
  • Supported by:
    This work is partially supported by theFoundation:National Natural Science Foundation of China (61672417).

摘要: 针对稀疏子图发现问题中使用高维稀疏向量表示网络信息存在的时间和空间消耗大的问题,提出一种基于网络嵌入的稀疏子图发现(TGF)算法。该算法首先通过网络嵌入的方法将网络结构映射到低维空间中,得到节点的低维向量表示;然后定义向量空间中的稀疏子集发现问题,将稀疏子图发现问题转化为稀疏子集发现问题;迭代搜索局部密度最低的样本点并对其进行扩张,最终找到一个满足条件的最大稀疏子集。实验结果表明,在Synthetic_1000数据集上与TERA(Triangle and Edge Reduction Algorithm)和WK(Weight of K-hop)算法相比,TGF算法的搜索效率是TERA的1 353倍,是WK算法的4倍,并且在k-line、k-triangle和k-density指标上也取得了较优的结果。

关键词: 社交网络, 图挖掘, 网络嵌入, 稀疏子图, 弱社交

Abstract: Concerning the high time and space complexity caused by using high-dimensional tenuous vectors to represent network information in tenuous subgraph finding problem, a Tenuous subGraph Finding (TGF) algorithm based on network embedding was proposed. Firstly, the network structure was mapped to the low-dimensional space by the network embedding method in order to obtain the low-dimensional vector representation of nodes. Then, the tenuous subset finding problem in the vector space was defined, and the tenuous subgraph finding problem was transformed into the tenuous subset finding problem. Finally, the sample points with lowest local density were searched iteratively and expanded to figure out the largest tenuous subset satisfying the given conditions. Experimental results on Synthetic_1000 dataset show that, the search efficiency of TGF algorithm is 1 353 times that of Triangle and Edge Reduction Algorithm (TERA) and 4 times of that Weight of K-hop (WK) algorithm, and it achieves better results in k-line, k-triangle and k-density indexes

Key words: social network, graph mining, network embedding, tenuous subgraph, weak social interation

中图分类号: