Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3047-3053.DOI: 10.11772/j.issn.1001-9081.2022101535

• Artificial intelligence • Previous Articles    

Graph summarization algorithm based on node similarity grouping and graph compression

Yu HONG1, Hongchang CHEN2, Jianpeng ZHANG2(), Ruiyang HUANG2   

  1. 1.Cyberspace Security College,Zhengzhou University,Zhengzhou Henan 450002,China
    2.Institute of Information Technology,Information Engineering University,Zhengzhou Henan 450002,China
  • Received:2022-10-17 Revised:2023-01-31 Accepted:2023-01-31 Online:2023-04-12 Published:2023-10-10
  • Contact: Jianpeng ZHANG
  • About author:HONG Yu, born in 1998, M. S. candidate. His research interests include graph data mining.
    CHEN Hongchang, born in 1964, Ph. D., professor. His research interests include big data analysis, communication and information systems.
    HUANG Ruiyang, born in 1986, Ph. D., associate research fellow. His research interests include knowledge graph, text mining.
  • Supported by:
    National Natural Science Foundation of China(62002384);China Postdoctoral Science Foundation(2020M683760)

基于节点相似性分组与图压缩的图摘要算法

宏宇1, 陈鸿昶2, 张建朋2(), 黄瑞阳2   

  1. 1.郑州大学 网络空间安全学院,郑州 450002
    2.信息工程大学 信息技术研究所,郑州 450002
  • 通讯作者: 张建朋
  • 作者简介:宏宇(1998—),男,河北廊坊人,硕士研究生,主要研究方向:图数据挖掘
    陈鸿昶(1964—),男,河南新密人,教授,博士生导师,博士,主要研究方向:大数据分析、通信与信息系统
    黄瑞阳(1986—),男,福建漳州人,副研究员,博士,主要研究方向:知识图谱、文本挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62002384);中国博士后科学基金资助项目(2020M683760)

Abstract:

To solve the problem that the current graph summarization methods have high compression ratios and the graph compression algorithms cannot be directly used in downstream tasks, a fusion algorithm of graph summarization and graph compression was proposed, which called Graph Summarization algorithm based on Node Similarity grouping and graph Compression (GSNSC). Firstly, the nodes were initialized as super nodes, and the super nodes were grouped according to the similarity. Secondly, the super nodes of each group were merged until the specified number of times or nodes were reached. Thirdly, super edges and corrected edges were added between the super nodes for reconstructing the original graph. Finally, for the graph compression part, the cost of compressing and summarizing the adjacent edges of each super node were judged, and the less expensive one in these two was selected to execute. Experiments of graph compression ratio and graph query were conducted on six datasets such as Web-NotreDame, Web-Google and Web-Berkstan. Experimental results on six datasets show that, the proposed algorithm has the compression ratio reduced by at least 23 percentage points compared with SLUGGER (Scalable Lossless sUmmarization of Graphs with HiERarchy) algorithm, and the compression ratio decreased by at least 13 percentage points compared with SWeG (Summarization of Web-scale Graphs) algorithm. Experimental results on Web-NotreDame dataset show that the degree error of the proposed algorithm is reduced by 41.6% compared with that of SWeG algorithm. The above verifies that the proposed algorithm has better graph compression ratio and graph query accuracy.

Key words: graph summarization, graph compression, graph query, super edge, Minimum Description Length (MDL)

摘要:

针对当前图摘要方法压缩率较高,图压缩算法无法直接被用于下游任务分析的问题,提出一种图摘要与图压缩的融合算法,即基于节点相似性分组与图压缩的图摘要算法(GSNSC)。首先,初始化节点为超节点,并根据相似度对超节点分组;其次,将每个组的超节点合并,直到达到指定次数或指定节点数;再次,在超节点之间添加超边和校正边以恢复原始图;最后,对于图压缩部分,判断对每个超节点的邻接边压缩和摘要的代价,并选择二者中代价较小的执行。在Web-NotreDame、Web-Google和Web-Berkstan等6个数据集上进行了图压缩率和图查询实验。实验结果表明,在6个数据集上,与SLUGGER(Scalable Lossless sUmmarization of Graphs with HiERarchy)算法相比,所提算法的压缩率至少降低了23个百分点;与SWeG(Summarization of Web-scale Graphs)算法相比,所提算法的压缩率至少降低了13个百分点;在Web-NotreDame数据集上,所提算法的度误差比SWeG降低了41.6%。以上验证了所提算法具有更好的图压缩率和图查询准确度。

关键词: 图摘要, 图压缩, 图查询, 超边, 最小描述长度

CLC Number: