Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1855-1861.DOI: 10.11772/j.issn.1001-9081.2023050702

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

Academic anomaly citation group detection based on local extended community detection

Xinrui LIN, Xiaofei WANG, Yan ZHU()   

  1. School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2023-06-05 Revised:2023-06-28 Accepted:2023-07-19 Online:2023-08-01 Published:2024-06-10
  • Contact: Yan ZHU
  • About author:LIN Xinrui, born in 2001, M. S. candidate. Her research interests include software engineering.
    WANG Xiaofei, born in 1997, M. S. candidate. Her research interests include graph anomaly detection.
  • Supported by:
    Sichuan Science and Technology Program(2019YFSY0032)

基于局部扩展社区发现的学术异常引用群体检测

林欣蕊, 王晓菲, 朱焱()   

  1. 西南交通大学 计算机与人工智能学院,成都 611756
  • 通讯作者: 朱焱
  • 作者简介:林欣蕊(2001—),女,四川达州人,硕士研究生,主要研究方向:软件工程
    王晓菲(1997—),女,山东烟台人,硕士研究生,主要研究方向:图异常检测;
  • 基金资助:
    四川省科技计划项目(2019YFSY0032)

Abstract:

Some scholars in the academic social network may form anomaly citation groups, and excessively cite each other’s papers for profit. Most of the existing anomaly group detection algorithms separate community detection from node representation learning, which leads to the limited performance of anomaly group detection. To deal with the issue, a Group Anomaly Detection based on Local extended community detection (GADL) algorithm was proposed. The author anomaly citation features were extracted by using semantic information such as research field and title content of the paper. An extension metric function based on node transition similarity, node community membership, citation anomaly and BFS (Breath First Search) depth was defined. The optimal anomaly detection performance could be obtained by combining anomaly community detection and anomaly node detection, and jointly optimizing them in a unified framework. Compared with ALP algorithm, the proposed algorithm improved the Area Under Curve (AUC) by 6.07%, 5.35% and 3.38% respectively on the ACM, DBLP1, and DBLP2 datasets.Experimental results on real datasets show that GADL can effectively detect academic anomaly citations.

Key words: academic social network, graph anomaly detection, academic anomaly citation, Graph Neural Network (GNN), local extended community detection

摘要:

学术社交网络中的某些学者可能组成异常引用群体,相互之间过度引用彼此的文章以谋取利益。现有的异常群体检测算法大多将社区检测与节点表示学习分离,导致最终异常群体检测性能受限。为此,提出一种基于局部扩展社区发现的异常引用群体检测(GADL)算法。所提算法利用论文研究领域、标题内容等语义信息提取作者异常引用特征;定义基于节点转移相似度、节点社区隶属度、引用异常度和广度优先遍历(BFS)深度的扩展度量函数;结合异常社区发现和异常节点检测,在统一框架下对二者联合优化,可获得最优的异常检测性能。在ACM、DBLP1和DBLP2数据集上,相较于ALP算法,所提算法分别提高了6.07%、5.35%和3.38%。在真实数据集上的实验结果表明,所提算法可有效地检测异常学术引用。

关键词: 学术社交网络, 图异常检测, 学术异常引用, 图神经网络, 局部扩展社区发现

CLC Number: