《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (10): 3195-3202.DOI: 10.11772/j.issn.1001-9081.2024091388

• 数据科学与技术 • 上一篇    

面向大规模机构分散存储数据的基于属性的实体对齐算法

曹泽毅1,2, 昌燕1,2,3(), 赖仁鑫1,2, 张仕斌1,2,3, 秦智1,2,3, 闫丽丽1,2,3, 张雪健1,2, 狄元灏1,2   

  1. 1.成都信息工程大学 网络空间安全学院(芯谷产业学院),成都 610054
    2.先进密码技术与系统安全四川省重点实验室(成都信息工程大学),成都 610054
    3.先进微处理器技术国家工程研究中心(工业控制与安全分中心),成都 610225
  • 收稿日期:2024-10-07 修回日期:2024-12-19 接受日期:2024-12-20 发布日期:2025-03-14 出版日期:2025-10-10
  • 通讯作者: 昌燕
  • 作者简介:曹泽毅(1998—),男,四川成都人,硕士研究生,主要研究方向:数据挖掘、数据融合、知识图谱、实体对齐
    昌燕(1979—),女,内蒙古阿拉善人,教授,博士,CCF会员,主要研究方向:量子计算、信息安全、区块链 Email:cyttkl@cuit.edu.cn
    赖仁鑫(1998—),男,四川德阳人,硕士研究生,主要研究方向:数据挖掘
    张仕斌(1971—),男,重庆人,教授,博士,主要研究方向:量子计算、信息安全、区块链
    秦智(1977—),男,四川资阳人,副教授,硕士,主要研究方向:网络与信息安全、区块链、物联网
    闫丽丽(1980—),女,四川成都人,教授,博士,主要研究方向:量子计算、信息安全
    张雪健(2000—),男,河北唐山人,硕士研究生,主要研究方向:量子计算、信息安全
    狄元灏(2000—),男,四川宜宾人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:
    国家重点研发计划项目(2022YFB3103103);国家自然科学基金资助项目(62272068);国家自然科学基金资助项目(62076042);国家自然科学基金资助项目(62102049);四川省科技计划项目(2023YFS0419);成都市重点研发支持计划项目(2021-YF09-00114-GX);成都市重点研发支持计划项目(2019-YF005-02028-GX);四川省重点研发计划项目(2021YFSY0012);四川省重点研发计划项目(2020YFG0307);四川省重点研发计划项目(2021YFG0332)

Attribute-based entity alignment algorithm for decentralized data storage in large-scale institutions

Zeyi CAO1,2, Yan CHANG1,2,3(), Renxin LAI1,2, Shibin ZHANG1,2,3, Zhi QIN1,2,3, Lili YAN1,2,3, Xuejian ZHANG1,2, Yuanhao DI1,2   

  1. 1.School of Cybersecurity (Xin Gu Industrial College),Chengdu University of Information Technology,Chengdu Sichuan 610054,China
    2.Advanced Cryptography and System Security Key Laboratory of Sichuan Province (Chengdu University of Information Technology),Chengdu Sichuan 610054,China
    3.SUGON Industrial Control and Security Center,Chengdu Sichuan 610225,China
  • Received:2024-10-07 Revised:2024-12-19 Accepted:2024-12-20 Online:2025-03-14 Published:2025-10-10
  • Contact: Yan CHANG
  • About author:CAO Zeyi, born in 1998, M. S. candidate. His research interests include data mining, data fusion, knowledge graph, entity alignment.
    CHANG Yan, born in 1979, Ph. D., professor. Her research interests include quantum computing, information security, blockchain.
    LAI Renxin, born in 1998, M. S. candidate. His research interests include data mining.
    ZHANG Shibin, born in 1971, Ph. D., professor. His research interests include quantum computing, information security, blockchain.
    QIN Zhi, born in 1977, M. S., associate professor. His research interests include network and information security, blockchain, internet of things.
    YAN Lili, born in 1980, Ph. D., professor. Her research interests include quantum computing, information security.
    ZHANG Xuejian, born in 2000, M. S. candidate. His research interests include quantum computing, information security.
    DI Yuanhao, born in 2000, M. S. candidate. His research interests include data mining.
  • Supported by:
    National Key Research and Development Plan of China(2022YFB3103103);National Natural Science Foundation of China(62272068);Sichuan Science and Technology Program(2023YFS0419);Key Research and Development Support Program of Chengdu(2021-YF09-00114-GX);Key Research and Development Program of Sichuan Province(2021YFSY0012)

摘要:

大规模机构分散存储的数据实体存在数据冗余、信息缺失和不一致等问题,需要通过实体对齐进行集成融合。现有的实体对齐方法大多依赖实体的结构信息,通过子图匹配进行对齐,但分散存储数据的结构信息匮乏,导致对齐效果不佳。为解决上述问题,并支撑重要数据的识别,提出一种单层图神经网络的基于属性的实体对齐模型。首先,使用单层图神经网络避免次级邻居节点的信息干扰;其次,设计基于信息熵的属性赋权方法,从而在初始阶段快速区分属性的重要程度;最后,构建基于注意力机制的编码器,以结合局部和全局视角表征不同属性在对齐中的重要程度,更全面地表征实体信息。实验结果表明,在2个分散存储数据集上,相较于次优模型,所提模型的前1位命中率(Hits@1)分别提升了5.24和2.03个百分点。可见,所提模型的对齐效果优于其他实体对齐方法。

关键词: 重要数据识别, 数据融合, 信息熵, 实体对齐, 注意力机制

Abstract:

The data entities stored in large-scale decentralized institutions have issues such as data redundancy, missing information, and inconsistency, which requires integration through entity alignment. Most existing entity alignment methods rely on structural information of entities and perform alignment through subgraph matching. However, the lack of structural information in decentralized data storage will lead to poor alignment results. To address this issue and support identification of important data, a single-layer graph neural network-based attribute-based entity alignment model was proposed. Firstly, a single-layer graph neural network was utilized to avoid interference from secondary neighbor node information. Secondly, an attribute weighting method based on information entropy was designed to distinguish importance of the attributes in the initial stage quickly. Finally, an attention mechanism-based encoder was constructed to represent importance of different attributes in alignment from both local and global perspectives, thereby providing a more comprehensive representation of entity information. Experimental results indicate that on two decentralized storage datasets, the proposed model improves the Hits@1 by 5.24 and 2.03 percentage points, respectively, compared to the suboptimal models, demonstrating superior alignment performance of the proposed model over other entity alignment methods.

Key words: important data identification, data fusion, information entropy, entity alignment, attention mechanism

中图分类号: