Journal of Computer Applications

    Next Articles

Attribute-based entity alignment algorithm for decentralized stored data in large-scale organization

CAO Zeyi1,2, CHANG Yan1,2,3, LAI Renxin1,2, ZHANG Shibin1,2,3, QIN Zhi1,2,3, YAN Lili1,2,3, ZHANG Xuejian1,2, DI Yuanhao1,2   

  1. 1.School of cybersecurity(Xin Gu Industrial College, Chengdu University of Information Technology 2.Advanced Cryptography and System Security Key Laboratory of Sichuan Province (Chengdu University of Information Technology)
    3.SUGON Industrial Control and Security Center
  • Received:2024-09-28 Revised:2024-12-19 Online:2025-03-14 Published:2025-03-14
  • About author:CAO Zeyi, born in 1998, M. S. candidate. His research interests include data mining, data fusion, knowledge graph, entity alignment. CHANG Yan, born in 1979, Ph. D., professor. Her research interests include quantum computing, information security, blockchain. LAI Renxin, born in 1998, M. S. candidate. His research interests include data mining. ZHANG Shibin, born in 1971, Ph. D., professor. His research interests include quantum computing, information security, blockchain. QIN Zhi, born in 1977, M. S., associate professor. His research interests include quantum computing, information security, blockchain. YAN Lili, born in 1980, Ph. D., professor. Her research interests include quantum computing, information security. ZHANG Xuejian, born in 2000, M. S. candidate. His research interests include quantum computing, information security. DI Yuanhao, born in 2000, M. S. candidate. His research interests include data mining.
  • Supported by:
    National Key Research and Development Plan of China (2022YFB3103103); National Natural Science Foundation of China (62272068, 62076042, 62102049); Sichuan Science and Technology Program (2023YFS0419); Key Research and Development Support Plan of Chengdu (2021-YF09-00114-GX, 2019-YF05 02028-GX); Key Research and Development Project of Sichuan Province (2021YFSY0012, 2020YFG0307, 2021YFG0332)

面向大规模机构分散存储数据的基于属性的实体对齐算法

曹泽毅1,2,昌燕1,2,3,赖仁鑫1,2,张仕斌1,2,3,秦智1,2,3,闫丽丽1,2,3,张雪健1,2,狄元灏1,2   

  1. 1.成都信息工程大学 网络空间安全学院(芯谷产业学院) 2.先进密码技术与系统安全四川省重点实验室(成都信息工程大学) 3.先进微处理器技术国家工程研究中心(工业控制与安全分中心)
  • 通讯作者: 曹泽毅
  • 作者简介:曹泽毅(1998—),男,四川成都人,硕士研究生,主要研究方向:数据挖掘、数据融合、知识图谱、实体对齐;昌燕(1979—),女,内蒙古阿拉善人,博士,教授,CCF会员,主要研究方向:量子计算、信息安全、区块链;赖仁鑫(1998—),男,四川德阳人,硕士研究生,主要研究方向:数据挖掘;张仕斌(1971—),男,重庆人,教授,博士,主要研究方向:量子计算、信息安全、区块链;秦智(1977—),男,四川资阳人,副教授,硕士,主要研究方向:网络与信息安全、区块链、物联网;闫丽丽(1980—),女,四川成都人,教授,博士,主要研究方向:量子计算、信息安全;张雪健(2000—),男,河北唐山人,硕士研究生,主要研究方向:量子计算、信息安全;狄元灏(2000—),男,四川宜宾人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:
    国家重点研发计划项目(2022YFB3103103);国家自然科学基金资助项目(62272068,62076042,62102049);四川省科技计划(2023YFS0419)、成都市重点研发支持计划(2021-YF09-00114-GX,2019-YF005 02028-GX);四川省重点研发计划(2021YFSY0012,2020YFG0307,2021YFG0332)

Abstract: The data entities stored in large-scale decentralized institutions were found to have issues such as data redundancy, missing information, and inconsistency, which required integration through entity alignment. Most existing entity alignment methods relied on structural information of entities and performed alignment through subgraph matching; however, the lack of structural information in decentralized storage led to poor alignment results. To address these issues and support the identification of important data, a single-layer graph neural network-based attribute entity alignment model was proposed. Firstly, the model utilized a single-layer graph neural network to avoid interference from secondary neighbor node information. Secondly, an attribute weighting method based on information entropy was designed to quickly distinguish the importance of attributes in the initial stage. Finally, an attention mechanism-based encoder was constructed to represent the importance of different attributes in alignment from both local and global perspectives, providing a more comprehensive representation of entity information. Experimental results indicated that, on two decentralized storage datasets, the proposed model improved accuracy by 5.24 percentage points and 2.03 percentage points compared to the optimal results of baseline models, demonstrating superior alignment performance over other entity alignment methods.

Key words: important data identification, data fusion, information entropy, entity alignment, attention mechanism

摘要: 大规模机构分散存储的数据实体存在数据冗余、缺失、不一致等问题,需要通过实体对齐进行集成融合。现有的实体对齐方法大多依赖实体的结构信息,通过子图匹配进行对齐,但分散存储的数据结构信息匮乏,导致对齐效果不佳。为解决上述问题,支撑重要数据的识别,提出一种单层图神经网络的基于属性的实体对齐模型,首先该模型使用单层图神经网络避免次级邻居节点的信息干扰,其次设计了基于信息熵的属性赋权方法,在初始阶段快速区分属性的重要程度,最后构建了基于注意力机制的编码器,结合局部和全局视角表征不同属性在对齐中的重要程度,更加全面地表征实体信息。实验结果表明,在两个分散存储数据集上,相较于基线模型中的最优结果,所提模型准确率分别提升了5.24个百分点和2.03个百分点,对齐效果优于其他实体对齐方法。

关键词: 重要数据识别, 数据融合, 信息熵, 实体对齐, 注意力机制

CLC Number: