Journal of Computer Applications

    Next Articles

Topic-prior-guided dual-context entity alignment model

  

  • Received:2025-12-22 Revised:2026-03-03 Online:2026-03-20 Published:2026-03-20

主题先验引导的双上下文实体对齐模型

翟社平,杨乐童,刘雪,杨锐   

  1. 西安邮电大学
  • 通讯作者: 杨乐童
  • 基金资助:
    国家自然科学基金资助项目;国家级大学生创新创业计划训练项目;陕西省大学生创新创业训练计划项目;陕西省重点研发计划项目;陕西省重点研发计划项目;陕西省教育厅科学研究计划项目;陕西省社会科学基金资助项目;工业和信息化部通信软科学项目;西安市社会科学规划基金资助项目

Abstract: Entity alignment(EA) in multi-source heterogeneous knowledge graphs(KG) is crucial for identifying semantically equivalent entities across graphs. To address the difficulty of separating hard negative samples—entities that are highly similar in semantics but not equivalent—and the matching errors caused by relying only on local structure or textual similarity, a Topic-Prior-Guided Dual-Context Entity Alignment (TPDC) model was proposed. A topic model was built from entity attribute texts to derive entity-level topic distributions, which served as topic priors to guide candidate pool construction and difficulty stratification, thus constraining the search space to a semantically concentrated subspace. A dual-context encoding network with neighbor and relation branches was designed to capture fine-grained structural semantics from multi-hop neighbors and relation paths. A curriculum contrastive learning strategy was introduced to increase the sampling ratio and loss weights of hard negatives in an easy-to-hard manner, improving discrimination in late training. Results on DBP15K show that Hits at 1 (Hits@1) increases by 0.04 and 5.18 percentage points over the second-best baseline on two subsets, Hits at 10 (Hits@10) increases by 0.32,1.13, and 0.05 percentage points, and Mean Reciprocal Rank (MRR) increases by 0.026,0.106,and 0.053,confirming better overall ranking quality and robust handling of hard negatives.

Key words: Knowledge Graphs(KG), Entity Alignment(EA), Topic Prior, Path-Enhanced Dual-Context, Curriculum Contrastive Learning

摘要: 摘 要: 多源异构知识图谱(KG)中的实体对齐(EA)是识别跨图谱语义等价实体的关键任务。针对实际场景中语义高度相似但并非等价的实体易构成典型负样本、仅依赖局部邻域结构或文本相似度易导致匹配错误且决策边界难以精确划定的问题,提出了一种主题先验引导的双上下文实体对齐模型(TPDC)。基于实体属性文本构建主题模型生成实体级主题分布,并作为主题先验指导候选池构建与负样本难度分级,从全局语义层面将对齐搜索空间约束至语义集中的候选子空间。并设计由邻居上下文与关系上下文两路组成的双上下文编码网络,联合建模多跳邻居与关系路径的细粒度结构语义。最后引入课程式对比学习策略,按先易后难逐步提升困难负样本采样比例并加大其损失权重,使模型后期更聚焦区分语义相近但不等价的困难负样本。实验结果表明,在DBP15K的三个子数据集上,Hits@1在其中两个子数据集上相较于次优基线模型,分别提升了0.04和5.18个百分点;Hits@10相较各子集的次优基线模型分别提升了0.32、1.13、0.05个百分点。此外,平均倒数排名(MRR)相较各子集的次优基线模型,分别提升0.026、0.106、0.053,进一步验证了TPDC在综合排序质量上的优势。同时证明了其在处理困难负样本方面的有效性和鲁棒性。

关键词: 知识图谱, 实体对齐, 主题先验, 路径增强双上下文, 课程式对比学习

CLC Number: