计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2139-2143.DOI: 10.11772/j.issn.1001-9081.2016.08.2139

• 第六届中国数据挖掘会议(CCDM 2016) • 上一篇    下一篇

利用二次归并的Deep Web实体匹配方法

陈丽君   

  1. 浙江越秀外国语学院 网络传播研究所, 浙江 绍兴 312000
  • 收稿日期:2016-03-01 修回日期:2016-04-25 出版日期:2016-08-10 发布日期:2016-08-10
  • 通讯作者: 陈丽君
  • 作者简介:陈丽君(1979-),女,浙江乐清人,讲师,硕士,主要研究方向:DeepWeb数据挖掘、智能信息处理、教育信息技术。
  • 基金资助:
    全国教育信息技术研究课题资助项目(136241401);浙江越秀外国语学院科研项目(N201375)。

Deep Web entity matching method based on twice-merging

CHEN Lijun   

  1. Network Communication Institute, Zhejiang Yuexiu University of Foreign Languages, Shaoxing Zhejiang 312000, China
  • Received:2016-03-01 Revised:2016-04-25 Online:2016-08-10 Published:2016-08-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61263037), the Natural Science Foundation of Inner Mongolia Autonomous Region (2014BS0604, 2014MS0603).

摘要: 针对权重边剪枝(WEP)方法在准确率和匹配效率等方面的不足,通过引入自匹配和归并概念,提出一种基于二次归并的Deep Web实体匹配方法。首先,提取各对象的属性值,并按属性值重组对象,使具有相同属性值的对象聚集在一起,实现块的有效划分;其次,计算块内各对象间的匹配度,并据此进行剪枝、自匹配检测、归并,输出初步类簇;最后,以初步类簇为基础,利用簇内对象间传递的消息以及对象属性相似值,进一步挖掘匹配关系,触发新一轮的类簇归并与更新。实验结果表明,与WEP方法相比,所提方法通过自匹配检测,自动区分匹配关系并采取合适的匹配策略,使归并过程逐渐精化,提高了匹配准确率;通过分块、剪枝,有效缩减了匹配空间,提高了系统运行效率。

关键词: 二次归并, Deep Web, 实体匹配, 类簇, 相似值

Abstract: Concerning the limitations of the Weighted Edge Pruning (WEP) method in accuracy and matching efficiency, a Deep Web entity matching method based on twice-merging was proposed by introducing the concepts of self-matching and merging. Firstly, attribute values of each object were extracted to regroup objects for gathering objects with the same attribute value together, therefore, all objects could be divided into blocks efficiently. Secondly, the matching values between objects within a same block were calculated for pruning, self-matching detection, merging explicit matching to generate preliminary clusters. Finally, based on these preliminary clusters, matching relationships were further discovered by using the message passing between objects within a cluster and objects' attribute similarity values, which triggered a new round of cluster merging and updating. Experimental results show that compared with the WEP method, the proposed method, by detecting self-matching to automatically distinguish matching relationships and take the proper matching method, gradually refines the merging process to improve the matching accuracy; simultaneously, by blocking and pruning to effectively reduce the matching space, its system efficiency is improved.

Key words: twice-merging, Deep Web, entity matching, cluster, similarity value

中图分类号: