计算机应用 ›› 2012, Vol. 32 ›› Issue (06): 1688-1691.DOI: 10.3724/SP.J.1087.2012.01688

• 人工智能 • 上一篇    下一篇

结合匹配度和语义相似度的Deep Web查询接口模式匹配

冯永1,2,张洋1,3   

  1. 1. 重庆大学 计算机学院,重庆 400030
    2. 重庆大学 信息物理社会可信服务计算教育部重点实验室,重庆 400030
    3. 信息物理社会可信服务计算教育部重点实验室(重庆大学),重庆 400030
  • 收稿日期:2011-12-15 修回日期:2012-02-14 发布日期:2012-06-04 出版日期:2012-06-01
  • 通讯作者: 冯永
  • 作者简介:冯永(1977-),男,山东平度人,副教授,博士,主要研究方向:知识发现与知识工程、语义信息处理;〓张洋(1986-),男,湖南湘潭人,硕士研究生,主要研究方向:语义信息处理。
  • 基金资助:
    国家自然科学基金资助项目;重庆市高等教育教学改革研究重点项目;中央高校基本科研业务基金;“211工程”三期建设项目

Deep Web query interface schema matching based on matching degree and semantic similarity

FENG Yong1,2,ZHANG Yang3,4   

  1. 1. College of Computer Science, Chongqing University, Chongqing 400030,China
    2. Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University, Chongqing 400030,China
    3. College of Computer Science, Chongqing University, Chongqing 400030, China
    4. Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Ministry of Education, Chongqing 400030, China
  • Received:2011-12-15 Revised:2012-02-14 Online:2012-06-04 Published:2012-06-01
  • Contact: FENG Yong

摘要: 查询接口模式匹配是Deep Web信息集成中的关键部分,双重相关性挖掘方法(DCM)能有效利用关联挖掘方法解决复杂接口模式匹配问题。针对DCM方法在匹配效率、匹配准确性方面的不足,提出了一种基于匹配度和语义相似度的新模式匹配方法。该方法首先使用矩阵存储属性间的关联关系,然后采用匹配度计算属性间的相关度,最后利用语义相似度计算候选匹配的相似性。通过在美国伊利诺斯大学的BAMM数据集上进行实验,所提方法与DCM及其改进方法比较有更高的匹配效率和准确性,表明该方法能更好地处理接口之间模式匹配问题。

关键词: Deep Web, 模式匹配, 匹配度, 语义相似度

Abstract: Query interface schema matching is a key step in Deep Web data integration. Dual Correlated Mining (DCM) is able to make full use of association mining method to solve the problems of complex interface schema matching. There are some problems about DCM, such as inefficiency and inaccuracy in matching. Therefore, a new method based on matching degree and semantic similarity was presented in this paper to solve the problems. Firstly, the method used correlation matrix to save the association relationship among attributes; and then, matching degree was applied to calculate the degree of correlation between attributes; at last, semantic similarity was used to ensure the accuracy of final results. The experimental results on BAMM data sets of University of Illinois show that the proposed method has higher precision and efficiency than DCM and improved DCM, and indicate that the method can deal with the query interface schema matching problems very well.

Key words: Deep Web, pattern-matching, matching degree, semantic similarity