计算机应用 ›› 2011, Vol. 31 ›› Issue (07): 1733-1736.DOI: 10.3724/SP.J.1087.2011.01733

• 数据库技术 •    下一篇

基于结果模式的Deep Web数据标注方法

李明,李秀兰   

  1. 兰州理工大学 计算机与通信学院,兰州730050
  • 收稿日期:2010-12-28 修回日期:2011-02-01 发布日期:2011-07-01 出版日期:2011-07-01
  • 通讯作者: 李秀兰
  • 作者简介:李明(1959-),男,河北辛集人,教授,主要研究方向:数据挖掘、智能信息处理;李秀兰(1986-),女,甘肃定西人,硕士研究生,主要研究方向:信息检索、软件工程。
  • 基金资助:

    甘肃省自然科学基金资助项目

Deep Web data annotation method based on result schema

Ming LI,XIU-lan LI   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou Gansu 730050,China
  • Received:2010-12-28 Revised:2011-02-01 Online:2011-07-01 Published:2011-07-01
  • Contact: XIU-lan LI

摘要: 全面准确地标注Deep Web查询结果是Deep Web数据集成的关键问题,但现有的Web数据库标注方法还不能较好地解决该问题,为此提出一种基于结果模式的Deep Web数据标注方法。首先通过结果页面解析和抽取结构化数据来完成数据预处理的工作,并在集成结果模式和待标注数据之间建立正确的语义映射,进而确定Deep Web数据的标注信息。通过对4个领域Web数据库进行实验测试,结果表明所提方法能有效地标注Deep Web查询结果数据。

关键词: Deep Web, 结果模式, 数据标注, 数据抽取

Abstract: Comprehensive and accurate annotation of Deep Web data is the key technology to Deep Web data integration, but the existing methods of Deep Web data annotation are unavailable to effectively solve the problem. Therefore, an approach of Deep Web data annotation based on result schema was proposed. The paper, through analyzing Deep Web result pages and extracting structured data, completed data pretreatment work, then though establishing the correct semantic mapping relation between integrated result schema and staying annotation data, achieved correct annotation of Deep Web data. The experimental results over four real areas show that the proposed method can efficiently annotate Deep Web data.

Key words: Deep Web, result schema, data annotation, data extraction

中图分类号: