计算机应用 ›› 2011, Vol. 31 ›› Issue (05): 1351-1354.DOI: 10.3724/SP.J.1087.2011.01351

• 数据库技术 • 上一篇    下一篇

基于标签编码的Deep Web查询接口识别方法

王妍,宋宝燕,张佳旸,张洪梅,李晓光   

  1. 辽宁大学 信息学院,沈阳 110036
  • 收稿日期:2010-09-29 修回日期:2010-12-03 发布日期:2011-05-01 出版日期:2011-05-01
  • 通讯作者: 王妍
  • 作者简介:王妍(1978-),女,辽宁抚顺人,讲师,博士研究生,主要研究方向:Deep Web、数据流处理;宋宝燕(1965-),女,辽宁开原人,教授,博士,主要研究方向:Deep Web、RFID数据处理、数据网格、数据流处理;张佳旸(1983-),女,辽宁沈阳人,硕士研究生,主要研究方向:Deep Web。
  • 基金资助:

    国家自然科学基金资助项目(60873068;60703068);辽宁大学“211工程”三期建设项目。

Deep Web query interface identification approach based on label coding

WANG Yan, SONG Bao-yan, ZHANG Jia-yang, ZHANG Hong-mei, LI Xiao-guang   

  1. School of Information Science and Technology, Liaoning University, Shenyang Liaoning 110036, China
  • Received:2010-09-29 Revised:2010-12-03 Online:2011-05-01 Published:2011-05-01

摘要: 通过对现有查询接口的识别方法进行的深入研究,针对计算、维护复杂以及匹配歧义性等问题,提出一种基于标签编码的Deep Web查询接口识别方法。该方法根据查询接口排列的方向性和不规则性进行标签编码并分组,然后以每一个标签组作为一个独立单位进行特征信息识别,提出了简单属性、复合属性的识别方法以及孤立文本的处理方法。通过对标签下标的约束确定与元素匹配的文本,大大减少了文本与元素匹配中需要考虑的文本数量,避免了由大量启发式算法引发的匹配歧义性问题;通过两次聚类有效解决了接口层次嵌套问题。

关键词: 标签编码, Deep Web, 特征信息识别, 查询接口, 识别技术

Abstract: In this paper, concerning the complexity of calculation, maintenance and matching ambiguity, a Deep Web query interface identification approach based on label coding was proposed after studying the current identification approach of query interface thoroughly. This approach coded and grouped labels by the directivity and the irregularity of arrangement of the query interface. The identification approach of simple attributes and composite attributes and the processing approach of isolated texts were proposed, taking each label group as an independent unit to identify the feature information. The texts matching the elements were determined by the constraints on the label subscript, which greatly reduced the number of texts considered in matching an element and avoided the problem of matching ambiguity caused by massive heuristic algorithm, and the presentation of nested information was solved by twice clustering effectively and efficiently.

Key words: label coding, Deep Web, identification of feature information, query interface, identification technology