Journal of Computer Applications ›› 2011, Vol. 31 ›› Issue (05): 1351-1354.DOI: 10.3724/SP.J.1087.2011.01351
• Database technology • Previous Articles Next Articles
WANG Yan, SONG Bao-yan, ZHANG Jia-yang, ZHANG Hong-mei, LI Xiao-guang
Received:
Revised:
Online:
Published:
王妍,宋宝燕,张佳旸,张洪梅,李晓光
通讯作者:
作者简介:
基金资助:
国家自然科学基金资助项目(60873068;60703068);辽宁大学“211工程”三期建设项目。
Abstract: In this paper, concerning the complexity of calculation, maintenance and matching ambiguity, a Deep Web query interface identification approach based on label coding was proposed after studying the current identification approach of query interface thoroughly. This approach coded and grouped labels by the directivity and the irregularity of arrangement of the query interface. The identification approach of simple attributes and composite attributes and the processing approach of isolated texts were proposed, taking each label group as an independent unit to identify the feature information. The texts matching the elements were determined by the constraints on the label subscript, which greatly reduced the number of texts considered in matching an element and avoided the problem of matching ambiguity caused by massive heuristic algorithm, and the presentation of nested information was solved by twice clustering effectively and efficiently.
Key words: label coding, Deep Web, identification of feature information, query interface, identification technology
摘要: 通过对现有查询接口的识别方法进行的深入研究,针对计算、维护复杂以及匹配歧义性等问题,提出一种基于标签编码的Deep Web查询接口识别方法。该方法根据查询接口排列的方向性和不规则性进行标签编码并分组,然后以每一个标签组作为一个独立单位进行特征信息识别,提出了简单属性、复合属性的识别方法以及孤立文本的处理方法。通过对标签下标的约束确定与元素匹配的文本,大大减少了文本与元素匹配中需要考虑的文本数量,避免了由大量启发式算法引发的匹配歧义性问题;通过两次聚类有效解决了接口层次嵌套问题。
关键词: 标签编码, Deep Web, 特征信息识别, 查询接口, 识别技术
WANG Yan SONG Bao-yan ZHANG Jia-yang ZHANG Hong-mei LI Xiao-guang. Deep Web query interface identification approach based on label coding[J]. Journal of Computer Applications, 2011, 31(05): 1351-1354.
王妍 宋宝燕 张佳旸 张洪梅 李晓光. 基于标签编码的Deep Web查询接口识别方法[J]. 计算机应用, 2011, 31(05): 1351-1354.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.3724/SP.J.1087.2011.01351
http://www.joca.cn/EN/Y2011/V31/I05/1351