Extracting Result Schema Based on Query Instances in the Deep Web Extracting result schema based on query instances in the Deep Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Extracting Result Schema Based on Query Instances in the Deep Web

引用本文：	NIE Tiezheng YU Ge SHEN Derong KOU Yue LIU Wei. Extracting Result Schema Based on Query Instances in the Deep Web[J]. 武汉大学学报:自然科学英文版, 2007, 12(5): 835-839. DOI: 10.1007/s11859-007-0043-7

作者姓名：	NIE Tiezheng YU Ge SHEN Derong KOU Yue LIU Wei

作者单位：	College of Information Science and Engineering, Northeastern University, Shenyang 110004, Liaoning, China

基金项目：	Supported by the National Natural Science Foundation of China （60673139, 60473073, 60573090）

摘要：	Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a Deep Web source and fills in it with a specifically query instance. Then the result pages of the Deep Web sources are formatted in the tree structure to retrieve subtrees that contain elements of the query instance, Next, result schema of the Deep Web source is extracted by matching the subtree＇ nodes with the query instance, in which, a two-phase schema extraction method is adopted for obtaining more accurate result schema. Finally, experiments on real Deep Web sources show the utility of our approach, which provides a high precision and recall.
关键词：	深层互联网数据挖掘地址图片提取
文章编号：	1007-1202（2007）05-0835-05
收稿时间：	2007-02-27
修稿时间：	2007-02-27
Extracting result schema based on query instances in the Deep Web

Nie Tiezheng,Yu Ge,Shen Derong,Kou Yue,Liu Wei. Extracting result schema based on query instances in the Deep Web[J]. Wuhan University Journal of Natural Sciences, 2007, 12(5): 835-839. DOI: 10.1007/s11859-007-0043-7

Authors:	Nie Tiezheng Yu Ge Shen Derong Kou Yue Liu Wei

Affiliation:	(1) College of Information Science and Engineering, Northeastern University, Shenyang, 110004, Liaoning, China

Abstract:	Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a Deep Web source and fills in it with a specifically query instance. Then the result pages of the Deep Web sources are formatted in the tree structure to retrieve subtrees that contain elements of the query instance. Next, result schema of the Deep Web source is extracted by matching the subtree’ nodes with the query instance, in which, a two-phase schema extraction method is adopted for obtaining more accurate result schema. Finally, experiments on real Deep Web sources show the utility of our approach, which provides a high precision and recall. Biography: NIE Tiezheng(1980–), male, Ph.D. candidate, research direction: Deep Web, schema matching.

Keywords:	Deep Web schema extraction result schema query instance
本文献已被维普 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏