首页 | 本学科首页   官方微博 | 高级检索  
     

Extracting Result Schema Based on Query Instances in the Deep Web
引用本文:NIE Tiezheng YU Ge SHEN Derong KOU Yue LIU Wei. Extracting Result Schema Based on Query Instances in the Deep Web[J]. 武汉大学学报:自然科学英文版, 2007, 12(5): 835-839. DOI: 10.1007/s11859-007-0043-7
作者姓名:NIE Tiezheng YU Ge SHEN Derong KOU Yue LIU Wei
作者单位:College of Information Science and Engineering, Northeastern University, Shenyang 110004, Liaoning, China
基金项目:Supported by the National Natural Science Foundation of China (60673139, 60473073, 60573090)
摘    要:Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a Deep Web source and fills in it with a specifically query instance. Then the result pages of the Deep Web sources are formatted in the tree structure to retrieve subtrees that contain elements of the query instance, Next, result schema of the Deep Web source is extracted by matching the subtree' nodes with the query instance, in which, a two-phase schema extraction method is adopted for obtaining more accurate result schema. Finally, experiments on real Deep Web sources show the utility of our approach, which provides a high precision and recall.

关 键 词:深层互联网 数据挖掘 地址 图片提取
文章编号:1007-1202(2007)05-0835-05
收稿时间:2007-02-27
修稿时间:2007-02-27

Extracting result schema based on query instances in the Deep Web
Nie Tiezheng,Yu Ge,Shen Derong,Kou Yue,Liu Wei. Extracting result schema based on query instances in the Deep Web[J]. Wuhan University Journal of Natural Sciences, 2007, 12(5): 835-839. DOI: 10.1007/s11859-007-0043-7
Authors:Nie Tiezheng  Yu Ge  Shen Derong  Kou Yue  Liu Wei
Affiliation:(1) College of Information Science and Engineering, Northeastern University, Shenyang, 110004, Liaoning, China
Abstract:Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a Deep Web source and fills in it with a specifically query instance. Then the result pages of the Deep Web sources are formatted in the tree structure to retrieve subtrees that contain elements of the query instance. Next, result schema of the Deep Web source is extracted by matching the subtree’ nodes with the query instance, in which, a two-phase schema extraction method is adopted for obtaining more accurate result schema. Finally, experiments on real Deep Web sources show the utility of our approach, which provides a high precision and recall. Biography: NIE Tiezheng(1980–), male, Ph.D. candidate, research direction: Deep Web, schema matching.
Keywords:Deep Web   schema extraction   result schema   query instance
本文献已被 维普 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号