Web database sampling approach based on attribute correlation |
| |
Authors: | Jianwei Tian Shijun Li Xiaoyue Tang |
| |
Affiliation: | (1) Department of Computer Science & Engineering, Arizona State University, Tempe, AZ 85287, USA |
| |
Abstract: | In this paper, we present a novel approach utilizing attributes correlation for the sampling task on nonuniform hidden databases. We propose the method of calculating the attributes dependency and construct the sampling template according to the attributes dependency. Then, we use the sampling template to generate initial sampling queries and propose a bottom-up algorithm to search the sampling template. We also conduct extensive experiments over real deep Web sites and controlled databases to illustrate that our sampling method has good performance both on the quality and efficiency. |
| |
Keywords: | attributes correlation hidden database sampling template mutual information |
本文献已被 CNKI 维普 SpringerLink 等数据库收录! |