A New Framework for Focused Web Crawling A new framework for focused Web crawling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A New Framework for Focused Web Crawling

引用本文：	PENG Tao HE Fengling ZUO Wanli. A New Framework for Focused Web Crawling[J]. 武汉大学学报:自然科学英文版, 2006, 11(5): 1394-1397. DOI: 10.1007/BF02829273

作者姓名：	PENG Tao HE Fengling ZUO Wanli

作者单位：	College of Computer Science and Technology/KeyLaboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changehun130012, Jilin, China

摘要：	Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more ＂important＂ pages and has a high precision and recall value.
关键词：	聚焦履带不相干记录关联量度 Web
文章编号：	1007-1202（2006）05-1394-04
收稿时间：	2006-03-10
A new framework for focused Web crawling

Peng Tao,He Fengling,Zuo Wanli. A new framework for focused Web crawling[J]. Wuhan University Journal of Natural Sciences, 2006, 11(5): 1394-1397. DOI: 10.1007/BF02829273

Authors:	Peng Tao He Fengling Zuo Wanli

Affiliation:	(1) College of Computer Science and Technology/Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, 130012 Changchun, Jilin, China

Abstract:	Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more “important” pages and has a high precision and recall value. Foundation item: Supported by the National Natural Science Foundation of China (60373099) Biography: PENG Tao (1977-), male, Ph. D. candidate, research direction. Web mining, machine learning, and Web search engine.

Keywords:	focused crawlers irrelevant pages relevance metrics
本文献已被 CNKI 维普万方数据 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏