首页 | 本学科首页   官方微博 | 高级检索  
     

A New Framework for Focused Web Crawling
引用本文:PENG Tao HE Fengling ZUO Wanli. A New Framework for Focused Web Crawling[J]. 武汉大学学报:自然科学英文版, 2006, 11(5): 1394-1397. DOI: 10.1007/BF02829273
作者姓名:PENG Tao HE Fengling ZUO Wanli
作者单位:College of Computer Science and Technology/KeyLaboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changehun130012, Jilin, China
摘    要:Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more "important" pages and has a high precision and recall value.

关 键 词:聚焦履带 不相干记录 关联量度 Web
文章编号:1007-1202(2006)05-1394-04
收稿时间:2006-03-10

A new framework for focused Web crawling
Peng Tao,He Fengling,Zuo Wanli. A new framework for focused Web crawling[J]. Wuhan University Journal of Natural Sciences, 2006, 11(5): 1394-1397. DOI: 10.1007/BF02829273
Authors:Peng Tao  He Fengling  Zuo Wanli
Affiliation:(1) College of Computer Science and Technology/Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, 130012 Changchun, Jilin, China
Abstract:Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more “important” pages and has a high precision and recall value. Foundation item: Supported by the National Natural Science Foundation of China (60373099) Biography: PENG Tao (1977-), male, Ph. D. candidate, research direction. Web mining, machine learning, and Web search engine.
Keywords:focused crawlers  irrelevant pages  relevance metrics
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号