基于网页分块技术主题爬行器的实现 Realization of Focused Crawler Based on Page Segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于网页分块技术主题爬行器的实现

引用本文：	李晓亚,赫枫龄,左万利.基于网页分块技术主题爬行器的实现[J].吉林大学学报(理学版),2007,45(6):959-965.

作者姓名：	李晓亚赫枫龄左万利

作者单位：	吉林大学计算机科学与技术学院, 长春 130012

摘要：	针对目前通用搜索引擎搜索到的结果过多、与主题相关性不强的现状, 提出一种基于网页分块技术的主题爬行器实现方法, 并实现了一个原型系统Crawler1. 实验结果表明, 本系统性能较好, 所爬网页的相关度在55%以上.
关键词：	主题搜索主题爬行相关度分析网页分块
文章编号：	1671-5489（2007）06-0959-07
收稿时间：	2006-11-17
修稿时间：	2006-11-17
Realization of Focused Crawler Based on Page Segmentation

LI Xiao-ya,HE Feng-ling,ZUO Wan-li.Realization of Focused Crawler Based on Page Segmentation[J].Journal of Jilin University: Sci Ed,2007,45(6):959-965.

Authors:	LI Xiao-ya HE Feng-ling ZUO Wan-li

Institution:	College of Computer Science and Technology, Jilin University, Changchun 130012, China

Abstract:	In the light of result returned currently by general-purpose search engines being excessive, and having no strong similarity with the topic, this paper covers a technique of dividing the web page to chunks to implement a focused crawler. With this method, Crawlerl, a prototype of a focused crawler has been realized. Experimental results indicate that Crawlerl has better performance. The number of topic web pages crawled by Crawlerl attains more than 55%.

Keywords:	topic-specific search focused crawling relevance analysis page segmentation
本文献已被维普万方数据等数据库收录！
	点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
	点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏