首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于网页分块技术主题爬行器的实现
引用本文:李晓亚,赫枫龄,左万利.基于网页分块技术主题爬行器的实现[J].吉林大学学报(理学版),2007,45(6):959-965.
作者姓名:李晓亚  赫枫龄  左万利
作者单位:吉林大学 计算机科学与技术学院, 长春 130012
摘    要:针对目前通用搜索引擎搜索到的结果过多、 与主题相关性不强的现状, 提出一种基于网页分块技术的主题爬行器实现方法, 并实现了一个原型系统Crawler1. 实验结果表明, 本系统性能较好, 所爬网页的相关度在55%以上.

关 键 词:主题搜索  主题爬行  相关度分析  网页分块  
文章编号:1671-5489(2007)06-0959-07
收稿时间:2006-11-17
修稿时间:2006-11-17

Realization of Focused Crawler Based on Page Segmentation
LI Xiao-ya,HE Feng-ling,ZUO Wan-li.Realization of Focused Crawler Based on Page Segmentation[J].Journal of Jilin University: Sci Ed,2007,45(6):959-965.
Authors:LI Xiao-ya  HE Feng-ling  ZUO Wan-li
Institution:College of Computer Science and Technology, Jilin University, Changchun 130012, China
Abstract:In the light of result returned currently by general-purpose search engines being excessive, and having no strong similarity with the topic, this paper covers a technique of dividing the web page to chunks to implement a focused crawler. With this method, Crawlerl, a prototype of a focused crawler has been realized. Experimental results indicate that Crawlerl has better performance. The number of topic web pages crawled by Crawlerl attains more than 55%.
Keywords:topic-specific search  focused crawling  relevance analysis  page segmentation
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号