首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于网页分块的Shark-Search算法
引用本文:陈军,陈竹敏.基于网页分块的Shark-Search算法[J].山东大学学报(理学版),2007,42(9):62-66.
作者姓名:陈军  陈竹敏
作者单位:1. 山东大学,网络中心,山东,济南,250100
2. 山东大学,计算机科学与技术学院,山东,济南,250061
基金项目:国家科技支撑计划子课题;山东省博士学科点专项科研基金
摘    要:Shark-Search算法是一个经典的主题爬取算法. 针对该算法在爬取噪音链接较多的Web页面时性能并不理想的问题, 提出了基于网页分块的Shark-Search算法, 该算法从页面、块、链接的多种粒度来更加有效的进行链接的选择与过滤. 实验证明, 改进的Shark-Search算法比传统的Shark-Search算法在查准率和信息量总和上有了质的提高.

关 键 词:Shark-Search算法  主题爬取  页面分块  相关性计算
文章编号:1671-9352(2007)09-0062-05
修稿时间:2007-06-28

Improved Shark-Search algorithm based on page segmentation
CHEN Jun,CHEN Zhu-min.Improved Shark-Search algorithm based on page segmentation[J].Journal of Shandong University,2007,42(9):62-66.
Authors:CHEN Jun  CHEN Zhu-min
Institution:1. Network Center, Shandong University, Jinan 250100, Shandong, China; 2. School of Computer Science and Technology, Shandong University, Jinan 250061, Shandong, China
Abstract:A Shark-Search algorithm is one of the classical algorithms for focused crawling. However, its performance is not ideal for crawling Web pages which contain too many noisy links. An improved Shark-Search algorithm based on page segmentation was proposed, which can accurately evaluate the relevance from three granularities: page, block and single link. Several experiments were carried out to verify that the improved Shark-Search algorithm can obtain significantly higher efficiency than traditional ones.
Keywords:Shark-Search algorithm  focused crawling  page segmentation  relevance computation
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
点击此处可从《山东大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号