首页 | 本学科首页   官方微博 | 高级检索  
     

基于网页分块的Shark-Search算法
引用本文:陈军,陈竹敏. 基于网页分块的Shark-Search算法[J]. 山东大学学报(理学版), 2007, 42(9): 62-66
作者姓名:陈军  陈竹敏
作者单位:山东大学,网络中心,山东,济南,250100;山东大学,计算机科学与技术学院,山东,济南,250061
基金项目:国家科技支撑计划子课题;山东省博士学科点专项科研基金
摘    要:Shark-Search算法是一个经典的主题爬取算法. 针对该算法在爬取噪音链接较多的Web页面时性能并不理想的问题, 提出了基于网页分块的Shark-Search算法, 该算法从页面、块、链接的多种粒度来更加有效的进行链接的选择与过滤. 实验证明, 改进的Shark-Search算法比传统的Shark-Search算法在查准率和信息量总和上有了质的提高.

关 键 词:Shark-Search算法  主题爬取  页面分块  相关性计算沣
文章编号:1671-9352(2007)09-0062-05
修稿时间:2007-06-28

Improved Shark-Search algorithm based on page segmentation
CHEN Jun,CHEN Zhu-min. Improved Shark-Search algorithm based on page segmentation[J]. Journal of Shandong University, 2007, 42(9): 62-66
Authors:CHEN Jun  CHEN Zhu-min
Affiliation:1. Network Center, Shandong University, Jinan 250100, Shandong, China; 2. School of Computer Science and Technology, Shandong University, Jinan 250061, Shandong, China
Abstract:A Shark-Search algorithm is one of the classical algorithms for focused crawling. However, its performance is not ideal for crawling Web pages which contain too many noisy links. An improved Shark-Search algorithm based on page segmentation was proposed, which can accurately evaluate the relevance from three granularities: page, block and single link. Several experiments were carried out to verify that the improved Shark-Search algorithm can obtain significantly higher efficiency than traditional ones.
Keywords:Shark-Search algorithm   focused crawling   page segmentation   relevance computation
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《山东大学学报(理学版)》浏览原始摘要信息
点击此处可从《山东大学学报(理学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号