首页 | 本学科首页   官方微博 | 高级检索  
     检索      

融合链接结构的主题爬虫算法
引用本文:刘韶涛,李洪胜.融合链接结构的主题爬虫算法[J].华侨大学学报(自然科学版),2017,0(2):195-200.
作者姓名:刘韶涛  李洪胜
作者单位:华侨大学 计算机科学与技术学院, 福建 厦门 361021
摘    要:通过分析基于内容的链接选择Best-First算法,引入能够体现链接价值的HITS(hyperlink induced topic search)算法,提出了新的链接选择策略.将两种算法相结合,新的爬虫不仅仅考虑页面内容,同时将链接结构加入进来,使得在下载的过程中能够保证主题相关性和权威性,缓解爬虫在爬行阶段的“近视”现象.结果表明:新的爬行策略比单一的Best-First算法具有更好的性能表现.

关 键 词:Best-First算法  链接结构  HITS算法  爬行策略

Topic Crawler Algorithm With Link Structure
LIU Shaotao,LI Hongsheng.Topic Crawler Algorithm With Link Structure[J].Journal of Huaqiao University(Natural Science),2017,0(2):195-200.
Authors:LIU Shaotao  LI Hongsheng
Institution:College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
Abstract:By analyzing the content-based link selection Best-First algorithm, and introduce the HITS(hyperlink induced topic search)algorithm which can reflect the link value, a new kind of link selection strategy is proposed: Combination of two algorithms, new crawler not only consider the page content, but also the link structure, and can ensure topic relevance and authority in the process of downloading; at the same time, ease the “short-sighted” phenomenon in crawling stage. Experimental result shows the new crawling strategy has better performance than that of the single Best-First algorithm.
Keywords:Best-First algorithm  link structure  HITS algorithm  crawling strategy
本文献已被 CNKI 等数据库收录!
点击此处可从《华侨大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《华侨大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号