首页 | 本学科首页   官方微博 | 高级检索  
     

基于网页内容的网页消重高效检测算法研究
引用本文:王祖析. 基于网页内容的网页消重高效检测算法研究[J]. 佳木斯大学学报, 2010, 28(1): 22-24
作者姓名:王祖析
作者单位:湖南化工职业技术学院,湖南株洲412004
摘    要:在对现有主流网页消重技术分析的基础上,提出一种基于网页内容的改进的网页消重高效检测算法.该算法通过利用网页的标签树结构选取最大的多个文本块,将这些文本块连接在一起生成一个代表该网页的MD5指纹,对指纹进行比较,确认近似网页实现消重,实验证明该方法对近似网页能进行准确的检测.

关 键 词:搜索引擎  网页消重  MD5指纹  算法分析

High-efficiency Detective Algorithm Research for Web-Page-Content-Based Duplication Elimination
WANG Zu-xi. High-efficiency Detective Algorithm Research for Web-Page-Content-Based Duplication Elimination[J]. Journal of Jiamusi University(Natural Science Edition), 2010, 28(1): 22-24
Authors:WANG Zu-xi
Affiliation:Hunan Chemical Industry Vocation Technology Institute;Zhuzhou 412004;China
Abstract:Based on technical mainstream elimination of duplicated web pages analysis,a high-efficiency and improved detective Algorithm was presented for web-page-content-based duplication elimination.The algorithm selected the maximum number of text blocks by using the web page tag tree structure,and connected these blocks together to generate a web page with typical MD5 fingerprint,to compare the fingerprints,and then to confirm similar web pages to eliminate the duplications.It is proved by experiments that this A...
Keywords:search engine  elimination of duplicated web pages  MD5 Finerprint  lgorithm nalysis  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号