首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于噪音清除的网页削重算法
引用本文:吕争,陈侃.一种基于噪音清除的网页削重算法[J].信阳师范学院学报(自然科学版),2007,20(1):105-108.
作者姓名:吕争  陈侃
作者单位:信阳职业技术学院,河南,信阳,464000
基金项目:国家重点基础研究发展计划(973计划)
摘    要:提出了一种基于噪音清除的网页削重算法.首先应用空间向量模型,仅仅使用<特征词,权重>二元组表示网页,降低削重算法的时空复杂度;其次,通过一组启发式规则来消除网页中包含的“噪音”,消除了无关信息对网页核心内容的干扰.

关 键 词:搜索引擎  Web挖掘  噪音清除
文章编号:1003-0972(2007)01-0105-04
收稿时间:2006-03-22
修稿时间:2006-03-222006-10-31

A Web Pages Near-replicas Detection Algorithm Based on Noise Reduction
LV Zheng,CHEN Kan.A Web Pages Near-replicas Detection Algorithm Based on Noise Reduction[J].Journal of Xinyang Teachers College(Natural Science Edition),2007,20(1):105-108.
Authors:LV Zheng  CHEN Kan
Institution:Xinyang Vocational and Technical College, Xinyang 464000, China
Abstract:A near-replica of Web pages detection algorithm is introduced.There are two keys in the algorithm,the first is that web page is presented by which using space vector model,which can decrease the time and space complexity of near-replicas of Web pages detection algorithm;the second is that some heuristics are used to reduce noise automatically.Experimental results show that the algorithm is more effective than the existing algorithm of Web pages near-replicas detection in search engine.
Keywords:MD5
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号