首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于网页特征关键词的近似检测算法
引用本文:闫亮,李先国.基于网页特征关键词的近似检测算法[J].科学技术与工程,2009,9(4).
作者姓名:闫亮  李先国
作者单位:西北工业大学计算机学院,西安,710072
摘    要:针对海量web文本信息,利用从网页主题内容提取出来的特征关键词,在倒排索引基础上建立相似度计算模型.对一篇新入库的网页文档,利用所包含的关键词迅速缩小进行相似度计算的网页范围,提高计算效率.实验结果表明该算法是有效的,小规模评测结果得到较好的效果.

关 键 词:近似网页  搜索引擎  网页消重

Similar Detection Algorithm Research Based on the Features Keyword of Web Page
YAN Liang,LI Xian-guo.Similar Detection Algorithm Research Based on the Features Keyword of Web Page[J].Science Technology and Engineering,2009,9(4).
Authors:YAN Liang  LI Xian-guo
Institution:Department of Computer Science;Northwestern Polytechnic University;Xian 710072;P.R.China
Abstract:To solve near-replicas of large-scale Web pages crawled by search engine,a similarity dealing algorithm was proposed based on terms extracted from the Web pages.The algorithm reduces the scale of Web pages that to be processed and improves efficiency largely.
Keywords:near-replicas documents key word search engine page re-extinction  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号