首页 | 本学科首页   官方微博 | 高级检索  
     检索      

搜索引擎排序作弊的识别:基于文本内容和链接结构的分析
引用本文:王洪伟,王伟,孟园.搜索引擎排序作弊的识别:基于文本内容和链接结构的分析[J].系统工程理论与实践,2015,35(2):445-457.
作者姓名:王洪伟  王伟  孟园
作者单位:同济大学 经济与管理学院, 上海 200092
基金项目:国家自然科学基金(70971099,71371144);上海市哲学社会科学规划课题一般项目(2013BGL004);中央高校基本科研业务费专项资金(1200219198)
摘    要:搜索引擎排序作弊通过提高网页与搜索请求的相关性,达到提高搜索排名的目的.为此,根据作弊网页的特征,引入作弊倾向系数这一概念来衡量网页作弊的可能性.网页作弊通过多种手段实现,鉴于此本文基于网页内容本身的名词密度特征,衡量页面内容作弊的可能性,由于搜索关键词大部分为名词,超过一定名词比例阈值的页面,其内容作弊的可能性越大.根据页面的链接特征,衡量页面链接作弊的可能性,从黑名单页面通过迭代计算链接作弊系数,并根据与黑名单页面的距离设置权重.最终从上述两方面特征来综合考量页面的作弊倾向系数.选取PageRank,TrustRank,BadRank为基线实验,实验结果验证了关于检索词性分析的假设以及链接作弊检测算法的有效性.

关 键 词:搜索引擎  搜索引擎优化  网页排序  排名作弊  文本内容  链接结构  
收稿时间:2013-07-16

Countering page ranking spam for search engine based on text content and link structure analysis
WANG Hong-wei;WANG Wei;MENG Yuan.Countering page ranking spam for search engine based on text content and link structure analysis[J].Systems Engineering —Theory & Practice,2015,35(2):445-457.
Authors:WANG Hong-wei;WANG Wei;MENG Yuan
Institution:School of Economics and Management, Tongji University, Shanghai 200092, China
Abstract:By improving the relevance of web pages and search requests, the search engines sort spam achieves the purposes of improving search ranking. Hence, according to the characteristics of the cheating pages, the paper introduces the concept of spam tendency rate to measure the possibility of a web spam behavior. Web spam may be achieved through a variety of channels, based on nouns density, it measures content spam tendency rate. Because majority search keywords are nouns, so the greater a page exceeds a certain proportion of nouns threshold, the greater the possibility of spam. Based on link characteristics, it measures link spam tendency rate. The paper calculates link spam tendency rate by iteration from the blacklist page, then sets the weight in accordance with the distance from the blacklist page. Finally, from these both aspects to comprehensive considerate the spam tendency rate of a page. By selecting PageRank, TrustRank, BadRank as baseline, the experimental results verify the assumptions of the part-of-speech on keywords and the effectiveness of link spam detection.
Keywords:search engine  search engine optimization  page ranking  ranking spam  text content  link structure
本文献已被 CNKI 等数据库收录!
点击此处可从《系统工程理论与实践》浏览原始摘要信息
点击此处可从《系统工程理论与实践》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号