首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种应用于博客的垃圾评论识别方法
引用本文:邓冰娜,王煜,刘宇.一种应用于博客的垃圾评论识别方法[J].郑州大学学报(自然科学版),2011(1):65-69,74.
作者姓名:邓冰娜  王煜  刘宇
作者单位:河北大学数学与计算机系,河北石家庄071002
基金项目:河北省教育厅科学研究重点项目 编号ZH200804
摘    要:针对博客垃圾评论泛滥的问题,提出了一种识别博客垃圾评论的新方法.利用网络常用语对短小评论先进行评论的识别,然后利用改进的相似度公式对评论进行了K轮评论的识别,在每轮识别之后,对主题词进行权重的调整和主题词扩展;待所有评论识别完毕,再利用网络常用语和主题词对识别出的垃圾评论进行第二次过滤,过滤出垃圾评论中的合法评论.实验结果表明,利用该方法进行评论识别在一定程度上提高了识别垃圾评论的准确率和召回率.

关 键 词:博客垃圾评论  相似度  语义信息

A Research on Identifying Comments Spam for Blog Comments
DENG Bing-na,WANG Yu,LIU Yu.A Research on Identifying Comments Spam for Blog Comments[J].Journal of Zhengzhou University (Natural Science),2011(1):65-69,74.
Authors:DENG Bing-na  WANG Yu  LIU Yu
Institution:(Department of Mathematics and Computer Science,Hebei University,Shijiazhuang 071002,China)
Abstract:A new method to identify blog comments spam was proposed.The short comments were identified by the network common words first,and made K rounds to identify the comments which used the improved similarity formula.Following every identifies,the weights of key words and extend Key words were adjusted.All the comments were identified to the category.The spam reviews were filter again by the network common words and the Key words,and more legitimate comments were identified.Experimental results showed that the method,to some extent,improved the recognition accuracy.
Keywords:blog comments  spam similarity  semantic information
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号