首页 | 本学科首页   官方微博 | 高级检索  
     

基于多质心的不良文本快速过滤方法
引用本文:黄家裕,刘连芳. 基于多质心的不良文本快速过滤方法[J]. 广西科学院学报, 2010, 26(4): 436-438
作者姓名:黄家裕  刘连芳
作者单位:南宁市平方软件新技术有限责任公司,广西南宁530007
摘    要:针对Rocchio容易受到类别样本分布及噪声影响的而导致错误扩大类别范围的问题,提出对训练样本进行聚类,使用聚类形成的多个簇的质心向量替代单个质心向量作为过滤判定向量组的方法。该方法既能保证过滤效率,又比单质心的Rocchio过滤法具有更高的召回率和准确率。

关 键 词:不良文本  快速过滤  多质心向量  Rocchio  K-means
收稿时间:2010-09-28
修稿时间:2010-10-18

A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector
HUANG Jia-yu and LIU Lian-fang. A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector[J]. Journal of Guangxi Academy of Sciences, 2010, 26(4): 436-438
Authors:HUANG Jia-yu and LIU Lian-fang
Affiliation:(Pingsoft New Technology Co.Ltd.of Nanning,Nanning,Guangxi,530007,China)
Abstract:Aiming at the defect in Rocchio that classification range could be easily mis-extended due to distribution of classification samples and noises,a filtering method is presented in this paper,in which a vector of single centroid is substituted by a vector group of centroids at multiple clusters formed by clustering trained samples and used as a deciding vector group for filtering.This method is characterized by lossless filtering efficiency.Recalling rate and accuracy of this method is higher than that of the single centroid-featured Rocchio Filtering.
Keywords:illegal and harmful text  fast filter  multi-centroid vector  Rocchio  K-means
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《广西科学院学报》浏览原始摘要信息
点击此处可从《广西科学院学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号