首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于基尼的模糊kNN分类器
引用本文:尚文倩,瞿有利,黄厚宽,朱海滨,林永民,董红斌.基于基尼的模糊kNN分类器[J].广西师范大学学报(自然科学版),2006,24(4):87-90.
作者姓名:尚文倩  瞿有利  黄厚宽  朱海滨  林永民  董红斌
作者单位:1. 北京交通大学,计算机学院,北京,100044
2. 尼普森大学,计算科学与数学系,加拿大,安大略,诺斯贝,PIB8L7
基金项目:National Natural Science Foundation of China (60503017),Beijing Jiaotong University Science Foun-dation (2004RC008)
摘    要:随着网络的发展,大量的文档涌现在网上,自动文本分类成为处理海量数据的关键技术。在众多的文本分类算法中,kNN算法被证明是最好的文本分类算法之一。对于大多数文本分类来说,文本预处理是文本分类的瓶颈,文本预处理的好坏直接影响着分类的性能。在此介绍了一种新的文本预处理算法——基于基尼的文本预处理算法。同时采用模糊集理论改进kNN的决策规则。这两者的结合使得模糊kNN比传统的kNN表现出更好的分类性能。实验结果证明这种改进是有效的,可行的。

关 键 词:文本分类  kNN  模糊kNN  文本预处理  Gini  index
文章编号:1001-6600(2006)04-0087-04
收稿时间:2006-05-31
修稿时间:2006年5月31日

Fuzzy kNN Text Classifier Based on Gini Index
SHANG Wen-qian,QU You-li,HUANG Hou-kuan,ZHU Hai-bin,LIN Yong-min,DONG Hong-bin.Fuzzy kNN Text Classifier Based on Gini Index[J].Journal of Guangxi Normal University(Natural Science Edition),2006,24(4):87-90.
Authors:SHANG Wen-qian  QU You-li  HUANG Hou-kuan  ZHU Hai-bin  LIN Yong-min  DONG Hong-bin
Institution:1. School of Computer and Information Technology ,Beijing Jiaotong University,Beijing 100044 ,China 2. Department of Computer Science and Mathematics ,Nipissing University ,North Bay ,ON P1B 8L7 ,Canada
Abstract:With the development of Web ,large numbers of documents are available on Internet. Automatic text categorization becomes more and more important for dealing with massive data. In numerous text categorization algorithms,kNN algorithm is proved one of the best text categorization algorithms. But for kNN classifier and other classifiers,text preprocessing before categorization is a bottleneck. The results of text preprocessing directly affect the categorization performance. This paper present a new text preprocessing algorithm text preprocessing algorithm based on Gini index. At the same time ,this paper adopt the theory of fuzzy sets to improve the decision rule of kNN algorithm. The combination of these two methods makes the fuzzy kNN classifier show better categorization performance than classical kNN algorithm. Experiment results show that our algorithm is effective and feasible.
Keywords:text categorization  kNN  fuzzy kNN  text preprocessing  Gini index
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号