首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于密度的改进KNN文本分类算法
引用本文:茅剑,刘晋明,曹勇. 一种基于密度的改进KNN文本分类算法[J]. 漳州师院学报, 2012, 0(2): 45-48
作者姓名:茅剑  刘晋明  曹勇
作者单位:[1]集美大学计算机工程学院,福建厦门361021 [2]华为技术有限公司,广东深圳518129
摘    要:KNN算法是一种应用广泛的人工智能算法,在文本分类应用中,简单有效,易于实现.但是,KNN分类的时间复杂度与训练样本数量成正比,而且,训练样本分布密度的不均匀性将导致分类准确性的下降.本文在KNN算法的基础上,提出一种改进算法.算法分析了训练样本的分布密度,通过裁减高密度区域训练样本,降低样本数量,调节训练样本分布,达到提高分类准确性的目的.实验证明,基于密度的改进KNN文本分类算法在降低时间复杂度的同时,还具有较好的准确率和召回率.

关 键 词:K近邻  文本分类  样本裁减

An Improved KNN Text Categorization Algorithm Based on Density
MAO Jian,LIU Jin-ming,CAO Yong. An Improved KNN Text Categorization Algorithm Based on Density[J]. Journal of ZhangZhou Teachers College(Philosophy & Social Sciences), 2012, 0(2): 45-48
Authors:MAO Jian  LIU Jin-ming  CAO Yong
Affiliation:1. Computer Engineering Cellgee of Jimei University, Xiamen Fujian 361021, China; 2. Huawd Teclumlogies Co. S $18129, China)
Abstract:The KNN algorithm is a widely used in artificial intelligence field. As a text categorization algorithm, it is simple,effectlve, and easy to implement. But the time complexity of KNN is directly proportional to the sample size. And the categorization accuracy will decrease in case of training samples uneven distribution. An improved KNN algorithm is proposed to improve the text categorization accuracy by adjusting training sample distribution. It analyzed and reduced the training samples in high distribution density areas. Experiments show that, the algorithm works with lower time complexity, also has better accuracy rate and r, ecall rate than common KNN in text classification.
Keywords:KNN  Text Categorization  Sample Reduction
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号