首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于投影寻踪的kNN文本分类算法的加速策略
引用本文:张永,孟晓飞.基于投影寻踪的kNN文本分类算法的加速策略[J].科学技术与工程,2014,14(36).
作者姓名:张永  孟晓飞
作者单位:兰州理工大学计算机与通信学院,兰州,730050
摘    要:传统的k近邻(k-nearest neighbors,kNN)文本分类中,由于文本被表示成向量空间模型后维数非常高,且训练文本的数目巨大,kNN分类算法通常被视为是一种虽然有效,但并非高效的文本分类算法。针对传统kNN分类算法效率低下的问题,提出了一种基于投影寻踪思想的kNN分类算法加速策略。基本思想是:通过投影的方法缩减训练集的规模,同时在寻找k近邻过程中对文本进行降维处理,从两方面着手降低算法的计算开销。实验数据表明,优化后的kNN算法比传统kNN算法在时间性能上有较大的提升,同时保证了分类的精度。

关 键 词:kNN  文本分类  投影寻踪  降维  训练集缩减
收稿时间:2014/8/12 0:00:00
修稿时间:2014/8/12 0:00:00

Accelerated k-nearest neighbors text classification algorithm based on projection pursuit
ZHANG Yong and.Accelerated k-nearest neighbors text classification algorithm based on projection pursuit[J].Science Technology and Engineering,2014,14(36).
Authors:ZHANG Yong and
Abstract:In the traditional k-nearest neighbor (kNN) text classification,the text is represented as the vector space model.As the feature vector dimension and the number of training texts is very large,the k-nearest neighbors algorithm is considered as an effective, but not efficient, classification algorithm for text categorization. Aiming at the problem of low classification efficiency,this paper proposed an accelerated strategy for the traditional kNN based on Projection Pursuit.The basic idea is to reduce the size of the training set by the projection method,while in the process of looking for k-nearest neighbor reduce the dimension of the text.Two-pronged approach to reduce the computational overhead of the algorithm.Experimental results show that the proposed strategy greatly improves the time performance of the traditional kNN, with little degradation in accuracy .
Keywords:kNN  text classification  projection pursuit  dimensionality reduction  training set reduction  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号