首页 | 本学科首页   官方微博 | 高级检索  
     检索      

直推式可信最邻近分类器在文本分类中的应用研究
引用本文:杜秋超,赵宏.直推式可信最邻近分类器在文本分类中的应用研究[J].北京交通大学学报(自然科学版),2008,32(5).
作者姓名:杜秋超  赵宏
作者单位:北京交通大学计算机与信息技术学院
摘    要:直推式可信最邻近分类器是基于算法随机性理论提出的一种新的分类算法,它不仅能够判断样本的类别,还能够为每一个判断提供可信度,这对于分类机器的应用是很有意义的.但这种分类器需要将每一个待分类样本逐一在所有的类别中进行计算,使得计算量大大的增加.这一点对于多类别和大数据量的文本分类尤为明显.本文在深入研究该算法的基础上,对其利用聚类分析进行了改进,并将这一算法及其改进后的算法用在文本分类中.实验表明改进后的算法和原算法相比准确率相近,但在计算速度上提高了近40%.

关 键 词:文本分类  可信度  K-近邻  聚类

Text Classifications Using Transductive Confidence Machine for K Nearest Neighbors
DU Qiuchao,ZHAO Hong.Text Classifications Using Transductive Confidence Machine for K Nearest Neighbors[J].JOURNAL OF BEIJING JIAOTONG UNIVERSITY,2008,32(5).
Authors:DU Qiuchao  ZHAO Hong
Abstract:Transductive confidence machine for K nearest neighbors is a new algorithm based on algorithmic stochastic theory.It not only can predict the classification of sample,but also can provide confidence for every prediction,this is meaningful for machine learning.However,for every test sample it needs to calculate with every classification,especially using on text classification with multi-classes and great datum,it always needs huge calculation.The algorithm by using cluster method was improved in this paper,and then it was used for text classification.According to the results,the improved algorithm has a similar accuracy to the old algorithm,but the calculation speed has been improved 40% up.
Keywords:text classification  confidence  K-NN  cluster
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号