基于DBSCAN聚类的改进KNN文本分类算法 An Improved KNN Text Categorization Algorithm Based on DBSCAN期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于DBSCAN聚类的改进KNN文本分类算法

引用本文：	苟和平.基于DBSCAN聚类的改进KNN文本分类算法[J].科学技术与工程,2013,13(1):219-222.

作者姓名：	苟和平

作者单位：	1. 琼台师范高等专科学校信息技术系,海口,571100 2. 西北师范大学计算机科学与工程学院,兰州,730070

基金项目：	教育部科学技术研究重点项目

摘要：	K最近邻算法(KNN)在分类时,需要计算待分类样本与训练样本集中每个样本之间的相似度.当训练样本过多时,计算代价大,分类效率降低.因此,提出一种基于DBSCAN聚类的改进算法.利用DBSCAN聚类消除训练样本的噪声数据.同时,对于核心样本集中的样本,根据其样本相似度阈值和密度进行样本裁剪,以缩减与待分类样本计算相似度的训练样本个数.实验表明此算法能够在保持基本分类能力不变的情况下,有效地降低分类计算量.
关键词：	K最近邻文本分类样本裁剪
收稿时间：	8/24/2012 1:04:14 AM
修稿时间：	9/26/2012 9:13:40 PM
An Improved KNN Text Categorization Algorithm Based on DBSCAN

gouheping.An Improved KNN Text Categorization Algorithm Based on DBSCAN[J].Science Technology and Engineering,2013,13(1):219-222.

Authors:	gouheping

Institution:	2(Department of Information Technology,Qiongtai Teachers College 1,Haikou 571100,P.R.China;College of Computer Science and Engineering,Northwest Normal University 2,Lanzhou 730070,P.R.China)

Abstract:	In order to find k neighbors of classification, KNN algorithm needs to calculate the similarity between the test sample and every training sample in sample space, with the increasing in the number of training samples, the computational overhead becomes higher. Aiming at the problem of the KNN, this paper proposes an improved algorithm based on DBSCAN to reduce the number of training samples. The noisy data in sample space were reduced with DBSCAN algorithm, furthermore, the part of highly similar samples in kernel set of training data were reduced according to the similarity threshold and density. It is shown that the improved method can reduce computational overhead effectively.

Keywords:	KNN text classification sample reduction
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《科学技术与工程》浏览原始摘要信息
	点击此处可从《科学技术与工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏