一种适用于不均衡数据集分类的KNN算法 A KNN algorithm for Unbalanced Data Set期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种适用于不均衡数据集分类的KNN算法

引用本文：	杜娟.一种适用于不均衡数据集分类的KNN算法[J].科学技术与工程,2011,11(12):2680-2685.

作者姓名：	杜娟

作者单位：	东北石油大学计算机与信息技术学院,大庆,163318

基金项目：	黑龙江省研究生创新科研资金项目(YJSCX2006-38HLJ)

摘要：	传统的K-最邻近(K Nearest Neighbor,KNN)分类算法在处理不均衡样本数据时,其分类器预测倾向于多数类,少数类分类误差大。针对此问题从数据层的角度改进了传统的KNN算法。先通过K-means聚类算法将少数类样本聚类分组,将每个聚类内的样本作为遗传算法的初始种群;再使用遗传交叉和变异操作获取新样本,并进行有效性验证。最终获取到各类别样本数量基本均衡的训练样本集合。实验结果表明此方法有效改善了KNN算法对少数类分类效果。此法同时适用于其他关注少数类分类精度的不均衡数据集分类问题。
关键词：	KNN 上采样不均衡数据集聚类遗传交叉遗传变异
收稿时间：	2/13/2011 4:31:02 PM
修稿时间：	2/13/2011 4:31:02 PM
A KNN algorithm for Unbalanced Data Set

DU Juan.A KNN algorithm for Unbalanced Data Set[J].Science Technology and Engineering,2011,11(12):2680-2685.

Authors:	DU Juan

Institution:	DU Juan,LIU Zhi-gang,YI Zhi-an(Computer and Information Technology College,Northeast Petroleum University,Daqing 16331,P.R.China)

Abstract:	Traditional K-Nearest Neighbor(KNN)categorization algorithm,the prediction result of classification was towards the class with more training samples,when it was used to train imbalanced data sets.The classification error of the class with fewer training samples was grave.Aiming at this problem,the traditional KNN algorithm was improved from the data angle: the class with fewer training samples was grouped by using K-means cluster algorithm,the samples in each cluster as the initial population of genetic alg...

Keywords:	KNN up-sampling imbalanced data sets clustering genetic crossover genetic mutation
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《科学技术与工程》浏览原始摘要信息
	点击此处可从《科学技术与工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏