首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向不平衡数据集融合Canopy和K-means的SMOTE改进算法
引用本文:郭朝有,许喆,马砚堃,曹蒙蒙.面向不平衡数据集融合Canopy和K-means的SMOTE改进算法[J].科学技术与工程,2020,20(22):9069-9074.
作者姓名:郭朝有  许喆  马砚堃  曹蒙蒙
作者单位:海军工程大学动力工程学院,武汉430033;海军工程大学动力工程学院,武汉430033;海军工程大学动力工程学院,武汉430033;海军工程大学动力工程学院,武汉430033
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:针对SMOTE算法和随机森林可较好解决不平衡数据集的分类问题但对少数类样本分类效果还有待提高的问题,融合Canopy和K-means两种聚类算法,设计了C-K-SMOTE改进算法。先后利用Canopy算法进行快速近似聚类,再利用K-means算法进行精准聚类,得到精准聚类簇,最后利用SMOTE算法增加少数类样本数量,使数据趋于平衡。选取公开数据集KEEL(knowledge extraction on evolutionary learning)数据库中的不平衡数据集,结合随机森林分类模型进行了实验验证,实验表明C-K-SMOTE算法可有效平衡不平衡数据集。

关 键 词:Canopy算法  K-means算法  SMOTE算法  C-K-SMOTE算法  随机森林  不平衡数据集  分类问题
收稿时间:2019/11/6 0:00:00
修稿时间:2019/12/9 0:00:00

The Improved SMOTE Algorithm Fusion Canopy and K-means for Imbalanced Data Sets
GUO Chao-you,XU Zhe,MA Yan-kun,CAO Meng-meng.The Improved SMOTE Algorithm Fusion Canopy and K-means for Imbalanced Data Sets[J].Science Technology and Engineering,2020,20(22):9069-9074.
Authors:GUO Chao-you  XU Zhe  MA Yan-kun  CAO Meng-meng
Institution:College of Power Engineering, Naval university of Engineering
Abstract:Aiming at the problem that SMOTE algorithm and random forest algorithm can solve the classification problem of unbalanced data set but the classification effect of minority samples needs to be improved, the improved C-K-SMOTE algorithm is designed by combining Canopy and K-means clustering algorithms. The Canopy algorithm is used to perform fast approximate clustering, and then the K-means algorithm is used for accurate clustering to obtain accurate clustering clusters. Finally, the SMOTE algorithm is used to increase the number of samples in a few classes, so that the data tends to be balanced. In this paper, the unbalanced data set in the KEEL (Knowledge Extraction on Evolutionary Learning) database is selected and verified by the random forest classification model. The experiment shows that the C-K-SMOTE algorithm can effectively balance the unbalanced data set.
Keywords:canopy algorithm  K-means algorithm  SMOTE algorithm  C-K-SMOTE algorithm  random forests  imbalanced data  classification problem
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号