首页 | 本学科首页   官方微博 | 高级检索  
     检索      

改进随机子空间与决策树相结合的不平衡数据分类方法
引用本文:胡小生.改进随机子空间与决策树相结合的不平衡数据分类方法[J].佛山科学技术学院学报(自然科学版),2013(5):22-26.
作者姓名:胡小生
作者单位:佛山科学技术学院电子与信息工程学院,广东佛山528000
基金项目:佛山市科技发展专项资金项目(2011AA100061);佛山市产学研专项资金项目(2012HC100272);佛山市教育局智能评价指标体系研究项目(DX20120220)
摘    要:提出一种改进随机子空间与C4.5决策树算法相结合的分类算法.以C4.5算法构建决策树作为集成学习的基分类器,每次迭代初始,将SMOTE采样技术与随机子空间方法相结合,生成在特征空间和数据分布上差异明显的合成样例,为基分类器提供多样化的平衡训练数据集,采用绝大多数投票方法进行最终决策的融合输出.实验结果表明,该方法对少数类和多数类均具有较高的识别率.

关 键 词:不平衡数据分类  随机子空间方法  决策树  集成学习

Imbalanced data classification improvement with combination of random subspace method and decision tree
HU Xiao-sheng.Imbalanced data classification improvement with combination of random subspace method and decision tree[J].Journal of Foshan University(Natural Science Edition),2013(5):22-26.
Authors:HU Xiao-sheng
Institution:HU Xiao-sheng (School of Electronic and Information Engineering, Foshan University, Foshan 528000, China)
Abstract:In this paper, a novel hybrid method of combination improved random subspace (RSM) method and C4.5 decision tree algorithm is proposed. The proposed method constructs decision tree with G4. 5 algorithm as a basic classifier, at the beginning of each iteration, just like in RSM, some features of the training data are removed, after removing a subset of the features, SMOTE is then applied to the dataset which is subsequently used to train the base classifier. In this way, a higher degree of variance and diversity training datasets for base" classifier are constructed. The fusion of decisions and the outputs are determined by the vast majority of votes. Experimental results show that the proposed method provides better classification performance than other approaches on both minority and majority classes, and is effective and feasible to deal with the imbalanced datasets.
Keywords:imbalanced data classification  random subspace method  decision tree  ensemble learning
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号