首页 | 本学科首页   官方微博 | 高级检索  
     

基于新型集成分类器的非平衡数据分类关键问题研究
引用本文:翟云,杨炳儒,曲武,隋海峰. 基于新型集成分类器的非平衡数据分类关键问题研究[J]. 系统工程与电子技术, 2011, 33(1): 196-0201. DOI: 10.3969/j.issn.1001 506X.2011.01.40
作者姓名:翟云  杨炳儒  曲武  隋海峰
作者单位:1. 北京科技大学信息工程学院, 北京 100083;;2. 聊城大学计算机学院, 山东 聊城 252059
基金项目:国家自然科学基金(60675030,60875029)资助课题
摘    要:针对非平衡数据分类问题,提出了一种基于差异采样率的重采样算法(differentiated sampling rate algorithm, DSRA),基于DSRA设计了一种新的集成分类器(SVM-Ripper ensemble classifier, SREC)。SREC采用独特的分类器选择策略、分类器集成策略、分类决策方案,可获得较高的分类精度。同时,利用SREC对影响非平衡数据分类的关键问题进行了研究。结果表明,非平衡数据分类问题本质上是由正负样本类间非平衡、类内非平衡、样本规模以及样本非平衡度等诸多因素引起的,只有综合考虑这些因素才能更好地解决非平衡数据分类问题。

关 键 词:数据挖掘  非平衡类数据分类  集成分类器  关键问题

Study on source of classification in imbalanced datasets based on new ensemble classifier
ZHAI Yun,YANG Bing-ru,QU WU,SUI Hai-feng. Study on source of classification in imbalanced datasets based on new ensemble classifier[J]. System Engineering and Electronics, 2011, 33(1): 196-0201. DOI: 10.3969/j.issn.1001 506X.2011.01.40
Authors:ZHAI Yun  YANG Bing-ru  QU WU  SUI Hai-feng
Affiliation:1. School of Information Engineering, University of Science and Technology Beijing, Beijing 100083, China; ;2. College of Computer Science, Liaocheng University, Liaocheng 252059, China
Abstract:For the issue of classification in imbalanced datasets, this paper presents a new differentiated sampling rate algorithm (DSRA), on this basis, a SVM-Ripper ensemble classifier (SREC) is proposed. SREC employs an unique classifier selection strategy, a novel classifier integration approach and an original classification decision-making method, so that it receives a higher classification accuracy. At the same time, the source of classification in an imbalanced dataset is studied by use of SREC. The simulation results prove that the source of classification in an imbalanced dataset is the aggregation of imbalance between classes, imbalance within a class, sample size as well as the imbalance degree, and only a comprehensive consideration of these factors can better address the issue of classification in imbalanced dataset.  
Keywords:data mining  classification in imbalanced datasets  ensemble classifier  source
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号