首页 | 本学科首页   官方微博 | 高级检索  
     

基于非平衡数据的随机森林分类算法改进
引用本文:魏正韬,杨有龙,白婧. 基于非平衡数据的随机森林分类算法改进[J]. 重庆大学学报(自然科学版), 2018, 41(4): 54-62. DOI: 10.11835/j.issn.1000-582X.2018.04.007
作者姓名:魏正韬  杨有龙  白婧
作者单位:西安电子科技大学 数学与统计学院,西安,710126
基金项目:国家自然科学基金资助项目(61573266).Supported by National Natural Science Foundation of China (61573266)
摘    要:随机森林算法作为一种组合分类器有较好的分类性能,适合多样的分类环境。算法同样也存在一些缺陷,例如算法处理非平衡数据时不能很好地区分正类和负类。针对这一问题,通过对抽样结果增加约束条件来改进Bootstrap重抽样方法,减少抽样对非平衡性的影响,同时尽量保证算法的随机性。之后利用生成数据的非平衡系数给每棵决策树进行加权处理,提升对非平衡数据敏感的决策树在投票环节的话语权,从而提升整体算法对非平衡数据的分类性能。通过上述两种改进可以明显提高随机森林在决策树数量不足情况下的分类精度。

关 键 词:非平衡数据  随机森林算法  有条件的Bootstrap重抽样  加权的决策树  unbalanced data set  random forest  conditional Bootstrap resampling  weighted decision tree
收稿时间:2017-10-20

An improved random forest algorithm based on unbalanced data
WEI Zhengtao,YANG Youlong and BAI Jing. An improved random forest algorithm based on unbalanced data[J]. Journal of Chongqing University(Natural Science Edition), 2018, 41(4): 54-62. DOI: 10.11835/j.issn.1000-582X.2018.04.007
Authors:WEI Zhengtao  YANG Youlong  BAI Jing
Affiliation:School of Mathematics and Statistics, Xidian University, Xi''an 710126, P. R. China,School of Mathematics and Statistics, Xidian University, Xi''an 710126, P. R. China and School of Mathematics and Statistics, Xidian University, Xi''an 710126, P. R. China
Abstract:Random forest algorithm has better classification performance as a combination of classification and is suitable for a variety of classification environments,but it also has some flaws.For example,it can not distinguish positive and negative class when dealing with unbalanced data.By setting conditions on sampling results,we improve the Bootstrap sampling method,reduce the influence of sampling on nonequilibrium and ensure the randomness of this algorithm.Then,we weight every decision tree according to the non-equilibrium coefficient of the generated data to enhance the discourse right of the decision tree which is sensitive to the non-equilibrium data and improve the classification performance of the whole algorithm dealing with unbalanced data.With these two above improvements,the new algorithm can significantly improve classification performance when the number of decision tree is insufficient.
Keywords:unbalanced data set  random forest  conditional Bootstrap resampling  weighted decision tree
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《重庆大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号