首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于Boosting的集成学习算法在不均衡数据中的分类
引用本文:李诒靖,郭海湘,李亚楠,刘晓.一种基于Boosting的集成学习算法在不均衡数据中的分类[J].系统工程理论与实践,2016,36(1):189-199.
作者姓名:李诒靖  郭海湘  李亚楠  刘晓
作者单位:1. 中国地质大学 经济管理学院, 武汉 430074;2. 中国地质大学 数字化商务与智能管理研究中心, 武汉 430074
基金项目:国家自然科学基金(71103163, 71103164, 71301153, 71573237); 教育部新世纪优秀人才支持计划(NCET-13-1012);中央高校基本科研业务费专项资金资助(CUG120111, CUG110411, G2012002A, CUG140604);构造与油气资源教育部重点实验室开放课题(TPR-2011-11)
摘    要:针对多类别不均衡数据的分类问题,从数据集的特征选择和集成学习两个角度出发,提出了一种新的针对不均衡数据的分类方法—BPSO-Adaboost-KNN算法,算法采用基于多分类问题的可视化的AUCarea作为分类评价指标.为了测试算法的性能,本文选取了10组UCI和KEEL选取的测试数据集进行测试,结果表明本算法在有效提取关键特征后提高了Adaboost的稳定性,在十组数据的分类精度上相比单纯使用KNN分类器有20%~40%不等的提高.在本算法和其他state-of-the-art集成分类算法对比中,BPSO-Adaboost-KNN能够取得较优或相当的结果.最后,本文将该算法应用到石油储层含油性的识别中,成功提取了声波、孔隙度和含油饱和度三个关键属性,在分类精度上相比传统分类算法有了大幅度提高,在江汉油田五口油井oilsk81~oilsk85上的分类精度均达到98%以上,比单纯使用KNN的精度高出了20%,尤其在最易错分的油层和差油层中有良好的分类效果.

关 键 词:不均衡数据  特征提取  分类  石油储层  
收稿时间:2014-06-26

A boosting based ensemble learning algorithm in imbalanced data classification
LI Yijing,GUO Haixiang,LI Yanan,LIU Xiao.A boosting based ensemble learning algorithm in imbalanced data classification[J].Systems Engineering —Theory & Practice,2016,36(1):189-199.
Authors:LI Yijing  GUO Haixiang  LI Yanan  LIU Xiao
Institution:1. College of Economics and Management, China University of Geosciences, Wuhan 430074, China;2. Research Center for Digital Business Management, China University of Geosciences, Wuhan 430074, China
Abstract:This paper focused on multi-class imbalanced data classification, proposed a BPSO-Adaboost-KNN ensemble learning algorithm based on feature selection and ensemble learning. What's more, the algorithm used a visual AUCarea metric to evaluate the performance of classifier when dealing with multi-class classification problems. Then the paper used 10 groups of UCI and KEEL data sets to test the proposed algorithm. The results show that the proposed algorithm improves the stability of the Adaboost after extract the key features, and the classification accuracy for ten groups of data are 20%~40% higher than the KNN classifier. When comparing BPSO-Adaboost-KNN with other three state-of-the-art ensemble algorithms, BPSO-Adaboost-KNN can obtain equal or better results. At last, the proposed algorithm is used in oil-bearing of reservoir recognition, three key attributes are selected (acoustic wave, porosity and oil saturation) successfully. The classification precision reaches more than 98% in oilsk81~oilsk85 Jianghan well logging data, which is 20% higher than KNN classifier. Particularly, the proposed algorithm has significant superiority when distinguishing the oil layer from other oil layers.
Keywords:imbalanced data  feature selection  classification  oil reservoir
本文献已被 CNKI 等数据库收录!
点击此处可从《系统工程理论与实践》浏览原始摘要信息
点击此处可从《系统工程理论与实践》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号