首页 | 本学科首页   官方微博 | 高级检索  
     检索      

代价敏感学习的过度拟合问题研究
引用本文:李作春,周秀梅,袁鼎荣.代价敏感学习的过度拟合问题研究[J].广西大学学报(自然科学版),2009,34(6).
作者姓名:李作春  周秀梅  袁鼎荣
作者单位:1. 广西师范大学,计算机科学与信息工程学院,广西,桂林,541004
2. 广西师范大学,计算机科学与信息工程学院,广西,桂林,541004;悉尼科技大学,量子计算和智能系统中心,澳大利亚,NSW,2006
基金项目:国家自然科学基金项目,国家973计划项目,澳大利亚ARC基金 
摘    要:代价敏感学习算法的目的是最小化各种代价总和,与其他学习算法一样,它必须面对过度拟合这个挑战性问题,即分类器可以较好地拟合训练数据,但对测试或实际数据的效果较差.针对代价敏感学习的这些缺点,提出两个克服过度拟合的策略.第一个滤波技术策略针对TCSDT分类建立,滤波后的概率估计值被用于对每个分离属性的潜在误分类代价计算,并延缓潜在大误分类代价的分离属性的优先选择,最后,采用交叉验证方法决定m的值.第二个策略与基于标准错误的Laplace剪枝方法不同,阈值剪枝采用一个预先定义的阈值集合(跟代价有关)来确定决策树的一个叶节点是否被剪除.这两策略可独立或联合用于避免TCSDT分类的数据过度拟合.实验表明,所提出的两算法不但在代价敏感学习中有优势,在非代价敏感学习也具有优势,可以有效地减弱过度拟合,有很强的健壮性,UCI数据集实验结果显示算法的拟合能力平均优于存在方法10%以上.

关 键 词:代价敏感  学习算法  过度拟合  机器学习

A study on over-fitting in cost-sensitive learning
LI Zou-chun,ZHOU Xiu-mei,YUAN Ding-rong.A study on over-fitting in cost-sensitive learning[J].Journal of Guangxi University(Natural Science Edition),2009,34(6).
Authors:LI Zou-chun  ZHOU Xiu-mei  YUAN Ding-rong
Abstract:Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Similar to other learning algorithms, cost-sensitive learning al-gorithrns must face a significant challenge, i. e. , over-fitting. Specifically speaking, in an applied context of cost-sensitive learning, the classifiers built can generate good results on training data but usually do not produce an optimal model when they are applied to unseen data in real world applica-tions. This paper deals with the issue of data over-fitting by designing two simple and efficient strate-gies, against the cost-sensitive decision tree methods. To evaluate the proposed approaches, exten-sive experiments are conducted on some UCI datasets across different cost ratios. The experimental results show that the proposed algorithms outperform extant algorithms at reducing the data over-fit-ting.
Keywords:cost-sensitive  learning algorithm  over-fitting  machine learning
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号