首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Relief的组合式特征选择
引用本文:张丽新,王家廞,赵雁南,杨泽红.基于Relief的组合式特征选择[J].复旦学报(自然科学版),2004,43(5):893-898.
作者姓名:张丽新  王家廞  赵雁南  杨泽红
作者单位:清华大学,智能技术与系统国家重点实验室,北京,100084
摘    要:ReliefF是公认的效果较好的filter式特征评估方法,但该方法一大缺点是不能辨别冗余特征.提出两种基于Relief的组合式特征选择算法:ReCorre和ReSBSW,这两种算法均首先利用ReliefF算法过滤掉无关特征,然后分别采用相关分析(Correlation)以及顺序后向搜索(SBS)的Wrapper算法去除冗余特征.在实际数据集以及人造数据集上进行了实验,分析比较了Relief,ReCorre以及ReSBSW算法的性能.实验结果得出如下结论:ReliefF方法对无关特征较多的数据集能够很好的降维,但对于实际数据中特征间关系较复杂的情况,只能去掉很少的无关特征,并会去除一部分相关特征,ReliefF不能处理冗余特征,ReCorre可以在ReliefF基础上去除大部分冗余特征.ReSBSW算法可得到较好的泛化性能,但算法计算量很高,不适合大规模数据集.

关 键 词:特征选择  算法  冗余  数据集  大规模数据  泛化性能  搜索  实际  结论  计算量
文章编号:0427-7104(2004)05-0893-06

Combination Feature Selection Based on Relief
ZHANG Li-xin,WANG Jia-xin,ZHAO Yan-nan,YANG Ze-hong artment of Computer Science and Technology,Tsinghua University,Beijing ,China.Combination Feature Selection Based on Relief[J].Journal of Fudan University(Natural Science),2004,43(5):893-898.
Authors:ZHANG Li-xin  WANG Jia-xin  ZHAO Yan-nan  YANG Ze-hong artment of Computer Science and Technology  Tsinghua University  Beijing  China
Institution:ZHANG Li-xin,WANG Jia-xin,ZHAO Yan-nan,YANG Ze-hong artment of Computer Science and Technology,Tsinghua University,Beijing 100084,China)
Abstract:Relief is a feature evaluation method which performs well, while Relief cannot discriminate redundant features. It proposes two combination feature selection algorithm based on Relief: ReCorre and ReSBSW. The two algorithms both first use Relief to filter irrelevant features, then use correlation analysis and sequential backward search (SBS) in Wrapper form to remove redundant features,respectively. It makes experiments on real and artificial datasets, analyze and make comparison between Relief,ReCorre and ReSBSW. It gets the following conclusions: Relief can reduce dimension well on datasets with many irrelevant features, but can remove relatively few irrelevant features and may remove relevant features for real datasets with complex relationship among features. ReCorre can remove most of redundant features based on ReliefF, while ReSBSW can get better generalization performance with high computing, and is not fit to large-scale datasets.
Keywords:feature selection  genetic algorithm  ReliefF  Wrapper  large-scale dataset
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号