首页 | 本学科首页   官方微博 | 高级检索  
     

Feature Selection Based on Difference and Similitude in Data Mining
引用本文:WU Ming YAN Puliu. Feature Selection Based on Difference and Similitude in Data Mining[J]. 武汉大学学报:自然科学英文版, 2007, 12(3): 467-470. DOI: 10.1007/s11859-006-0077-2
作者姓名:WU Ming YAN Puliu
作者单位:School of Electronic Information, Wuhan University, Wuhan 430072, Hubei, China
基金项目:Supported by the National Natural Science Foundation of China (90204008) and Chan-Guang Plan of Wuhan City(20055003059-3)
摘    要:Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).

关 键 词:特征选择 差异 相似性 数据挖掘
文章编号:1007-1202(2007)03-0467-04
收稿时间:2006-07-15
修稿时间:2006-07-15

Feature selection based on difference and similitude in data mining
Ming Wu,Puliu Yan. Feature selection based on difference and similitude in data mining[J]. Wuhan University Journal of Natural Sciences, 2007, 12(3): 467-470. DOI: 10.1007/s11859-006-0077-2
Authors:Ming Wu  Puliu Yan
Affiliation:(1) School of Electronic Information, Wuhan University, Wuhan, 430072, Hubei, China
Abstract:Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices,which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix,the result may not be the simplest rules. Although difference similitude(DS) methods take both of the difference and the similitude into account,the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function,which considers both of the difference and similitude in feature selection,is defined in the improved algorithm. Experiments show that it is an effective algorithm,especially for large-scale databases. The time complexity of the algorithm is O (|C |2|U|2).
Keywords:knowledge reduction   feature selection   rough set   difference set   similitude set   attribute rank function
本文献已被 CNKI 维普 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号