Feature Selection Based on Difference and Similitude in Data Mining Feature selection based on difference and similitude in data mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Feature Selection Based on Difference and Similitude in Data Mining

引用本文：	WU Ming YAN Puliu. Feature Selection Based on Difference and Similitude in Data Mining[J]. 武汉大学学报:自然科学英文版, 2007, 12(3): 467-470. DOI: 10.1007/s11859-006-0077-2

作者姓名：	WU Ming YAN Puliu

作者单位：	School of Electronic Information, Wuhan University, Wuhan 430072, Hubei, China

基金项目：	Supported by the National Natural Science Foundation of China （90204008） and Chan-Guang Plan of Wuhan City（20055003059-3）

摘要：	Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude（DS） methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O（｜ C ｜^2｜U ｜^2）.
关键词：	特征选择差异相似性数据挖掘
文章编号：	1007-1202（2007）03-0467-04
收稿时间：	2006-07-15
修稿时间：	2006-07-15
Feature selection based on difference and similitude in data mining

Ming Wu,Puliu Yan. Feature selection based on difference and similitude in data mining[J]. Wuhan University Journal of Natural Sciences, 2007, 12(3): 467-470. DOI: 10.1007/s11859-006-0077-2

Authors:	Ming Wu Puliu Yan

Affiliation:	(1) School of Electronic Information, Wuhan University, Wuhan, 430072, Hubei, China

Abstract:	Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices,which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix,the result may not be the simplest rules. Although difference similitude(DS) methods take both of the difference and the similitude into account,the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function,which considers both of the difference and similitude in feature selection,is defined in the improved algorithm. Experiments show that it is an effective algorithm,especially for large-scale databases. The time complexity of the algorithm is O (\|C \|2\|U\|2).

Keywords:	knowledge reduction feature selection rough set difference set similitude set attribute rank function
本文献已被 CNKI 维普 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏