首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于粗糙集的高维分类型数据子空间聚类算法
引用本文:孙浩军,游俊斌,吴廷发.基于粗糙集的高维分类型数据子空间聚类算法[J].汕头大学学报(自然科学版),2012(4):46-53.
作者姓名:孙浩军  游俊斌  吴廷发
作者单位:汕头大学工学院,广东汕头515063
基金项目:国家自然科学基金资助项目(61170130)
摘    要:现有的优秀的聚类算法大多是处理低维数据的,但是对于高维数据,由于其分布特性与低维情形有很大的差异,这些算法失效.为解决高维分类型数据聚类问题,提出了一种基于粗糙集的高维分类型数据子空间聚类算法,基于粗糙集的上、下近似集的类边界描述,确定了类边界范围,然后采用相容度来调整类边界,聚类的过程采用增长子空间的思想,从低维到高维迭代地搜子空间类簇.最后通过在soybean、zoo数据集上的对比实验,实验结果表明了算法不仅可行,而且精度高.

关 键 词:高维分类型数据  增长子空间  粗糙集  聚类

An Algorithm for High Dimensional Categorical Clustering Using Rough Set Theory
SUN Hao-jun,YOU Jun-bin,WU Ting-fa.An Algorithm for High Dimensional Categorical Clustering Using Rough Set Theory[J].Journal of Shantou University(Natural Science Edition),2012(4):46-53.
Authors:SUN Hao-jun  YOU Jun-bin  WU Ting-fa
Institution:(College of Engineering, Shantou University, Shantou 515063, Guangdong, China)
Abstract:The existing excellent clustering algorithms are mostly used in processing the low dimensional data. For high dimensional data, its distribution characteristics are different from the low dimensional case. These algorithms fail to solve the high dimension data clustering problem. A clustering algorithm is presented based on the rough set and high dimensional categorical data subspace. The rough set's up and down approximations set to describe the class boundary, thus determine the range of boundary. The consistency degree is used to determine the clustering. The clustering process uses the growth subspace idea. Finally, good results are obtained through the experiment on the soybean, zoo data set. Results show that the algorithm is feasible and has high precision.
Keywords:high dimension categorical data  growth subspace  information entropy  rough set  clustering
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号