首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种高维分类型数据的子空间聚类算法
引用本文:孙浩军,李惊涛,张磊,张崇锐,肖婷.一种高维分类型数据的子空间聚类算法[J].汕头大学学报(自然科学版),2014(3):51-59.
作者姓名:孙浩军  李惊涛  张磊  张崇锐  肖婷
作者单位:汕头大学工学院,广东汕头515063
基金项目:国家自然科学基金资助项目(61170130)
摘    要:子空间聚类是一种将搜索局部化在相关维上进行的聚类算法,它能有效地克服数据因维度过高引起的在全空间上聚类的困难.针对高维分类型数据,本文提出了一种自底向上的子空间层次聚类算法,该算法在全局范围内建立一个最相似线性表用来记录每个簇类与其最相似的簇类的相似度,在聚类过程中,选取最相似的簇类合并,并通过维护此线性表产生最相似的簇类.此算法在基于信息熵的意义上能够较准确地搜索簇类的子空间.通过Zoo和Soybean两个典型的分类型数据实验发现,相对于其它相关聚类算法,该算法在聚类的准确率和稳定性方面表现出较高的优越性.

关 键 词:子空间  聚类  高维  信息熵

A Subspace Clustering Algorithm for High-Dimensional Categorical Data
SUN Haojun,LI Jingtao,ZHANG Lei,ZHANG Chongrui,XIAO Ting.A Subspace Clustering Algorithm for High-Dimensional Categorical Data[J].Journal of Shantou University(Natural Science Edition),2014(3):51-59.
Authors:SUN Haojun  LI Jingtao  ZHANG Lei  ZHANG Chongrui  XIAO Ting
Institution:(College of Engineering, Shanton University, Shantou 515063, Guangdong, China)
Abstract:Subspace clustering is a kind of clustering algorithm which searches information within the scope of local related dimensions. It can overcome the difficulties caused by high-dimensional data set. In this paper, a hierarchical subspace clustering algorithm with the structure of button-up for high-dimensional categorical data is proposed. This algorithm creates the most similar linear list (MSLL)to record the similarity between cluster and its most similar cluster. In the process of clustering, the two clusters have the maximum similarity are merged. The information of most similarity cluster is stored in MSLL. This algorithm can search the subspace of clusters precisely based on information entropy. The experiment on the data sets of Zoo and Soybean show excellent nature on precision and stability compared to other related algorithms.
Keywords:subspace  clustering  high-dimensional  information entropy
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号