首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种大规模高维数据集的高效聚类算法
引用本文:周晓云,孙志挥,张柏礼.一种大规模高维数据集的高效聚类算法[J].应用科学学报,2006,24(4):396-400.
作者姓名:周晓云  孙志挥  张柏礼
作者单位:东南大学,计算机科学与工程系,江苏,南京,210096
基金项目:国家高技术研究发展计划(863计划);高等学校博士学科点专项科研项目
摘    要:大规模高维数据集的聚类算法已成为当前聚类研究的热点,由于高维的原因,聚类往往隐藏在数据空间的某些子空间中,传统的聚类算法无法获得有意义的聚类结果.此外,高维数据中含有的大量的随机噪声也会带来额外的效率问题.为了解决以上问题,该文在CLIQUE算法的基础上提出了一种基于最优区间分割和数据集划分的聚类算法-OpCluster,并使用仿真数据对该算法加以验证,实验结果表明,OpCluster对大规模高维数据集具有很好的聚类效果.

关 键 词:聚类算法  子空间聚类  最优分割  数据划分
文章编号:0255-8297(2006)04-0396-05
收稿时间:2005-02-25
修稿时间:2005-02-252005-05-24

An Efficient Clustering Algorithm of Large Scale and High Dimensional Data Set
ZHOU Xiao-yun,SUN Zhi-hui,ZHANG Bai-li.An Efficient Clustering Algorithm of Large Scale and High Dimensional Data Set[J].Journal of Applied Sciences,2006,24(4):396-400.
Authors:ZHOU Xiao-yun  SUN Zhi-hui  ZHANG Bai-li
Institution:Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
Abstract:Clustering large data set of high dimensionality has always been a serious challenge for clustering algorithms. Traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensionality and inherently sparse feature space of most real-world data sets. Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space. In addition, high-dimensional data often contain a significant amount of noise which causes additional effectiveness problems. To overcome these problems, a new algorithm based on CLIQUE, named OpCluster, is proposed. A set of experiments on a synthetic dataset demonstrate the effectiveness and efficiency of the new approach.
Keywords:clustering algorithms  subspace clustering  optimal partition  data partition
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号