一种大规模高维数据集的高效聚类算法 An Efficient Clustering Algorithm of Large Scale and High Dimensional Data Set期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种大规模高维数据集的高效聚类算法

引用本文：	周晓云,孙志挥,张柏礼.一种大规模高维数据集的高效聚类算法[J].应用科学学报,2006,24(4):396-400.

作者姓名：	周晓云孙志挥张柏礼

作者单位：	东南大学,计算机科学与工程系,江苏,南京,210096

基金项目：	国家高技术研究发展计划(863计划);高等学校博士学科点专项科研项目

摘要：	大规模高维数据集的聚类算法已成为当前聚类研究的热点,由于高维的原因,聚类往往隐藏在数据空间的某些子空间中,传统的聚类算法无法获得有意义的聚类结果.此外,高维数据中含有的大量的随机噪声也会带来额外的效率问题.为了解决以上问题,该文在CLIQUE算法的基础上提出了一种基于最优区间分割和数据集划分的聚类算法-OpCluster,并使用仿真数据对该算法加以验证,实验结果表明,OpCluster对大规模高维数据集具有很好的聚类效果.
关键词：	聚类算法子空间聚类最优分割数据划分
文章编号：	0255-8297（2006）04-0396-05
收稿时间：	2005-02-25
修稿时间：	2005-02-252005-05-24
An Efficient Clustering Algorithm of Large Scale and High Dimensional Data Set

ZHOU Xiao-yun,SUN Zhi-hui,ZHANG Bai-li.An Efficient Clustering Algorithm of Large Scale and High Dimensional Data Set[J].Journal of Applied Sciences,2006,24(4):396-400.

Authors:	ZHOU Xiao-yun SUN Zhi-hui ZHANG Bai-li

Institution:	Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China

Abstract:	Clustering large data set of high dimensionality has always been a serious challenge for clustering algorithms. Traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensionality and inherently sparse feature space of most real-world data sets. Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space. In addition, high-dimensional data often contain a significant amount of noise which causes additional effectiveness problems. To overcome these problems, a new algorithm based on CLIQUE, named OpCluster, is proposed. A set of experiments on a synthetic dataset demonstrate the effectiveness and efficiency of the new approach.

Keywords:	clustering algorithms subspace clustering optimal partition data partition
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏