基于网格和密度的随机样例的聚类算法 A Clustering Algorithm Based on Grid and Density with Random Sampling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于网格和密度的随机样例的聚类算法

引用本文：	孙志伟,赵政,王红梅.基于网格和密度的随机样例的聚类算法[J].天津大学学报(自然科学与工程技术版),2006,39(5):621-626.

作者姓名：	孙志伟赵政王红梅

作者单位：	天津大学电子信息工程学院,天津300072

摘要：	为提高密度聚类算法效率并处理非空间属性约束，提出了基于网格和密度的聚类算法（GDRS）．它使用网格区域表示点的邻域，非空间属性被分为数值和字符类型．首先通过网格方法找到能准确反映数据空间几何特征的参考点；然后随机选择没有分类的参考点，并测试其邻域的稀疏状况、与其他聚类的关系以及非空间属性的约束来决定加入、合并聚类或形成新的聚类；最后把参考点映射回数据．把此算法和DBSCAN及DBRS算法进行了理论比较，并使用合成和真实数据集对GDRS和DBSCAN进行了对比．实验表明，GDRS具有密度算法的优点，即可发现各种形状的聚类并能屏蔽噪声点，且执行效率明显优于密度算法．
关键词：	数据挖掘聚类算法密度网格参考点随机样例约束
文章编号：	0493-2137（2006）05-0621-06
收稿时间：	2005-01-11
修稿时间：	2005-01-112005-11-10
A Clustering Algorithm Based on Grid and Density with Random Sampling

SUN Zhi-wei,ZHAO Zheng,WANG Hong-mei.A Clustering Algorithm Based on Grid and Density with Random Sampling[J].Journal of Tianjin University(Science and Technology),2006,39(5):621-626.

Authors:	SUN Zhi-wei ZHAO Zheng WANG Hong-mei

Institution:	School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China

Abstract:	To improve efficiency of density-based clustering algorithms and deal with the constrains tial attributes, a novel spatial clustering algorithm called GDRS is proposed. It is based on grid and of non-spadensity with random sampling, and uses rectangular grid region with the same area to express the neighborhood of point instead of circle. Non-spatial attributes are classified to numeric type and character type. Firstly, references which can accurately reflect spatial character are found by grid method. Then it repeatedly picks an unclassified reference randomly and inserts, merges or creates a new cluster based on the sparseness hood, the relation with other clusters and the constrains of non-spatial attributes. At last the references will be mapped back to the original point data. The proposed algorithm is compared theoretically with DBSCAN and DBRS, and GDRS is compared with DBSCAN on synthetic and real data sets. Both theoretical analysis and experimental results show that GDRS can discover clusters with arbitrary shape and screen noise data, and the executing efficiency is much higher than the traditional DBSCAN algorithm.

Keywords:	data mining clustering algorithm density grid reference random sampling constrain
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏