首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于网格和密度的随机样例的聚类算法
引用本文:孙志伟,赵政,王红梅.基于网格和密度的随机样例的聚类算法[J].天津大学学报(自然科学与工程技术版),2006,39(5):621-626.
作者姓名:孙志伟  赵政  王红梅
作者单位:天津大学电子信息工程学院,天津300072
摘    要:为提高密度聚类算法效率并处理非空间属性约束,提出了基于网格和密度的聚类算法(GDRS).它使用网格区域表示点的邻域,非空间属性被分为数值和字符类型.首先通过网格方法找到能准确反映数据空间几何特征的参考点;然后随机选择没有分类的参考点,并测试其邻域的稀疏状况、与其他聚类的关系以及非空间属性的约束来决定加入、合并聚类或形成新的聚类;最后把参考点映射回数据.把此算法和DBSCAN及DBRS算法进行了理论比较,并使用合成和真实数据集对GDRS和DBSCAN进行了对比.实验表明,GDRS具有密度算法的优点,即可发现各种形状的聚类并能屏蔽噪声点,且执行效率明显优于密度算法.

关 键 词:数据挖掘  聚类算法  密度  网格  参考点  随机样例  约束
文章编号:0493-2137(2006)05-0621-06
收稿时间:2005-01-11
修稿时间:2005-01-112005-11-10

A Clustering Algorithm Based on Grid and Density with Random Sampling
SUN Zhi-wei,ZHAO Zheng,WANG Hong-mei.A Clustering Algorithm Based on Grid and Density with Random Sampling[J].Journal of Tianjin University(Science and Technology),2006,39(5):621-626.
Authors:SUN Zhi-wei  ZHAO Zheng  WANG Hong-mei
Institution:School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
Abstract:To improve efficiency of density-based clustering algorithms and deal with the constrains tial attributes, a novel spatial clustering algorithm called GDRS is proposed. It is based on grid and of non-spadensity with random sampling, and uses rectangular grid region with the same area to express the neighborhood of point instead of circle. Non-spatial attributes are classified to numeric type and character type. Firstly, references which can accurately reflect spatial character are found by grid method. Then it repeatedly picks an unclassified reference randomly and inserts, merges or creates a new cluster based on the sparseness hood, the relation with other clusters and the constrains of non-spatial attributes. At last the references will be mapped back to the original point data. The proposed algorithm is compared theoretically with DBSCAN and DBRS, and GDRS is compared with DBSCAN on synthetic and real data sets. Both theoretical analysis and experimental results show that GDRS can discover clusters with arbitrary shape and screen noise data, and the executing efficiency is much higher than the traditional DBSCAN algorithm.
Keywords:data mining  clustering algorithm  density  grid  reference  random sampling  constrain
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号