首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种改进的基于密度和样本数量的K-means算法
引用本文:ZHAO Da-wei Xiao Zhou-fang ﹙.School of Computer Science and Technology,China University of Mining and Technology,Jiangsu,Xuzhou ,P.R.China, .School of Computer Science and Technology,China University of Mining and Technology,Jiangsu,Xuzhou ,P.R.China﹚.一种改进的基于密度和样本数量的K-means算法[J].科技信息,2008(28).
作者姓名:ZHAO Da-wei Xiao Zhou-fang ﹙.School of Computer Science and Technology  China University of Mining and Technology  Jiangsu  Xuzhou  P.R.China  .School of Computer Science and Technology  China University of Mining and Technology  Jiangsu  Xuzhou  P.R.China﹚
作者单位:中国矿业大学计算机科学与技术系
摘    要:对原始K-means算法进行了研究,通过改进,算法能够自动找出合适的k值,并且最大限度的找出孤立点。首先,寻找样本容量的最大可能初始聚类数n。然后做样本圆,将样本圆等分为n份,依据样本点的位置将样本归属到相应的份里,对初始的n个类进行聚类。最后通过应用DBSCAN算法的小类合并策略将需要合并的小类进行了合并。为了测试改进算法的聚类性能,将改进后的算法源码放在新西兰怀卡托大学所开发的开源平台"weka"上,在多个数据集上与原始K-means算法进行了对比实验,验证了改进算法在聚类质量和聚类稳定性上都远优于原始K-means算法。

关 键 词:数据挖掘  聚类算法  K-means  DBSCAN

Improved K-means Clustering Algorithm Based Density and Sample Size
ZHAO Da-wei,Xiao Zhou-fang.Improved K-means Clustering Algorithm Based Density and Sample Size[J].Science,2008(28).
Authors:ZHAO Da-wei  Xiao Zhou-fang
Institution:1.School of Computer Science and Technology; China University of Mining and Technology; Jiangsu; Xuzhou 221008; P.R.China; 2.School of Computer Science and Technology; P.R.China﹚;
Abstract:Studying and improving the K-means algorithm, it can search the proper value k automatically, and try the best to find the isolated points. Firstly, The algorithm calculates the most probable number of clustering. Then, finding the sample circle, dividing the cicle, swatches are distributed to their shares by their positions. Followed, the improved algorithm puts clustering into practice. Lastly, some small classes are combined by adopting DBSCAN algorithm. The source code of the improved algorithm is put into the open platform "weak" which is developed by University of Waikato, New Zealand to test the performance of the improved algorithm. It has compared with the original K-means algorithm in many datas, which proves that it precedes the original K-means algorithm in quality and stability.
Keywords:data mining  clustering algorithm  k-means  DBSCAN
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号