首页 | 本学科首页   官方微博 | 高级检索  
     

基于密度RPCL的K-means算法
引用本文:谢娟英,郭文娟,谢维信,高新波. 基于密度RPCL的K-means算法[J]. 西北大学学报(自然科学版), 2012, 0(4): 570-576
作者姓名:谢娟英  郭文娟  谢维信  高新波
作者单位:西安电子科技大学电子工程学院;陕西师范大学计算机科学学院;深圳大学信息工程学院
基金项目:陕西省自然科学基础研究计划基金资助项目(2010JM3004);中央高校基本科研业务费专项基金资助项目(GK201102007)
摘    要:目的探索同时确定K-means算法的最佳聚类数K和最佳初始聚类中心的方法,使K-means算法的聚类结果尽可能地收敛于全局最优解或近似全局最优解。方法以次胜者受罚竞争学习(Rival Penalized Competitive Learning,RPCL)作为K-means的预处理步骤,以其学习结果作为K-means的聚类数和初始聚类中心并依据数据集样本自然分布定义样本密度,将此密度引入RPCL的节点权值调整,以此密度RPCL的输出作为K-means的最佳聚类数K和最佳初始聚类中心。采用UCI机器学习数据库数据集以及随机生成的带有噪音点的人工模拟数据集进行实验测试,并用不同的聚类结果评价指标对聚类结果作了分析。结果提出的密度RPCL为K-means提供了最佳的类簇数和最佳的初始聚类中心。结论基于密度RPCL的K-means算法具有很好的聚类效果,对噪音数据有很强的抗干扰性能。

关 键 词:RPCL  K-means  密度  聚类数目  初始聚类中心

A density RPCL based K-means algorithm
XIE Juan-ying,GUO Wen-juan,XIE Wei-xin,GAO Xin-bo. A density RPCL based K-means algorithm[J]. Journal of Northwest University(Natural Science Edition), 2012, 0(4): 570-576
Authors:XIE Juan-ying  GUO Wen-juan  XIE Wei-xin  GAO Xin-bo
Affiliation:1(1.School of Electronic Engineering,Xidian University,Xi′an 710071,China; 2.School of Computer Science,Shaanxi Normal University,Xi′an 710062,China; 3.School of Information Engineering,Shenzhen University,Shenzhen 518060,China)
Abstract:Aim To solve the two open problems of K-means algorithm that needs the cluster number in advance and its clustering results depend on initial seeds.Methods The advantages of RPCL(Rival Penalized Competitive Learning) are introduced to uncover the clusters of a dataset and the best initial seeds for K-means as well.The deficiency of the available RPCL algorithms is analyzed and a density RPCL is proposed here,where the density for a sample is defined according to the pattern distribution of samples in a dataset and the density is introduced to adjust the weights of nodes in RPCL.The density RPCL is used as the preprocessing procedure to determine the proper number of clusters and the best initial seeds for K-means algorithm.The density RPCL based K-means clustering algorithm is tested on some well-known data sets from UCI machine learning repository and on some synthetic data sets with noise samples,and the experimental results are analyzed upon many different criteria.Results The density RPCL algorithm uncovers the proper clusters of a dataset and the best initial seeds for K-means.Conclusion The density RPCL based K-means algorithm achieves better clustering result and is insensitive to noisy data.
Keywords:RPCL  K-means  density  the number of clusters  initial centers
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号