首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于集成学习的分布式聚类算法
引用本文:吉根林,凌霄汉,杨明.一种基于集成学习的分布式聚类算法[J].东南大学学报(自然科学版),2007,37(4):585-588.
作者姓名:吉根林  凌霄汉  杨明
作者单位:南京师范大学计算机系,南京,210097
摘    要:基于集成学习的思想,提出一种分布式聚类模型.该模型的分布式处理过程分为2个阶段:先在局部站点局部聚类,然后在全局站点全局聚类.局部站点的局部聚类看作是一种基于数据子集的学习过程,所有的局部聚类结果组成了聚类集成系统的个体学习器,全局聚类采用平均法对局部结果进行集成,并定义了一个准则函数来度量集成的精度.把K-means算法推广到分布式环境,提出一种基于该模型的分布式K均值算法DK-means,该算法对局部数据的分布有较强的伸缩性.实验结果表明,DK-means在同等条件下能达到集中式聚类的精度水平,是有效可行的,从而验证了基于集成学习的分布式聚类模型的有效性.

关 键 词:分布式聚类  数据挖掘  集成学习
文章编号:1001-0505(2007)04-0585-04
修稿时间:2006-11-24

Distributed clustering algorithm based on ensemble learning
Ji Genlin,Ling Xiaohan,Yang Ming.Distributed clustering algorithm based on ensemble learning[J].Journal of Southeast University(Natural Science Edition),2007,37(4):585-588.
Authors:Ji Genlin  Ling Xiaohan  Yang Ming
Institution:Department of Computer Science, Nanjing Normal University, Nanjing 210097, China
Abstract:A distributed clustering model based on ensemble learning is proposed. A typical distributed clustering scenario of the model is a 'two-stage' course,which firstly does clustering in local sites and then in global site.The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results.The model converts distributed clustering into a combinatorial optimization problem.As an implementation for the model,a novel distributed K-means called DK-means is presented.DK-means firstly does clustering in each local site using K-means,then does clustering in global site which receives clustering results from local sites by K-means again.Despite the fact that data distribution varies in any local site,it always works well.Experimental results show that DK-means is effective and efficient.So it is also an empirical verification of validity to the model.
Keywords:K-means
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号