首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于MapReduce和改进人工蜂群算法的并行划分聚类算法
引用本文:陶涛,毛伊敏.基于MapReduce和改进人工蜂群算法的并行划分聚类算法[J].科学技术与工程,2021,21(21):8989-8998.
作者姓名:陶涛  毛伊敏
作者单位:江西理工大学信息工程学院,赣州341000
基金项目:国家重点研发计划项目(No. 2018YFC1504705); 国家自然科学基金项目(No.41562019)
摘    要:针对大数据背景下基于划分的聚类算法中存在参数寻优能力不佳、初始中心敏感、数据倾斜等问题,提出一种基于MapReduce和人工蜂群(artificial bee colony,ABC)算法的并行划分聚类(the partitioning-based clustering algorithm by using im-prove artificial bee colony based on MapReduce,MR-PBIABC)算法.首先,提出基于反向学习和聚类准则函数的初始化策略(backward learning and the clustering criterion function,BLCCF),提升人工蜂群算法搜索的解质量,并将ABC算法和人工鱼群(artificial fish colony,AFS)算法结合,提出改进人工蜂群(improve artificial bee colony,IABC)算法,通过利用AFS算法最优解能力较强的特性,来提高ABC算法的寻优能力;其次,根据改进的人工蜂群算法IABC获取初始聚类中心,提出相对熵策略(rela-tive entropy strategy,RES)衡量人工鱼间的距离,保证获得的初始聚类中心是最优人工鱼状态,从而有效避免了随机选取初始聚类中心,引起的初始中心敏感的问题;再次,设计数据均衡策略(data balancing strategy,DBS),通过动态收集节点负载并分配节点间的负载,解决了节点上数据倾斜的问题;最后,结合MapReduce计算模型,并行挖掘簇中心,生成最终聚类结果.实验结果表明,MR-PBIABC算法的聚类效果更佳,同时在大数据环境下,能有效地提高并行计算的效率.

关 键 词:大数据  并行化聚类  人工蜂群(ABC)算法  人工鱼群(AFS)算法  MapReduce
收稿时间:2021/1/4 0:00:00
修稿时间:2021/5/21 0:00:00

The partitioning-based clustering algorithm by using improve artificial bee colony based on MapReduce
Tao Tao,Mao Yimin.The partitioning-based clustering algorithm by using improve artificial bee colony based on MapReduce[J].Science Technology and Engineering,2021,21(21):8989-8998.
Authors:Tao Tao  Mao Yimin
Institution:Jiangxi University of Science and Technology
Abstract:Aiming at the problems of poor parameter optimization ability, sensitivity of initial center and data skew in big data clustering algorithm based on partitioning, this paper proposes a partitioning-based clustering algorithm by using improve artificial bee colony based on MapReduce, named MR-PBIABC. Firstly, the BLCCF (backward learning and the clustering criterion function) is proposed to improve the solution quality when use artificial bee colony algorithm to search. Meanwhile, according to AFS algorithm, it makes use of the characteristics of strong optimal solution capability, ABC algorithm and AFS algorithm is combined to improve the optimization ability of ABC algorithm. Then, the IABC algorithm is proposed to get the initial clustering center, which avoids the sensitivity of initial center caused by random selection of initial cluster center. Secondly, a DBS (data balancing strategy) is designed to handle data skew in data partitions, which improve the cluster efficiency. Finally, based on MapReduce, the cluster centers are mined in parallel to generate the final clustering results. The experimental results show that the MR-PBIABC algorithm has better clustering results and performs better parallelization in big data.
Keywords:big data  parallelize clustering  ABC algorithm  AFS algorithm  MapReduce
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号