首页 | 本学科首页   官方微博 | 高级检索  
     检索      

云计算环境下关联性大数据实时流式可控聚类算法研究
引用本文:李鹏飞,刘春宇,海军.云计算环境下关联性大数据实时流式可控聚类算法研究[J].科学技术与工程,2018,18(7).
作者姓名:李鹏飞  刘春宇  海军
作者单位:囯网内蒙古东部电力有限公司信息通信分公司,囯网内蒙古东部电力有限公司信息通信分公司;华北电力大学计算机学院,囯网内蒙古东部电力有限公司信息通信分公司
摘    要:针对传统聚类算法效率低、效果差和稳定性弱等弊端,提出一种新的云计算环境下关联性大数据实时流式可控聚类算法。介绍了关联性实时流式数据的定义和特点。通过粗聚类对实时抵达的数据元组进行相应的预处理,确定类簇的数量与中心点位置,形成通过存在差异的宏簇构成的集合,粗聚类采用的算法为Canopy算法。将粗聚类得到的宏簇传至K-means算法,给出了K-means算法的详细步骤,通过K-means算法完成细聚类,介绍了整个细聚类详细步骤。实验结果表明,所提算法具有效率高、质量好、稳定性强等优势,可有效实现云计算环境下关联性实时流式大数据聚类。

关 键 词:云计算  关联性  大数据  实时流式  聚类
收稿时间:2017/8/3 0:00:00
修稿时间:2017/8/3 0:00:00

Research on real time clustering algorithm for real time clustering in cloud computing environment
LI Peng-fei,LIU Chun-yu and HAI Jun.Research on real time clustering algorithm for real time clustering in cloud computing environment[J].Science Technology and Engineering,2018,18(7).
Authors:LI Peng-fei  LIU Chun-yu and HAI Jun
Abstract:In view of the disadvantages of traditional clustering algorithms, such as low efficiency, poor efficiency and weak stability, a new real-time clustering algorithm for real-time clustering of large data streams in cloud computing environment is proposed. The definition and characteristics of association real-time streaming data are introduced. Through rough clustering preprocessing corresponding to arrive in time for data tuples, determine the class number of clusters and the center point, form set formed by different macro cluster, rough clustering using the algorithm for the Canopy algorithm. The macro cluster obtained from the coarse clustering is transmitted to the K-means algorithm, and the detailed steps of the K-means algorithm are given. The fine clustering is completed by K-means algorithm, and the detailed steps of the fine clustering are introduced. The experimental results show that the proposed algorithm has the advantages of high efficiency, good quality and strong stability, and can effectively realize the association of real-time streaming large data clustering in cloud computing environment.
Keywords:Cloud computing  relevance  big data  real-time streaming  clustering
本文献已被 CNKI 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号