基于Spark框架的大数据K-prototypes聚类算法 K-prototypes Clustering Algorithm Based on Spark Framework for Big Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于Spark框架的大数据K-prototypes聚类算法

引用本文：	龚静.基于Spark框架的大数据K-prototypes聚类算法[J].西南师范大学学报(自然科学版),2019,44(7):63-68.

作者姓名：	龚静

作者单位：	铜仁学院大数据学院

基金项目：	贵州省教育厅普通高等学校创新人才团队建设项目（黔教合人才团队字[2015]67号）.

摘要：	大数据具有数据量大及混合类型的属性,基于MapReduce的K-prototypes并行大规模混合数据方案的缺点是时间和内存的限制,导致这些方案不适合处理大数据.为了解决这个问题,该文提出一种新的基于Spark的k-prototypes聚类方法,该方法使用了重新聚集技术,利用Spark框架的内存操作来构建大规模混合数据分组.在模拟和实际数据集上的实验表明,该文方法可行,且提高了现有K-prototypes方法的效率.
关键词：	大数据混合数据 K-prototypes Spark框架
收稿时间：	2018/6/1 0:00:00
K-prototypes Clustering Algorithm Based on Spark Framework for Big Data

GONG Jing.K-prototypes Clustering Algorithm Based on Spark Framework for Big Data[J].Journal of Southwest China Normal University(Natural Science),2019,44(7):63-68.

Authors:	GONG Jing

Institution:	School of Data Science, Tongren University, Tongren Guizhou 554300, China

Abstract:	Big data has a large amount of data and mixed types of attributes. The disadvantages of the current MapReduce-based K-prototypes parallel large-scale hybrid data plan are the limitations of time and memory, making these solutions unsuitable for processing big data. To solve this problem, a new Spark-based K-prototypes clustering method has been proposed in this paper. In this method, the re-aggregation technique and Spark''s memory operations have been used to build large-scale mixed data groups. Experiments on simulated and actual datasets show that this method is feasible and improves the efficiency of the existing K-prototypes method.

Keywords:	big data mixed data K-prototypes spark framework
本文献已被 CNKI 等数据库收录！
	点击此处可从《西南师范大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《西南师范大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏