首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于MapReduce的数据流频繁项集挖掘算法
引用本文:朱付保,白庆春,汤萌萌,朱颢东.基于MapReduce的数据流频繁项集挖掘算法[J].华中师范大学学报(自然科学版),2017,51(4):429-434.
作者姓名:朱付保  白庆春  汤萌萌  朱颢东
作者单位:郑州轻工业学院 计算机与通信工程学院, 郑州 450002
摘    要:针对传统数据流频繁项集计算中效率低、内存消耗大等问题,本文采用并行计算的思想设计了一种基于MapReduce的数据流频繁项集挖掘算法,首先,对进行数据分块压缩和传输,其次,将数据频繁项的计算分布在负载均衡的数据节点,可以有效保证数据的执行效率.最后通过一次调度处理合并各个节点产生的频繁项集并进行合并.理论分析和实验对比结果均表明,该算法对于并行处理数据流频繁项集的统计问题是有效可行的.

关 键 词:MapReduce    频繁项集    数据流    并行计算    数据挖掘  
收稿时间:2017-07-07

An algorithm for mining frequent item sets from data streams based on MapReduce
ZHU Fubao,BAI Qingchun,TANG Mengmeng,ZHU Haodong.An algorithm for mining frequent item sets from data streams based on MapReduce[J].Journal of Central China Normal University(Natural Sciences),2017,51(4):429-434.
Authors:ZHU Fubao  BAI Qingchun  TANG Mengmeng  ZHU Haodong
Institution:School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002
Abstract:Aiming at the problems in traditional frequent item sets computing, such as low efficiency and large memory consumption, this paper presents a new frequent mining algorithm based on MapReduce parallel computing model. Firstly, in order to compress and transmit data, the data are divided into small pieces. Secondly, the calculation of frequent data distribution is in the load-balanced data nodes, which can improve the efficiency greatly. Finally, the dataset generated by each node are merged. The theoretical analysis and experimental results show that the algorithm is effective and feasible for dealing with the frequent item sets of data flow in parallel processing.
Keywords:MapReduce  item sets  data streams  parallel computation  data mining  
本文献已被 CNKI 等数据库收录!
点击此处可从《华中师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《华中师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号