首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于MapReduce的关联规则并行增量更新算法
引用本文:杨勇,高松松.基于MapReduce的关联规则并行增量更新算法[J].重庆邮电大学学报(自然科学版),2014,26(5):670-678.
作者姓名:杨勇  高松松
作者单位:重庆邮电大学 计算智能重庆市重点实验室,重庆 400065;重庆邮电大学 计算智能重庆市重点实验室,重庆 400065
基金项目:重庆市自然科学基金(CSTC 2007BB2445);重庆市教委科学技术研究项目(KJ110522);重庆邮电大学科研基金(A2009-26)
摘    要:针对在关联规则的实际挖掘中,由数据快速增加所造成的大数据问题和增量更新问题?在快速更新频繁模式树算法(fast updated frequent pattern tree,FUFP - tree)的基础上,引入MapReduce 编程模型,提出了一个面向大数据的并行的关联规则增量更新算法(parallel fast updated frequent pattern tree,PFUFP - tree)?该算法通过构建原始事务数据的分块索引,从而使得在每次增量更新时,能够最小化地扫描原始事务数据库,提高了挖掘效率;同时采用动态负载均衡的项目分组策略来优化并行计算过程中的项集分组问题,从而保证分布式集群中节点之间的负载均衡;实验结果证明,提出 的算法是有效的和高效的,适用于动态增长的大数据环境?

关 键 词:关联规则  大数据  增量更新  MapReduce  快速更新频繁模式树(FUFP-tree)
收稿时间:2014/6/24 0:00:00
修稿时间:2014/9/25 0:00:00

Parallel and incremental updating algorithm for association rules based on mapReduce
YANG Yong and GAO Songsong.Parallel and incremental updating algorithm for association rules based on mapReduce[J].Journal of Chongqing University of Posts and Telecommunications,2014,26(5):670-678.
Authors:YANG Yong and GAO Songsong
Institution:Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,P. R.China;Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,P. R.China
Abstract:In the actual mining of association rules, aiming at the big data problem and incremental updating problem caused by the rapidly increasing of data, in this paper,a parallel incremental updating algorithm of association rules is proposed based on the MapReduce parallel programming model and the FUFP-tree algorithm. At first, the block index of the original transactions would be built.Based on the index, the number of scanning the original transaction database can be reduced.Therefore,the mining efficiency would be improved.Secondly, the grouping strategy of dynamic load-balancing is adopted to solve the item grouping problem in the process of parallel computing, so as to ensure the load-balancing between nodes of the distributed clusters.Finally,according to the compared experiment results,it is demonstrated that the proposed algorithm is effective and efficient,and can be used to incremental big data environment.
Keywords:association rule  big data  incremental updating  MapReduce  fast updated frequent pattern tree (FUFP-tree)
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号