首页 | 本学科首页   官方微博 | 高级检索  
     检索      

MapReduce框架下的查询结果共享
引用本文:石霖.MapReduce框架下的查询结果共享[J].科学技术与工程,2018,18(8).
作者姓名:石霖
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目),国家高技术研究发展计划(863计划)
摘    要:当前的大规模数据分析通常在MapReduce框架下执行查询。由于MapReduce框架本身的冗余性以及查询之间的重叠性,复用已有查询的结果可以大幅提高查询的执行效率。复用查询的结果需要对其进行存储和匹配管理,产生高昂的系统开销,抵消复用的部分效果。本文针对目前先进的查询结果复用系统ReStore在管理查询结果和匹配中存在的效率低下的问题,提出森林结构的Job存储管理技术和与之相适应的匹配算法,提高查询的匹配效率,减少系统的开销。为了使系统能够充分复用已执行查询的结果,本文提出对多个查询进行预处理的方案,通过改变各查询进入Pig编译器进行编译的顺序,从而改变Job的执行顺序,使得加载相同数据集的Job同时执行,减少与存储库进行匹配的次数。实验表明:在构建存储结构与匹配已有结果过程中,本文提出的方法与ReStore相比,节约16.3%的时间开销,伸缩性也更好。

关 键 词:MapReduce框架  ReStore系统  系统开销
收稿时间:2017/8/9 0:00:00
修稿时间:2017/11/15 0:00:00

Sharing Query Results in MapReduce Framework
shilin.Sharing Query Results in MapReduce Framework[J].Science Technology and Engineering,2018,18(8).
Authors:shilin
Institution:College of Computer Science and Technology, Taiyuan University of Technology
Abstract:The current large-scale data analysis is usually to execute queries in MapReduce framework. Because of the redundancy of MapReduce framework and overlap among queries, reusing the results of queries can significantly improve the efficiency of the execution of queries. It is necessary to store the results and match queries, which have significant overhead and offset some of the benefits. To alleviate the problem, this paper takes ReStore, the state of the art system for reusing query results, as an example, to improve its efficiency. A forest structure for managing query results is proposed and a matching algorithm is developed. Both of them can contribute to improving the efficiency of the system and reduce overhead. In order to fully enable the system to reuse the results of executed queries, a preprocessing scheme is proposed, which arranges queries in an order to enter Pig compiler according to their proximity in terms of datasets to be operated, so that the queries operate on the same datasets can be executed in sequence and matching can be localized. Experiments show that the proposed techniques can reduce 16.3% time cost, with a better scaling up factor.
Keywords:MapReduce framework  ReStore system  system overhead
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号