首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Hive的性能优化研究
引用本文:王康,陈海光,李东静.基于Hive的性能优化研究[J].上海师范大学学报(自然科学版),2017,46(4):527-534.
作者姓名:王康  陈海光  李东静
作者单位:上海师范大学 信息与机电工程学院, 上海 200234,上海师范大学 信息与机电工程学院, 上海 200234,南京航空航天大学 计算机科学与技术学院, 南京 211106
摘    要:主要从Map Reduce作业调度和Hive性能调优两个方面对Hive的性能优化进行研究.对于Map Reduce主要从编程模型切入,分析其执行过程,并从map端、reduce端进行参数调优.接着从Hive框架角度入手,分别从分区表和外部表以及常用数据文件的压缩、行式存储与列式存储等方面进行深入研究.实验结果表明,snappy压缩、orcfile/parquet存储格式对于列式查询,提高查询效率,对于大数据分析平台有较好的兼容性.

关 键 词:数据仓库  作业调优  性能优化  压缩  存储格式
收稿时间:2015/12/10 0:00:00

Performance optimization research based on Hive
Wang Kang,Chen Haiguang and Li Dongjing.Performance optimization research based on Hive[J].Journal of Shanghai Normal University(Natural Sciences),2017,46(4):527-534.
Authors:Wang Kang  Chen Haiguang and Li Dongjing
Institution:The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China,The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China and College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Abstract:This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning.MapReduce''s programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side.Then Hive''s framework is researched from the aspects of the partition table,the external surface and common data file compression,the line storage and column type storage.The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility.
Keywords:data warehouse  job optimization  performance optimization  compression  storage format
本文献已被 CNKI 等数据库收录!
点击此处可从《上海师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《上海师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号