首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于混合索引的HDFS小文件存储策略
引用本文:熊安萍,黄容,邹洋. 一种基于混合索引的HDFS小文件存储策略[J]. 重庆邮电大学学报(自然科学版), 2015, 27(1): 97-102. DOI: 10.3979/j.issn.1673-825X.2015.01.017
作者姓名:熊安萍  黄容  邹洋
作者单位:重庆邮电大学计算机科学与技术学院,重庆,400065
基金项目:重庆市教委科学技术研究项目( KJ120513);工信部2012年物联网发展专项资金(2-5)
摘    要:Hadoop分布式文件系统(hadoop distributed file system,HDFS)因其稳定高效、低成本等优势,已被很多大型企业广泛使用.针对HDFS海量小文件存储时元数据服务器节点内存开销过大,合并文件中小文件访问效率不高的问题,提出一种改进的基于混合索引的小文件存储策略,应用分类器分类标记小文件,并在元数据服务器建立H-B+树索引,在存储节点根据小文件大小建立不同的块内索引,达到提高小文件访问效率的目标.实现中采用缓存结构,以提高客户端访问的响应速度,同时也有利于元数据服务器节点的内存负载.实验结果表明,基于混合索引的小文件存储策略能有效提高小文件访问效率,并显著降低元数据节点内存开销.

关 键 词:Hadoop分布式文件系统(HDFS)  小文件  元数据服务器  缓存  混合索引
收稿时间:2014-02-23
修稿时间:2014-11-02

A kind of HDFS small files storage strategy based on hybrid index
XIONG Anping,HUANG Rong and ZOU Yang. A kind of HDFS small files storage strategy based on hybrid index[J]. Journal of Chongqing University of Posts and Telecommunications, 2015, 27(1): 97-102. DOI: 10.3979/j.issn.1673-825X.2015.01.017
Authors:XIONG Anping  HUANG Rong  ZOU Yang
Affiliation:Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China,Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China and Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
Abstract:Benefiting from its stability and efficiency, low-cost storage capability, Hadoop distributed file system HDFS has been widely used in many large enterprises. However, processing massive small files on HDFS costs too much memory overhead of NameNode, and the efficiency of accessing small files from merged file is not satisfactory. To deal with these two issues, this paper proposes an optimizing strategy for storing and accessing small files, that is Small Files Storage Strategy Based on Hybrid Index.Firstly, the strategy classifies and marks small files by a classifier. Secondly, H - B + _tree index on NameNode and different block index on DataNode are created to improve small file access efficiency. Finally, to enhance the response speed of the client access request, cache structure is used to release memory load of NameNode. The experimental results indicate that the strategy is able to improve small file access efficiency, and significantly reduce the memory overhead of NameNode.
Keywords:hadoop distributed file system (HDFS)   small files  metadata server  cache  hybrid index
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号