首页 | 本学科首页   官方微博 | 高级检索  
     

一种改进的Hadoop数据放置策略
引用本文:林伟伟. 一种改进的Hadoop数据放置策略[J]. 华南理工大学学报(自然科学版), 2012, 40(1): 152-158
作者姓名:林伟伟
作者单位:华南理工大学计算机科学与工程学院,广东广州,510006
基金项目:国家自然科学基金资助项目,广东省自然科学基金资助项目,广东省科技计划项目
摘    要:采用现有的Hadoop默认数据放置策略时,若本地数据副本失效,从远程结点上恢复数据需要耗费大量数据传输时间,且随机选取数据放置结点可能会影响数据放置的负载均衡.为此,文中提出一种改进的数据放置策略.该策略基于结点网络距离与数据负载计算每个结点的调度评价值,据此选择一个最佳的远程数据副本的放置结点,从而既能实现数据放置的负载均衡,又能实现良好的数据传输性能.在Hadoop平台上实现了所提出的数据副本放置改进策略,结果表明,与系统默认策略相比,文中提出的策略不仅可以改进数据放置的负载均衡,而且可以减少数据副本放置的时间.

关 键 词:Hadoop  数据放置  负载均衡  策略

An Improved Data Placement Strategy for Hadoop
Lin Wei-wei. An Improved Data Placement Strategy for Hadoop[J]. Journal of South China University of Technology(Natural Science Edition), 2012, 40(1): 152-158
Authors:Lin Wei-wei
Affiliation:Lin Wei-wei(School of Computer Engineering and Science,South China University of Technology,Guangzhou 510006,Guangdong,China)
Abstract:In the existing default data placement strategy for Hadoop,much time is needed to restore data from a remote DataNode when the local replicas become unavailable,and the load balancing may be destroyed due to the random selection of DataNode for data storage.In order to solve these problems,an improved data placement strategy is proposed,which chooses the most appropriate DataNode to place remote replicas according to the scheduling evaluation value of each DataNode based on DataNodes’ network distance and data load.Thus,the load balancing for data storage is implemented and excellent data transmission is achieved.The proposed data placement strategy is then implemented in the Hadoop platform and the results show that the proposed strategy is superior to the existing default data placement strategy because it improves the local balancing for data storage and reduces the time for data placement.
Keywords:Hadoop  data placement  load balancing  strategy
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号