首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于内存功能划分的并行程序检查点策略研究
引用本文:薛瑞尼,陈文光,郑纬民.基于内存功能划分的并行程序检查点策略研究[J].华中科技大学学报(自然科学版),2005,33(Z1):107-110.
作者姓名:薛瑞尼  陈文光  郑纬民
作者单位:清华大学,计算机科学与技术系,北京,100084
基金项目:国家高技术研究发展计划资助项目(2002AA1Z2103)
摘    要:目前采用检查点设置技术的并行程序容错系统存在着不能透明处理通信环境变量的缺点,需要在设置检查点之前关闭进程间通信套接字,在恢复之后重新构建,为此提出了基于内存功能划分的通信隔离策略,分离计算模块和通信模块,避免对通信套接字的直接操作,完成了透明的容错功能.实验结果显示此策略对并行检查点系统性能有一定程度的改善,可以降低系统实现的复杂度,提高卷回恢复的可靠性,而且独立于并行系统,具有良好的移植性.

关 键 词:容错  检查点设置  卷回恢复  内存排除
文章编号:1671-4512(2005)S1-0107-04
修稿时间:2005年8月24日

Checkpointing of parallel applications through differential memory functions
Xue Ruini,Chen Wenguang,Zheng Weimin.Checkpointing of parallel applications through differential memory functions[J].JOURNAL OF HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY.NATURE SCIENCE,2005,33(Z1):107-110.
Authors:Xue Ruini  Chen Wenguang  Zheng Weimin
Institution:Xue Ruini Chen Wenguang Zheng Weimin Doctoral Candidate,Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China.
Abstract:As high-performance computing systems continue to grow in size and popularity,issues of fault tolerance and reliability turn into limiting factors on application scalability and system availability.Current fault tolerance systems for parallel applications through checkpoint/restart cannot handle the communication environment transparently.Sockets would be closed before checkpointing and reestablished after recovery,which is difficult to implement and prone to errors."Communication exclusion" based on differential memory function is proposed to separate the communication and computation modules in order to avoid dealing with sockets directly.Experimental results indicate a little improvement on checkpointing performance.The strategy is helpful on reducing implementation complexity and improving recovery reliability,and is easy to be ported due to its independency to any parallel system.
Keywords:fault tolerance  checkpointing  rollback recovery  memory exclusion
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号