首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 484 毫秒
Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge. This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing(HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption. Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance. It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so. We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge(HPCC)benchmark suites and evaluate it on a representative HPC cluster. Experimental results show that our approach can effectively mitigate memory contention to improve application performance, and it achieves this without significantly increasing the peak power and overall energy consumption. Our approach obtains on average 12.69% performance improvement over the default resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings.  相似文献   

DRAM-based memory suffers from increasing row buffer conflicts, which causes significant performance degradation and power consumption. As memory capacity increases, the overheads of the row buffer conflict are increasingly worse as increasing bitline length, which results in high row activation and precharge latencies. In this work, we propose a practical approach called Row Buffer Cache(RBC) to mitigate row buffer conflict overheads efficiently. At the core of our proposed RBC architecture, the rows with good spatial locality are cached and protected,which are exempted from being interrupted by the accesses for rows with poor locality. Such an RBC architecture significantly reduces the overheads of performance and energy caused by row activation and precharge, and thus improves overall system performance and energy efficiency. We evaluate RBC architecture using SPEC CPU2006 on a DDR4 memory compared to a commodity baseline memory system. Results show that RBC improves the overall performance by up to 2:24(16:1% on average) and reduces the memory energy by up to 68:2%(23:6% on average) for single-core simulations. For multi-core simulations, RBC increases the overall performance by up to1:55(17% on average) and reduces memory energy consumption by up to 35:4%(21:3% on average).  相似文献   

面向按序执行处理器开展预执行机制的设计空间探索, 并对预执行机制的优化效果随 Cache 容量和访存延时的变化趋势进行了量化分析。实验结果表明, 对于按序执行处理器, 保存并复用预执行期间的有效结果和在预执行访存指令之间进行数据传递都能够有效地提升处理器性能, 前者还能够有效地降低能耗开销。将两者相结合使用, 在平均情况下将基础处理器的性能提升 24. 07% , 而能耗仅增加 4. 93% 。进一步发现, 在 Cache 容量较大的情况下, 预执行仍然能够带来较大幅度的性能提升。并且, 随着访存延时的增加, 预执行在提高按序执行处理器性能和能效性方面的优势都将更加显著。  相似文献   

The cost of the central register file and the size of the program code limit the scalability of very long instruction word (VLIW) processors with increasing numbers of functional units. This paper presents the architectural design of a six-way VLIW digital signal processor (DSP) with clustered register files. The architecture uses a variable length instruction set and supports dynamic instruction dispatching. The one-level memory system architecture of the processor includes 16-KB instruction and data caches and 16-KB instruction and data on-chip RAM. A compiler based on the Open64 was developed for the system. Evaluations show that the processor is suitable for high performance applications with a high code density and small program code size.  相似文献   

针对嵌入式单核处理器处理速度慢及主频提升受限等问题,提出了嵌入式双核处理器(two-cores embedded processor,TEP)模型.针对处理器运行时对存储器的依赖和分配问题,提出了基于非统一存储结构模拟分布式存储结构的方案;针对多核间对共享数据存储器的访存问题,给出了从属单元的仲裁机制,实现了共享资源的访问;针对面向多媒体应用的多核处理器间传输数据量大及通讯开销高的问题,提出了基于消息数据分离的传输方案.系统在FPGA平台进行了实现和验证,测试结果表明,TEP系统以较少的资源消耗和通讯开销获得了大加速比的性能.  相似文献   

This paper presents the design and implementation of a low power digital signal processor(THUCID-SP-1)targeting at application for cochlear implants.Multi-level low power strategies including algorithmoptimization,operand isolation,clock gating and memory partitioning are adopted in the processor designto reduce the power consumption.Experimental results show that the complexity of the Continuous Inter-leaved Sampling(CIS)algorithm is reduced by more than 80% and the power dissipation of the hardwarealo...  相似文献   

便携式伽马能谱仪在野外探矿、环境辐射监测和科学实验等领域被广泛应用.为了进一步缩小便携式能谱仪的体积,提高其能谱性能,从而扩展其应用领域,本文基于最新的闪烁晶体材料、半导体光电转换器件和高性能微处理器,开展了新一代便携式伽马能谱仪研究.在探测器设计方面,采用GAGG:Ce晶体耦合SiPM阵列成功设计并制作了高效率、高能量分辨率的紧凑型能谱探头;在数据采集电路方面,采用高性能ARM处理器及其自带ADC外设替代FPGA+ADC的传统电路架构,并设计专用的信号处理ARM程序,实现了在线能谱测量,并极大的减小了电路尺寸和系统功耗.综上所述,本文基于GAGG:Ce晶体耦合SiPM并搭配ARM处理器,成功研制了一款低成本、小体积、低功耗、高性能的口袋式能谱测量仪.整个能谱仪的体积仅为80 mm×40 mm×40 mm,重量为200 g;经过实验测试,能谱仪的工作功率为481 mW,能自带电池工作26小时;能谱响应线性拟合优度为0.996,能量分辨率为5.2%(@662 keV).  相似文献   

数字信号处理器(Digital Signal Processing,DSP)芯片用于手持式设备,功耗是其核心参数; DSP因ROM具有高的可靠性而使用其对固化的bootloader,科学函数库,功能函数库以及主应用程序进行存储,其功耗的大小对整个芯片产生了较大的影响;针对芯片中ROM被频繁访问产生较大功耗的问题,提出了对ROM存储空间进行结构优化和对其存储空间进行地址重组优化及对读数据时序结构进行优化的低功耗优化方法,达到了在不影响DSP性能的前提下降低功耗的目的; DSP已经流片并改版,最终减小DSP整体功耗约11.3%。  相似文献   

传统计算机体系结构中主存由动态随机存取存储器(DRAM)构成,而DRAM的刷新功耗随容量的增大而急剧增大.为应对这一问题,业界开始关注新型非易失性存储器(NVM).NVM具有掉电后数据不会丢失、不需刷新的优势,然而它们仍然处于研究阶段,单颗芯片的容量和价格不足以媲美DRAM,距离大批量投入商用仍有一段距离,因此,DRAM和NVM的新型混合主存结构被认为是下一代主存.本文提出一种SignificanceAware Pages Allocation(SA-PA)混合主存设计方案,通过将关键页分配到DRAM中,非关键页分配到相变存储器(PCM)中,采用DRAM和PCM并行结构,并采用Reset-Speed技术提高PCM的写速度,从而实现在不过分降低系统性能的前提下降低系统功耗的目的.结果表明,本文提出的SA-PA混合主存结构使得系统功耗平均下降25.78%,而系统性能仅下降1.34%.  相似文献   

移动终端像移动电话等正在采用双核处理器.这种处理器包含MPu和DSP两种核心,双核处理器有利于移动终端性能的提高和功耗的降低,但它会使软件的发展更趋复杂,因为MPU和DSP都要求各自的开发程序.为了改进双核系统对软件发展的要求,研制出了一种DSP脚本语言,其运行环境可以和MPu匹配,阐述了这个系统的设计、运行和评价.  相似文献   

当前任务节能调度方法通常需预先掌握嵌入式设备中差异化多任务的属性,但在实际应用中任务抵达处理器后才可获取任务属性,导致当前方法应用性较低。为此,提出一种新的嵌入式设备中差异化多任务节能优化调度方法,对嵌入式设备中处理器模型、任务模型和功耗模型进行描述。通过引入速度调节因子,依据松弛时间,结合功耗管理技术,降低嵌入式设备中差异化任务执行速度,达到合理调度与节省能耗之间的合理折中,给出嵌入式设备中差异化多任务节能优化调度的实现过程。实验结果表明,所提方法节能效果好,调度性能优。  相似文献   

基于片上cache占处理器芯片功耗的比重越来越大,提出了一种新的路衰减cache(Way-Decay Cache,WDC)结构.该结构通过门控Gnd技术来动态地关闭或开启部分cache路,使得cache结构可以在低功耗配置和正常配置之间切换,从而达到降低静态功耗的目的.与现有的低功耗cache结构相比,附加的逻辑少,实现简单,具有硬件的可实现性.试验结果表明,该结构可以降低cache的功耗,同时对cache整体的性能影响很小.  相似文献   

为了提高动力装置的性能,利用超级电容器,结合发电机组设计了基于超级电容器的混合动力装置,并将其应用在某牵引火炮的自动操瞄系统中.实验证明,超级电容器新型动力装置较传统的动力装置具有更强的性能与环境适应性,减少能耗20%以上,更适合在野战条件下的军用装备中应用.  相似文献   

提出一种异构多核处理器工程科学计算加速协处理器(ESCA)体系结构,此体系结构可作为协处理器对计算密集型的应用提供计算加速.基于该ESCA协处理器的混合计算系统设计并行静态图像JPEG压缩编码算法的映射与实现,并在四核ESCA处理器原型上对JPEG压缩编码算法进行了性能评测.实验结果表明:针对计算密集型的应用,所提出的ESCA处理器具有良好的计算加速效果.  相似文献   

针对多核编程模型运行时环境易造成处理器核资源竞争加剧以及可扩展性较差等弊端,基于动态反馈控制思想,将资源分配、运行时控制、任务执行视为有机整体,提出了自适应协同调度模型ACSM.ACSM采用集中式与分布式相结合的协同机制,动态调节处理器核资源在不同应用负载间及其内部的分配与管理.ACSM的优势在于充分体现了多核编程模型良好的可编程性和可移植性,消除了传统多核运行时环境显式指定核数的弊端,增强了处理器核资源分配的高效性和自适应性.实验结果表明,ACSM在提高多核编程模型易用性的同时,减少了系统处理器核资源的不良竞争,提升了系统的整体性能和资源利用率.与仅依赖多核编程模型运行时环境的调度算法相比,ACSM使应用程序的运行时间缩短了近50%,并且随着应用程序数量的增加效果更加显著.  相似文献   

New Generation Processor Architecture Research   总被引:1,自引:0,他引:1  
With the rapid development of microelectronics and hardware, the use ot ever faster microprocessors and new architecture must be continued to meet tomorrow‘s computing needs. New processor microarchitectures are needed to push performance further and to use higher transistor counts effectively. At the same time, aiming at different usages, the processor has been optimized in different aspects,such as high performace,low power consumption,small chip area and high security. SOC (System on chip)and SCMP (Single Chip Multi Processor) constitute the main processor system architecture.  相似文献   

ARM作为嵌入式系统的处理器,具有低电压、低功耗和高集成度等特点,并具有开放性和可扩充性。ARM内核已成为嵌入式系统首选的处理器内核。USB移动存储技术把USB连接技术与Flash存储器技术结合在一起,构成一种快速、大容量、方便的新型数据交换系统。文中的实验设计针对含有ARM芯片的开发板进行开发,使设备通过USB接口和主机连接后,可以实现USB移动存储设备(如USB闪存盘)的读写功能。通过完成该实验,可以帮助实验者加深对基于ARM的开发及Nand Flash读写、USB协议的理解,提高实验者动手能力。  相似文献   

Efficiency of Cache Mechanism for Network Processors   总被引:1,自引:0,他引:1  
With the explosion of network bandwidth and the ever-changing requirements for diverse network-based applications, the traditional processing architectures, i.e., general purpose processor (GPP) and application specific integrated circuits (ASIC) cannot provide sufficient flexibility and high performance at the same time. Thus, the network processor (NP) has emerged as an alternative to meet these dual demands for today's network processing. The NP combines embedded multi-threaded cores with a rich memory hierarchy that can adapt to different networking circumstances when customized by the application developers. In today's NP architectures, multithreading prevails over cache mechanism, which has achieved great success in GPP to hide memory access latencies. This paper focuses on the efficiency of the cache mechanism in an NP. Theoretical timing models of packet processing are established for evaluating cache efficiency and experiments are performed based on real-life network backbone traces. Testing results show that an improvement of nearly 70% can be gained in throughput with assistance from the cache mechanism. Accordingly, the cache mechanism is still efficient and irreplaceable in network processing, despite the existing of multithreading.  相似文献   

无线传感器网络应用中经常需要对信息进行查询,而查询主要通过数据汇聚完成。过多的数据查询必然引起能量的快速消耗。通过分析无线传感器网络中基于查询的数据汇聚算法,并设计实现了基于树结构的数据汇集算法;并引入了能量估算模型和延时抖动模型,提出基于树结构的汇聚改进策略;同时基于 TOSSIM 仿真平台建立WSN网络;调节不同参数数值,测试数据汇聚算法的成功应答率、总能量消耗、总传输次数和总延迟等性能指标。实验表明:该改进算法通过数据汇聚算法在能源消耗和传输拥塞控制上达到较好的表现。  相似文献   

在考虑居民人均收入、消费需求结构、城镇化率、营养目标以及科技进步等诸多因素对饲料用粮相互反馈的基础上,基于系统动力学方法建立中国饲料用粮消费需求的预测模型,从系统的角度对未来中国饲料用粮消费需求的动态发展行为做出预测及趋势判断.预测结果表明,我国未来饲料用粮需求增幅较快,由2011年的11 307.1万t增长到2020年的15 310.7万t,年均增长率约为3.46%.这表明,随着科技进步、居民收入水平提高以及畜产品需求的增长,饲料用粮将成为影响中国未来粮食消费需求的重要因素.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号