首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Deep learning algorithms are the basis of many artificial intelligence applications. Those algorithms are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Thus various deep learning accelerators(DLAs) are proposed and applied to achieve better performance and lower power consumption. However, most deep learning accelerators are unable to support multiple data formats. This research proposes the MW-DLA, a deep learning accelerator supporting dynamic configurable data-width. This work analyzes the data distribution of different data types in different layers and trains a typical network with per-layer representation. As a result, the proposed MW-DLA achieves 2 X performance and more than 50% memory requirement for AlexNet with less than 5.77% area overhead.  相似文献   

2.
结合大数据的获取,深度神经网络关键技术广泛应用于图像分类、物体检测、语音识别和自然语言处理等领域.随着深度神经网络模型性能不断提升,模型体积和计算需求提高,以致其依赖高功耗的计算平台.为解决在实时嵌入式系统中的存储资源和内存访问带宽的限制,以及计算资源相对不足的问题,开展嵌入式应用的深度神经网络模型压缩技术研究,以便缩减模型体积和对存储空间的需求,优化模型计算过程.对模型压缩技术进行分类概述,包括模型裁剪、精细化模型设计、模型张量分解和近似计算和模型量化等,并对发展状况进行总结.为深度神经网络模型压缩技术的研究提供参考.  相似文献   

3.
针对受莱斯衰落影响的4QAM、16QAM、32QAM、64QAM、128QAM、256QAM六类信号,分别研究了卷积神经网络(CNN)模型以及特征参数结合深度神经网络(DNN)分类器模型的调制方式识别性能。CNN模型需要大量带标签的数据集以及很长的训练时间才能获得较好的识别性能,而特征参数结合深度神经网络分类器模型所需训练时间较短,但其分类性能受限于特征参数的设计。针对以上问题,研究了混合高阶矩作为特征参数集,再将DNN作为分类器对多进制正交幅度调制(MQAM)信号进行识别的方法。仿真结果表明,该方法在低信噪比情况下对受莱斯衰落影响的MQAM信号识别准确率高于CNN模型,且分类准确率上限明显高于采用高阶累积量作为特征参数的方法。  相似文献   

4.
深度神经网络技术在汉语语音识别声学建模中的优化策略   总被引:1,自引:1,他引:0  
将深度神经网络作为声学模型引入面向汉语电话自然口语交谈语音识别系统。针对自然口语中识别字错误率较高的问题,从语音的声学特征类型选择、模型训练时元参数调节以及改善模型泛化能力等方面出发,对基于深度神经网络的声学模型建模技术进行了一系列的优化。针对训练样本中状态先验概率分布稀疏的情况,提出了一种状态先验概率平滑算法,在一定程度上缓解了这种数据稀疏问题,经平滑后,字错误率下降超过1%。在所采用的3个电话自然口语交谈测试集上,相对于优化前的深度神经网络模型,经过优化后的模型取得了性能的一致提升,字错误率平均相对降低15%。实验结果表明,所采用优化策略可以有效地改善深度神经网络声学模型性能。  相似文献   

5.
随着互联网技术的快速发展,如何对海量网络信息进行挖掘分析,已成为热点和难点问题。推荐系统能够帮助用户在没有明确需求或者信息量巨大时解决信息过载的问题,为用户提供精准、快速的业务(如商品、项目、服务等)信息,成为近年来产业界和学术界共同的兴趣点和研究热点,但是,目前数据的种类多种多样并且应用场景广泛,在面对这种情况时,推荐系统也会遇到冷启动、稀疏矩阵等挑战。深度学习是机器学习的一个重要研究领域和分支,近年来发展迅猛。研究人员使用深度学习方法,在语音识别、图像处理、自然语言处理等领域都取得了很大的突破与成就。目前,深度学习在推荐领域也得到了许多研究人员的青睐,成为推荐领域的一个新方向。推荐方法中融合深度学习技术,可以有效解决传统推荐系统中冷启动、稀疏矩阵等问题,提高推荐系统的性能和推荐精度。文中主要对传统的推荐方法和当前深度学习技术中神经网络在推荐方法上的应用进行了归纳,其中传统推荐方法主要分为以下3类:1)基于内容推荐方法主要依据用户与项目之间的特征信息,用户之间的联系不会影响推荐结果,所以不存在冷启动和稀疏矩阵的问题,但是基于内容推荐的结果新颖程度低并且面临特征提取的问题。2)协同过滤推荐方法是目前应用最为广泛的一种方法,不需要有关用户或项目的信息,只基于用户和诸如点击、浏览和评级等项目的交互信息做出准确的推荐。虽然该方法简单有效但是会出现稀疏矩阵和冷启动的问题。3)混合推荐方法融合了前2种传统推荐方法的特点,能取得很好的推荐效果,但在处理文本、图像等多源异构辅助信息时仍面临一些挑战与困难。依据神经网络基于深度学习的推荐方法主要分为4类:基于深度神经网络(DNN)的推荐方法、基于卷积神经网络(CNN)的推荐方法、基于循环神经网络(RNN)和长短期记忆神经网络(LSTM)的推荐方法、基于图神经网络(GNN)的推荐方法、将深度学习技术融入到推荐领域,构造的模型具有以下优势:具有较强的表征能力,可以直接从内容中提取用户和项目特征;具有较强的抗噪能力,可以轻易地处理含有噪声的数据;可以对动态或者序列数据进行建模;可以更加精准地学习用户或项目特征;便于对数据进行统一处理,并且可以处理大规模数据。将深度学习技术应用到推荐领域,可以积极有效地应对传统推荐方法面临的挑战,提高推荐效果。  相似文献   

6.
Aiming at the reliability analysis of small sample data or implicit structural function, a novel structural reliability analysis model based on support vector machine(SVM) and neural network direct integration method(DNN) is proposed. Firstly, SVM with good small sample learning ability is used to train small sample data, fit structural performance functions and establish regular integration regions. Secondly, DNN is approximated the integral function to achieve multiple integration in the integration region. Finally, structural reliability was obtained by DNN. Numerical examples are investigated to demonstrate the effectiveness of the present method, which provides a feasible way for the structural reliability analysis.  相似文献   

7.
针对嵌入式单核处理器处理速度慢及主频提升受限等问题,提出了嵌入式双核处理器(two-cores embedded processor,TEP)模型.针对处理器运行时对存储器的依赖和分配问题,提出了基于非统一存储结构模拟分布式存储结构的方案;针对多核间对共享数据存储器的访存问题,给出了从属单元的仲裁机制,实现了共享资源的访问;针对面向多媒体应用的多核处理器间传输数据量大及通讯开销高的问题,提出了基于消息数据分离的传输方案.系统在FPGA平台进行了实现和验证,测试结果表明,TEP系统以较少的资源消耗和通讯开销获得了大加速比的性能.  相似文献   

8.
针对入侵检测系统因采用的网络攻击样本具有不平衡性而导致检测结果出现较大偏差的问题,文章提出一种将改进后的深度卷积生成对抗网络(DCGAN)与深度神经网络(DNN)相结合的入侵检测模型(DCGAN-DNN),深度卷积生成对抗网络能够通过学习已知攻击样本数据的内在特征分布生成新的攻击样本,并对深度卷积生成对抗网络中生成网络所用的线性整流(ReLU)激活函数作出改进,改善了均值偏移和神经元坏死的问题,提升了训练稳定性。使用CIC-IDS-2017数据集作为实验样本对模型进行评估,与传统的过采样方法相比DCGAN-DNN入侵检测模型对于未知攻击和少数攻击类型具有较高检测率。  相似文献   

9.
基于数据驱动方法诊断齿轮故障时一般会用傅里叶变换等进行特征提取,特征提取方法的选取对诊断结果影响很大.提出应用深度神经网络来诊断齿轮早期点蚀故障,直接以采集的振动信号作为网络输入,可以避免特征提取环节产生误差.此外,应用粒子群算法优化深度神经网络,使训练过程更稳定、诊断率更高.在分析结果时应用主成分分析法对网络输出进行降维.用实验采集的数据训练并测试网络,诊断正确率能达到90%之上,证明所提出的方法是合理、可用的.  相似文献   

10.
空间信息表示是增强图像特征表达性能的重要手段,通过空间关系建模与深度学习方法融合可有效提升深度特征的语义特性,从而提升图像检索性能.首先,针对复杂图像的空间关系表示提出了一种新的精细拓扑结构表示模型,该模型不仅具有完备的拓扑描述性能,还提供了两种拓扑不变量的推理算法,使得拓扑不变量可以由表示模型直接推导而不需要繁复的几何计算;其次,基于精细拓扑结构表示模型,提出了有效的拓扑结构相似性度量方法,为空间关系特征表达奠定了基础;最后,进一步结合卷积神经网络,提出融合复杂空间关系特征与深度特征的多目标图像检索方法.实验结果表明,所提出的拓扑结构表示模型在空间查询中具有良好的性能;所提出的图像检索框架取得优于现有方法的精度,并能够有效地结合手工特征与深度特征各自的优势,为提升深度学习方法的可解释性创造了有利条件.  相似文献   

11.
The devastating effects of wildland fire are an unsolved problem, resulting in human losses and the destruction of natural and economic resources. Convolutional neural network(CNN) is shown to perform very well in the area of object classification. This network has the ability to perform feature extraction and classification within the same architecture. In this paper, we propose a CNN for identifying fire in videos. A deep domain based method for video fire detection is proposed to extract a powerful feature representation of fire. Testing on real video sequences, the proposed approach achieves better classification performance as some of relevant conventional video based fire detection methods and indicates that using CNN to detect fire in videos is efficient. To balance the efficiency and accuracy, the model is fine-tuned considering the nature of the target problem and fire data. Experimental results on benchmark fire datasets reveal the effectiveness of the proposed framework and validate its suitability for fire detection in closed-circuit television surveillance systems compared to state-of-the-art methods.  相似文献   

12.
Deep learning accelerators(DLAs) have been proved to be efficient computational devices for processing deep learning algorithms. Various DLA architectures are proposed and applied to different applications and tasks. However, for most DLAs, their programming interfaces are either difficult to use or not efficient enough. Most DLAs require programmers to directly write instructions, which is time-consuming and error-prone. Another prevailing programming interface for DLAs is high-performance libraries and deep learning frameworks, which are easy to be used and very friendly to users, but their high abstraction level limits their control capacity over the hardware resources thus compromises the efficiency of the accelerator. A design of the programming interface is for DLAs. First various existing DLAs and their programming methods are analyzed and a methodology for designing programming interface for DLAs is proposed, which is a high-level assembly language(called DLA-AL), assembler and runtime for DLAs. DLA-AL is composed of a low-level assembly language and a set of high-level blocks. It allows experienced experts to fully exploit the potential of DLAs and achieve near-optimal performance. Meanwhile, by using DLA-AL, end-users who have little knowledge of the hardware are able to develop deep learning algorithms on DLAs spending minimal programming efforts.  相似文献   

13.
As performance requirements for bus-based embedded System-on-Chips(So Cs) increase, more and more on-chip application-specific hardware accelerators(e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point(P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area,while the latter provides higher bandwidth at the cost of routability. What's more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2 P interconnect insertion simultaneously.To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total So C latency under the constraints of So C area and total P2 P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s.  相似文献   

14.
卷积神经网络(CNN)已被广泛用于图像处理领域,且通常在CPU和GPU平台上进行计算,然而在CNN推理阶段存在CPU计算速度慢和GPU功耗高的问题。鉴于现场可编程门阵列(field programmable gate array,FPGA)能够实现计算速度和功耗的平衡,针对当前在卷积结构设计、流水线设计、存储优化方面存在的问题,设计了基于FPGA的卷积神经网络并行加速结构。首先将图像数据和权值数据定点化为16 bit定点数,一定程度上减少了乘加运算的复杂性;然后根据卷积计算的并行特性,设计了一种高并行流水线卷积运算电路,提高了卷积运算性能,同时也对与片外存储进行数据交互的流水线存储结构进行了优化,以减少数据传输的时间消耗。实验结果表明,整体加速器在ImageNet数据集上的识别率达到94.6%,与近年来相关领域的报道结果相比,本文在计算性能方面有一定的优势。  相似文献   

15.
Convolutional Neural Networks(CNNs) are widely used in computer vision, natural language processing,and so on, which generally require low power and high efficiency in real applications. Thus, energy efficiency has become a critical indicator of CNN accelerators. Considering that asynchronous circuits have the advantages of low power consumption, high speed, and no clock distribution problems, we design and implement an energy-efficient asynchronous CNN accelerator with a 65 nm Complementary Metal Oxide Semiconductor(CMOS) process. Given the absence of a commercial design tool flow for asynchronous circuits, we develop a novel design flow to implement Click-based asynchronous bundled data circuits efficiently to mask layout with conventional Electronic Design Automation(EDA) tools. We also introduce an adaptive delay matching method and perform accurate static timing analysis for the circuits to ensure correct timing. The accelerator for handwriting recognition network(LeNet-5 model)is implemented. Silicon test results show that the asynchronous accelerator has 30% less power in computing array than the synchronous one and that the energy efficiency of the asynchronous accelerator achieves 1.538 TOPS/W,which is 12% higher than that of the synchronous chip.  相似文献   

16.
为了提高情感语音合成的质量,提出一种采用多个说话人的情感训练语料,利用说话人自适应实现基于深度神经网络的情感语音合成方法。该方法应用文本分析获得语音对应的文本上下文相关标注,并采用WORLD声码器提取情感语音的声学特征;采用文本的上下文相关标注和语音的声学特征训练获得与说话人无关的深度神经网络平均音模型,用目标说话人的目标情感的训练语音和说话人自适应变换获得与目标情感的说话人相关的深度神经网络模型,利用该模型合成目标情感语音。主观评测表明,与传统的基于隐马尔科夫模型的方法比较,该方法合成的情感语音的主观评分更高。客观实验表明,合成的情感语音频谱更接近原始语音。所以,该方法能够提高合成情感语音的自然度和情感度。  相似文献   

17.
传统人体行为识别基于人工设计特征方法涉及的环节多,具有时间开销大,算法难以整体调优的缺点。以深度视频为研究对象,构建了3维卷积深度神经网络自动学习人体行为的时空特征,使用Softmax分类器进行人体行为的分类识别。实验结果表明,提出的方法能够有效提取人体行为的潜在特征,不但在MSR-Action3D数据集上能够获得与当前最好方法一致的识别效果,在UTKinect-Action3D数据集也能够获得与基准项目相当的识别效果。本方法的优势是不需要人工提取特征,特征提取和分类识别构成一个端到端的完整闭环系统,方法更加简单。同时,研究方法也验证了深度卷积神经网络模型具有良好的泛化性能,使用MSR-Action3D数据集训练的模型直接应用于UTKinect-Action3D数据集上行为的分类识别,同样获得了良好的识别效果。  相似文献   

18.
分布式实时多数据流的并发协作处理   总被引:1,自引:0,他引:1  
基于C/S模型分析了分布式实时计算环境中多数据流并发处理的事务问题.采用复杂事件驱动技术,设计了一个适于多数据流并发协作处理的事务机制ARTs-MDS,能够自组织地将对分布式实时数据产生作用的并发任务组成一个原子的事务单位.系统维护着一个永久性的请求/响应队列将物理的设备的实操作与实时事务相分离,通过保证并发协作任务的事务特性获得以分布式局部采集数据为基础产生的微观层操作行为协作产生的整体效果的原子性.测试结果分析表明:该系统能提高对连续的外部采集数据进行协作处理的实时响应,减少了数据丢失.  相似文献   

19.
准确的公交到站时间预测具有重要意义,但现实公交运行受突发路况影响,运行速度具有非平稳性,本文结合时序特征处理技术和深度学习,建立一种使用AVL数据预测公交到站时间的互补集合经验模态分解-长短期记忆神经网络模型。模型收集公交自动车辆定位数据,经预处理后引入互补集合经验模态分解平稳化公交运行速度,再借助Adam参数寻优后的长短期记忆神经网络对福州市303路公交某日早高峰公交到站时间进行预测。结果表明:优化的公交到站时间预测模型平均绝对误差比单一模型低了1.69min,预测精度高于长短期记忆神经网络模型和经验模态分解的到站时间预测模型,可有效地为安装车载自动车辆定位系统的公交线路预测公交到站时间提供参考。  相似文献   

20.
段勃  Wang  Wendi  Tan  Guangming  Meng  Dan 《高技术通讯(英文版)》2014,20(4):333-345
The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CU-DA GPGPU architectures and decouples the memory operations from the computing flow and orches-trates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号