首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于流水线架构的卷积神经网络FPGA实现
引用本文:崔江伟,周勇胜,张帆,尹嫱,项德良.基于流水线架构的卷积神经网络FPGA实现[J].北京化工大学学报(自然科学版),2021,48(5):111-118.
作者姓名:崔江伟  周勇胜  张帆  尹嫱  项德良
作者单位:北京化工大学 信息科学与技术学院, 北京 100029
摘    要:卷积神经网络(CNN)已被广泛用于图像处理领域,且通常在CPU和GPU平台上进行计算,然而在CNN推理阶段存在CPU计算速度慢和GPU功耗高的问题。鉴于现场可编程门阵列(field programmable gate array,FPGA)能够实现计算速度和功耗的平衡,针对当前在卷积结构设计、流水线设计、存储优化方面存在的问题,设计了基于FPGA的卷积神经网络并行加速结构。首先将图像数据和权值数据定点化为16 bit定点数,一定程度上减少了乘加运算的复杂性;然后根据卷积计算的并行特性,设计了一种高并行流水线卷积运算电路,提高了卷积运算性能,同时也对与片外存储进行数据交互的流水线存储结构进行了优化,以减少数据传输的时间消耗。实验结果表明,整体加速器在ImageNet数据集上的识别率达到94.6%,与近年来相关领域的报道结果相比,本文在计算性能方面有一定的优势。

关 键 词:卷积神经网络  现场可编程门阵列(FPGA)  硬件加速器  流水线  并行结构  
收稿时间:2021-03-26

Field programmable gate array implementation of a convolutional neural network based on a pipeline architecture
CUI JiangWei,ZHOU YongSheng,ZHANG Fan,YIN Qiang,XIANG DeLiang.Field programmable gate array implementation of a convolutional neural network based on a pipeline architecture[J].Journal of Beijing University of Chemical Technology,2021,48(5):111-118.
Authors:CUI JiangWei  ZHOU YongSheng  ZHANG Fan  YIN Qiang  XIANG DeLiang
Institution:College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
Abstract:Convolutional neural networks (CNN) have been widely used in the field of image processing, and are usually calculated on CPU and GPU platforms. However, in the CNN inference stage, there are problems with slow CPU calculations and high GPU power consumption. Field programmable gate array (FPGA) offers a balance of calculation speed and power consumption. In view of the current problems in convolution structure design, pipeline design, and storage optimization, a convolutional neural network parallel acceleration structure based on FPGA has been designed in this work. First, the image data and weight data are fixed-pointed into 16-bit fixed-point numbers, which reduces the complexity of multiplication and addition operations to a certain extent. Then, according to the parallel characteristics of convolution calculations, a high-parallel pipelined convolution operation circuit is designed, which improves computational performance. At the same time, the pipeline storage structure for data interaction with off-chip storage is optimized in order to reduce the time consumption of data transmission. Experimental results show that the recognition rate of the overall accelerator in the ImageNet data set reached 94.6%, which is higher than other recent reported values. This work clearly shows that FPGAs offer certain advantages in terms of computational performance.
Keywords:convolution neural network                                                                                                                        field programmable gate array(FPGA)                                                                                                                        hardware accelerator                                                                                                                        pipeline                                                                                                                        parallel structure
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号