基于GPU的并行拟牛顿神经网络训练算法设计 Parallel BFGS quasi-Newton algorithm of neural network training based on GPU期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于GPU的并行拟牛顿神经网络训练算法设计

引用本文：	刘强,李佳峻.基于GPU的并行拟牛顿神经网络训练算法设计[J].河海大学学报(自然科学版),2018,46(5):458-463.

作者姓名：	刘强李佳峻

作者单位：	天津大学微电子学院;天津市成像与感知微电子技术重点实验室

基金项目：	国家自然科学基金(61574099)

摘要：	针对人工神经网络训练需要极强的计算能力和高效的最优解搜寻方法的问题,提出基于GPU的BFGS拟牛顿神经网络训练算法的并行实现。该并行实现将BFGS算法划分为不同的功能模块,针对不同模块特点采用混合的数据并行模式,充分利用GPU的处理和存储资源,取得较好的加速效果。试验结果显示:在复杂的神经网络结构下,基于GPU的并行神经网络的训练速度相比于基于CPU的实现方法最高提升了80倍;在微波器件的建模测试中,基于GPU的并行神经网络的速度相比于Neuro Modeler软件提升了430倍,训练误差在1%左右。
关键词：	神经网络 GPU 并行计算拟牛顿算法 OpenCL 加速算法
Parallel BFGS quasi-Newton algorithm of neural network training based on GPU

LIU Qiang and LI Jiajun.Parallel BFGS quasi-Newton algorithm of neural network training based on GPU[J].Journal of Hohai University (Natural Sciences ),2018,46(5):458-463.

Authors:	LIU Qiang and LI Jiajun

Institution:	School of Microelectronics, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China and School of Microelectronics, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China

Abstract:	One application challenge of the neural networks is the training, the essence of which is an iterative optimization process involving numerous data. The training process requires significant computing power and efficient searching methods for optimal solutions. To meet the requirement, a parallel BFGS quasi-Newton training algorithm based on GPU is proposed in this paper. To maximize the parallelism, the BFGS quasi-Newton algorithm is divided into different function modules, and each module is designed with a specific parallelism regarding its characteristics. In addition, the processing and memory resources of GPU are fully utilized to achieve a better parallelization. Experimental results show that the GPU implementation accelerates the neural network training by up to 80 times compared to the CPU implementation for the complicated neural network structure, while the speed up ratio is up to 430 for the modelling test of microwave device, where the training error is about 1%.

Keywords:	neural network GPU parallel computing quasi-Newton method OpenCL accelerating algorithm
本文献已被 CNKI 等数据库收录！
	点击此处可从《河海大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《河海大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏