首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Parallelization and Performance Optimization on Face Detection Algorithm with OpenCL: A Case Study
Authors:Weiyan Wang  Yunquan Zhang  Shengen Yan  Ying Zhang  Haipeng Jia  Laboratory of Parallel Software and Computational Science
Institution:Key Laboratory of Computer Science, Institute of Software, the Chinese Academy of Science, Beijing 100190, China; 3. Graduate University of Chinese Academy of Sciences, the Chinese Academy of Science, Beijing 100190, China; 4. Ocean University of China, Qingdao 26610, China
Abstract:Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today’s bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very nave implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore, those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU.
Keywords:Viola-Jones  OpenCL  time cost hidden  local memory usage  parallel granularity
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号