Data-Parallel Kernels Speeding CPU–GPU Performance Goyat Suman1,*, Soni A.K.2,** 1Research Scholar, Department of Computer Science & Engg., Sharda University, Greater Noida, U.P., India 2Professor, Department of Computer Science & Engg., Sharda University, Greater Noida, U.P., India *(Corresponding author) email id: goyat1606026@gmail.com
**ak.soni@sharda.ac.in
Abstract It is a well known fact by now that central processing unit (CPU) handles non-data-parallel work, whereas graphics processing unit (GPU) handles data parallel work because it has very large number of cores. The single CPU–GPU combination is easily handled by a single kernel. In this paper, we will move to single-kernel multiple devices (SKMD). SKMD takes care of collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. It is the job of the programmer to develop a single data-parallel kernel in Open CL, whereas system automatically does the partitioning of the workload across an arbitrary set of devices, generates kernels to execute the partial workloads and efficiently merges the partial outputs together. The goal is performance improvement by maximally utilising all available resources to execute the kernel when having SKMD. On real hardware, SKMD achieves an average speed of 35% on a system with one multi-core CPU and two asymmetric GPUs compared with a fastest device execution strategy for a set of popular Open CL kernels. Top Keywords SKMD (single-kernel multiple device), Open CL, Data parallel kernel, GPGPU, Historical database, Scheduling Decisions, Service-Oriented Architecture (SOA), Heterogenous system Architecture (HSA). Top |