Starting from the fundamentals of parallel programming in CUDA C/C++, learn how to maximize your development productivity. We present a design cycle we call APOD: Assess, Parallelize, Optimize, and Deploy, which helps application developers to rapidly identify the portions of their code that would most readily benefit from GPU acceleration, rapidly realize that benefit, and begin leveraging the resulting speedups in production as early as possible.
When integrating CUDA C++ kernels into existing C++ applications, it is at times desirable to migrate a C++ object instance from the host to the device or vice versa. Given variations among host compilers regarding structure layout, accomplishing this data marshalling in a manner that is reliable, simple, and efficient is a complex issue. cudaMemcpy is our primary means to transfer data to the GPU, but memcpy-style operations are more readily amenable to C-style structures and arrays than to C++ objects or collections of objects. In this session, we will cover the caveats and best practices for marshalling C++ data.
OpenCL is Khronos' new open standard for parallel programming of heterogeneous systems. This tutorial session will introduce the main concepts behind the standard and illustrate them with some simple code walkthrough. Attendees will also learn how to make efficient use of the API to achieve good performance on the GPU.