This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four!
This tutorial builds on the two previous sessions ("An Introduction to GPU Programming" and "An Introduction to GPU Memory Model") and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We'll demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. In the second part of the session, we'll focus on dynamic parallelism. We'll deliver a programming demo involving asynchronous operations. We'll also provide printed copies of the material to all attendees for each session - collect all four!