SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Abstract:
Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8980
Streaming:
Download:
Share:
 
Abstract:
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8979
Streaming:
Download:
Share:
 
Abstract:
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8981
Download:
Share:
 
Abstract:
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8982
Download:
Share: