SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8980
Streaming:
Download:
Share:
 
Abstract:
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8979
Streaming:
Download:
Share:
 
Abstract:
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8981
Download:
Share:
 
Abstract:
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8982
Download:
Share:
 
Abstract:
This tutorial is for those with some background in CUDA including an understanding of the CUDA memory model and streaming multiprocessor. Our earlier tutorials will provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. This session will include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session collect all four!
This tutorial is for those with some background in CUDA including an understanding of the CUDA memory model and streaming multiprocessor. Our earlier tutorials will provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. This session will include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session collect all four!  Back
 
Topics:
Programming Languages, Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5664
Streaming:
Share:
 
Abstract:

This tutorial builds on the two previous sessions (An Introduction to GPU Programming and the Introduction to GPU Memory Model) and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism.A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session – collect all four!

This tutorial builds on the two previous sessions (An Introduction to GPU Programming and the Introduction to GPU Memory Model) and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism.A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session – collect all four!

  Back
 
Topics:
Programming Languages, Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5663
Streaming:
Share:
 
Abstract:
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations and dynamic parallelism will be included.
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations and dynamic parallelism will be included.  Back
 
Topics:
Programming Languages, Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4701
Streaming:
Share:
 
Abstract:
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented and highlighting the differences between the Fermi and Kepler architecture. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures.
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented and highlighting the differences between the Fermi and Kepler architecture. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures.  Back
 
Topics:
Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4702
Streaming:
Share:
 
Abstract:

Get the low down on debugging and profiling your GPU program from Dan Cyca, Chief Technology Officer, Acceleware.This webinar dives deep into profiling techniques and the tools available to help you optimize your code. We will demonstrate NVIDIA’s Visual Profiler, nvcc flags and cuobjdump and highlight the various methods available for understanding the performance of your CUDA program.The second part of the webinar will focus on debugging techniques and available tools to help you identify issues in your kernels. The latest debugging tools provided in CUDA 5.5 including Nsight and cuda-memcheck will be presented.  

Get the low down on debugging and profiling your GPU program from Dan Cyca, Chief Technology Officer, Acceleware.This webinar dives deep into profiling techniques and the tools available to help you optimize your code. We will demonstrate NVIDIA’s Visual Profiler, nvcc flags and cuobjdump and highlight the various methods available for understanding the performance of your CUDA program.The second part of the webinar will focus on debugging techniques and available tools to help you identify issues in your kernels. The latest debugging tools provided in CUDA 5.5 including Nsight and cuda-memcheck will be presented.  

  Back
 
Topics:
Tools & Libraries
Type:
Webinar
Event:
GTC Webinars
Year:
2013
Session ID:
GTCE068
Download:
Share: