SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
Explore the memory model of the GPU! This session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8980
Streaming:
Download:
Share:
 
Abstract:
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8979
Streaming:
Download:
Share:
 
Abstract:
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations will be delivered. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8981
Download:
Share:
 
Abstract:
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented. Cooperative groups will also be introduced as an additional optimization technique. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8982
Download:
Share:
 
Abstract:
Join us for an informative introductory tutorial intended for those new to CUDA and which serves as the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We'll explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. We'll deliver a programming demonstration of a simple CUDA kernel. We'll also provide printed copies of the material to all attendees for each session - collect all four!
Join us for an informative introductory tutorial intended for those new to CUDA and which serves as the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We'll explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. We'll deliver a programming demonstration of a simple CUDA kernel. We'll also provide printed copies of the material to all attendees for each session - collect all four!  Back
 
Topics:
Programming Languages, Other
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7699
Download:
Share:
 
Abstract:

This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four!

This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four!

  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7700
Download:
Share:
 
Abstract:

This tutorial builds on the two previous sessions ("An Introduction to GPU Programming" and "An Introduction to GPU Memory Model") and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We'll demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. In the second part of the session, we'll focus on dynamic parallelism. We'll deliver a programming demo involving asynchronous operations. We'll also provide printed copies of the material to all attendees for each session - collect all four!

This tutorial builds on the two previous sessions ("An Introduction to GPU Programming" and "An Introduction to GPU Memory Model") and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We'll demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. In the second part of the session, we'll focus on dynamic parallelism. We'll deliver a programming demo involving asynchronous operations. We'll also provide printed copies of the material to all attendees for each session - collect all four!

  Back
 
Topics:
Programming Languages, Other
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7705
Download:
Share:
 
Abstract:
This tutorial is for those with some background in CUDA, including an understanding of the CUDA memory model and streaming multiprocessor. Our previous three tutorials provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency, and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. It'll also include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. We'll provide printed copies of the material to all attendees for each session ? collect all four!
This tutorial is for those with some background in CUDA, including an understanding of the CUDA memory model and streaming multiprocessor. Our previous three tutorials provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency, and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. It'll also include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. We'll provide printed copies of the material to all attendees for each session ? collect all four!  Back
 
Topics:
Programming Languages, Other
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7706
Download:
Share:
 
Abstract:
Join Acceleware and SPEAG/Zurich MedTech to learn how GPU-enabled subgridding for the finite difference time domain (FDTD) algorithm can substantially reduce runtimes for electromagnetic simulations of human interface technology. We'll focus on real-life examples, including an RF-powered contact lens, a wireless capsule endoscopy, and a smart watch. We'll also outline the basics of the subgridding algorithm along with the GPU implementation and the development challenges. Performance results will illustrate the significant reduction in computation times when using a localized subgridded mesh running on an NVIDIA Tesla GPU.
Join Acceleware and SPEAG/Zurich MedTech to learn how GPU-enabled subgridding for the finite difference time domain (FDTD) algorithm can substantially reduce runtimes for electromagnetic simulations of human interface technology. We'll focus on real-life examples, including an RF-powered contact lens, a wireless capsule endoscopy, and a smart watch. We'll also outline the basics of the subgridding algorithm along with the GPU implementation and the development challenges. Performance results will illustrate the significant reduction in computation times when using a localized subgridded mesh running on an NVIDIA Tesla GPU.  Back
 
Topics:
Signal and Audio Processing, Computational Biology & Chemistry, Computer-Aided Engineering
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6552
Streaming:
Download:
Share:
 
Abstract:

New to CUDA? Join this free foundational webinar on Wednesday, June 8 to gain essential programming knowledge.

Even those with some CUDA experience can benefit by refreshing the key concepts required for future optimization tutorials.

The course begins with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model, fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. 

New to CUDA? Join this free foundational webinar on Wednesday, June 8 to gain essential programming knowledge.

Even those with some CUDA experience can benefit by refreshing the key concepts required for future optimization tutorials.

The course begins with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model, fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. 

  Back
 
Topics:
Tools & Libraries
Type:
Webinar
Event:
GTC Silicon Valley
Year:
2016
Session ID:
GTCE127
Streaming:
Download:
Share:
 
Abstract:
This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. To learn the basics of CUDA programming required for this session, attend the session entitled An Introduction to GPU Programming. This session begins with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session collect all four!
This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. To learn the basics of CUDA programming required for this session, attend the session entitled An Introduction to GPU Programming. This session begins with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session collect all four!  Back
 
Topics:
Programming Languages, Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5662
Streaming:
Share:
 
Abstract:

Join us for an informative introductory tutorial intended for those new to CUDA and is the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization.The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session – collect all four!

Join us for an informative introductory tutorial intended for those new to CUDA and is the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization.The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be delivered. Printed copies of the material will be provided to all attendees for each session – collect all four!

  Back
 
Topics:
Programming Languages, Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5661
Streaming:
Share:
 
Abstract:
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided.
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided.  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4699
Streaming:
Share:
 
Abstract:
Explore the memory model of the GPU! The session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features available in the Kepler architecture such as the shuffle instruction, shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.
Explore the memory model of the GPU! The session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features available in the Kepler architecture such as the shuffle instruction, shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.  Back
 
Topics:
Programming Languages, Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4700
Streaming:
Share:
 
Abstract:

Join Chris Mason, Product Manager, Acceleware for an informative introduction to CUDA programming. The webinar will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. Chris will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided. 

Join Chris Mason, Product Manager, Acceleware for an informative introduction to CUDA programming. The webinar will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. Chris will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided. 

  Back
 
Topics:
Programming Languages
Type:
Webinar
Event:
GTC Webinars
Year:
2014
Session ID:
GTCE088
Streaming:
Share:
 
Abstract:

Join Chris Mason, Product Manager at Acceleware, and explore the memory model of the GPU! The webinar will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. Chris will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features available in the Kepler architecture such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.

Join Chris Mason, Product Manager at Acceleware, and explore the memory model of the GPU! The webinar will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. Chris will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features available in the Kepler architecture such as shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.

  Back
 
Topics:
Programming Languages
Type:
Webinar
Event:
GTC Webinars
Year:
2014
Session ID:
GTCE091
Streaming:
Share:
 
Abstract:

Join Chris Mason, Product Manager at Acceleware, as he leads attendees in a deep dive into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. Chris will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the webinar will focus on dynamic parallelism. 

Join Chris Mason, Product Manager at Acceleware, as he leads attendees in a deep dive into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. Chris will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the webinar will focus on dynamic parallelism. 

  Back
 
Topics:
Programming Languages
Type:
Webinar
Event:
GTC Webinars
Year:
2014
Session ID:
GTCE095
Streaming:
Share:
 
Abstract:

Learn how to optimize your algorithms for NVIDIA GPUs. This informative webinar will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The webinar will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms Dan will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented including a comparison between the Fermi and Kepler architectures.  

Learn how to optimize your algorithms for NVIDIA GPUs. This informative webinar will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The webinar will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms Dan will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented including a comparison between the Fermi and Kepler architectures.  

  Back
 
Topics:
Programming Languages
Type:
Webinar
Event:
GTC Webinars
Year:
2014
Session ID:
GTCE100
Streaming:
Share:
 
Abstract:

Join Chris Mason, Product Manager, Acceleware, for an informative introduction to GPU Programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy.

Join Chris Mason, Product Manager, Acceleware, for an informative introduction to GPU Programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy.

  Back
 
Topics:
Programming Languages
Type:
Webinar
Event:
GTC Webinars
Year:
2013
Session ID:
GTCE056
Streaming:
Download:
Share:
 
Abstract:

Join Chris Mason, Product Manager, Acceleware and explore the memory model of the GPU and the memory enhancements available in the new Kepler architecture and how these will affect your performance optimization. The webinar will begin with an essential overview of GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. The shuffle instruction, new shared memory configurations and Read-Only Data Cache of the Kepler architecture are introduced and optimization techniques discussed.

Join Chris Mason, Product Manager, Acceleware and explore the memory model of the GPU and the memory enhancements available in the new Kepler architecture and how these will affect your performance optimization. The webinar will begin with an essential overview of GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. The shuffle instruction, new shared memory configurations and Read-Only Data Cache of the Kepler architecture are introduced and optimization techniques discussed.

  Back
 
Topics:
Tools & Libraries
Type:
Webinar
Event:
GTC Webinars
Year:
2013
Session ID:
GTCE066
Streaming:
Share:
 
Abstract:

Join us for an informative introduction to GPU Programming. The session will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided. Introduction to GPU Programming - CUDA overview - Data-parallelism - GPU programming model o GPU kernels o Host vs. device responsibilities o CUDA syntax o Thread hierarchy - Programming Demo: Simple CUDA Kernels

Join us for an informative introduction to GPU Programming. The session will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided. Introduction to GPU Programming - CUDA overview - Data-parallelism - GPU programming model o GPU kernels o Host vs. device responsibilities o CUDA syntax o Thread hierarchy - Programming Demo: Simple CUDA Kernels

  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2614
Streaming:
Download:
Share:
 
Abstract:

Explore the memory model of the GPU. The first part of the session covers task parallelism and thread cooperation in GPU computing. The second part focuses on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. A programming demonstration of shared memory will be delivered. Introduction to the GPU Architecture and Memory Model - Task parallelism - Thread cooperation in GPU computing - GPU memory model o Shared memory o Constant memory o Global memory - Programming Demo: Shared Memory

Explore the memory model of the GPU. The first part of the session covers task parallelism and thread cooperation in GPU computing. The second part focuses on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. A programming demonstration of shared memory will be delivered. Introduction to the GPU Architecture and Memory Model - Task parallelism - Thread cooperation in GPU computing - GPU memory model o Shared memory o Constant memory o Global memory - Programming Demo: Shared Memory

  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2615
Streaming:
Download:
Share:
 
Abstract:

Get the low down on debugging your GPU program. This session includes discussion on debugging techniques and tools to help you identify issues in your kernels. The latest debugging tools provided in CUDA 4.1 including Parallel NSight, cuda-gdb and cuda-memcheck will be discussed. A programming demonstration of Parallel NSight will be provided. Debugging GPU Programs - Debugging tools and techniques - cuda-gdb - Parallel NSight - cuda-memcheck - Programming Demo: Parallel NSight

Get the low down on debugging your GPU program. This session includes discussion on debugging techniques and tools to help you identify issues in your kernels. The latest debugging tools provided in CUDA 4.1 including Parallel NSight, cuda-gdb and cuda-memcheck will be discussed. A programming demonstration of Parallel NSight will be provided. Debugging GPU Programs - Debugging tools and techniques - cuda-gdb - Parallel NSight - cuda-memcheck - Programming Demo: Parallel NSight

  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2616
Streaming:
Download:
Share:
 
Abstract:

Learn how to optimize and profile your algorithms for the GPU. This session will cover the essentials of code optimization and will include: arithmetic optimizations, warps, branching efficiency, memory latency/occupancy and memory performance optimizations. Real life commercial examples will be discussed to highlight the critical aspects of GPU optimization techniques. A programming demonstration using the NVIDIA Visual Profiler will be included. Introduction to Optimizations and Profiling - Arithmetic optimizations - Warps - Branching efficiency - Memory latency/Occupancy - Memory performance optimizations - Programming Demo: Visual Profiler

Learn how to optimize and profile your algorithms for the GPU. This session will cover the essentials of code optimization and will include: arithmetic optimizations, warps, branching efficiency, memory latency/occupancy and memory performance optimizations. Real life commercial examples will be discussed to highlight the critical aspects of GPU optimization techniques. A programming demonstration using the NVIDIA Visual Profiler will be included. Introduction to Optimizations and Profiling - Arithmetic optimizations - Warps - Branching efficiency - Memory latency/Occupancy - Memory performance optimizations - Programming Demo: Visual Profiler

  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2617
Streaming:
Download:
Share:
 
Speakers:
Chris Mason
- Acceleware
Abstract:
Learn about Acceleware''s and Dassault Systemes'' integrated solution that performs an LDL^T factorization on GPUs within the Abaqus software package. We will discuss efficient GPU parallelization of the factorization algorithm and enabling the CPU and GPU to overlap their computations and data transfers. Includes an end user simulation case study and GPU performance measurements including 300 GFlops in single precision and 145 GFlops in double precision on NVIDIA Tesla C2050.
Learn about Acceleware''s and Dassault Systemes'' integrated solution that performs an LDL^T factorization on GPUs within the Abaqus software package. We will discuss efficient GPU parallelization of the factorization algorithm and enabling the CPU and GPU to overlap their computations and data transfers. Includes an end user simulation case study and GPU performance measurements including 300 GFlops in single precision and 145 GFlops in double precision on NVIDIA Tesla C2050.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102208
Download:
Share: