GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
In this session, we will study a real CUDA application and use NVIDIA® Nsight Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.
In this session, we will study a real CUDA application and use NVIDIA® Nsight Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Topics:
Performance Optimization1, Tools & Libraries
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5173
Streaming:
Download:
Share:
 
Abstract:
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Topics:
Performance Optimization1, Tools & Libraries
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5174
Streaming:
Download:
Share:
 
Abstract:
In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilization of different components of a GPU by this kernel. Our focus will be on Kepler and Maxwell architectures.
In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilization of different components of a GPU by this kernel. Our focus will be on Kepler and Maxwell architectures.  Back
 
Topics:
Medical Imaging & Radiology, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5534
Streaming:
Download:
Share:
 
Abstract:
In this session, we will present a CUDA implementation to segment many 3D cubes on GPUs. Our implementation relies on an efficient strategy to decompose the work among blocks of threads. We will also analyze the performance of our code using NVIDIA® Nsight Visual Studio Edition.
In this session, we will present a CUDA implementation to segment many 3D cubes on GPUs. Our implementation relies on an efficient strategy to decompose the work among blocks of threads. We will also analyze the performance of our code using NVIDIA® Nsight Visual Studio Edition.  Back
 
Topics:
Developer - Algorithms, Seismic & Geosciences, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5555
Streaming:
Download:
Share:
 
Abstract:
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Topics:
Performance Optimization1
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4160
Streaming:
Download:
Share:
 
Abstract:
In this session, we will study a real CUDA application and use NVIDIA(R) Nsight(TM) Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.
In this session, we will study a real CUDA application and use NVIDIA(R) Nsight(TM) Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Topics:
Performance Optimization1
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4165
Streaming:
Download:
Share:
 
Abstract:
In that session we will present our work on the computation of the Greeks of multi-asset American options. We will describe our implementation of the Longstaff-Schwartz algorithm and explain the programming techniques used to obtain a very efficient code for the Andersen-QE path discretization. This solution was developed in collaboration with IBM and STAC and is used to calculate the Greeks in real-time on a single workstation with Tesla GPUs.
In that session we will present our work on the computation of the Greeks of multi-asset American options. We will describe our implementation of the Longstaff-Schwartz algorithm and explain the programming techniques used to obtain a very efficient code for the Andersen-QE path discretization. This solution was developed in collaboration with IBM and STAC and is used to calculate the Greeks in real-time on a single workstation with Tesla GPUs.  Back
 
Topics:
Finance
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4784
Streaming:
Download:
Share:
 
Abstract:

GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performance. However, the heterogeneity of hardware and the inherent bottlenecks involved make efficient resource utilization and strong scaling very challenging. This advanced session describes our recent efforts on multi-level load-balancing, kernel execution strategies, CPU-GPU work splitting, and ways to exploit Kepler features such as Hyper-Q. Join us to talk about current limits of GPU acceleration in MD, and how to take molecular dynamics simulations to 100 millisecond/iteration, equivalent to 10,000 fps, in the near future!

GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performance. However, the heterogeneity of hardware and the inherent bottlenecks involved make efficient resource utilization and strong scaling very challenging. This advanced session describes our recent efforts on multi-level load-balancing, kernel execution strategies, CPU-GPU work splitting, and ways to exploit Kepler features such as Hyper-Q. Join us to talk about current limits of GPU acceleration in MD, and how to take molecular dynamics simulations to 100 millisecond/iteration, equivalent to 10,000 fps, in the near future!

  Back
 
Topics:
Quantum Chemistry, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3011
Streaming:
Download:
Share:
 
Abstract:

The new Kepler GPU architecture introduces a new instruction: SHFL. This instruction allows threads in a warp to exchange values without using shared memory. In some cases, using the SHFL ("shuffle") instruction can significantly improve the performance. In this session we will present code patterns where SHFL helps improve the performance of your applications.

The new Kepler GPU architecture introduces a new instruction: SHFL. This instruction allows threads in a warp to exchange values without using shared memory. In some cases, using the SHFL ("shuffle") instruction can significantly improve the performance. In this session we will present code patterns where SHFL helps improve the performance of your applications.

  Back
 
Topics:
Programming Languages, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3174
Streaming:
Download:
Share:
 
Abstract:

The goal of this session is to present advanced techniques to optimize CUDA code on the GPU. In particular, we will demonstrate the use of advanced CUDA instructions (inline PTX, warp instructions, "extended" syncthreads) and load-balancing strategies to improve the performance of a sparse matrix-matrix multiplication on the GPU.

The goal of this session is to present advanced techniques to optimize CUDA code on the GPU. In particular, we will demonstrate the use of advanced CUDA instructions (inline PTX, warp instructions, "extended" syncthreads) and load-balancing strategies to improve the performance of a sparse matrix-matrix multiplication on the GPU.

  Back
 
Topics:
Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2285
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next