SEARCH SESSIONS

Search All
 
Refine Results:
All tags
All Events
 
All Years
All Types

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

GPU computing is a transformational force in high performance computing and is enabling developers, engineers, programmers and researchers across a myriad of industry verticals, as well as academia to accelerate research and mission critical applications. See our featured sessions highlighting some of our best talks or delve head-long into the many other keynotes, technical sessions, presentations, research posters, webinars and tutorials we make available to you at any time on GTC On-Demand.

Development Tools & Libraries
Presentation
Media
Analysis-Driven Performance Optimization
Paulius Micikevicius
- NVIDIA
The goal of this session is to demystify performance optimization by transforming it into an analysis-driven process. ...Read More
The goal of this session is to demystify performance optimization by transforming it into an analysis-driven process. There are three fundamental limiters to kernel performance: instruction throughput, memory throughput, and latency. In this session we will describe: •how to use profiling tools and source code instrumentation to assess the significance of each limiter; •what optimizations to apply for each limiter; •how to determine when hardware limits are reached. Concepts will be illustrated with some examples and are equally applicable to both CUDA and OpenCL development. It is assumed that attendees are already familiar with the fundamental optimization techniques.   Back
 
Keywords:
Development Tools & Libraries, GTC 2010 - ID 2012
Download:
Energy Exploration
Presentation
Media
GPUs in Energy & Exploration: Software Development and Production
Paulius Micikevicius (NVIDIA), Paulo Souza (Petrobras), Alexander Loddoch (Chevron), Dave Nichols (Schlumberger), Mauricio Araya (Repsol)
This session will feature expert panelists that will share their experience adopting GPUs in their respective environments. Since 2009, these production systems have been boosting throughput, and shorten cycle times while delivering enhanced ima ...Read More

This session will feature expert panelists that will share their experience adopting GPUs in their respective environments. Since 2009, these production systems have been boosting throughput, and shorten cycle times while delivering enhanced images using NVIDIA technologies. Featured panelists will include: Hess, Schlumberger, Petrobras, Chevron and more.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S0628
Download:
Programming Languages & Compilers
Presentation
Media
Fundamental Performance Optimizations for GPUs
Paulius Micikevicius
- NVIDIA
This presentation covers the major CUDA optimizations. Topics will include: maximizing memory throughput, kernel launch configuration, using shared memory, and improving GPU/CPU interaction. ...Read More
This presentation covers the major CUDA optimizations. Topics will include: maximizing memory throughput, kernel launch configuration, using shared memory, and improving GPU/CPU interaction. While C for CUDA is used for illustration, the concepts covered will apply equally to programs written with OpenCL and DirectCompute APIs.  Back
 
Keywords:
Programming Languages & Compilers, Development Tools & Libraries, GTC 2010 - ID 2011
Streaming:
Download:
 
GPU Performance Analysis and Optimization
Paulius Micikevicius (NVIDIA)
This session will present the fundamental performance-optimization concepts and illustrate their practical application in the context of programming for Fermi and Kepler GPUs. The goal is twofold: make the optimization process a methodical seque ...Read More

This session will present the fundamental performance-optimization concepts and illustrate their practical application in the context of programming for Fermi and Kepler GPUs. The goal is twofold: make the optimization process a methodical sequence of steps, facilitate making performance-aware algorithmic decisions before coding even starts. In order to maximize GPU performance, a code should have sufficient parallelism, access memory in a coalesced pattern, and be amenable to vector execution within warps (groups of 32 threads). We will show how to quantify these requirements for a specific GPU in order to determine performance limiters and their importance for a given code. To address the limiters, we will review hardware operation specifics and related optimization techniques. Optimization process will be illustrated using NVIDIA profiling tools and kernel case studies.

  Back
 
Keywords:
Programming Languages & Compilers, GTC 2012 - ID S0514
Streaming:
Download:
 
Multi-GPU Programming
Paulius Micikevicius (NVIDIA)
CUDA releases starting with 4.0 include a number of features that facilitate multi-GPU programming and computing. In this session we will review the features useful for programming for multiple GPUs, both within a single node and across network. ...Read More

CUDA releases starting with 4.0 include a number of features that facilitate multi-GPU programming and computing. In this session we will review the features useful for programming for multiple GPUs, both within a single node and across network. We will cover peer-to-peer GPU communication, communication patterns for various GPU topologies, as well as streams in the context of multiple GPUs. Concepts will be illustrated with a case study of 3D forward wave modeling, common in seismic computing.

  Back
 
Keywords:
Programming Languages & Compilers, GTC 2012 - ID S0515
Streaming:
Download:
 
Performance Optimization: Programming Guidelines and GPU Architecture Details Behind Them
Paulius Micikevicius (NVIDIA)
The goal of this presentation is to describe GPU operation details underlying various performance optimization suggestions. Topics will include parallelism required to achieve high utilization of GPUs, instruction issue, warp execution and how i ...Read More

The goal of this presentation is to describe GPU operation details underlying various performance optimization suggestions. Topics will include parallelism required to achieve high utilization of GPUs, instruction issue, warp execution and how it relates to CUDA cores, various memories and how their accesses are processed, concurrent execution, and others. Emphasis will be on the Kepler architecture, but most concepts apply to previous GPU generations as well. Experimental results will be presented where appropriate.

  Back
 
Keywords:
Programming Languages & Compilers, GTC 2013 - ID S3466
Streaming:
Download:
Video & Image Processing
Presentation
Media
Implementing 3D Finite Difference Codes on the GPU
Paulius Micikevicius (NVIDIA)
This presentation reviews GPU parallelization of 3D finite difference computation over regular grids. 3DFD is a fundamental computation in many applications, including Reverse Time Migration in seismic computing. A single-GPU implementation is d ...Read More

This presentation reviews GPU parallelization of 3D finite difference computation over regular grids. 3DFD is a fundamental computation in many applications, including Reverse Time Migration in seismic computing. A single-GPU implementation is described first, followed up by a scalability study on a cluster of up to 8 GPUs. Performance results are compared to the theoretical limits of the hardware.

  Back
 
Keywords:
Video & Image Processing, Algorithms & Numerical Techniques, Development Tools & Libraries, Energy Exploration, Physics Simulation, GTC 2009 - ID 1006
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2014 NVIDIA Corporation Legal Info | Privacy Policy