GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Do you need to compute larger or faster than a single GPU allows? Learn how to scale your application to multiple GPUs. We'll explain how to use the different available multi-GPU programming models and describe their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.
Do you need to compute larger or faster than a single GPU allows? Learn how to scale your application to multiple GPUs. We'll explain how to use the different available multi-GPU programming models and describe their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.  Back
 
Topics:
Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9139
Streaming:
Download:
Share:
 
Abstract:
Most large companies use online analytical processing (OLAP) to gain insight from available data and guide business decisions. To support time-critical business decisions, companies must answer queries as quickly as possible. For OLAP, the performance bottlenecks are joins of large relations. GPUs can significantly accelerate these joins, but often the speed or memory capacity of a single GPU is not sufficient to join input tables or unable to do it quickly enough. We'll discuss how we're addressing these problems by proposing join algorithms that scale to multiple GPUs.
Most large companies use online analytical processing (OLAP) to gain insight from available data and guide business decisions. To support time-critical business decisions, companies must answer queries as quickly as possible. For OLAP, the performance bottlenecks are joins of large relations. GPUs can significantly accelerate these joins, but often the speed or memory capacity of a single GPU is not sufficient to join input tables or unable to do it quickly enough. We'll discuss how we're addressing these problems by proposing join algorithms that scale to multiple GPUs.  Back
 
Topics:
Accelerated Data Science, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9557
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and about their individual advantages. All programming models will be introduced using the example of applying a domain decomposition strategy.
Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and about their individual advantages. All programming models will be introduced using the example of applying a domain decomposition strategy.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8121
Streaming:
Download:
Share:
 
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
Programming Languages, Tools & Libraries, Developer Tools
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8316
Streaming:
Share:
 
Abstract:
Early on, memory bandwidths, more than an order of magnitude higher than conventional processors have made GPUs an attractive platform for data-intensive applications. While there are many success stories about GPU-accelerated databases built from scratch, GPU-accelerated operations for large-scale, general-purpose databases are rather an exception than the norm. We characterize fundamental database operators like scan, filter, join, and group-by based on their memory access patterns. From these characteristics, we derive their potential for GPU acceleration, such as upper bounds for performance on current and future architectures. Starting from basic GPU implementations, we deep dive into aspects like optimizing data transfers, access, and layout, etc.
Early on, memory bandwidths, more than an order of magnitude higher than conventional processors have made GPUs an attractive platform for data-intensive applications. While there are many success stories about GPU-accelerated databases built from scratch, GPU-accelerated operations for large-scale, general-purpose databases are rather an exception than the norm. We characterize fundamental database operators like scan, filter, join, and group-by based on their memory access patterns. From these characteristics, we derive their potential for GPU acceleration, such as upper bounds for performance on current and future architectures. Starting from basic GPU implementations, we deep dive into aspects like optimizing data transfers, access, and layout, etc.  Back
 
Topics:
Accelerated Data Science, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8289
Streaming:
Download:
Share:
 
Abstract:
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We''ll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we''ll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We''ll also cover the latest improvements with CUDA-aware MPI, interaction with unified memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We''ll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we''ll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We''ll also cover the latest improvements with CUDA-aware MPI, interaction with unified memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8314
Streaming:
Download:
Share:
 
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
HPC and AI, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23031
Download:
Share:
 
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7142
Download:
Share:
 
Abstract:

Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with Unified Memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.

Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with Unified Memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.

  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7133
Download:
Share:
 
Abstract:
Learn how to use unified memory to improve your productivity in accelerating applications with OpenACC. Using a Lattice Boltzmann CFD solver as an example, we'll explain how a profile-driven approach allows one to incrementally accelerate an application with OpenACC and unified memory. Besides the productivity gain, a primary advantages of this approach is that it is very accessible also for developers new to a project and therefore not familiar with the whole code base.
Learn how to use unified memory to improve your productivity in accelerating applications with OpenACC. Using a Lattice Boltzmann CFD solver as an example, we'll explain how a profile-driven approach allows one to incrementally accelerate an application with OpenACC and unified memory. Besides the productivity gain, a primary advantages of this approach is that it is very accessible also for developers new to a project and therefore not familiar with the whole code base.  Back
 
Topics:
Computational Fluid Dynamics, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6134
Streaming:
Download:
Share:
 
Abstract:
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, the multi process service (MPS aka Hyper-Q for MPI), and MPI support in the NVIDIA performance analysis tools.
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, the multi process service (MPS aka Hyper-Q for MPI), and MPI support in the NVIDIA performance analysis tools.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries, OpenACC
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6142
Streaming:
Download:
Share:
 
Abstract:

In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA and also covers advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. The latest improvements with CUDA-aware MPI, the Multi Process Service (MPS aka Hyper-Q for MPI) and MPI support in the NVIDIA performance analysis tools are covered.

In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA and also covers advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. The latest improvements with CUDA-aware MPI, the Multi Process Service (MPS aka Hyper-Q for MPI) and MPI support in the NVIDIA performance analysis tools are covered.

  Back
 
Topics:
HPC and Supercomputing, Data Center & Cloud Infrastructure
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5117
Streaming:
Download:
Share:
 
Abstract:
In this session you will learn how to program GPU clusters using the message passing interface (MPI) and OpenACC or CUDA. Part I of this session will explain how to get started by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Part II will explain more advanced topics like GPU-aware MPI and how to overlap communication with computation to hide communication times. Finally, Part III will cover how to use the NVIDIA performance analysis tools in a MPI environment and give an overview of third party tools specifically designed for GPU clusters.
In this session you will learn how to program GPU clusters using the message passing interface (MPI) and OpenACC or CUDA. Part I of this session will explain how to get started by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Part II will explain more advanced topics like GPU-aware MPI and how to overlap communication with computation to hide communication times. Finally, Part III will cover how to use the NVIDIA performance analysis tools in a MPI environment and give an overview of third party tools specifically designed for GPU clusters.  Back
 
Topics:
HPC and Supercomputing, Performance Optimization1
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4236
Streaming:
Download:
Share:
 
Abstract:
To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session a LBM code applying a D2Q37 model is used as a case study to explain by example how both targets can be met. The compute intensive collide kernel of the LBM code is optimized for Kepler specifically considering the large amount of state needed per thread due to the complex D2Q37 model. To gain efficient inter GPU communication CUDA-aware MPI was used. We explain how this was done and present performance results on a Infiniband Cluster with GPUDirect RDMA.
To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session a LBM code applying a D2Q37 model is used as a case study to explain by example how both targets can be met. The compute intensive collide kernel of the LBM code is optimized for Kepler specifically considering the large amount of state needed per thread due to the complex D2Q37 model. To gain efficient inter GPU communication CUDA-aware MPI was used. We explain how this was done and present performance results on a Infiniband Cluster with GPUDirect RDMA.  Back
 
Topics:
HPC and Supercomputing, Computational Fluid Dynamics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4186
Streaming:
Download:
Share:
 
Abstract:

Always wanted to know what NVIDIA GPUDirect is about and how your MPI+CUDA application can benefit from using it? In this session you will learn how MPI implementations take advantage of GPUDirect technologies to make your applications run faster, including peer-to-peer communication and RDMA. We will introduce several free and commercial CUDA-aware MPI implementations that are available today, ans show and how easy it is to use them. And we will present performance gainsfor real world applications as well as micro benchmarks. If you are working on a MPI+GPU application, then don''t miss this session to learn how NVIDIA GPUDirect and CUDA-aware MPI can give you improved performance, improved usability and better maintainability of your code.

Always wanted to know what NVIDIA GPUDirect is about and how your MPI+CUDA application can benefit from using it? In this session you will learn how MPI implementations take advantage of GPUDirect technologies to make your applications run faster, including peer-to-peer communication and RDMA. We will introduce several free and commercial CUDA-aware MPI implementations that are available today, ans show and how easy it is to use them. And we will present performance gainsfor real world applications as well as micro benchmarks. If you are working on a MPI+GPU application, then don''t miss this session to learn how NVIDIA GPUDirect and CUDA-aware MPI can give you improved performance, improved usability and better maintainability of your code.

  Back
 
Topics:
Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3047
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next