GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acceleration for LQCD applications like MILC and Chromaon on six generations of GPUs with various network and node configurations, including IBM POWER9- and x86-based systems and NVLink- and NVSwitch-based systems. Based on those experiences, we'll discuss best practices for scaling to hundreds and even thousands of GPUs. We'll also cover peer-to-peer memory access, GPUDirect RDMA, and NVSHMEM, as well as the techniques such as auto-tuning kernel launch configurations.
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acceleration for LQCD applications like MILC and Chromaon on six generations of GPUs with various network and node configurations, including IBM POWER9- and x86-based systems and NVLink- and NVSwitch-based systems. Based on those experiences, we'll discuss best practices for scaling to hundreds and even thousands of GPUs. We'll also cover peer-to-peer memory access, GPUDirect RDMA, and NVSHMEM, as well as the techniques such as auto-tuning kernel launch configurations.  Back
 
Topics:
HPC and Supercomputing, Performance Optimization, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9708
Streaming:
Download:
Share:
 
Abstract:

Less code, more performance! Runtime compilation with NVRTC offers many potential benefits to new and existing codes, but also presents challenges when it comes to implementation. To help solve this dilemma, we've developed a small C++ library called "Jitify" that hides the complexities of runtime compilation behind a simple, high-level interface. Jitify takes care of issues like kernel caching, template instantiation, type reflection, and compilation of host code for the device. It also provides a convenient parallel_for function and lambda wrapper that enables dynamic runtime selection of host or device execution. Since source code passed to NVRTC does not require CUDA-specific annotations, porting a large C++ code to CUDA using Jitify can be as simple as replacing a for loop with Jitify's parallel_for construct. We'll present some examples of Jitify in action, demonstrating how it enables better code generation, faster compilation times, and rapid code porting.

Less code, more performance! Runtime compilation with NVRTC offers many potential benefits to new and existing codes, but also presents challenges when it comes to implementation. To help solve this dilemma, we've developed a small C++ library called "Jitify" that hides the complexities of runtime compilation behind a simple, high-level interface. Jitify takes care of issues like kernel caching, template instantiation, type reflection, and compilation of host code for the device. It also provides a convenient parallel_for function and lambda wrapper that enables dynamic runtime selection of host or device execution. Since source code passed to NVRTC does not require CUDA-specific annotations, porting a large C++ code to CUDA using Jitify can be as simple as replacing a for loop with Jitify's parallel_for construct. We'll present some examples of Jitify in action, demonstrating how it enables better code generation, faster compilation times, and rapid code porting.

  Back
 
Topics:
Tools & Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7716
Download:
Share:
 
Abstract:
Learn how combining GPUs with advanced multi-grid solvers are revolutionizing the study of lattice quantum chromodynamics (LQCD). LQCD is a computational tool for probing nuclear and particle physics, however, it can require thousands of GPUs working in tandem for months due to the computationally prohibitive linear solver. Using the QUDA framework, we describe how the solver can be accelerated using an adaptive multi-grid method. The optimization techniques employed are: fine-grained parallelization, mixed precision, communication reducing solvers, and reformulation of the algorithm to allow the CPU and GPU to work in parallel. Using this multitude of algorithmic innovations, we demonstrate that a 5X speedup can be realized over present state-of-the-art methods using GPUs.
Learn how combining GPUs with advanced multi-grid solvers are revolutionizing the study of lattice quantum chromodynamics (LQCD). LQCD is a computational tool for probing nuclear and particle physics, however, it can require thousands of GPUs working in tandem for months due to the computationally prohibitive linear solver. Using the QUDA framework, we describe how the solver can be accelerated using an adaptive multi-grid method. The optimization techniques employed are: fine-grained parallelization, mixed precision, communication reducing solvers, and reformulation of the algorithm to allow the CPU and GPU to work in parallel. Using this multitude of algorithmic innovations, we demonstrate that a 5X speedup can be realized over present state-of-the-art methods using GPUs.  Back
 
Topics:
Computational Physics, Algorithms & Numerical Techniques, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6667
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next