GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn how we're bringing Gromacs up to speed with the latest cutting-edge multi-GPU technology. Gromacs, a simulation package for biomolecular systems, is one of the most highly used HPC applications globally. It already benefits from GPU acceleration to allow fast simulation of large and complex systems. However, as GPUs become more powerful and increasingly sophisticated multi-GPU systems become available, Gromacs must adapt to optimally benefit from the massive extent of performance on offer. We will describe work to port all significant remaining computational kernels to the GPU, and to perform the required Inter-GPU communications using peer-to-peer memory copies, such that the GPU is exploited throughout and repeated PCIe transfers are avoided. We will present performance results to show the impact of our developments, and also describe the Gromacs performance model we've created to guide our work.
Learn how we're bringing Gromacs up to speed with the latest cutting-edge multi-GPU technology. Gromacs, a simulation package for biomolecular systems, is one of the most highly used HPC applications globally. It already benefits from GPU acceleration to allow fast simulation of large and complex systems. However, as GPUs become more powerful and increasingly sophisticated multi-GPU systems become available, Gromacs must adapt to optimally benefit from the massive extent of performance on offer. We will describe work to port all significant remaining computational kernels to the GPU, and to perform the required Inter-GPU communications using peer-to-peer memory copies, such that the GPU is exploited throughout and repeated PCIe transfers are avoided. We will present performance results to show the impact of our developments, and also describe the Gromacs performance model we've created to guide our work.  Back
 
Topics:
Computational Biology & Chemistry, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9270
Streaming:
Download:
Share:
 
Abstract:
In this talk attendees will learn how key algorithms for Numerical Weather Prediction were ported to the latest GPU technology and the substantial benefits gained from doing so. We will showcase the power of individual Voltas and the impressive performance of the cutting edge DGX-2 server with multiple GPUS connected by a high speed interconnect.
In this talk attendees will learn how key algorithms for Numerical Weather Prediction were ported to the latest GPU technology and the substantial benefits gained from doing so. We will showcase the power of individual Voltas and the impressive performance of the cutting edge DGX-2 server with multiple GPUS connected by a high speed interconnect.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8195
Streaming:
Download:
Share:
 
Abstract:
We'll take you on a journey through enabling applications for GPUs; interoperability of different languages (including Fortran, OpenACC, C, and CUDA); CUDA library interfacing; data management, movement, and layout tuning; kernel optimization; tool usage; multi-GPU data transfer; and performance modeling. We'll show how careful optimizations can have a dramatic effect and push application performance towards the maximum possible on the hardware. We'll describe tuning of multi-GPU communications, including efficient exploitation of high-bandwidth NVLink hardware. The applications used in this study are from the domain of numerical weather prediction, and also feature in the ESCAPE European collaborative project, but we'll present widely relevant techniques in a generic and easily transferable way.
We'll take you on a journey through enabling applications for GPUs; interoperability of different languages (including Fortran, OpenACC, C, and CUDA); CUDA library interfacing; data management, movement, and layout tuning; kernel optimization; tool usage; multi-GPU data transfer; and performance modeling. We'll show how careful optimizations can have a dramatic effect and push application performance towards the maximum possible on the hardware. We'll describe tuning of multi-GPU communications, including efficient exploitation of high-bandwidth NVLink hardware. The applications used in this study are from the domain of numerical weather prediction, and also feature in the ESCAPE European collaborative project, but we'll present widely relevant techniques in a generic and easily transferable way.  Back
 
Topics:
Climate, Weather & Ocean Modeling, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8190
Streaming:
Share:
 
Abstract:
"Developing your application for GPUs destroys portability to other platforms." We'll debunk this and other myths as we describe how we have solved the performance-portability challenge, allowing two separate scientific applications (which simulate complex fluids and fundamental particle physics, respectively) to effectively utilize machines such as the world's largest GPU-accelerated supercomputer, Titan at Oak Ridge, while remaining completely portable to multi-core or many-core CPU-based systems when GPUs are unavailable. The key ingredient is a new simplistic abstraction layer called targetDP, which targets data parallel hardware in a platform-agnostic but performance-portable manner.
"Developing your application for GPUs destroys portability to other platforms." We'll debunk this and other myths as we describe how we have solved the performance-portability challenge, allowing two separate scientific applications (which simulate complex fluids and fundamental particle physics, respectively) to effectively utilize machines such as the world's largest GPU-accelerated supercomputer, Titan at Oak Ridge, while remaining completely portable to multi-core or many-core CPU-based systems when GPUs are unavailable. The key ingredient is a new simplistic abstraction layer called targetDP, which targets data parallel hardware in a platform-agnostic but performance-portable manner.  Back
 
Topics:
HPC and Supercomputing, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6175
Streaming:
Download:
Share:
 
Abstract:
Discover how to adapt a real, complex application such that it can efficiently utilize thousands of GPUs in parallel. We describe our successes in combining CUDA with MPI to simulate a wide variety of complex fluids of key importance to everyday life. We are careful to present our work in a generalizable way, such that others can learn from our experience, follow our methodology and even re-use our highly efficient communication library. We detail our efforts to maximize both performance and maintainability, noting that we support both CPU and GPU versions (where the latter is 3.5-5 times faster comparing equal numbers of GPUs and fully-utilized CPUs). We present our work to carefully schedule and overlap lattice based operations and halo-exchange communication mechanisms, allowing excellent scaling to at least 8,192 GPUs in parallel on the Titan supercomputer.
Discover how to adapt a real, complex application such that it can efficiently utilize thousands of GPUs in parallel. We describe our successes in combining CUDA with MPI to simulate a wide variety of complex fluids of key importance to everyday life. We are careful to present our work in a generalizable way, such that others can learn from our experience, follow our methodology and even re-use our highly efficient communication library. We detail our efforts to maximize both performance and maintainability, noting that we support both CPU and GPU versions (where the latter is 3.5-5 times faster comparing equal numbers of GPUs and fully-utilized CPUs). We present our work to carefully schedule and overlap lattice based operations and halo-exchange communication mechanisms, allowing excellent scaling to at least 8,192 GPUs in parallel on the Titan supercomputer.  Back
 
Topics:
HPC and Supercomputing, Computational Fluid Dynamics, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4285
Streaming:
Download:
Share:
 
Abstract:

Discover how to scale scientific applications to thousands of GPUs in parallel. We will demonstrate our techniques using two codes representative of a wide spectrum of programming methods. The Ludwig lattice Boltzmann package, capable of simulating extremely complex fluid dynamics models, combines C, MPI and CUDA. The Himeno three-dimensional Poisson equation solver benchmark combines Fortran (using the new coarray feature for communication) with prototype OpenMP accelerator directives (a promising new high-productivity GPU programming method). We will present performance results using the cutting-edge massively-parallel Cray XK6 hybrid supercomputer featuring the latest NVIDIA Tesla 2090 GPUs.

Discover how to scale scientific applications to thousands of GPUs in parallel. We will demonstrate our techniques using two codes representative of a wide spectrum of programming methods. The Ludwig lattice Boltzmann package, capable of simulating extremely complex fluid dynamics models, combines C, MPI and CUDA. The Himeno three-dimensional Poisson equation solver benchmark combines Fortran (using the new coarray feature for communication) with prototype OpenMP accelerator directives (a promising new high-productivity GPU programming method). We will present performance results using the cutting-edge massively-parallel Cray XK6 hybrid supercomputer featuring the latest NVIDIA Tesla 2090 GPUs.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2286
Streaming:
Download:
Share:
 
 
Topics:
Computational Fluid Dynamics
Type:
Webinar
Event:
GTC Webinars
Year:
2012
Session ID:
GTCE018
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next