GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9476
Streaming:
Download:
Share:
 
Abstract:

Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9501
Streaming:
Download:
Share:
 
Abstract:
Learn about the latest developments in the high-performance mass passing interference (MPI) over InfiniBand, iWARP, and RoCE (MVAPICH2) library that simplify the task of porting MPI applications to HPC and Supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA framework for MPI datatype processing using CUDA kernels, support for GPUDirect Async, support for heterogeneous clusters with GPU and non-GPU nodes, and more. We use the popular Ohio State University micro-benchmark suite and example applications to demonstrate how developers can effectively take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We provide guidance on issues like processor affinity to GPU and network that can significantly affect the performance of MPI applications that use MVAPICH2.
Learn about the latest developments in the high-performance mass passing interference (MPI) over InfiniBand, iWARP, and RoCE (MVAPICH2) library that simplify the task of porting MPI applications to HPC and Supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA framework for MPI datatype processing using CUDA kernels, support for GPUDirect Async, support for heterogeneous clusters with GPU and non-GPU nodes, and more. We use the popular Ohio State University micro-benchmark suite and example applications to demonstrate how developers can effectively take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We provide guidance on issues like processor affinity to GPU and network that can significantly affect the performance of MPI applications that use MVAPICH2.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8373
Streaming:
Download:
Share:
 
Abstract:
Learn about the latest developments in MVAPICH2-GDR library that helps MPI developers to exploit maximum performance and scalability on HPC clusters with NVIDIA GPUs. Multiple designs focusing on GPUDirect RDMA(GDR)_Async, non-blocking collectives, support for unified memory and datatype processing will be highlighted to boost the performance of HPC applications. Furthermore, targeting emerging deep learning frameworks, we'll present novel designs and enhancements to the MVAPICH2-GDR library to accommodate the large message and dense GPU computing requirements of the DL frameworks. Using a co-designed scheme between MVAPICH2-GDR and the Caffe workflow, we'll present OSU-Caffe, which supports an MPI-based distributed and scalable DL framework. Performance and scalability numbers of OSU-Caffe for various system configurations and datasets will also be presented.
Learn about the latest developments in MVAPICH2-GDR library that helps MPI developers to exploit maximum performance and scalability on HPC clusters with NVIDIA GPUs. Multiple designs focusing on GPUDirect RDMA(GDR)_Async, non-blocking collectives, support for unified memory and datatype processing will be highlighted to boost the performance of HPC applications. Furthermore, targeting emerging deep learning frameworks, we'll present novel designs and enhancements to the MVAPICH2-GDR library to accommodate the large message and dense GPU computing requirements of the DL frameworks. Using a co-designed scheme between MVAPICH2-GDR and the Caffe workflow, we'll present OSU-Caffe, which supports an MPI-based distributed and scalable DL framework. Performance and scalability numbers of OSU-Caffe for various system configurations and datasets will also be presented.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7356
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next