GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn about techniques and solutions that bring GPU computing to the world of partitioned global address space (PGAS) models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. We'll discuss simple extensions to the OpenSHMEM model to address this issue. We'll also present challenges and solutions in designing NVIDIA CUDA aware runtimes to support these extensions and optimize data movement using CUDA IPC and GPUDirect RDMA features. And we'll demonstrate the impact of these concepts to application performance.
Learn about techniques and solutions that bring GPU computing to the world of partitioned global address space (PGAS) models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. We'll discuss simple extensions to the OpenSHMEM model to address this issue. We'll also present challenges and solutions in designing NVIDIA CUDA aware runtimes to support these extensions and optimize data movement using CUDA IPC and GPUDirect RDMA features. And we'll demonstrate the impact of these concepts to application performance.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7324
Download:
Share:
 
Abstract:
Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various NVIDIA?CUDA?features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on the challenges in combining and fully utilizing GPUDirect RDMA (GDR) and hardware InfiniBand multicast technologies in tandem to design support for high-performance heterogeneous broadcast operation for streaming applications. Further, we present associated challenges and designs in supporting reliability for clusters with multi-HCA and multi-GPU configurations. Performance evaluation of the proposed designs on various system configurations will be presented and analyzed.
Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various NVIDIA?CUDA?features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on the challenges in combining and fully utilizing GPUDirect RDMA (GDR) and hardware InfiniBand multicast technologies in tandem to design support for high-performance heterogeneous broadcast operation for streaming applications. Further, we present associated challenges and designs in supporting reliability for clusters with multi-HCA and multi-GPU configurations. Performance evaluation of the proposed designs on various system configurations will be presented and analyzed.  Back
 
Topics:
HPC and Supercomputing, Data Center & Cloud Infrastructure, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7345
Download:
Share:
 
Abstract:
Learn about the latest developments in MVAPICH2-GDR library that helps MPI developers to exploit maximum performance and scalability on HPC clusters with NVIDIA GPUs. Multiple designs focusing on GPUDirect RDMA(GDR)_Async, non-blocking collectives, support for unified memory and datatype processing will be highlighted to boost the performance of HPC applications. Furthermore, targeting emerging deep learning frameworks, we'll present novel designs and enhancements to the MVAPICH2-GDR library to accommodate the large message and dense GPU computing requirements of the DL frameworks. Using a co-designed scheme between MVAPICH2-GDR and the Caffe workflow, we'll present OSU-Caffe, which supports an MPI-based distributed and scalable DL framework. Performance and scalability numbers of OSU-Caffe for various system configurations and datasets will also be presented.
Learn about the latest developments in MVAPICH2-GDR library that helps MPI developers to exploit maximum performance and scalability on HPC clusters with NVIDIA GPUs. Multiple designs focusing on GPUDirect RDMA(GDR)_Async, non-blocking collectives, support for unified memory and datatype processing will be highlighted to boost the performance of HPC applications. Furthermore, targeting emerging deep learning frameworks, we'll present novel designs and enhancements to the MVAPICH2-GDR library to accommodate the large message and dense GPU computing requirements of the DL frameworks. Using a co-designed scheme between MVAPICH2-GDR and the Caffe workflow, we'll present OSU-Caffe, which supports an MPI-based distributed and scalable DL framework. Performance and scalability numbers of OSU-Caffe for various system configurations and datasets will also be presented.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7356
Download:
Share:
 
Abstract:
Learn how MVAPICH2-GDR library is enabling support for different GPUDirect technologies to simplify the task of porting message passing interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2-GDR supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API. Recent advances in MVAPICH2 include support of GDR_Async, MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY, Non-Blocking Collectives using GDR and Core-Direct, and much more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Performance impact of application co-design using MVAPICH2-GDR will also be presented.
Learn how MVAPICH2-GDR library is enabling support for different GPUDirect technologies to simplify the task of porting message passing interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2-GDR supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API. Recent advances in MVAPICH2 include support of GDR_Async, MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY, Non-Blocking Collectives using GDR and Core-Direct, and much more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Performance impact of application co-design using MVAPICH2-GDR will also be presented.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6411
Streaming:
Download:
Share:
 
Abstract:
Learn about techniques and solutions that bring GPU computing to the World of Partitioned Global Address Space (PGAS) Models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. This talk discusses simple extensions to the OpenSHMEM model to address this issue. Challenges and solutions in designing CUDA-aware runtimes to support these extensions, optimize data movement using CUDA IPC and GPUDirect RDMA features are presented. The impact of these concepts on applications performance is demonstrated.
Learn about techniques and solutions that bring GPU computing to the World of Partitioned Global Address Space (PGAS) Models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. This talk discusses simple extensions to the OpenSHMEM model to address this issue. Challenges and solutions in designing CUDA-aware runtimes to support these extensions, optimize data movement using CUDA IPC and GPUDirect RDMA features are presented. The impact of these concepts on applications performance is demonstrated.  Back
 
Topics:
HPC and Supercomputing, Programming Languages, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6418
Streaming:
Download:
Share:
 
Abstract:
Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various CUDA features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on challenges in combining and fully utilizing GPUDirect RDMA and hardware multicast technologies in tandem to design support for high-performance broadcast operation for streaming applications. Further, we'll present associated challenges and designs for clusters with multi-HCA and multi-GPU configurations and MPI_Bcast operations performance evaluation of the proposed designs will be presented and analyzed.
Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various CUDA features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on challenges in combining and fully utilizing GPUDirect RDMA and hardware multicast technologies in tandem to design support for high-performance broadcast operation for streaming applications. Further, we'll present associated challenges and designs for clusters with multi-HCA and multi-GPU configurations and MPI_Bcast operations performance evaluation of the proposed designs will be presented and analyzed.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6460
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next