GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth bound and codes like Breadth First Search that have a dynamic communication pattern.
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth bound and codes like Breadth First Search that have a dynamic communication pattern.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9677
Streaming:
Download:
Share:
 
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications including QMCPack, HPGMG, CoMD, and Memcached demonstrating the programming model advantages.
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications including QMCPack, HPGMG, CoMD, and Memcached demonstrating the programming model advantages.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8135
Streaming:
Share:
 
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8595
Streaming:
Download:
Share:
 
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7142
Download:
Share:
 
Abstract:
We'll present an overview of the NVSHMEM multi-GPU programming model. NVSHMEM is an implementation of the OpenShmem standard for GPUs. By providing fine-grained communication primitives between GPU threads, NVSHMEM improves communication latencies and can greatly reduce the complexity usually associated with multi-GPU programming. Two application studies are presented to illustrate the utility of NVSHMEM: CoMD, a molecular dynamics mini-application, and HPGMG, a geometric multi-grid solver.
We'll present an overview of the NVSHMEM multi-GPU programming model. NVSHMEM is an implementation of the OpenShmem standard for GPUs. By providing fine-grained communication primitives between GPU threads, NVSHMEM improves communication latencies and can greatly reduce the complexity usually associated with multi-GPU programming. Two application studies are presented to illustrate the utility of NVSHMEM: CoMD, a molecular dynamics mini-application, and HPGMG, a geometric multi-grid solver.  Back
 
Topics:
Programming Languages, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6378
Streaming:
Download:
Share:
 
Abstract:

The next major advancement of GPUDirect™ technology is here. GPUDirect RDMA provides direct GPU-GPU communications across the network resulting in a significant reduction in communication latency between remote GPUs and completely bypassing the CPU. This webinar will cover the latest schedule for GPUDirect RDMA, scaling and optimization techniques for maximizing application performance using MVAPICH2, and the latest advancements of CUDA.

The next major advancement of GPUDirect™ technology is here. GPUDirect RDMA provides direct GPU-GPU communications across the network resulting in a significant reduction in communication latency between remote GPUs and completely bypassing the CPU. This webinar will cover the latest schedule for GPUDirect RDMA, scaling and optimization techniques for maximizing application performance using MVAPICH2, and the latest advancements of CUDA.

  Back
 
Topics:
Tools & Libraries
Type:
Webinar
Event:
GTC Webinars
Year:
2013
Session ID:
GTCE046
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next