SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll describe NVIDIA's Automatic Mixed Precision (AMP) for PyTorch, a tool to enable mixed precision training for neural networks in just three lines of Python. Mixed precision training combines memory savings and Tensor Core-accelerated throughput of FP16 (16-bit) arithmetic for compute-intensive operations with traditional FP32 arithmetic for a few selected operations. In practice, mixed precision delivers end-to-end speedups between 2 and 4X for many bellwether networks. We'll briefly review mixed precision benefits, concepts, and best practices, then walk through implementing AMP in several example models.
We'll describe NVIDIA's Automatic Mixed Precision (AMP) for PyTorch, a tool to enable mixed precision training for neural networks in just three lines of Python. Mixed precision training combines memory savings and Tensor Core-accelerated throughput of FP16 (16-bit) arithmetic for compute-intensive operations with traditional FP32 arithmetic for a few selected operations. In practice, mixed precision delivers end-to-end speedups between 2 and 4X for many bellwether networks. We'll briefly review mixed precision benefits, concepts, and best practices, then walk through implementing AMP in several example models.  Back
 
Topics:
Deep Learning and AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9998
Streaming:
Download:
Share:
 
Abstract:

We'll demonstrate acceleration of a large, preexisting Fortran fluid dynamics solver using Kokkos, a C++ library that enables a single codebase to achieve high performance on multiple parallel architectures, including NVIDIA GPUs. We'll describe the complete process: identifying performance-critical physics subroutines, porting and optimizing these routines, integrating Kokkos C++ with the main Fortran code in a minimally invasive way, and tuning cluster-level performance. We'll compare the performance achieved when Kokkos uses NVIDIA Tesla K40 GPUs, Knight's Corner Xeon Phis, and Xeon CPUs. We'll also present some GPU-specific optimizations. For "trivially parallel" physics calculations, assigning one NVIDIA CUDA thread to each grid point may not be ideal. If a small team works cooperatively on each grid point, performance can improve due to the larger amount of effective cache available to each team.

We'll demonstrate acceleration of a large, preexisting Fortran fluid dynamics solver using Kokkos, a C++ library that enables a single codebase to achieve high performance on multiple parallel architectures, including NVIDIA GPUs. We'll describe the complete process: identifying performance-critical physics subroutines, porting and optimizing these routines, integrating Kokkos C++ with the main Fortran code in a minimally invasive way, and tuning cluster-level performance. We'll compare the performance achieved when Kokkos uses NVIDIA Tesla K40 GPUs, Knight's Corner Xeon Phis, and Xeon CPUs. We'll also present some GPU-specific optimizations. For "trivially parallel" physics calculations, assigning one NVIDIA CUDA thread to each grid point may not be ideal. If a small team works cooperatively on each grid point, performance can improve due to the larger amount of effective cache available to each team.

  Back
 
Topics:
Computational Fluid Dynamics, Tools and Libraries, Computer Aided Engineering
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7148
Download:
Share:
 
Abstract:
Diblock copolymers possess fascinating self-assembly properties that can be leveraged for a variety of industrial applications, most notably nanolithography. However, such efforts are often impeded by the formation of metastable defect structures. We present a GPU-accelerated method to quantify the difficulty of defect removal, guiding experiments toward optimal polymer chemistry. We demonstrate that this problem is ideally suited to NVIDIA GPUs' massively parallel architecture.
Diblock copolymers possess fascinating self-assembly properties that can be leveraged for a variety of industrial applications, most notably nanolithography. However, such efforts are often impeded by the formation of metastable defect structures. We present a GPU-accelerated method to quantify the difficulty of defect removal, guiding experiments toward optimal polymer chemistry. We demonstrate that this problem is ideally suited to NVIDIA GPUs' massively parallel architecture.  Back
 
Topics:
Computational Physics, Life & Material Science
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5308
Download:
Share: