GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.  Back
 
Topics:
HPC and AI, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share:
 
Abstract:
In this session you''ll hear about one team''s experience preparing an adaptive mesh refinement library -- and a fluid dynamics code based on it -- for Summit, the IBM POWER9 and NVIDIA Volta system at Oak Ridge National Laboratory, where multiple GPUs are connected via NVLink to each other and the CPUs. It was simple to compile and run on the OpenPOWER architecture, and to offload to the GPUs with CUDA Fortran, with little architecture-specific code. Initial results with POWER8 and P100 have shown excellent CPU and GPU performance and good multi-node scaling for an astrophysics mini-app that was difficult to run effectively on prior GPU architectures. We will also discuss our experiences porting other modules in our multi-physics codes, and preliminary results on the POWER9 and V100 platform.
In this session you''ll hear about one team''s experience preparing an adaptive mesh refinement library -- and a fluid dynamics code based on it -- for Summit, the IBM POWER9 and NVIDIA Volta system at Oak Ridge National Laboratory, where multiple GPUs are connected via NVLink to each other and the CPUs. It was simple to compile and run on the OpenPOWER architecture, and to offload to the GPUs with CUDA Fortran, with little architecture-specific code. Initial results with POWER8 and P100 have shown excellent CPU and GPU performance and good multi-node scaling for an astrophysics mini-app that was difficult to run effectively on prior GPU architectures. We will also discuss our experiences porting other modules in our multi-physics codes, and preliminary results on the POWER9 and V100 platform.  Back
 
Topics:
Astronomy & Astrophysics, Tools & Libraries, Computational Fluid Dynamics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8397
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next