GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

HPC and AI
Presentation
Media
Tensor Core Programmability and Profiling for AI and HPC Applications
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.
 
Topics:
HPC and AI, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share: