SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Abstract:
Mixed precision training of deep neural networks provides tremendous benefit. It requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8X speedup over FP32. In this tutorial we'll first present the considerations and techniques when training with reduced precision, including master weights and automatic loss scaling. After, we'll discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.
Mixed precision training of deep neural networks provides tremendous benefit. It requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8X speedup over FP32. In this tutorial we'll first present the considerations and techniques when training with reduced precision, including master weights and automatic loss scaling. After, we'll discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.  Back
 
Topics:
Deep Learning and AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9143
Streaming:
Download:
Share:
 
Abstract:

Mixed precision training of deep neural networks provides tremendous benefits: it requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8x speedup over FP32. In this talk, we first present the considerations and techniques when training with reduced-precision, including master weights and automatic loss scaling. After, we discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.

Mixed precision training of deep neural networks provides tremendous benefits: it requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8x speedup over FP32. In this talk, we first present the considerations and techniques when training with reduced-precision, including master weights and automatic loss scaling. After, we discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8494
Download:
Share: