SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Abstract:
In order to obtain peak performance and energy efficiency on modern deep learning architectures, such as GPUs and TPUs, it is critical to use half precision arithmetic. Compared to single precision, half precision reduces memory traffic, allowing 2x better use of the available DRAM bandwidth. Smaller memory footprints for half precision layer activations also allow larger batch sizes and deeper network architectures to fit in the accelerator's memory during training. Finally, architectural features, such as Volta's Tensor Cores, boost the raw math throughput of half precision operations by up to 8x compared to single precision. We describe two new streamlined implementations of mixed-precision training being built into TensorFlow. The first is provided through extensions to the tf.keras API and will be available in the upcoming months. The second is based on a Grappler graph optimization pass and will work with TF 1.x graph-based models as well as future TensorFlow 2.0 models that make use of tf.function decorators. Each method is enabled using a one or two line tweak to the training script. Empirical results show that result accuracy matches that of a model trained in single-precision, while training speedup is similar to what can be achieved with hand-coded mixed precision strategies.
In order to obtain peak performance and energy efficiency on modern deep learning architectures, such as GPUs and TPUs, it is critical to use half precision arithmetic. Compared to single precision, half precision reduces memory traffic, allowing 2x better use of the available DRAM bandwidth. Smaller memory footprints for half precision layer activations also allow larger batch sizes and deeper network architectures to fit in the accelerator's memory during training. Finally, architectural features, such as Volta's Tensor Cores, boost the raw math throughput of half precision operations by up to 8x compared to single precision. We describe two new streamlined implementations of mixed-precision training being built into TensorFlow. The first is provided through extensions to the tf.keras API and will be available in the upcoming months. The second is based on a Grappler graph optimization pass and will work with TF 1.x graph-based models as well as future TensorFlow 2.0 models that make use of tf.function decorators. Each method is enabled using a one or two line tweak to the training script. Empirical results show that result accuracy matches that of a model trained in single-precision, while training speedup is similar to what can be achieved with hand-coded mixed precision strategies.  Back
 
Topics:
AI and DL Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91029
Streaming:
Download:
Share: