GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll discuss OpenSeq2Seq, a TensorFlow-based toolkit for training deep learning models optimized for NVIDIA GPUs. The main features of our toolkit are ease of use, modularity, and support for fast distributed and mixed-precision training. OpenSeq2Seq provides a large set of state-of-the-art models and building blocks for neural machine translation (GNMT, Transformer, ConvS2S, etc.), automatic speech recognition (DeepSpeech2, Wave2Letter, etc.) speech synthesis (Tacotron2, etc.), and language modeling. All models have been optimized for mixed-precision training with GPU Tensor Cores, and they achieve 1.5-3x training speed-up comparing to float32.
We'll discuss OpenSeq2Seq, a TensorFlow-based toolkit for training deep learning models optimized for NVIDIA GPUs. The main features of our toolkit are ease of use, modularity, and support for fast distributed and mixed-precision training. OpenSeq2Seq provides a large set of state-of-the-art models and building blocks for neural machine translation (GNMT, Transformer, ConvS2S, etc.), automatic speech recognition (DeepSpeech2, Wave2Letter, etc.) speech synthesis (Tacotron2, etc.), and language modeling. All models have been optimized for mixed-precision training with GPU Tensor Cores, and they achieve 1.5-3x training speed-up comparing to float32.  Back
 
Topics:
Speech & Language Processing, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9187
Streaming:
Download:
Share:
 
Abstract:

OpenSeq2Seq is an open-source, TensorFlow-based toolkit, which supports a wide range of off-the-shelf models for Natural Language Translation (GNMT, Transformer, ConvS2S), Speech Recognition (Wave2Letter, DeepSpeech2), Speech Synthesis (Tacotron 2), Language Modeling and transfer learning for NLP tasks. OpenSeq2Seq is optimized for latest GPUs. It supports multi-GPU and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x faster training time.

OpenSeq2Seq is an open-source, TensorFlow-based toolkit, which supports a wide range of off-the-shelf models for Natural Language Translation (GNMT, Transformer, ConvS2S), Speech Recognition (Wave2Letter, DeepSpeech2), Speech Synthesis (Tacotron 2), Language Modeling and transfer learning for NLP tasks. OpenSeq2Seq is optimized for latest GPUs. It supports multi-GPU and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x faster training time.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2018
Session ID:
SIL8152
Streaming:
Download:
Share:
 
Abstract:

We'll describe training of very deep networks with mixed-precision float (("float16") using Volta Tensor Core. Float16 has two major potential benefits: high training speed and reduced memory footprint. But float16 has smaller numerical range than regular single precision float, which can result in overflow or underflow ("vanishing gradient") during training. We'll describe simple rescaling mechanism which solves these potential issues. With this rescaling algorithm, we successfully used mixed precision training for such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy.Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

We'll describe training of very deep networks with mixed-precision float (("float16") using Volta Tensor Core. Float16 has two major potential benefits: high training speed and reduced memory footprint. But float16 has smaller numerical range than regular single precision float, which can result in overflow or underflow ("vanishing gradient") during training. We'll describe simple rescaling mechanism which solves these potential issues. With this rescaling algorithm, we successfully used mixed precision training for such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy.Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2017
Session ID:
SIL7116
Download:
Share:
 
Abstract:

We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,65504). This narrow numerical range can result both in overflow ("inf/nan" problem) or underflow ("vanishing gradient") during training of deep networks. We'll describe the new scaling algorithm, implemented in nvcaffe, which prevents these negative effects. With this algorithm, we successfully trained such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy. Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,65504). This narrow numerical range can result both in overflow ("inf/nan" problem) or underflow ("vanishing gradient") during training of deep networks. We'll describe the new scaling algorithm, implemented in nvcaffe, which prevents these negative effects. With this algorithm, we successfully trained such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy. Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Algorithms & Numerical Techniques, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7218
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next