GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll share experiences of end-to-end deep learning optimization on Alibaba's platform of artificial intelligence (PAI), including both offline training and online inference. For offline training, dedicated optimization is made for local and distributed environment. For online inference, the optimization is done through both algorithm and system perspectives. Both the methodology and benchmark number are shared during this session. We'll share several business applications driven by these optimizations to ensure learning to bridge the gap between low-level optimization and real business scenarios.
We'll share experiences of end-to-end deep learning optimization on Alibaba's platform of artificial intelligence (PAI), including both offline training and online inference. For offline training, dedicated optimization is made for local and distributed environment. For online inference, the optimization is done through both algorithm and system perspectives. Both the methodology and benchmark number are shared during this session. We'll share several business applications driven by these optimizations to ensure learning to bridge the gap between low-level optimization and real business scenarios.  Back
 
Topics:
Deep Learning & AI Frameworks, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8113
Streaming:
Share:
 
Abstract:
End-to-end speech recognition systems, which directly transcribe audio data with text without requiring an intermediate phonetic representation, are based on recurrent neural network (RNN) + connectionist temporal classification (CTC). CTC is to automatically learn the alignments between speech frames and the label sequence of transcript. In this work, we focus on optimizing CTC training, especially the forward-backward algorithm, on GPU. Firstly, opportunities of saving computation and memory access of CTC forward-backward algorithm were quantitatively analyzed and utilized to get a speedup of ~1.28X. Secondly, by data reuse among frames and data transfer between frames through register file and shared memory, we get a speedup of ~1.80X.
End-to-end speech recognition systems, which directly transcribe audio data with text without requiring an intermediate phonetic representation, are based on recurrent neural network (RNN) + connectionist temporal classification (CTC). CTC is to automatically learn the alignments between speech frames and the label sequence of transcript. In this work, we focus on optimizing CTC training, especially the forward-backward algorithm, on GPU. Firstly, opportunities of saving computation and memory access of CTC forward-backward algorithm were quantitatively analyzed and utilized to get a speedup of ~1.28X. Secondly, by data reuse among frames and data transfer between frames through register file and shared memory, we get a speedup of ~1.80X.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6383
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next