SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Abstract:
Learn how to speed up a deep feedforward sequential memory network (FSMN) on Volta. We'll describe how to use Tensor Cores to speed up GEMM operations and explain how to optimize an FSMN kernel by increasing its locatiy and reducing its math workload. Although RNNs are a powerful tool to process sequence-to-sequence problems, their recurrent structure increases computational complexity. As an alternative, FSMN can effectively model long-term dependency without using any recurrent structure. We'll show how GPU-friendly FSMN can outperform RNN in both accuracy and speed. Our work is based on Alibaba's deep FSMN model.
Learn how to speed up a deep feedforward sequential memory network (FSMN) on Volta. We'll describe how to use Tensor Cores to speed up GEMM operations and explain how to optimize an FSMN kernel by increasing its locatiy and reducing its math workload. Although RNNs are a powerful tool to process sequence-to-sequence problems, their recurrent structure increases computational complexity. As an alternative, FSMN can effectively model long-term dependency without using any recurrent structure. We'll show how GPU-friendly FSMN can outperform RNN in both accuracy and speed. Our work is based on Alibaba's deep FSMN model.  Back
 
Topics:
Performance Optimization, AI Application Deployment and Inference, Speech and Language Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9113
Streaming:
Share:
 
Abstract:
We'll discuss Alibaba's PAI tensor accelerator and optimizer (PAI-Tao), an elaborately implemented and optimized AI engine for deep learning training and inference tasks. PAI-Tao is designed with a data-driven and compiler-oriented approach. It periodically collects online running statistics to provide insights for optimization and uses collected statistics to help drive the real optimization work. We'll outline how PAI-Tao's compiler-oriented design can better accommodate diversified and fast-changing AI workloads.
We'll discuss Alibaba's PAI tensor accelerator and optimizer (PAI-Tao), an elaborately implemented and optimized AI engine for deep learning training and inference tasks. PAI-Tao is designed with a data-driven and compiler-oriented approach. It periodically collects online running statistics to provide insights for optimization and uses collected statistics to help drive the real optimization work. We'll outline how PAI-Tao's compiler-oriented design can better accommodate diversified and fast-changing AI workloads.  Back
 
Topics:
AI and DL Research, Accelerated Data Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9280
Streaming:
Download:
Share:
 
Abstract:
We'll share experiences of end-to-end deep learning optimization on Alibaba's platform of artificial intelligence (PAI), including both offline training and online inference. For offline training, dedicated optimization is made for local and distributed environment. For online inference, the optimization is done through both algorithm and system perspectives. Both the methodology and benchmark number are shared during this session. We'll share several business applications driven by these optimizations to ensure learning to bridge the gap between low-level optimization and real business scenarios.
We'll share experiences of end-to-end deep learning optimization on Alibaba's platform of artificial intelligence (PAI), including both offline training and online inference. For offline training, dedicated optimization is made for local and distributed environment. For online inference, the optimization is done through both algorithm and system perspectives. Both the methodology and benchmark number are shared during this session. We'll share several business applications driven by these optimizations to ensure learning to bridge the gap between low-level optimization and real business scenarios.  Back
 
Topics:
Deep Learning and AI Frameworks, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8113
Streaming:
Share:
 
Abstract:
We'll introduce Pluto, a distributed heterogeneous deep learning framework developed by Alibaba. Learn our practice of optimizing large-scale deep learning training processes. We'll introduce several distributed optimization strategies, which have already been deployed into our production training pipeline and serve data scientists inside Alibaba. We'll cover technical details as well as some of our benchmark results.
We'll introduce Pluto, a distributed heterogeneous deep learning framework developed by Alibaba. Learn our practice of optimizing large-scale deep learning training processes. We'll introduce several distributed optimization strategies, which have already been deployed into our production training pipeline and serve data scientists inside Alibaba. We'll cover technical details as well as some of our benchmark results.  Back
 
Topics:
Deep Learning and AI, Data Center and Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7650
Download:
Share: