GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn how to speed up a deep feedforward sequential memory network (FSMN) on Volta. We'll describe how to use Tensor Cores to speed up GEMM operations and explain how to optimize an FSMN kernel by increasing its locatiy and reducing its math workload. Although RNNs are a powerful tool to process sequence-to-sequence problems, their recurrent structure increases computational complexity. As an alternative, FSMN can effectively model long-term dependency without using any recurrent structure. We'll show how GPU-friendly FSMN can outperform RNN in both accuracy and speed. Our work is based on Alibaba's deep FSMN model.
Learn how to speed up a deep feedforward sequential memory network (FSMN) on Volta. We'll describe how to use Tensor Cores to speed up GEMM operations and explain how to optimize an FSMN kernel by increasing its locatiy and reducing its math workload. Although RNNs are a powerful tool to process sequence-to-sequence problems, their recurrent structure increases computational complexity. As an alternative, FSMN can effectively model long-term dependency without using any recurrent structure. We'll show how GPU-friendly FSMN can outperform RNN in both accuracy and speed. Our work is based on Alibaba's deep FSMN model.  Back
 
Topics:
Performance Optimization, AI Application, Deployment & Inference, Speech & Language Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9113
Streaming:
Share:
 
Abstract:
加速模型训练和预测对深度学习非常重要。典型深度学习加速技术通常依赖来自领域专家的大量手工优化工作。这些工作既费时费力,又单调乏味,从而激发了人们对支持端到端自动优化的深度学习编译器的极大兴趣。在该演讲中,我们展示Woodpecker-DL,一个基于异构体系结构加速深度学习的高效编译器。该编译器基本上由五个部件组成:计算图优化,张量描述的优化图,耦合了软/硬件描述语言和专家优化库的领域定制语言编译器,支持多种硬件的执行引擎,以及支持多种参数化搜索算法(遗传算法和强化学习等)的自动调优框架。我们使用蚂蚁金服支付业务的某真实深度神经网络对Woodpecker-DL和TensorRT进行了性能测评和比对。实验结果展示在同样GPU上,Woodpecker-DL能取得比TensorRT高达两倍的加速比。
加速模型训练和预测对深度学习非常重要。典型深度学习加速技术通常依赖来自领域专家的大量手工优化工作。这些工作既费时费力,又单调乏味,从而激发了人们对支持端到端自动优化的深度学习编译器的极大兴趣。在该演讲中,我们展示Woodpecker-DL,一个基于异构体系结构加速深度学习的高效编译器。该编译器基本上由五个部件组成:计算图优化,张量描述的优化图,耦合了软/硬件描述语言和专家优化库的领域定制语言编译器,支持多种硬件的执行引擎,以及支持多种参数化搜索算法(遗传算法和强化学习等)的自动调优框架。我们使用蚂蚁金服支付业务的某真实深度神经网络对Woodpecker-DL和TensorRT进行了性能测评和比对。实验结果展示在同样GPU上,Woodpecker-DL能取得比TensorRT高达两倍的加速比。  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC China
Year:
2019
Session ID:
CN9274
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next