GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Recurrent Neural Networks(RNNs) are a powerful tool for solving sequence-based problems, but their execution time are dependent on the size of the network. Baidu introduced persistent RNN to solve this issue by minimizing bandwidth of the network parameters, but the size of on-chip storage imposes a strict upper limit on the network size. Model pruning can lead to signi?cant reductions in RNN parameters, which makes network sparse. We design a ef?cient method for accelerating sparse RNN, which includes several optimizations: Lamport barriers, wide memory loads, and a bank-aware weight layout. With these optimizations, on GP100, we achieve 1) ~4.5 TFlops for a hidden layer of size 1792, batch size of 4, and a density of 10%; and 2?18 TFLOP/s (36X speedup of cudnn RNN) with 45 SMs for a hidden layer of size 5760, batch size of 2, and a density of 1%.
Recurrent Neural Networks(RNNs) are a powerful tool for solving sequence-based problems, but their execution time are dependent on the size of the network. Baidu introduced persistent RNN to solve this issue by minimizing bandwidth of the network parameters, but the size of on-chip storage imposes a strict upper limit on the network size. Model pruning can lead to signi?cant reductions in RNN parameters, which makes network sparse. We design a ef?cient method for accelerating sparse RNN, which includes several optimizations: Lamport barriers, wide memory loads, and a bank-aware weight layout. With these optimizations, on GP100, we achieve 1) ~4.5 TFlops for a hidden layer of size 1792, batch size of 4, and a density of 10%; and 2?18 TFLOP/s (36X speedup of cudnn RNN) with 45 SMs for a hidden layer of size 5760, batch size of 2, and a density of 1%.  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7182
Download:
Share:
 
Abstract:
In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilization of different components of a GPU by this kernel. Our focus will be on Kepler and Maxwell architectures.
In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilization of different components of a GPU by this kernel. Our focus will be on Kepler and Maxwell architectures.  Back
 
Topics:
Medical Imaging & Radiology, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5534
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next