GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
By exposing parallelism between operations in a recurrent neural network it is possible to achieve significant performance improvements when training. In this talk a case study based on a Long Short-Term Memory (LSTM) recurrent network will be used to demonstrate a 5x speedup over a naive implementation for the forward pass of a single layer. A further 2x speedup (totaling 10x) will be shown when considering multiple layers. Results will also be presented for the backward pass.
By exposing parallelism between operations in a recurrent neural network it is possible to achieve significant performance improvements when training. In this talk a case study based on a Long Short-Term Memory (LSTM) recurrent network will be used to demonstrate a 5x speedup over a naive implementation for the forward pass of a single layer. A further 2x speedup (totaling 10x) will be shown when considering multiple layers. Results will also be presented for the backward pass.  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6165
Streaming:
Download:
Share:
 
Abstract:

Linear Solvers on serial machines tend to be highly recursive, but that's not an option on GPUs. In this paper we describe a new preconditoner for GMRES and similar Krylov subspace linear solvers that is highly parallel, but also provides effective mechanisms to reconcile remote driving forces in a spatially discretized system. We will present results, taken from some real-world studies using a commercial oil reservoir simulator, showing how it compares with a state of the art serial solver, and showing how performance scales in a domain decomposition formulation run on a multiple CPU+GPU cluster.

Linear Solvers on serial machines tend to be highly recursive, but that's not an option on GPUs. In this paper we describe a new preconditoner for GMRES and similar Krylov subspace linear solvers that is highly parallel, but also provides effective mechanisms to reconcile remote driving forces in a spatially discretized system. We will present results, taken from some real-world studies using a commercial oil reservoir simulator, showing how it compares with a state of the art serial solver, and showing how performance scales in a domain decomposition formulation run on a multiple CPU+GPU cluster.

  Back
 
Topics:
Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2432
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next