GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale.
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale.  Back
 
Topics:
HPC and Supercomputing, Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9412
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next