GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 Tensor Core GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers.

We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 Tensor Core GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1825
Download:
Share:
 
Abstract:
In this session we will cover how to use GPUs to accelerate back-end data-center infrastructure, specifically image processing. We will present an implementation of a REST API for image processing, concentrating on an approach to managing large numbers of concurrent requests while efficiently scheduling the CPU and GPU resources in the system. We will show that we can provide higher throughput and lower latency than CPU implementations we are replacing. We will also discuss the practical infrastructure implications, different deployment scenarios, and how to fit SW acceleration into those scenarios.
In this session we will cover how to use GPUs to accelerate back-end data-center infrastructure, specifically image processing. We will present an implementation of a REST API for image processing, concentrating on an approach to managing large numbers of concurrent requests while efficiently scheduling the CPU and GPU resources in the system. We will show that we can provide higher throughput and lower latency than CPU implementations we are replacing. We will also discuss the practical infrastructure implications, different deployment scenarios, and how to fit SW acceleration into those scenarios.  Back
 
Topics:
Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4787
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next