GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn what's needed to achieve optimal performance on NVIDIA Tensor Core GPUs. We'll review the fundamentals of GPU performance, explain how Tensor Core-accelerated operations work, and use this knowledge to infer how to structure and size neural network operations (layers) to achieve ideal performance. We'll also provide a cheat sheet of Tensor Core performance guidelines. The talk aims to provide tools to understand why neural networks perform a certain way on Tensor Core GPUs and to enable changes to network architecture to further improve performance.
Learn what's needed to achieve optimal performance on NVIDIA Tensor Core GPUs. We'll review the fundamentals of GPU performance, explain how Tensor Core-accelerated operations work, and use this knowledge to infer how to structure and size neural network operations (layers) to achieve ideal performance. We'll also provide a cheat sheet of Tensor Core performance guidelines. The talk aims to provide tools to understand why neural networks perform a certain way on Tensor Core GPUs and to enable changes to network architecture to further improve performance.  Back
 
Topics:
Performance Optimization, Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9926
Streaming:
Download:
Share:
 
Abstract:

We'll discuss, analyze, and improve the performance of deep neural network inference using GPUs. Other than neural net training, which is an offline process where large batches of images are fed to the GPU to maximize computational throughput, inference focuses on small-batch, low-latency forward propagation through the network. We'll discuss how the different performance requirements for inference impact the way we implement it on GPUs and what performance optimizations are possible, and we'll show how GPUs, all the way from the small Tegra X1 to the powerful TITAN X, excel at performance and energy efficiency when performing inference for deep neural networks.

We'll discuss, analyze, and improve the performance of deep neural network inference using GPUs. Other than neural net training, which is an offline process where large batches of images are fed to the GPU to maximize computational throughput, inference focuses on small-batch, low-latency forward propagation through the network. We'll discuss how the different performance requirements for inference impact the way we implement it on GPUs and what performance optimizations are possible, and we'll show how GPUs, all the way from the small Tegra X1 to the powerful TITAN X, excel at performance and energy efficiency when performing inference for deep neural networks.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Autonomous Vehicles, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6136
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next