GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

We'll do a deep dive into previously undisclosed architectural details of NVIDIA's Turing T4 Cloud GPU, which we unearthed via micro-benchmarks, and compare the architecture's features with previous generations of NVIDIA GPUs. We'll also reveal the geometry and latency of Turing's complex memory hierarchy, the format of its encoded instructions, and the latency of instructions. Learn how developers can use this knowledge to design workloads that adapt exactly to the characteristics of the T4 GPU. We'll also explain how to manually assemble binary code that squeezes every bit of bare-metal performance from the hardware, which maximizes dual issues and avoids bank conflicts.

We'll do a deep dive into previously undisclosed architectural details of NVIDIA's Turing T4 Cloud GPU, which we unearthed via micro-benchmarks, and compare the architecture's features with previous generations of NVIDIA GPUs. We'll also reveal the geometry and latency of Turing's complex memory hierarchy, the format of its encoded instructions, and the latency of instructions. Learn how developers can use this knowledge to design workloads that adapt exactly to the characteristics of the T4 GPU. We'll also explain how to manually assemble binary code that squeezes every bit of bare-metal performance from the hardware, which maximizes dual issues and avoids bank conflicts.

  Back
 
Topics:
Finance - Deep Learning, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9839
Streaming:
Download:
Share:
 
Abstract:
We'll present the architectural details of the Volta GPU discovered via our micro-benchmarks and reveal the geometry and latency of Volta's complex memory hierarchy, the format of its encoded instructions, and the latency of commonly used instructions. The knowledge being shared enables developers to craft better optimized code than what is currently possible through publicly available information and tool chains.
We'll present the architectural details of the Volta GPU discovered via our micro-benchmarks and reveal the geometry and latency of Volta's complex memory hierarchy, the format of its encoded instructions, and the latency of commonly used instructions. The knowledge being shared enables developers to craft better optimized code than what is currently possible through publicly available information and tool chains.  Back
 
Topics:
Finance - Quantitative Risk & Derivative Calculations, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8122
Streaming:
Download:
Share:
 
Abstract:
Learn some advanced skills about performance optimization on Kepler GPUs. NVIDIA has provided many powerful tools to analyze and improve efficiency of CUDA kernels. However, in many specific cases, developers need to do some more detailed adjusting to get expected performance. In this session, a native assembler for Kepler architecture used in Alibaba will be introduced. Also, turning experiences of CNN and gemm implementation with this assembler will be shown as examples. If you are interested in assembly level optimization and want to use such a tool in Kepler architecture, you shouldn't miss this session!
Learn some advanced skills about performance optimization on Kepler GPUs. NVIDIA has provided many powerful tools to analyze and improve efficiency of CUDA kernels. However, in many specific cases, developers need to do some more detailed adjusting to get expected performance. In this session, a native assembler for Kepler architecture used in Alibaba will be introduced. Also, turning experiences of CNN and gemm implementation with this assembler will be shown as examples. If you are interested in assembly level optimization and want to use such a tool in Kepler architecture, you shouldn't miss this session!  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6173
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next