SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Algorithms and Numerical Techniques
Presentation
Media
CUTLASS: Software Primitives for Dense Linear Algebra at All Levels and Scales within CUDA
Andrew Kerr (NVIDIA)
Audience members will learn how to implement efficient Deep Learning computations using CUDA C++ in the context of CUTLASS. CUTLASS is an open-source collection of C++ template abstractions for implementing high-performance matrix-multiplication (GE ...Read More
Audience members will learn how to implement efficient Deep Learning computations using CUDA C++ in the context of CUTLASS. CUTLASS is an open-source collection of C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels of the CUDA thread hierarchy. We will describe many of the algorithmic strategies used by cuBLAS and cuDNN, and how they can be implemented using C++ templates to cover an extensive space of problem sizes, data layouts, and data types. In particular, we will emphasize how to support alternative and mixed precision math operations such as Pascal's integer DP4A operation and Volta's TensorCores. Finally, we will illustrate how CUTLASS primitives can be combined with custom functionality to implement related algorithms such as convolution. Although this talk highlights CUTLASS, the architecture concepts and algorithm details are relevant to any CUDA programmer focused on Deep Learning.  Back
 
Keywords:
Algorithms and Numerical Techniques, Tools and Libraries, GTC 2018 - ID S8854
Streaming: