GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Computational Biology & Chemistry
Presentation
Media
cuTENSOR: High-performance Tensor Operations in CUDA
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor operations such as tensor contractions (a generalization of matrix-matrix multiplications), point-wise tensor operations such as tensor permutations, and tensor decompositions (a generalization of matrix decompositions). While providing high performance, cuTENSOR also allows users to express their mathematical equations for tensors in a straightforward way that hides the complexity of dealing with these high-dimensional objects behind an easy-to-use API. CUDA 10.1 enables CUDA programmers to utilize Tensor Cores directly with the new mma.sync instruction. In this presentation, we describe the functionality of mma.sync and present strategies for implementing efficient matrix multiply computations in CUDA that maximize performance on NVIDIA Volta GPUs. We then describe how CUTLASS 1.3 provides reusable components embodying these strategies. CUTLASS 1.3 demonstrates a median 44% speedup of CUDA kernels executing layers from real-world Deep Learning workloads.
 
Topics:
Computational Biology & Chemistry, Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9593
Streaming:
Download:
Share: