GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn how to develop fast and energy-efficient linear solvers using GPUs. Hybrid CPU-GPU techniques achieve high performance at the cost of extra power consumption. The new advancements in GPU architectures enable full GPU solutions that are high performance, energy efficient, and CPU-independent. In addition, new technologies such as half precision arithmetic (FP16) help the design of new solvers that are significantly faster and even more energy efficient. While FP16 arithmetic has been a powerful tool for deep learning applications, our designs show that it is also very useful for boosting performance and energy efficiency of linear solvers. The new developments complement the hybrid algorithms in the MAGMA library, and provide users with a wide variety of designs that fit different requirements of performance, energy efficiency, and numerical accuracy.
Learn how to develop fast and energy-efficient linear solvers using GPUs. Hybrid CPU-GPU techniques achieve high performance at the cost of extra power consumption. The new advancements in GPU architectures enable full GPU solutions that are high performance, energy efficient, and CPU-independent. In addition, new technologies such as half precision arithmetic (FP16) help the design of new solvers that are significantly faster and even more energy efficient. While FP16 arithmetic has been a powerful tool for deep learning applications, our designs show that it is also very useful for boosting performance and energy efficiency of linear solvers. The new developments complement the hybrid algorithms in the MAGMA library, and provide users with a wide variety of designs that fit different requirements of performance, energy efficiency, and numerical accuracy.  Back
 
Topics:
Algorithms & Numerical Techniques, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8478
Streaming:
Download:
Share:
 
Abstract:

Learn how to accelerate your machine learning, data mining, and other algorithms through fast matrix and tensor operations on GPUs. There's an increasing demand for accelerated independent computations on tensors and many small matrices. Although common, these workloads cannot be efficiently executed using standard linear algebra libraries. To fill the gap, we developed the MAGMA Batched library that achieves dramatically better performance by repetitively executing the small operations in "batches." We'll describe a methodology on how to develop high-performance BLAS, SVD, factorizations, and solvers for both large- and small-batched matrices. We'll also present the current state-of-the-art implementations and community efforts to standardize an API that extends BLAS for Batched computations.

Learn how to accelerate your machine learning, data mining, and other algorithms through fast matrix and tensor operations on GPUs. There's an increasing demand for accelerated independent computations on tensors and many small matrices. Although common, these workloads cannot be efficiently executed using standard linear algebra libraries. To fill the gap, we developed the MAGMA Batched library that achieves dramatically better performance by repetitively executing the small operations in "batches." We'll describe a methodology on how to develop high-performance BLAS, SVD, factorizations, and solvers for both large- and small-batched matrices. We'll also present the current state-of-the-art implementations and community efforts to standardize an API that extends BLAS for Batched computations.

  Back
 
Topics:
Tools & Libraries, Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7728
Download:
Share:
 
Abstract:
Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learning, data mining, dense and sparse solvers, high-order FEM, astrophysics, and more. We will consider the development of batched computations for these applications, stressing innovative GPU techniques and algorithms for uniform, as well as variable-size batches, tensor contractions, batched BLAS, and more. Batched computations can fill up the GPU with work, remove scheduling overheads and costly CPU-GPU communications to accelerate the computation often by an order of magnitude compared to non-batched approaches.
Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learning, data mining, dense and sparse solvers, high-order FEM, astrophysics, and more. We will consider the development of batched computations for these applications, stressing innovative GPU techniques and algorithms for uniform, as well as variable-size batches, tensor contractions, batched BLAS, and more. Batched computations can fill up the GPU with work, remove scheduling overheads and costly CPU-GPU communications to accelerate the computation often by an order of magnitude compared to non-batched approaches.  Back
 
Topics:
Algorithms & Numerical Techniques, Tools & Libraries, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6509
Streaming:
Download:
Share:
 
Abstract:
Here you will learn techniques for small matrix computations on GPUs and their use for energy efficient, high-performance solvers. Work on small problems delivers high performance through improved data re-use. Many numerical libraries and applications need this functionality further developed. We describe the main factorizations -LU, QR, and Cholesky- for a set of small dense matrices in parallel. We achieve significant acceleration and reduced energy consumption against other solutions. Our techniques are of interest to GPU application developers in general. We will show extensions to large entirely GPU solvers, review and compare against the hybrid CPU-GPU algorithms in MAGMA, analyze the pros and cons of hybrid vs. just GPU approaches on high-end systems and low-end embedded devices.
Here you will learn techniques for small matrix computations on GPUs and their use for energy efficient, high-performance solvers. Work on small problems delivers high performance through improved data re-use. Many numerical libraries and applications need this functionality further developed. We describe the main factorizations -LU, QR, and Cholesky- for a set of small dense matrices in parallel. We achieve significant acceleration and reduced energy consumption against other solutions. Our techniques are of interest to GPU application developers in general. We will show extensions to large entirely GPU solvers, review and compare against the hybrid CPU-GPU algorithms in MAGMA, analyze the pros and cons of hybrid vs. just GPU approaches on high-end systems and low-end embedded devices.  Back
 
Topics:
Developer - Algorithms, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5476
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next