GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax=b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16 and FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4×speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax=b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16 and FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4×speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1930
Streaming:
Share:
 
Abstract:
In this talk we will look at the current state of high performance computing and look to the future toward exascale. In addition, we will examine some issues that can help in reducing the power consumption for linear algebra computations.
In this talk we will look at the current state of high performance computing and look to the future toward exascale. In addition, we will examine some issues that can help in reducing the power consumption for linear algebra computations.  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1733
Download:
Share:
 
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2016
Session ID:
SC6116
Streaming:
Share:
 
Abstract:

This talk will highlight the emerging technologies in high performance computing. We will look at the development of accelerators and some of the accomplishments in the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project. We use a hybridization methodology that is built on representing linear algebra algorithms as collections of tasks and data dependencies, as well as properly scheduling the tasks' execution over the available multicore and GPU hardware components.

This talk will highlight the emerging technologies in high performance computing. We will look at the development of accelerators and some of the accomplishments in the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project. We use a hybridization methodology that is built on representing linear algebra algorithms as collections of tasks and data dependencies, as well as properly scheduling the tasks' execution over the available multicore and GPU hardware components.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2013
Session ID:
SC3119
Streaming:
Download:
Share:
 
Abstract:

This talk will highlight the latest accomplishments in the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project. We use a hybridization methodology that is built on representing linear algebra algorithms as collections of tasks and data dependencies, as well as properly scheduling the tasks' execution over the available multicore and GPU hardware components. This methodology is applied in MAGMA to develop high-performance fundamental linear algebra routines, such as the one-sided dense matrix factorizations (LU, QR, and Cholesky) and linear solvers, two-sided dense matrix factorizations (bidiagonal, tridiagonal, and Hessenberg reductions) for singular and eigenvalue problems, in addition to iterative linear and eigenvalue solvers. MAGMA is designed to be similar to LAPACK in functionality, data storage, and interface, in order to allow scientists to effortlessly port any of their LAPACK-relying software components to take advantage of the new architectures.

This talk will highlight the latest accomplishments in the Matrix Algebra on GPU and Multicore Architectures (MAGMA) project. We use a hybridization methodology that is built on representing linear algebra algorithms as collections of tasks and data dependencies, as well as properly scheduling the tasks' execution over the available multicore and GPU hardware components. This methodology is applied in MAGMA to develop high-performance fundamental linear algebra routines, such as the one-sided dense matrix factorizations (LU, QR, and Cholesky) and linear solvers, two-sided dense matrix factorizations (bidiagonal, tridiagonal, and Hessenberg reductions) for singular and eigenvalue problems, in addition to iterative linear and eigenvalue solvers. MAGMA is designed to be similar to LAPACK in functionality, data storage, and interface, in order to allow scientists to effortlessly port any of their LAPACK-relying software components to take advantage of the new architectures.

  Back
 
Topics:
Tools & Libraries
Type:
Talk
Event:
Supercomputing
Year:
2012
Session ID:
SC2012
Download:
Share:
 
Speakers:
Jack Dongarra
 
Topics:
Tools & Libraries
Type:
Talk
Event:
Supercomputing
Year:
2011
Session ID:
SC1002
Streaming:
Share:
 
Speakers:
Jack Dongarra
- University of Tennessee
 
Topics:
Tools & Libraries
Type:
Talk
Event:
Supercomputing
Year:
2010
Session ID:
SC1002
Download:
Share:
 
Speakers:
Jack Dongarra
- University of Tennessee
 
Topics:
Tools & Libraries
Type:
Talk
Event:
Supercomputing
Year:
2009
Session ID:
SC0901
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next