GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

Learn how to accelerate your machine learning, data mining, and other algorithms through fast matrix and tensor operations on GPUs. There's an increasing demand for accelerated independent computations on tensors and many small matrices. Although common, these workloads cannot be efficiently executed using standard linear algebra libraries. To fill the gap, we developed the MAGMA Batched library that achieves dramatically better performance by repetitively executing the small operations in "batches." We'll describe a methodology on how to develop high-performance BLAS, SVD, factorizations, and solvers for both large- and small-batched matrices. We'll also present the current state-of-the-art implementations and community efforts to standardize an API that extends BLAS for Batched computations.

Learn how to accelerate your machine learning, data mining, and other algorithms through fast matrix and tensor operations on GPUs. There's an increasing demand for accelerated independent computations on tensors and many small matrices. Although common, these workloads cannot be efficiently executed using standard linear algebra libraries. To fill the gap, we developed the MAGMA Batched library that achieves dramatically better performance by repetitively executing the small operations in "batches." We'll describe a methodology on how to develop high-performance BLAS, SVD, factorizations, and solvers for both large- and small-batched matrices. We'll also present the current state-of-the-art implementations and community efforts to standardize an API that extends BLAS for Batched computations.

  Back
 
Topics:
Tools & Libraries, Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7728
Download:
Share:
 
Abstract:
In this session you will learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. We will show a number of novel algorithms for solving linear systems and eigenvalue problems. Besides the algorithmic developments, we will present the methodology for their implementation on multiGPU platforms. Ease of development is achieved through a programming model that allows to express algorithms through sequential code that gets executed in parallel by a run-time system that schedules the execution over GPUs and multicore CPUs while seamlessly moves data between (when needed) GPUs and CPUs. The implementations are open source, available through the MAGMA library - a next generation of Sca/LAPACK for heterogeneous architectures. Besides the Sca/LAPACK functionality for dense linear algebra problems, we will present a new MAGMA component that deals with sparse linear algebra problems as well.
In this session you will learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. We will show a number of novel algorithms for solving linear systems and eigenvalue problems. Besides the algorithmic developments, we will present the methodology for their implementation on multiGPU platforms. Ease of development is achieved through a programming model that allows to express algorithms through sequential code that gets executed in parallel by a run-time system that schedules the execution over GPUs and multicore CPUs while seamlessly moves data between (when needed) GPUs and CPUs. The implementations are open source, available through the MAGMA library - a next generation of Sca/LAPACK for heterogeneous architectures. Besides the Sca/LAPACK functionality for dense linear algebra problems, we will present a new MAGMA component that deals with sparse linear algebra problems as well.  Back
 
Topics:
Numerical Algorithms & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4541
Streaming:
Download:
Share:
 
Abstract:

Learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. A number of novel algorithms and the methodology used for their implementation on multiGPU platforms will be shown. The implementations are open source, available through the MAGMA library âààa next generation of LAPACK for heterogeneous architectures. Included are both linear system and eigenproblem solvers for both dense and sparse computations. The developments incorporate advances made through the CUDA Center of Excellence (CCOE) at University of Tennessee, the CCOE at King Abdullah University of Science and Technology, Saudi Arabia, and at INRIA, France though the StarPU and MORSE projects.

Learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. A number of novel algorithms and the methodology used for their implementation on multiGPU platforms will be shown. The implementations are open source, available through the MAGMA library âààa next generation of LAPACK for heterogeneous architectures. Included are both linear system and eigenproblem solvers for both dense and sparse computations. The developments incorporate advances made through the CUDA Center of Excellence (CCOE) at University of Tennessee, the CCOE at King Abdullah University of Science and Technology, Saudi Arabia, and at INRIA, France though the StarPU and MORSE projects.

  Back
 
Topics:
Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3281
Streaming:
Download:
Share:
 
Abstract:

Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.

Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.

  Back
 
Topics:
Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3584
Streaming:
Download:
Share:
 
Abstract:

To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements.  Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months.  An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research.  Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:

 

  • Barcelona Supercomputing Center, OmpSs: Leveraging CUDA for Productive Programming in Clusters of Multi-GPU Systems
  • Harvard University, Massive Cross-correlation in radio Astronomy with Graphics Processing Units
  • Tokyo Tech, TSUBAME 2.0
  • University of Tennessee, MAGMA: A breakthrough in Solvers for Eigenvalue Problems

 

Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment.  After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.

To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements.  Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months.  An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research.  Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:

 

  • Barcelona Supercomputing Center, OmpSs: Leveraging CUDA for Productive Programming in Clusters of Multi-GPU Systems
  • Harvard University, Massive Cross-correlation in radio Astronomy with Graphics Processing Units
  • Tokyo Tech, TSUBAME 2.0
  • University of Tennessee, MAGMA: A breakthrough in Solvers for Eigenvalue Problems

 

Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment.  After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.

  Back
 
Topics:
General Interest
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S4000
Download:
Share:
 
Speakers:
Hatem Ltaief, Stan Tomov
- University of Tennessee
Abstract:
Learn how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. Examples will be given using MAGMA, a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms to the highly parallel GPU and multicore heterogeneous architectures.
Learn how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. Examples will be given using MAGMA, a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms to the highly parallel GPU and multicore heterogeneous architectures.  Back
 
Topics:
HPC and AI, Tools & Libraries, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102138
Streaming:
Download:
Share:
 
Speakers:
Amitabh Varshney, Stan Tomov, Wei Ge
- University of Tennessee, University of Maryland, Institute of Process Engineering, Chinese Academy of Sciences
Abstract:
Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Dr. Wei Ge at the Chinese Academy of Science, Professor Amitabh Varshney at the University of Maryland, and Adjunct Assistant Professor Stan Tomov at the University of Tennessee - Knoxville.
Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Dr. Wei Ge at the Chinese Academy of Science, Professor Amitabh Varshney at the University of Maryland, and Adjunct Assistant Professor Stan Tomov at the University of Tennessee - Knoxville.   Back
 
Topics:
General Interest
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102263
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next