Learn how to accelerate your machine learning, data mining, and other algorithms through fast matrix and tensor operations on GPUs. There's an increasing demand for accelerated independent computations on tensors and many small matrices. Although common, these workloads cannot be efficiently executed using standard linear algebra libraries. To fill the gap, we developed the MAGMA Batched library that achieves dramatically better performance by repetitively executing the small operations in "batches." We'll describe a methodology on how to develop high-performance BLAS, SVD, factorizations, and solvers for both large- and small-batched matrices. We'll also present the current state-of-the-art implementations and community efforts to standardize an API that extends BLAS for Batched computations.
Learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. A number of novel algorithms and the methodology used for their implementation on multiGPU platforms will be shown. The implementations are open source, available through the MAGMA library ÃÂ¢ÃÂ ÃÂ a next generation of LAPACK for heterogeneous architectures. Included are both linear system and eigenproblem solvers for both dense and sparse computations. The developments incorporate advances made through the CUDA Center of Excellence (CCOE) at University of Tennessee, the CCOE at King Abdullah University of Science and Technology, Saudi Arabia, and at INRIA, France though the StarPU and MORSE projects.
Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.
To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements. Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months. An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research. Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:
Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment. After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.