Abstract:
Here you will learn techniques for small matrix computations on GPUs and their use for energy efficient, high-performance solvers. Work on small problems delivers high performance through improved data re-use. Many numerical libraries and applications need this functionality further developed. We describe the main factorizations -LU, QR, and Cholesky- for a set of small dense matrices in parallel. We achieve significant acceleration and reduced energy consumption against other solutions. Our techniques are of interest to GPU application developers in general. We will show extensions to large entirely GPU solvers, review and compare against the hybrid CPU-GPU algorithms in MAGMA, analyze the pros and cons of hybrid vs. just GPU approaches on high-end systems and low-end embedded devices.