Come and learn about new fast low-rank matrix computations on GPUs! By exploiting the low-rank off-diagonal block structure, we design and implement fast linear algebra operations on massively parallel hardware architectures. The main idea is to refactor the numerical algorithms and the corresponding implementations by aggregating similar numerical operations in terms of highly optimized batched kernels. Applications in weather prediction, seismic imaging and material science are employed to assess the trade-off between numerical accuracy and parallel performance of these fast matrix computations compared to more traditional approaches..
Have you heard about the largest ground-based telescope ever built? Are you interested in the newest NVIDIA DGX-1 hardware accelerator? Come and learn how the DGX-1 architecture dramatically leaps forward the computational astronomy community in designing major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we'll explain how the resulting matrix computations associated with an efficient task-based programming model help design the next generation of telescope instruments.
Learn how GPUs help design major, multimillion-dollar optical instruments for the European Extremely Large Telescope. From the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, this talk will explain how the resulting dense linear algebra operations associated with an efficient task-based programming model help design the next generation of telescope instruments.
Learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. A number of novel algorithms and the methodology used for their implementation on multiGPU platforms will be shown. The implementations are open source, available through the MAGMA library ÃÂ¢ÃÂ ÃÂ a next generation of LAPACK for heterogeneous architectures. Included are both linear system and eigenproblem solvers for both dense and sparse computations. The developments incorporate advances made through the CUDA Center of Excellence (CCOE) at University of Tennessee, the CCOE at King Abdullah University of Science and Technology, Saudi Arabia, and at INRIA, France though the StarPU and MORSE projects.
Reservoir simulation involve sparse iterative solvers for linear systems that arise from implicit discretizations of coupled PDEs from high-fidelity reservoir simulators. One of the major bottlenecks in these solvers is the sparse matrix-vector product. Sparse matrices are usually compressed in some format (e.g., CSR, ELL) before being processed. In this talk, we focus on the low-level design of a sparse matrix-vector (SpMV) kernel on GPUs. Most of the relevant contributions focus on introducing new formats that suit the GPU architecture such as the diagonal format for diagonal matrices and the blocked-ELL format for sparse matrices with small dense blocks. However, we target both generic and domain-specific implementations. Generic implementations basically target the CSR and ELL formats, in order to be part of the KAUST-BLAS library. More chances for further optimizations appear when the matrix has specific structure. In the talk, we will present the major design challenges and outlines, and preliminary results. The primary focus will be on the CSR format, where some preliminary results will be shown. The other bottleneck of reservoir simulations is the preconditioning in the sparse matrix solver. We investigate the possibility of a Fast Multipole Method based technique on GPUs as a compute-bound preconditioner.
See the newest features integrated in MAGMA (Matrix Algebra on GPU and Multicore Architectures) to tackle the multiple GPU-based systems for numerical linear algebra. In this talk, we describe how we leveraged MAGMA to solve existing and new challenging numerical problems on multiple hardware accelerators. Using a hybridization methodology, the new multiGPU-enabled MAGMA is characterized by a representation of linear algebra algorithms as directed acyclic graphs, where nodes correspond to tasks and edges to data dependencies among them, and a dynamic runtime system environment StarPU used to schedule various computational kernels over hybrid architectures of GPUs and homogeneous multicores.