Reservoir simulation involve sparse iterative solvers for linear systems that arise from implicit discretizations of coupled PDEs from high-fidelity reservoir simulators. One of the major bottlenecks in these solvers is the sparse matrix-vector product. Sparse matrices are usually compressed in some format (e.g., CSR, ELL) before being processed. In this talk, we focus on the low-level design of a sparse matrix-vector (SpMV) kernel on GPUs. Most of the relevant contributions focus on introducing new formats that suit the GPU architecture such as the diagonal format for diagonal matrices and the blocked-ELL format for sparse matrices with small dense blocks. However, we target both generic and domain-specific implementations. Generic implementations basically target the CSR and ELL formats, in order to be part of the KAUST-BLAS library. More chances for further optimizations appear when the matrix has specific structure. In the talk, we will present the major design challenges and outlines, and preliminary results. The primary focus will be on the CSR format, where some preliminary results will be shown. The other bottleneck of reservoir simulations is the preconditioning in the sparse matrix solver. We investigate the possibility of a Fast Multipole Method based technique on GPUs as a compute-bound preconditioner.
See the newest developments in the area of hierarchical N-body methods for GPU computing. Hierarchical N-body methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. In this session we will cover topics such as hybridization of treecodes and fast multipole methods, auto-tuning kernels for heterogenous systems, fast tree construction based on prefix sums, fast load balancing of global trees, and more. Examples will be given using ExaFMM --an open source hierarchical N-body library for heterogenous systems developed by the speaker. (released at SC11)