Abstract:
Learn how to optimize large complex-number reductions in material science code BerkeleyGW on NVIDIA GPUs. Our talk will showcase two BerkeleyGW kernels implemented with four frameworks CUDA, OpenACC, OpenMP 4.5, and Kokkos. We'll share optimization techniques used to achieve decent performance across all four implementations. We'll also report on the status of OpenACC and OpenMP 4.5 compilers and compare the performance portability capabilities of OpenACC, OpenMP 4.5, and Kokkos.
Learn how to optimize large complex-number reductions in material science code BerkeleyGW on NVIDIA GPUs. Our talk will showcase two BerkeleyGW kernels implemented with four frameworks CUDA, OpenACC, OpenMP 4.5, and Kokkos. We'll share optimization techniques used to achieve decent performance across all four implementations. We'll also report on the status of OpenACC and OpenMP 4.5 compilers and compare the performance portability capabilities of OpenACC, OpenMP 4.5, and Kokkos.
Back