Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in astrophysics. We have ported the PPM Hydrodynamics and Magnetohydrodynamics solvers to GPU and integrated the GPU solvers fully into the AMR framework. This talk will describe the porting strategy and performance results.
Learn how to port legacy Fortran plasma codes to GPU. Many legacy plasma codes are written in Fortran and have many lines of codes. We will discuss techniques in porting such legacy codes easily and efficiently to CUDA C/C++. Performance analysis of major algorithmic patterns in plasma codes will be discussed. The discussion will use the GTC and GeFi plasma code as realistic examples.
SJTU-NS3D is an in-house CFD code co-developed by SJTU and COMAC for large civil airplane, solving 3D Reynolds Average Navier-Stokes (RANS) equations on structured grids by finite volume method, which could be used in designing wing model. In this talk, we will present the design and further optimization of CUDA version of SJTU-NS3D, and it achieves 20-fold speedup for standard M6 wing model and 37-fold speedup for wing model candidate from COMAC on single Fermi C2050.
Adaptive mesh fluid simulations play a crucial role in many areas of astrophysical research including the formation and explosion of stars, jets from black holes, etc. A parallel adaptive mesh multi-physics fluid code, Enzo, has been widely used in astrophysical community in recent years. In this talk I will describe a CUDA implementation of the finite volume fluid solver used in Enzo. The GPU version shows significant speed-up compared to the CPU version.
In this session, we will discuss how to optimize OpenCL programs on NVIDIA GPUs. Three main aspects are discussed: memory, execution configuration, and instruction throughput. On memory optimization, we will discuss how to increase bandwidth by global memory coalescing and using local memory. Then we will discuss the concept of occupancy and various considerations in specifying the execution configuration of a kernel. Finally, we discuss techniques for improving instruction throughput.