Learn how a successful implementation of a low memory footprint, multi-GPU iterative method makes it possible to efficiently resolve localization of spontaneous nonlinear flow in deforming porous media. Grasping this physical process is essential to ensure safe underground waste storage and understand natural fluid migration in reservoirs. We'll describe our parallel, matrix-free solver design, which provides a short time to solution and can solve a variety of coupled and nonlinear systems of partial differential equations in 3D. We will unveil the key algorithmic and optimization concepts that enable our stencil-based solvers to converge in few iterations, while tackling the hardware limit on the most recent NVIDIA high-bandwidth GPU accelerators. Also, we will explain how we achieved 98 percent parallel efficiency on 5000 NVIDIA Tesla P100 GPUs on the hybrid Cray XC-50 Piz Daint supercomputer at the Swiss National Supercomputing Centre, CSCS.