Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.