Learn about various methods and trade-offs in the distributed GPU implementation of molecular dynamics proxy application that achieves more than 90% weak scaling efficiency on 512 GPU nodes. CoMD represents a reference implementation of classical molecular dynamics algorithms and workloads. It is created and maintained by The Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) and is part of the R&D100 Award-winning Mantevo 1.0 software suite. In this talk we will discuss the main techniques and methods that are involved in GPU implementation of CoMD, including (1) cell-based and neighbor list approaches for neighbor particles search, (2) different thread-mapping strategies and memory layouts. An efficient distributed implementation will be covered in detail. Interior/boundary cells separation is used to allow efficient asynchronous processing and concurrent execution of kernels, memory copies and MPI transfers.