Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.