Abstract:
We will explore the advantages of combining a model graph that swaps tensors between Volta GPUs with system memory that uses NVLink 2.0 connections between the GPUs and the system cores. GPU memory size can limit model size, image resolution, and batch sizes in neural network training. We'll show how get around those limitations. Our method uses both a graph modification library that adds tensor swap-in/swap-out operations to the graph and NVLink 2.0 connections to system cores and memory to quickly train with models, image resolutions, and batch sizes that were previously impossible. We'll also compare the graph modification module, system architecture, and performance results with standard benchmarks and other models.