Abstract:
Learn our solutions for increasing GPU resource utilization on an on-premise DGX-2 node and public clouds. In this talk we present our operational experiences of a set of multi-tenant deep learning workloads selected through an open competition. To host them we use and extend the Backend.AI framework as the resource and computation manager. While tailored for both educational and research-oriented workloads, it offers a topology-aware multi-GPU resource scheduler combined with fractional GPU scaling implemented via API-level CUDA virtualization, achieving higher GPU utilization compared to vanilla setups.