We'll cover recent features and performance improvement in the NVIDIA collective communication library (NCCL). NCCL is designed to make computing on multiple GPUs easy and is integrated in most deep learning frameworks to accelerate training times. NCCL supports communication over Shared memory, PCI, NVLink, Sockets, and InfiniBand Verbs, to support both multi-GPU machines and multi-node clusters.
We'll present the functionalities of NCCL (pronounced "Nickel"), a standalone library of standard collective communication routines, such as all-gather, reduce, broadcast, etc., that have been optimized to achieve high bandwidth over GPU topologies. NCCL can be used in either single- or multi-process (for example, MPI) applications.