Abstract:
Well discuss the RAPIDS ecosystem, which is accelerating the data science workflow by keeping data and computations on GPUs. Were able to go from ingestion to insights more quickly, with larger workloads. Within RAPIDS, cuML provides a sklearn-like application programming interface (API) and cuGraph a NetworkX API of GPU-accelerated algorithms. While over 100x of speedup is possible on a single GPU, the scale is bounded by the devices available memory space. By scaling to multiple GPUs spread across multiple nodes, cuML and cuGraph can increase speedup even further while providing avenues to scale up and out. Well focus on how we enabled the training and inference of machine learning and graph models on multiple nodes within cuML and cuGraph, and provide an architectural overview of our communications API, which is enabling GPU-to-GPU direct memory transfers. Well conclude with examples and benchmarks.