Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning, and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology and in-network computing use to accelerates large scale deployments in HPC and artificial intelligence.
Come join us, and learn how to build a data-centric GPU clusters for artificial intelligence. We will briefly present the state-of-the-art techniques for distributed Machine Learning, and the special requirements they impose on the GPU cluster. Additionally, we will present an overview of interconnect technologies used to scale and accelerate distributed Machine Learning. During the session we will cover RDMA, NVIDIA's GPUDirect RDMA and GPUDirect Asynch as well as in-network-computing and how the use of those technologies enables new level of scalability and performance in large scale deployments in artificial intelligence and high performance computing.