NVIDIA's DGX-2 system offers a unique architecture which connects 16 GPUs together via the high-speed NVLink interface, along with NVSwitch which enables unprecedented bandwidth between processors. This talk will take an in depth look at the properties of this system along with programming techniques to take maximum advantage of the system architecture.
Atomic memory operations provide powerful communication and coordination capabilities for parallel programs, including the well-known operations compare-and-swap and fetch-and-add. The atomic operations enable the creation of parallel algorithms and data structures that would otherwise be very difficult (or impossible) to express without them - for example: shared parallel data structures, parallel data aggregation, and control primitives such as semaphores and mutexes. In this talk we will use examples to describe atomic operations, explain how they work, and discuss performance considerations and pitfalls when using them.