The next release of CUDA introduces Cooperative Groups, a new programming model that significantly improves cooperative thread programming. Cooperative Groups, along with new warp synchronous primitives, enables threads and blocks within a CUDA grid to synchronize, exchange data, and perform collective operations in a safe, explicit, and reliable manner. Cooperative Groups is an elegant and scalable programming model for expressing synchronization and communication between groups of parallel threads ranging in size from a subset of a warp to an entire CUDA grid launch. Both Cooperative Groups and the lower-level warp-synchronous primitives offer a safe and explicit mechanism for high-performance intra-warp communications. We'll cover the new programming model features in depth, including best practice examples.