Learn about the latest developments in middleware design that boosts the performance of GPGPU based streaming applications. Several middlewares already support communication directly from GPU device memory and optimize it using various features offered by the CUDA toolkit, providing optimized performance. Some of these middlewares also take advantage of novel features like hardware based multicast that high performance networks like InfiniBand offer to boost broadcast performance. This talk will focus on challenges in combining and fully utilizing GPUDirect RDMA and hardware multicast features in tandem to design support for high performance broadcast operation for streaming applications. Performance results will be presented to demonstrate the efficacy of the proposed designs.