In the GPU off-loading programming model, the CPU is the initiator, e.g. it prepares and orchestrates work for the GPU. In GPU-accelerated multi-node programs, the CPU has to do the same for the network interface as well. But the truth is that both the GPU and the network have sophisticated hardware resources, and these can be effectively short-circuited so to get rid of the CPU altogether. Meet PeerSync, which is a set of CUDA-Infiniband Verbs interoperability APIs which opens an unlimited number of possibilities. It also provides a scheme to go beyond the GPU-network duo, i.e. effectively employing the same ideas to other 3rd party devices.