We see increasing demand for easy to use, fast, high-resolution image and video manipulation tools. Recently, Criminisi et al. proposed the geodesic distance transform (GDT) which can be used to implement several interesting image and video editing tasks efficiently for high resolution imagery. In this work we present an efficient CUDA GDT implementation. The key contribution is the introduction of a score-boarding mechanism for CUDA blocks. This significantly improves the achieved overlap of memory transfers and computation and reduces kernel launch overheads.