Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models for implementing efficient tasking frameworks. Participants will learn about the pitfalls for tasking arising from the architectural differences between latency-driven CPUs and throughput-driven GPUs. To overcome these pitfalls, we consider programming concepts such as persistent threads, warp-aware data structures and CUDA asynchronous task graphs. In addition, we look at the latest GPU features such as forward progress guarantees and grid synchronization that facilitate the implementation of tasking approaches. A task-based fast multipole method for the molecular dynamics package GROMACS serves as use case for our considerations.