OpenACC has quickly become the standard for accelerating large code bases with GPUs. Using directives, the programmer provides hints about data locality, data dependency and control flow that allows the compiler to automatically generate efficient GPU code. While the OpenACC model is well suited for a broad range of commonly encountered software patterns, it is sometimes necessary to fine-tune an application with advanced OpenACC directives or interface to an external CUDA code to take advantage of latest hardware features. The goal of this tutorial is to present the different strategies to tune OpenACC code and introduce mechanisms to interface OpenACC with other GPU code. Based on examples, we will first present different strategies to assess and optimize the performance of an OpenACC code, and will then focus on interfacing OpenACC code with CUDA and graphics libraries.