Understanding and characterizing performance problems of CPU-GPU programs, as well as providing insightful feedback to help guide programmer towards tuning their applications is critical to improving developer productivity. HPCToolkit is a start-of-the-art performance analysis tool that employs statistical sampling of timers and hardware counters, and attributes performance metrics to the hierarchical calling context. We extend HPCToolkit to measure and attribute performance of hybrid CPU-GPU codes. We present CPU-GPU blame shifting - a technique to identify code regions that underutilize CPU and/or GPU compute resources. We demonstrate the effectiveness of our tools on diverse scientific codes such as hydrodynamics (LULESH), molecular dynamics (LAMMPS), and epidemiology simulation(GPU-EpiSimdemics).
Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.