SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Numerical Algorithms & Libraries
Presentation
Media
Raising the Roofline on GPU Applications with Stacked Memory
Lorena Barba (George Washington University)
GPU applications face three potential bottlenecks: instruction throughput, memory throughput and latency. Sometimes we can refactor the algorithm to improve performance after profiling. Another approach is to use the roofline model to analyze computa ...Read More
GPU applications face three potential bottlenecks: instruction throughput, memory throughput and latency. Sometimes we can refactor the algorithm to improve performance after profiling. Another approach is to use the roofline model to analyze computational kernels and identify performance limitations on specific hardware. Such analysis characterizes many important scientific algorithms as memory-bound when running on GPUs. But as we look forward to new generations endowed with stacked DRAM, we see the roof magically lifting due to reduced latencies and higher bandwidths, leading to unprecedented speed-up factors in memory-bound algorithms. With my co-author Manuel Ujaldon, NVIDIA CUDA Fellow and Professor of Computer Architecture at the University of Malaga (Spain), we are looking at how scientific algorithms may benefit from the stacked DRAM of future GPU generations. In this talk, I will present how we characterize GPU application performance via the roofline model and analyze the contribution of stacked DRAM to anticipate its impact in raising performance ceilings in future GPUs like Volta.  Back
 
Keywords:
Numerical Algorithms & Libraries, Supercomputing & HPC, GTC 2014 - ID S4486
Streaming:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2017 NVIDIA Corporation Legal Info | Privacy Policy