GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn how to use the roofline model to analyze the performance of GPU-Accelerated applications. We'll cover the basics of the model and explain how to use it to analyze application performance and track progress. We'll also explain how to use nvprof to automate data collection on GPU-Accelerated systems. Demonstrations will include DOE proxy applications in arithmetic intensity, memory stride, memory coalescing, and thread divergence/prediction, all of which can be captured within the roofline methodology.
Learn how to use the roofline model to analyze the performance of GPU-Accelerated applications. We'll cover the basics of the model and explain how to use it to analyze application performance and track progress. We'll also explain how to use nvprof to automate data collection on GPU-Accelerated systems. Demonstrations will include DOE proxy applications in arithmetic intensity, memory stride, memory coalescing, and thread divergence/prediction, all of which can be captured within the roofline methodology.  Back
 
Topics:
Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9624
Streaming:
Download:
Share:
 
Abstract:
Learn how to optimize large complex-number reductions in material science code BerkeleyGW on NVIDIA GPUs. Our talk will showcase two BerkeleyGW kernels implemented with four frameworks CUDA, OpenACC, OpenMP 4.5, and Kokkos. We'll share optimization techniques used to achieve decent performance across all four implementations. We'll also report on the status of OpenACC and OpenMP 4.5 compilers and compare the performance portability capabilities of OpenACC, OpenMP 4.5, and Kokkos.
Learn how to optimize large complex-number reductions in material science code BerkeleyGW on NVIDIA GPUs. Our talk will showcase two BerkeleyGW kernels implemented with four frameworks CUDA, OpenACC, OpenMP 4.5, and Kokkos. We'll share optimization techniques used to achieve decent performance across all four implementations. We'll also report on the status of OpenACC and OpenMP 4.5 compilers and compare the performance portability capabilities of OpenACC, OpenMP 4.5, and Kokkos.  Back
 
Topics:
Performance Optimization, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9626
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next