GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

Understanding and characterizing performance problems of CPU-GPU programs, as well as providing insightful feedback to help guide programmer towards tuning their applications is critical to improving developer productivity. HPCToolkit is a start-of-the-art performance analysis tool that employs statistical sampling of timers and hardware counters, and attributes performance metrics to the hierarchical calling context. We extend HPCToolkit to measure and attribute performance of hybrid CPU-GPU codes. We present CPU-GPU blame shifting - a technique to identify code regions that underutilize CPU and/or GPU compute resources. We demonstrate the effectiveness of our tools on diverse scientific codes such as hydrodynamics (LULESH), molecular dynamics (LAMMPS), and epidemiology simulation(GPU-EpiSimdemics).

Understanding and characterizing performance problems of CPU-GPU programs, as well as providing insightful feedback to help guide programmer towards tuning their applications is critical to improving developer productivity. HPCToolkit is a start-of-the-art performance analysis tool that employs statistical sampling of timers and hardware counters, and attributes performance metrics to the hierarchical calling context. We extend HPCToolkit to measure and attribute performance of hybrid CPU-GPU codes. We present CPU-GPU blame shifting - a technique to identify code regions that underutilize CPU and/or GPU compute resources. We demonstrate the effectiveness of our tools on diverse scientific codes such as hydrodynamics (LULESH), molecular dynamics (LAMMPS), and epidemiology simulation(GPU-EpiSimdemics).

  Back
 
Topics:
Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3256
Streaming:
Download:
Share:
 
Abstract:

This talk presents an overview of the implementation of the particle pusher which targets NVIDIA GPUs by extending a novel energy- and charge- conserving 1D electrostatic particle pushing algorithm to a 2D electromagnetic version. Energy is conserved by using a fully implicit time integration, and particles are carefully treated at cell boundaries to maintain charge conservation. The momentum in the system is controlled by an adaptive orbit integrator that compares a first and second order integration scheme. Implementation is based on the CUDA 4.1 framework. Implementation effectively exploits the memory hierarchy on the GPU by employing the texture memory to access the electric and magnetic fields, and the shared memory to accumulate the charge and current density before a global accumulation. Evaluating a red-black scheduling scheme of CUDA blocks to reduce contention while global accumulation. Effectively utilize multiple GPUs to perform computation for different species of particles. Showcases the CUDA implementation via a two species (ion, electron) plasma physics application where the particles are in equilibrium.

This talk presents an overview of the implementation of the particle pusher which targets NVIDIA GPUs by extending a novel energy- and charge- conserving 1D electrostatic particle pushing algorithm to a 2D electromagnetic version. Energy is conserved by using a fully implicit time integration, and particles are carefully treated at cell boundaries to maintain charge conservation. The momentum in the system is controlled by an adaptive orbit integrator that compares a first and second order integration scheme. Implementation is based on the CUDA 4.1 framework. Implementation effectively exploits the memory hierarchy on the GPU by employing the texture memory to access the electric and magnetic fields, and the shared memory to accumulate the charge and current density before a global accumulation. Evaluating a red-black scheduling scheme of CUDA blocks to reduce contention while global accumulation. Effectively utilize multiple GPUs to perform computation for different species of particles. Showcases the CUDA implementation via a two species (ion, electron) plasma physics application where the particles are in equilibrium.

  Back
 
Topics:
Computational Physics, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3144
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next