GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Get your hands on the latest versions of Score-P and Vampir to profile the execution behavior of your large-scale GPU-Accelerated applications. See how these HPC community tools pick up as other tools (such as NVVP) drop off when your application spans multiple compute nodes. Regardless of whether your application uses CUDA, OpenACC, OpenMP or OpenCL for acceleration, or whether it is written in C, C++, Fortran or Python, you will receive a high-resolution timeline view of all program activity alongside the standard profiles to identify hot spots and avenues for optimization. The novel Python support now also enables performance studies for optimizing the inner workings of deep learning frameworks.
Get your hands on the latest versions of Score-P and Vampir to profile the execution behavior of your large-scale GPU-Accelerated applications. See how these HPC community tools pick up as other tools (such as NVVP) drop off when your application spans multiple compute nodes. Regardless of whether your application uses CUDA, OpenACC, OpenMP or OpenCL for acceleration, or whether it is written in C, C++, Fortran or Python, you will receive a high-resolution timeline view of all program activity alongside the standard profiles to identify hot spots and avenues for optimization. The novel Python support now also enables performance studies for optimizing the inner workings of deep learning frameworks.  Back
 
Topics:
Tools & Libraries, HPC and Supercomputing
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9347
Streaming:
Download:
Share:
 
Abstract:
We discuss our experience on porting the CUDA-based plasma simulation code PIConGPU to heterogeneous platforms using the abstract kernel interface Alpaka. With the advent of next-generation architectures such as OpenPower, the full use of the hardware and the mapping of CPUs and GPUs to specific simulation tasks has become important. Performance portability is of great interest, but even more important is the ability to develop against a single interface to keep code testable and maintainable. We show how we can make use of the Alpaka library in real-world applications and how we achieve portability and performance.
We discuss our experience on porting the CUDA-based plasma simulation code PIConGPU to heterogeneous platforms using the abstract kernel interface Alpaka. With the advent of next-generation architectures such as OpenPower, the full use of the hardware and the mapping of CPUs and GPUs to specific simulation tasks has become important. Performance portability is of great interest, but even more important is the ability to develop against a single interface to keep code testable and maintainable. We show how we can make use of the Alpaka library in real-world applications and how we achieve portability and performance.  Back
 
Topics:
HPC and Supercomputing, Performance Optimization, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6298
Streaming:
Download:
Share:
 
Abstract:
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.  Back
 
Topics:
OpenACC, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6747
Streaming:
Share:
 
Abstract:
Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to record how much time is spent in OpenACC regions and what device activity it turns into. See how this can be turned into a natural timeline based visualization to show with great detail what an OpenACC application is doing at any point in time.
Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to record how much time is spent in OpenACC regions and what device activity it turns into. See how this can be turned into a natural timeline based visualization to show with great detail what an OpenACC application is doing at any point in time.  Back
 
Topics:
OpenACC, Tools & Libraries, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5139
Streaming:
Download:
Share:
 
Abstract:
OpenACC and OpenMP provide programmers with two good options for portable, high-level parallel programming for GPUs. This talk will discuss similarities and differences between the two specifications in terms of programmability, portability, and performance.
OpenACC and OpenMP provide programmers with two good options for portable, high-level parallel programming for GPUs. This talk will discuss similarities and differences between the two specifications in terms of programmability, portability, and performance.  Back
 
Topics:
OpenACC, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5196
Streaming:
Download:
Share:
 
Abstract:
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted by individual electrons in a large-scale, turbulent plasma, providing unique insight into the relation between the plasma dynamics and observable radiation spectra.
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted by individual electrons in a large-scale, turbulent plasma, providing unique insight into the relation between the plasma dynamics and observable radiation spectra.  Back
 
Topics:
Astronomy & Astrophysics, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4139
Streaming:
Download:
Share:
 
Abstract:

With GPUs large-scale plasma simulations can provide frames-per-second simulation speeds. We present interactive, in-GPU rendering of large-scale particle-in-cell simulations running on GPU clusters. The user can choose which data is visualized and change the direction of view while the simulation is running. A remote visualization client can connect to the running simulation, allowing for live visualization even when bandwidth is limited.

With GPUs large-scale plasma simulations can provide frames-per-second simulation speeds. We present interactive, in-GPU rendering of large-scale particle-in-cell simulations running on GPU clusters. The user can choose which data is visualized and change the direction of view while the simulation is running. A remote visualization client can connect to the running simulation, allowing for live visualization even when bandwidth is limited.

  Back
 
Topics:
Combined Simulation & Real-Time Visualization, Graphics and AI, Large Scale Data Visualization & In-Situ Graphics, Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4140
Streaming:
Download:
Share:
 
Abstract:

With PIConGPU, new physics phenomena previously not accessible within laser plasma simulations can be studied, which will help us optimize laser-driven radiation sources. Presents results on laser wakefield acceleration of electrons simulated on the Oakridge TITAN system and discuss in detail which techniques help us to get the most out of these clusters. Finally showing how to add fault-tolerance and load-balancing to a large hybridh CPU-GPU code such as PIConGPU to achieve optimum performance.

With PIConGPU, new physics phenomena previously not accessible within laser plasma simulations can be studied, which will help us optimize laser-driven radiation sources. Presents results on laser wakefield acceleration of electrons simulated on the Oakridge TITAN system and discuss in detail which techniques help us to get the most out of these clusters. Finally showing how to add fault-tolerance and load-balancing to a large hybridh CPU-GPU code such as PIConGPU to achieve optimum performance.

  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3026
Streaming:
Download:
Share:
 
Abstract:

Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.

Application profiling allows developers to assess the opportunity for improving application performance using GPUs. Attend this session if you are interested in understanding the CUPTI, and how several popular tools (NVIDIA NSite, TAU, Vampir, PAPI, and HPCToolkit) make use of this profiling library. This will be run as a panel session with good opportunity for audience interaction.

  Back
 
Topics:
Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3584
Streaming:
Download:
Share:
 
Abstract:

With powerful lasers breaking the Petawatt barrier, applications for laser-accelerated particle beams are gaining more interest than ever. Ion beams accelerated by intense laser pulses foster new ways of treating cancer and make them available to more people than ever before. Laser-generated electron beams can drive new compact x-ray sources to create snapshots of ultrafast processes in materials. With PIConGPU laser-driven particle acceleration can be computed in hours compared to weeks on standard CPU clusters. We present the techniques behind PIConGPU, detailed performance analysis and the benefits of PIConGPU for real-world physics cases.

With powerful lasers breaking the Petawatt barrier, applications for laser-accelerated particle beams are gaining more interest than ever. Ion beams accelerated by intense laser pulses foster new ways of treating cancer and make them available to more people than ever before. Laser-generated electron beams can drive new compact x-ray sources to create snapshots of ultrafast processes in materials. With PIConGPU laser-driven particle acceleration can be computed in hours compared to weeks on standard CPU clusters. We present the techniques behind PIConGPU, detailed performance analysis and the benefits of PIConGPU for real-world physics cases.

  Back
 
Topics:
Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2067
Streaming:
Download:
Share:
 
Abstract:

Get in contact with performance tuning experts for multi-hybrid applications and see first hand how VampirTrace/Vampir can significantly speed up application porting and development.

Get in contact with performance tuning experts for multi-hybrid applications and see first hand how VampirTrace/Vampir can significantly speed up application porting and development.

  Back
 
Topics:
Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2257
Streaming:
Download:
Share:
 
Speakers:
Guido Juckeland, Michael Bussmann
- TU Dresden - ZIH, Forschungszentrum Dresden-Rossendorf
Abstract:
Dive deep into a multi-parallel Particle in Cell code that utilizes MPI, pthreads, and CUDA. Around this specific application a general C++ framework for transparent data transfers between GPUs has been developed and will be presented. Further techniques employed include interleaving of communication and computation, particle tiling and a study of how well CUDA performance can be transferred to OpenCL.
Dive deep into a multi-parallel Particle in Cell code that utilizes MPI, pthreads, and CUDA. Around this specific application a general C++ framework for transparent data transfers between GPUs has been developed and will be presented. Further techniques employed include interleaving of communication and computation, particle tiling and a study of how well CUDA performance can be transferred to OpenCL.  Back
 
Topics:
Physics Simulation, Astronomy & Astrophysics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102090
Streaming:
Download:
Share:
 
Speakers:
Guido Juckeland, Jeremy Meredith
- TU Dresden - ZIH, Oak Ridge National Laboratory
Abstract:
Learn how applications can be executed over multiple GPUs located in multiple hosts, what the challenges are to scale one application to a 20 PFLOP/s machine and why tool support is a necessity. Receive an overview on the available performance analysis tools that support CUDA developers in generating applications with outstanding speedups.
Learn how applications can be executed over multiple GPUs located in multiple hosts, what the challenges are to scale one application to a 20 PFLOP/s machine and why tool support is a necessity. Receive an overview on the available performance analysis tools that support CUDA developers in generating applications with outstanding speedups.  Back
 
Topics:
HPC and AI, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102089
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next