SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Astronomy and Astrophysics
Presentation
Media
Potential Field Solutions of the Solar Corona: Converting a PCG Solver from MPI to MPI+OpenACC
We'll describe a real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code, and show timing results for multi-node, multi-GPU runs. The code's application is obtaining 3D spherical potential field (PF) solu ...Read More
We'll describe a real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code, and show timing results for multi-node, multi-GPU runs. The code's application is obtaining 3D spherical potential field (PF) solutions of the solar corona using observational boundary conditions. PF solutions yield approximations of the coronal magnetic field structure and can be used as initial/boundary conditions for MHD simulations with applications to space weather prediction. We highlight key tips and strategies used when converting the MPI code to MPI+OpenACC, including linking Fortran code to the cuSparse library, using CUDA-aware MPI, maintaining performance portability, and dealing with multi-node, multi-GPU run-time environments. We'll show timing results for three increasing-sized problems for running the code with MPI-only (up to 1728 CPU cores), and with MPI+GPU (up to 60 GPUs) using NVIDIA K80 and P100 GPUs.  Back
 
Keywords:
Astronomy and Astrophysics, HPC and Supercomputing, GTC Silicon Valley 2017 - ID S7535
Download:
Share:
 
Comparison of OpenACC and OpenMP4.5 Offloading: Speeding Up Simulations of Stellar Explosions
Learn about a case-study comparing OpenACC and OpenMP4.5 in the context of stellar explosions. Modeling supernovae requires multi-physics simulation codes to capture hydrodynamics, nuclear burning, gravitational forces, etc. As a nuclear detonat ...Read More

Learn about a case-study comparing OpenACC and OpenMP4.5 in the context of stellar explosions. Modeling supernovae requires multi-physics simulation codes to capture hydrodynamics, nuclear burning, gravitational forces, etc. As a nuclear detonation burns through the stellar material, it also increases the temperature. An equation of state (EOS) is then required to determine, say, the new pressure associated with this temperature increase. In fact, an EOS is needed after the thermodynamic conditions are changed by any physics routines. This means it is called many times throughout a simulation, requiring the need for a fast EOS implementation. Fortunately, these calculations can be performed independently during each time step, so the work can be offloaded to GPUs. Using the IBM/NVIDIA early test system (precursor to the upcoming Summit supercomputer) at Oak Ridge National Laboratory, we use a hybrid MPI+OpenMP (traditional CPU threads) driver program to offload work to GPUs. We'll compare the performance results as well as some of the currently available features of OpenACC and OpenMP4.5.

  Back
 
Keywords:
Astronomy and Astrophysics, HPC and Supercomputing, GTC Silicon Valley 2017 - ID S7635
Download:
Share:
Computational Biology and Chemistry
Presentation
Media
Using OpenACC for NGS Techniques to Create a Portable and Easy-to-Use Code Base
Happy with your code but re-writing every time a hardware platform changes? Know NVIDIA CUDA but want to use a higher-level programming model? OpenACC is a directive-based technique that enables more science and less programming. The model facilitate ...Read More
Happy with your code but re-writing every time a hardware platform changes? Know NVIDIA CUDA but want to use a higher-level programming model? OpenACC is a directive-based technique that enables more science and less programming. The model facilitates reusing code base on more than one platform. This session will help you: (1) Learn how to incrementally improve a bioinformatics code base using OpenACC without losing performance (2) Explore how to apply optimization techniques and the challenges encountered in the process. We'll share our experience using OpenACC for DNA Next Generation Sequencing techniques.  Back
 
Keywords:
Computational Biology and Chemistry, Programming Languages, Performance Optimization, GTC Silicon Valley 2017 - ID S7341
Download:
Share:
Computational Fluid Dynamics
Presentation
Media
Porting and Optimization of Search of Neighbor-Particle by Using OpenACC
MPS method is a sort of particle method (not a stencil computation) used for computational fluid dynamics. "Search of neighbor-particle" is a main bottleneck of MPS. We show our porting efforts and three optimizations of search of neig ...Read More

MPS method is a sort of particle method (not a stencil computation) used for computational fluid dynamics. "Search of neighbor-particle" is a main bottleneck of MPS. We show our porting efforts and three optimizations of search of neighbor-particle by using OpenACC. We evaluate our implementations on Tesla K20c, GeForce GTX 1080, and Tesla P100 GPUs. It achieved 45.7x, 96.8x, and 126.1x times speedup compared with single-thread Ivy-bridge CPU.

  Back
 
Keywords:
Computational Fluid Dynamics, Computer Aided Engineering, GTC Silicon Valley 2017 - ID S7558
Download:
Share:
 
OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver
We'll demonstrate the maturity and capabilities of OpenACC and the PGI compiler suite in a professional C++ programming environment. We'll explore in detail the adaptation of the general purpose NUMECA FINE/Open CFD solver for heterogeneous CPU+GPU ...Read More
We'll demonstrate the maturity and capabilities of OpenACC and the PGI compiler suite in a professional C++ programming environment. We'll explore in detail the adaptation of the general purpose NUMECA FINE/Open CFD solver for heterogeneous CPU+GPU execution. We'll give extra attention to OpenACC tips and tricks used to efficiently port the existing C++ programming model with minimal code modifications. Sample code blocks will be used to clearly demonstrate the implementation principles in a clear and concise manner. Finally, we'll present simulations completed in partnership with Dresser-Rand on the OLCF Titan supercomputer, showcasing the scientific capabilities of FINE/Open and the improvements in simulation turnaround time made possible through the use of OpenACC.  Back
 
Keywords:
Computational Fluid Dynamics, Tools and Libraries, Computer Aided Engineering, GTC Silicon Valley 2017 - ID S7672
Download:
Share:
Computational Physics
Presentation
Media
Porting C++ Applications to GPUs with OpenACC for Lattice Quantum Chromodynamics
We'll describe our experience with using OpenACC to port a C++ library to run GPUs, focusing in particular on the issue of deep copy. The C++ library, Grid, is developed for numerical lattice quantum chromodynamics (LQCD) simulations, and is highly ...Read More
We'll describe our experience with using OpenACC to port a C++ library to run GPUs, focusing in particular on the issue of deep copy. The C++ library, Grid, is developed for numerical lattice quantum chromodynamics (LQCD) simulations, and is highly optimized for Intel x86 and many-core architectures. Our goal is to port it to run on NVIDIA GPUs using OpenACC so that its main code structure can be preserved and minimal code changes are required. We'll describe the challenges encountered and share the lessons learned during the porting process. In particular, due to the heavy use of templated abstractions, it is challenging to use OpenACC to deal with the data movement between the CPU and the GPU due to the deep-copy issue. We'll demonstrate that NVIDIA's virtual unified memory provides essential support for our porting effort. We'll also present initial performance results on Kepler and Pascal GPUs.  Back
 
Keywords:
Computational Physics, Programming Languages, GTC Silicon Valley 2017 - ID S7640
Download:
Share:
HPC and Supercomputing
Presentation
Media
Multi-GPU Programming with MPI
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover a ...Read More

Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with Unified Memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.

  Back
 
Keywords:
HPC and Supercomputing, Programming Languages, GTC Silicon Valley 2017 - ID S7133
Download:
Share:
 
Achieving Portable Performance for GTC-P with OpenACC on GPU, Multi-Core CPU, and Sunway Many-Core Processor
Gyrokinetic Toroidal Code developed in Princeton (GTC-P) delivers highly-scalable plasma turbulence simulations at extreme scales on world-leading supercomputers such as Tianhe-2 and Titan. The aim of this work to achieve portable performance in a si ...Read More
Gyrokinetic Toroidal Code developed in Princeton (GTC-P) delivers highly-scalable plasma turbulence simulations at extreme scales on world-leading supercomputers such as Tianhe-2 and Titan. The aim of this work to achieve portable performance in a single source code for GTC-P. We developed the first OpenACC implementation for GPU, CPU, and Sunway processor. The results showed the OpenACC version achieved nearly 90% performance of NVIDIA?CUDA?version on GPU and OpenMP version on CPU; the Sunway OpenACC version achieved 2.5X speedup in the entire code. Our work demonstrates OpenACC can deliver portable performance to complex real-science codes like GTC-P. In additional, we request adding thread-id support in OpenACC standard to avoid expensive atomic operations for reductions.  Back
 
Keywords:
HPC and Supercomputing, GTC Silicon Valley 2017 - ID S7193
Download:
Share:
 
Unstructured Low-Order Finite-Element Earthquake Simulation Using OpenACC on Pascal GPUs
We'll show a method that decreases random memory accesses for GPUs by splitting up calculations properly. The target application is unstructured low-order finite element analysis, the core application for manufacturing analyses. To reduce the memory ...Read More
We'll show a method that decreases random memory accesses for GPUs by splitting up calculations properly. The target application is unstructured low-order finite element analysis, the core application for manufacturing analyses. To reduce the memory access cost, we apply the element-by-element method for matrix-vector multiplication in the analysis. This method conducts local matrix-vector computation for each element in parallel. Atomic and cache hardware in GPUs has improved and we can utilize the data locality in the element node connectivity by using atomic functions for addition of local results. We port codes to GPUs using OpenACC directives and attain high performance with low development costs. We'll also describe the performance on NVIDIA DGX-1, which contains eight Pascal GPUs.  Back
 
Keywords:
HPC and Supercomputing, Computational Fluid Dynamics, Computational Physics, Computer Aided Engineering, Manufacturing Industries, GTC Silicon Valley 2017 - ID S7527
Download:
Share:
 
The Future of GPU Data Management
Optimizing data movement between host and device memories is an important step when porting applications to GPUs. This is true for any programming model (CUDA, OpenACC, OpenMP 4+, etc.), and becomes even more challenging with complex aggregate data s ...Read More
Optimizing data movement between host and device memories is an important step when porting applications to GPUs. This is true for any programming model (CUDA, OpenACC, OpenMP 4+, etc.), and becomes even more challenging with complex aggregate data structures (arrays of structs with dynamically allocated array members). The CUDA and OpenACC APIs expose the separate host and device memories, requiring the programmer or compiler to explicitly manage the data allocation and coherence. The OpenACC committee is designing directives to extend this explicit data management for aggregate data structures. CUDA C++ has managed memory allocation routines and CUDA Fortran has the managed attribute for allocatable arrays, allowing the CUDA driver to manage data movement and coherence. Future NVIDIA GPUs will support true unified memory, with operating system and driver support for sharing the entire address space between the host and the GPU. We'll compare and contrast the current and future explicit memory movement with driver- and system-managed memory, and discuss how future developments will affect application development and performance.  Back
 
Keywords:
HPC and Supercomputing, Programming Languages, GTC Silicon Valley 2017 - ID S7628
Download:
Share:
 
GPU Acceleration of the HiGrad Computational Fluid Dynamics Code with Mixed OpenACC and CUDA Fortran
We'll present the strategy and results for porting an atmospheric fluids code, HiGrad, to the GPU. Higrad is a cross-compiled, mixed-language code that includes C, C++, and Fortran, and is used for atmospheric modeling. Deep subroutine calls ...Read More

We'll present the strategy and results for porting an atmospheric fluids code, HiGrad, to the GPU. Higrad is a cross-compiled, mixed-language code that includes C, C++, and Fortran, and is used for atmospheric modeling. Deep subroutine calls necessitate detailed control of the GPU data layout with CUDA-Fortran. We'll present initial kernel accelerations with OpenACC, then discuss tuning with OpenACC and a comparison with specially curated CUDA kernels. We'll demonstrate the performance improvement and different techniques used for porting this code to GPUs, using a mixed CUDA-Fortran and OpenACC implementation for single-node performance, and scaling studies conducted with MPI on local supercomputers and Oak Ridge National Laboratory's Titan supercomputer, on different architectures including the Tesla K40 and Tesla P100.

  Back
 
Keywords:
HPC and Supercomputing, Computational Fluid Dynamics, GTC Silicon Valley 2017 - ID S7735
Download:
Share:
 
GPUs Unleashed: Analysis of Petascale Molecular Simulations with VMD
We'll showcase recent successes in the use of GPUs to accelerate challenging molecular simulation analysis tasks on the latest NVIDIA?Tesla?P100 GPUs on both Intel and IBM/OpenPOWER hardware platforms, and large-scale runs on petascale compu ...Read More

We'll showcase recent successes in the use of GPUs to accelerate challenging molecular simulation analysis tasks on the latest NVIDIA?Tesla?P100 GPUs on both Intel and IBM/OpenPOWER hardware platforms, and large-scale runs on petascale computers such as Titan and Blue Waters. We'll highlight the performance benefits obtained from die-stacked memory on the Tesla P100, the NVIDIA NVLink# interconnect on the IBM "Minsky" platform, and the use of NVIDIA CUDA?just-in-time compilation to increase the performance of data-driven algorithms. We will present results obtained with OpenACC parallel programming directives, current challenges, and future opportunities. Finally, we'll describe GPU-accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations.

  Back
 
Keywords:
HPC and Supercomputing, Accelerated Analytics, Computational Biology and Chemistry, GTC Silicon Valley 2017 - ID S7382
Download:
Share:
 
Accelerator Programming Ecosystems
Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU progra ...Read More

Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers.

  Back
 
Keywords:
HPC and Supercomputing, Programming Languages, GTC Silicon Valley 2017 - ID S7564
Download:
Share:
Performance Optimization
Presentation
Media
A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA
Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequ ...Read More
Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequential code, we guide the audience through the different steps that allowed us to increase performances on a GPU around 149 times the original runtime of the code (performances evaluated on a K20Xm). The same optimization strategy is applied to the CPU code and increases performances around 35 times the original run time (performances evaluated on a E5-1650v3 processor). Finally, different hardware architectures (Xeon CPUs, GPUs, KNL) are benchmarked with the native CUDA implementation and one based on OpenACC.  Back
 
Keywords:
Performance Optimization, Algorithms and Numerical Techniques, Computational Fluid Dynamics, GTC Silicon Valley 2017 - ID S7626
Download:
Share:
 
Cache Directive Optimization in OpenACC Programming Model
OpenACC is a directive-based programming model that provides a simple interface to exploit GPU computing. As the GPU employs deep memory hierarchy, appropriate management of memory resources becomes crucial to ensure performance. The OpenACC pro ...Read More

OpenACC is a directive-based programming model that provides a simple interface to exploit GPU computing. As the GPU employs deep memory hierarchy, appropriate management of memory resources becomes crucial to ensure performance. The OpenACC programming model offers the cache directive to use on-chip hardware (read-only data cache) or software-managed (shared memory) caches to improve memory access efficiency. We have implemented several strategies to promote the shared memory utilization in our PGI compiler suite. We'll briefly discuss our investigation of cases that can be potentially optimized by the cache directive and then dive into the underlying implementation. Our compiler is evaluated with self-written micro-benchmarks as well as some real-world applications. 

  Back
 
Keywords:
Performance Optimization, Programming Languages, HPC and Supercomputing, GTC Silicon Valley 2017 - ID S7636
Download:
Share:
Programming Languages
Presentation
Media
OmpSs+OpenACC: Multi-Target Task-Based Programming Model Exploiting OpenACC GPU Kernels
Discover how the OmpSs programming model enables you to develop different programming models such as OpenACC, multi-thread programming, CUDA, and OpenCL together while providing a single address space and directionality compiler directives. OmpSs is ...Read More
Discover how the OmpSs programming model enables you to develop different programming models such as OpenACC, multi-thread programming, CUDA, and OpenCL together while providing a single address space and directionality compiler directives. OmpSs is a flagship project in the Barcelona Supercomputing Center, as well as a forerunner of the OpenMP. We'll present the advantages in terms of coding productivity and performance brought by our recent work integrating OpenACC kernels within the OmpSs programming model, as a step forward to our previous OmpSs + CUDA support. We'll also present how to use hybrid GPU and CPU together without any code modification by our runtime system.  Back
 
Keywords:
Programming Languages, HPC and Supercomputing, GTC Silicon Valley 2017 - ID S7192
Download:
Share:
 
Using OpenACC to Parallelize Irregular Algorithms on GPUs
We'll dive deeper into using OpenACC and explore potential solutions that can overcome challenges faced while parallelizing an irregular algorithm, sparse Fast Fourier Transform (sFFT). We'll analyze code characteristics using profilers, discuss op ...Read More
We'll dive deeper into using OpenACC and explore potential solutions that can overcome challenges faced while parallelizing an irregular algorithm, sparse Fast Fourier Transform (sFFT). We'll analyze code characteristics using profilers, discuss optimizations applied, things we did right, things we did wrong, along with roadblocks that we faced and steps taken to overcome them. We'll highlight how to compare data reproducibility between accelerators in heterogeneous platforms, and report on the algorithmic changes from sequential to parallel especially for an irregular code, while using OpenACC. The results will demonstrate how to create a portable, productive, and maintainable codebase without compromising on performance using OpenACC.  Back
 
Keywords:
Programming Languages, Algorithms and Numerical Techniques, GTC Silicon Valley 2017 - ID S7478
Download:
Share:
 
Multi-GPU Programming with OpenACC
We'll discuss techniques for using more than one GPU in an OpenACC program. We'll demonstrate how to address multiple devices, mixing OpenACC and OpenMP to manage multiple devices, and utilizing multiple devices with OpenACC and MPI. ...Read More

We'll discuss techniques for using more than one GPU in an OpenACC program. We'll demonstrate how to address multiple devices, mixing OpenACC and OpenMP to manage multiple devices, and utilizing multiple devices with OpenACC and MPI.

  Back
 
Keywords:
Programming Languages, HPC and Supercomputing, GTC Silicon Valley 2017 - ID S7546
Download:
Share: