SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Algorithms and Numerical Techniques
Presentation
Media
GPU Parallelization of a Distance Field Solver
Anup Shrestha (Boise State University)
Propagating interfaces occur in a wide variety of fields, including fluid mechanics and computer graphics. The distance field from an interface can be calculated by solving the Eikonal equation at each node using the Fast Sweeping Method (FSM) [Zhao, ...Read More
Propagating interfaces occur in a wide variety of fields, including fluid mechanics and computer graphics. The distance field from an interface can be calculated by solving the Eikonal equation at each node using the Fast Sweeping Method (FSM) [Zhao, 2004]. However, parallelization of FSM is not straightforward. We proposed a parallel algorithm using Cuthill-McKee ordering that is suitable for massively threaded architecture. Here, we implement and compare different parallel algorithms for FSM using CUDA, OpenACC, and MPI. The maximum performance is achieved using CUDA and the parallel algorithm of Detrixhe et al., whereas a comparable speedup was achieved using OpenACC with a few directives, substantially shortening the development cycle.  Back
 
Keywords:
Algorithms and Numerical Techniques, Other, GTC Silicon Valley 2016 - ID P6257
Download:
Astronomy and Astrophysics
Presentation
Media
Non-Uniform Diffusion of the Solar Surface Magnetic Field: Code Acceleration Using OpenACC for both GPUs and x86
Ronald Caplan (Predictive Science Inc.)
We show the results of implementing OpenACC into a non-uniform diffusion time integration Fortran code. The code's application is to smooth observation-based radial magnetic field maps of the solar surface for use as inner boundary conditions of glo ...Read More
We show the results of implementing OpenACC into a non-uniform diffusion time integration Fortran code. The code's application is to smooth observation-based radial magnetic field maps of the solar surface for use as inner boundary conditions of global magnetohydrodynamic simulations of the corona and heliosphere. The code uses a RKL2 super-time-stepping algorithm to allow time-steps that far exceed the standard explicit stability limit. The algorithm remains explicit, making the code a prime target for OpenACC acceleration. The OpenACC implementation is discussed and speedup results are shown. The newly released OpenACC x86 feature in the PGI compiler is also tested and shown to produce multicore CPU code from the OpenACC directives that can outperform our OpenMP implementation.  Back
 
Keywords:
Astronomy and Astrophysics, Computational Physics, GTC Silicon Valley 2016 - ID P6259
Download:
Computational Biology and Chemistry
Presentation
Media
Enabling the Electronic Structure Program Gaussian on GPGPUs Using OpenACC
Roberto Gomperts (NVIDIA)
In 2011, Gaussian, Inc., PGI, and NVIDIA embarked on a long-term project to enable Gaussian on GPGPUs using a directives-based approach. OpenACC has emerged as the de-facto standard to port complex programs to GPU accelerators. We'll discuss ...Read More

In 2011, Gaussian, Inc., PGI, and NVIDIA embarked on a long-term project to enable Gaussian on GPGPUs using a directives-based approach. OpenACC has emerged as the de-facto standard to port complex programs to GPU accelerators. We'll discuss how we attacked some of the challenges involved in working with a large-scale, feature-rich application like Gaussian. This includes a number of PGI extensions to the OpenACC 2.0 standard that we believe will have a positive impact on other programs. To conclude, we'll present a sample of GPU-based performance improvements on a variety of theories and methods.

  Back
 
Keywords:
Computational Biology and Chemistry, Tools and Libraries, HPC and Supercomputing, GTC Silicon Valley 2016 - ID S6524
Streaming:
Download:
 
Need for Speed: Accelerating High-Accuracy Quantum Chemistry Using OpenACC Directives
Janus Eriksen (Aarhus University)
Quantum chemistry (QC)?that is, the application of quantum mechanics to molecular systems?has become an integral tool to most, if not all of chemical, biological, and general material sciences. In this session, we describe how we have achieved s ...Read More

Quantum chemistry (QC)?that is, the application of quantum mechanics to molecular systems?has become an integral tool to most, if not all of chemical, biological, and general material sciences. In this session, we describe how we have achieved speed-ups of more than 10x by accelerating existing CPU-based implementations of two of the most prominent models of modern wave function-based QC?the RI-MP2 and CCSD(T) models?as well as their local correlation Divide-Expand-Consolidate (DEC) formulations?DEC-RI-MP2 and DEC-CCSD(T). The codes in question have been accelerated in the massively parallel and linear-scaling LSDalton program using the compiler directives of the OpenACC 2.0 standard. Examples illustrating the efficiency of the resulting (portable) OpenACC GPU port will be provided.

  Back
 
Keywords:
Computational Biology and Chemistry, OpenACC, HPC and Supercomputing, GTC Silicon Valley 2016 - ID S6540
Streaming:
Download:
Computational Fluid Dynamics
Presentation
Media
High Performance and Productivity with Unified Memory and OpenACC: A LBM Case Study
Jiri Kraus (NVIDIA)
Learn how to use unified memory to improve your productivity in accelerating applications with OpenACC. Using a Lattice Boltzmann CFD solver as an example, we'll explain how a profile-driven approach allows one to incrementally accelerate an appli ...Read More
Learn how to use unified memory to improve your productivity in accelerating applications with OpenACC. Using a Lattice Boltzmann CFD solver as an example, we'll explain how a profile-driven approach allows one to incrementally accelerate an application with OpenACC and unified memory. Besides the productivity gain, a primary advantages of this approach is that it is very accessible also for developers new to a project and therefore not familiar with the whole code base.  Back
 
Keywords:
Computational Fluid Dynamics, Tools and Libraries, HPC and Supercomputing, GTC Silicon Valley 2016 - ID S6134
Streaming:
Download:
 
An Off-Load Model for Computing on GPU for a Parallel CFD Solver HiFUN
Munikrishna Nagaram (S & I Engineering Solutions Pvt. Ltd.), Balakrishnan Narayanarao (Indian Institute of Science, Bangalore), Thejaswi Rao (NVIDIA), Nikhil Shende (S & I Engineering Solutions Pvt. Ltd.)
The present study deals with porting the computational fluid dynamics flow solver HiFUN, a proprietary software by S & I Engineering Solutions Pvt. Ltd., on a GPU-based accelerator platform using OpenACC directives. HiFUN is already parallelized ...Read More
The present study deals with porting the computational fluid dynamics flow solver HiFUN, a proprietary software by S & I Engineering Solutions Pvt. Ltd., on a GPU-based accelerator platform using OpenACC directives. HiFUN is already parallelized on distributed memory HPC platforms using MPI directives and exhibits excellent scalability. In one of the recent studies, scaling over 15,000 processor cores on Cray XC40 has been demonstrated. The challenge at hand is to port HiFUN solver to accelerator-based HPC clusters without compromising its scalability. The presentation includes details on the use of OpenACC directives, wherein the compute-intensive tasks are transferred to the GPU. The success of this strategy to realize the objectives with minimal code change is also highlighted.  Back
 
Keywords:
Computational Fluid Dynamics, HPC and Supercomputing, GTC Silicon Valley 2016 - ID P6298
Download:
Earth Systems Modeling
Presentation
Media
Parallelization and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures
Mark Govett (NOAA Earth System Research Laboratory)
In an era defined by increasing diversity in computing architectures, performance portability is a key requirement for weather and climate applications that require massive computing resources. In this talk, you will learn about how we developed and ...Read More
In an era defined by increasing diversity in computing architectures, performance portability is a key requirement for weather and climate applications that require massive computing resources. In this talk, you will learn about how we developed and achieve performance on CPU, GPU and MIC architectures using industry-standard OpenACC and OpenMP directives. Performance results from the NIM weather model will be shown for a number of device, node and multi-node and system configurations. Further, communications optimizations will highlight a more than a 40% improvement in runtime with scaling to thousands of GPUs.  Back
 
Keywords:
Earth Systems Modeling, Programming Languages, HPC and Supercomputing, GTC Silicon Valley 2016 - ID S6117
Streaming:
Download:
Energy Exploration
Presentation
Media
Using OpenACC to Parallelize Seismic One-Way-Based Migration
Maxime Hugues (Total E&P Research & Technology USA, LLC)
We'll describe our experience in using OpenACC to parallelize One-Way Based Migration, a seismic application that uses Fourier Finite Differencing. We describe our approach at optimizing application kernels that involve FFT operations and solving ...Read More
We'll describe our experience in using OpenACC to parallelize One-Way Based Migration, a seismic application that uses Fourier Finite Differencing. We describe our approach at optimizing application kernels that involve FFT operations and solving systems of tridiagonal sparse matrices. We talk about expectations and challenges of using OpenACC along with potential pitfalls for application users. We highlight the advantages of using OpenACC for high-performance scientific applications and list shortcomings that affect performance.  Back
 
Keywords:
Energy Exploration, Performance Optimization, HPC and Supercomputing, GTC Silicon Valley 2016 - ID S6421
Streaming:
Download:
HPC and Supercomputing
Presentation
Media
Multi GPU Programming with MPI
Jiri Kraus (NVIDIA)
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced ...Read More
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, the multi process service (MPS aka Hyper-Q for MPI), and MPI support in the NVIDIA performance analysis tools.  Back
 
Keywords:
HPC and Supercomputing, Tools and Libraries, OpenACC, GTC Silicon Valley 2016 - ID S6142
Streaming:
Download:
 
MVAPICH2-GDR: Pushing the Frontier of Designing MPI Libraries Enabling GPUDirect Technologies
Dhabaleswar K.(DK)Panda (The Ohio State University)
Learn how MVAPICH2-GDR library is enabling support for different GPUDirect technologies to simplify the task of porting message passing interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2-GDR supports MPI communication ...Read More
Learn how MVAPICH2-GDR library is enabling support for different GPUDirect technologies to simplify the task of porting message passing interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2-GDR supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API. Recent advances in MVAPICH2 include support of GDR_Async, MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY, Non-Blocking Collectives using GDR and Core-Direct, and much more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Performance impact of application co-design using MVAPICH2-GDR will also be presented.  Back
 
Keywords:
HPC and Supercomputing, Tools and Libraries, Performance Optimization, GTC Silicon Valley 2016 - ID S6411
Streaming:
Download:
 
OpenACC Enabled Benchmark Suite on Intel Ivy Bridge
Joel Bricker (University of Delaware)
We explore the new OpenACC implementation to parallelize code for a multi-core processor. We use an OpenMP implementation on the same code base and compare the performance results obtained when running the code instrumented with both standards. The r ...Read More
We explore the new OpenACC implementation to parallelize code for a multi-core processor. We use an OpenMP implementation on the same code base and compare the performance results obtained when running the code instrumented with both standards. The results are notable because the OpenMP standard has existed for some time for multi-core parallelism, whereas the OpenACC standard just recently starting supporting multi-core. As such, it is important for the OpenACC implementation performance to match, or exceed, the performance of the existing OpenMP standard.  Back
 
Keywords:
HPC and Supercomputing, Tools and Libraries, GTC Silicon Valley 2016 - ID P6307
Download:
OpenACC
Presentation
Media
OpenACC Status and Developer Feedback
Michael Wolfe (NVIDIA), Sunita Chandrasekaran (University of Delaware), Fernanda Foertter (Oak Ridge National Labs), Guido Juckeland (Helmholtz-Zentrum Dresden-Rossendorf(HZDR)), Jeff Larkin (NVIDIA)
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to ...Read More
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.  Back
 
Keywords:
OpenACC, Tools and Libraries, GTC Silicon Valley 2016 - ID S6747
Streaming:
 
Writing Performance Portable Code, and the Challenges for Upcoming Systems
Fernanda Foertter (Oak Ridge National Labs)
This session will be around writing performance portable code. Best practices and recommendations from Oak Ridge Labs DOE staff. It will cover the CAAR program, what the labs are doing to help code migrate to machines like the upcoming Coral systems, ...Read More
This session will be around writing performance portable code. Best practices and recommendations from Oak Ridge Labs DOE staff. It will cover the CAAR program, what the labs are doing to help code migrate to machines like the upcoming Coral systems, and the advantages that the modern GPU architectures bring in terms of code simplification. The resources available to the domain scientist to ensure a smooth transition to this exciting architecture will be summarized, with suggested follow on activities. We are here to help!  Back
 
Keywords:
OpenACC, Programming Languages, Tools and Libraries, GTC Silicon Valley 2016 - ID S6748
Streaming:
 
Maximize OpenACC Performance with the PGPROF Profiler
Scott Biersdorff (NVIDIA)
OpenACC directives are a quick way to find out if your code can benefit from GPU acceleration. The PGI OpenACC compilers provide a wealth of information to help you along the way. we'll discuss using two key technologies which contribute performan ...Read More
OpenACC directives are a quick way to find out if your code can benefit from GPU acceleration. The PGI OpenACC compilers provide a wealth of information to help you along the way. we'll discuss using two key technologies which contribute performance data about OpenACC regions and how they can help you transition to the GPU and improve the performance of OpenACC accelerated code: (1) the OpenACC 2.5 tools interface provides a set of events that can record what operations are performed for each OpenACC region and how long each one takes; and (2) a profiler can correlate the compiler output with the performance data to convey specific information about each OpenACC region well as presenting a timeline that shows when each region executed.  Back
 
Keywords:
OpenACC, Performance Optimization, Tools and Libraries, GTC Silicon Valley 2016 - ID S6784
Streaming:
Download:
Performance Optimization
Presentation
Media
Algorithms for Auto-Tuning OpenACC Accelerated Kernels
Saber Feki (King Abdullah University of Science and Technology)
We'll present optimization techniques using different machine learning and derivative-free search algorithms, individually and in hybrid combinations, for auto-tuning parameters in OpenACC clauses for a stencil evaluation kernel executed on GPUs. ...Read More
We'll present optimization techniques using different machine learning and derivative-free search algorithms, individually and in hybrid combinations, for auto-tuning parameters in OpenACC clauses for a stencil evaluation kernel executed on GPUs. We compare execution time performance of several auto-tuning techniques. These optimization algorithms will be evaluated over a large two-dimensional parameter space not satisfactorily addressed to date by OpenACC compilers, consisting of gang size and vector length. A hybrid of historic learning and Nelder-Mead delivers the best balance of high performance and low tuning effort.  Back
 
Keywords:
Performance Optimization, Programming Languages, Algorithms and Numerical Techniques, GTC Silicon Valley 2016 - ID S6363
Streaming:
Download:
Programming Languages
Presentation
Media
Comparing OpenACC 2.5 and OpenMP 4.5
Jeff Larkin (NVIDIA), James Beyer (NVIDIA)
We'll compare the current state of two competing accelerator directive sets: OpenACC 2.5 and OpenMP 4.5. As members of both the OpenACC technical committee and the OpenMP language committee, we'll provide an inside take on the current state of ...Read More
We'll compare the current state of two competing accelerator directive sets: OpenACC 2.5 and OpenMP 4.5. As members of both the OpenACC technical committee and the OpenMP language committee, we'll provide an inside take on the current state of the directives and insight into how to transition between the directive sets.  Back
 
Keywords:
Programming Languages, Performance Optimization, OpenACC, GTC Silicon Valley 2016 - ID S6410
Streaming:
Download:
 
Write Once, Parallel Everywhere: OpenACC for GPUs, x86, OpenPOWER, and Beyond
Michael Wolfe (NVIDIA)
Performance portability means the ability to write a single program that runs with high performance across a wide range of target systems, including multicore systems, GPU-accelerated systems, and manycore systems, independent of the instruction set. ...Read More
Performance portability means the ability to write a single program that runs with high performance across a wide range of target systems, including multicore systems, GPU-accelerated systems, and manycore systems, independent of the instruction set. It's not a "myth" or a "dream," as has been claimed recently. It should be demanded by developers and expected from any modern high level parallel programming language. OpenACC was designed five years ago with broad cross-platform performance portability in mind. The current PGI compiler suite delivers on this promise. Come hear about the current capabilities and performance of PGI OpenACC on GPUs, x86 and OpenPOWER, and learn about our plans for new features and even wider platform support.  Back
 
Keywords:
Programming Languages, OpenACC, HPC and Supercomputing, GTC Silicon Valley 2016 - ID S6709
Streaming:
Download:
 
Locality-Aware Memory Association for Pipelining and Multi-Device Worksharing
Thomas Scogland (Lawrence Livermore National Laboratory)
Advances in directive-based programming models have made GPU programming more accessible than ever. Even so, models like OpenMP 4.0 and OpenACC lack worksharing and memory management facilities for multi-GPU environments. We present a memory-associat ...Read More
Advances in directive-based programming models have made GPU programming more accessible than ever. Even so, models like OpenMP 4.0 and OpenACC lack worksharing and memory management facilities for multi-GPU environments. We present a memory-association interface for directive-based models that enables multi-device worksharing, automated pipelining for greater support of out-of-core workloads, as well as NUMA management all as a single extension. Our implementation, AffinityTSAR, scales well to multiple GPUs, GPUs and CPUs together, and even shows improvement in CPU-only performance.  Back
 
Keywords:
Programming Languages, GTC Silicon Valley 2016 - ID P6236
Download:
 
Dynamical Analysis of Connected Neuronal Motifs with OpenAcc and OpenMPI
Krishna Pusuluri (Georgia State University)
Large-scale analysis of the dynamical behavior of central pattern generators (CPGs) formed by neuronal networks of even small sizes is computationally intensive and grows exponentially with network size. We have developed a suite of tools to exhausti ...Read More
Large-scale analysis of the dynamical behavior of central pattern generators (CPGs) formed by neuronal networks of even small sizes is computationally intensive and grows exponentially with network size. We have developed a suite of tools to exhaustively study the behavior of such networks on modern GPGPU accelerator clusters using OpenACC and OpenMPI. Directive-based approaches simplify the task of porting serial code onto GPUs without expertise in CUDA or OpenCL. Three-cell neuronal CPGs have been explored previously using various GPGPU tools. As motifs form the building blocks of larger networks, we have employed our framework to study four-cell CPGs and two connected three-cell motifs. We discuss the performance improvements achieved using this framework and present some of our results.  Back
 
Keywords:
Programming Languages, Deep Learning and AI, GTC Silicon Valley 2016 - ID P6269
Download:
Tools and Libraries
Presentation
Media
Utilization and Expansion of ppOpen-AT for OpenACC
Satoshi Ohshima (The University of Tokyo)
OpenACC attracts attention as an easy and useful GPU programming environment. While OpenACC is not difficult to use, users have to spend time and energy to optimize OpenACC programs. To address this, we are developing an auto-tuning (AT) language nam ...Read More
OpenACC attracts attention as an easy and useful GPU programming environment. While OpenACC is not difficult to use, users have to spend time and energy to optimize OpenACC programs. To address this, we are developing an auto-tuning (AT) language named ppOpen-AT. We have shown that this language is useful for multi- and many-core parallel programming. We investigate the usability of ppOpen-AT for OpenACC programs and propose to expand ppOpen-AT for further optimization of OpenACC. While ppOpen-AT for OpenACC is in development, the effectiveness is shown and we believe that our next-gen's ppOpen-AT will help various optimization works of OpenACC programs.  Back
 
Keywords:
Tools and Libraries, Programming Languages, GTC Silicon Valley 2016 - ID P6163
Download: