GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
The chemical shift of a protein structure offers a lot of information about the physical properties of the protein. Being able to accurately predict this shift is essential in drug discovery and in some other areas of molecular dynamics research. But because chemical shift prediction algorithms are so computationally intensive, no application can predict chemical shift of large protein structures in a realistic amount of time. We explored this problem by taking an algorithm called PPM_One and ported it to NVIDIA V100 GPUs using the directive-based programming model, OpenACC. When testing several different protein structures of datasets ranging from 1M to 11M atoms we observed ~45X average speedup between the datasets and a maximum of a 61X speedup. We'll discuss techniques to overcome programmatic challenges and highlight the scientific advances enabled by the model OpenACC.
The chemical shift of a protein structure offers a lot of information about the physical properties of the protein. Being able to accurately predict this shift is essential in drug discovery and in some other areas of molecular dynamics research. But because chemical shift prediction algorithms are so computationally intensive, no application can predict chemical shift of large protein structures in a realistic amount of time. We explored this problem by taking an algorithm called PPM_One and ported it to NVIDIA V100 GPUs using the directive-based programming model, OpenACC. When testing several different protein structures of datasets ranging from 1M to 11M atoms we observed ~45X average speedup between the datasets and a maximum of a 61X speedup. We'll discuss techniques to overcome programmatic challenges and highlight the scientific advances enabled by the model OpenACC.  Back
 
Topics:
Computational Biology & Chemistry
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9277
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the Max Planck/University of Chicago Radiative MHD code (MURaM), the primary model for simulating the sun's upper convection zone, its surface, and the corona. Accelerating MURaM allows physicists to interpret high-resolution solar observations. We'll describe the programmatic challenges and optimization techniques we employed while using the OpenACC programming model to accelerate MURaM on GPUs and multicore architectures. We will also examine what we learned and how it could be broadly applied on atmospheric applications that demonstrate radiation-transport methods.
We'll discuss the Max Planck/University of Chicago Radiative MHD code (MURaM), the primary model for simulating the sun's upper convection zone, its surface, and the corona. Accelerating MURaM allows physicists to interpret high-resolution solar observations. We'll describe the programmatic challenges and optimization techniques we employed while using the OpenACC programming model to accelerate MURaM on GPUs and multicore architectures. We will also examine what we learned and how it could be broadly applied on atmospheric applications that demonstrate radiation-transport methods.  Back
 
Topics:
Climate, Weather & Ocean Modeling, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9288
Streaming:
Download:
Share:
 
Abstract:
Learn about how the high-level directive-based, widely popular, programming model, OpenACC can help port radiation transport scientific codes to large scale heterogeneous systems consisting of state-of-the-art accelerators such as GPUs. Architectures are rapidly evolving and the exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages, programming models among other components in order to increase parallelism from a programming standpoint to be able to migrate large scale applications to these massively powerful platforms. This talk will discuss programming challenges and its corresponding solutions for porting a wavefront based miniapplication for Denovo, which is a production code for nuclear reactor modeling, using OpenACC. Our OpenACC implementation running on NVIDIA''s next-generation Volta GPU boasts a 85.06x speedup over serial code, which is larger than CUDA''s 83.72x speedup over the same serial implementation.
Learn about how the high-level directive-based, widely popular, programming model, OpenACC can help port radiation transport scientific codes to large scale heterogeneous systems consisting of state-of-the-art accelerators such as GPUs. Architectures are rapidly evolving and the exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages, programming models among other components in order to increase parallelism from a programming standpoint to be able to migrate large scale applications to these massively powerful platforms. This talk will discuss programming challenges and its corresponding solutions for porting a wavefront based miniapplication for Denovo, which is a production code for nuclear reactor modeling, using OpenACC. Our OpenACC implementation running on NVIDIA''s next-generation Volta GPU boasts a 85.06x speedup over serial code, which is larger than CUDA''s 83.72x speedup over the same serial implementation.  Back
 
Topics:
Performance Optimization, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8848
Streaming:
Download:
Share:
 
Abstract:
Happy with your code but re-writing every time a hardware platform changes? Know NVIDIA CUDA but want to use a higher-level programming model? OpenACC is a directive-based technique that enables more science and less programming. The model facilitates reusing code base on more than one platform. This session will help you: (1) Learn how to incrementally improve a bioinformatics code base using OpenACC without losing performance (2) Explore how to apply optimization techniques and the challenges encountered in the process. We'll share our experience using OpenACC for DNA Next Generation Sequencing techniques.
Happy with your code but re-writing every time a hardware platform changes? Know NVIDIA CUDA but want to use a higher-level programming model? OpenACC is a directive-based technique that enables more science and less programming. The model facilitates reusing code base on more than one platform. This session will help you: (1) Learn how to incrementally improve a bioinformatics code base using OpenACC without losing performance (2) Explore how to apply optimization techniques and the challenges encountered in the process. We'll share our experience using OpenACC for DNA Next Generation Sequencing techniques.  Back
 
Topics:
Computational Biology & Chemistry, Programming Languages, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7341
Download:
Share:
 
Abstract:
We'll dive deeper into using OpenACC and explore potential solutions that can overcome challenges faced while parallelizing an irregular algorithm, sparse Fast Fourier Transform (sFFT). We'll analyze code characteristics using profilers, discuss optimizations applied, things we did right, things we did wrong, along with roadblocks that we faced and steps taken to overcome them. We'll highlight how to compare data reproducibility between accelerators in heterogeneous platforms, and report on the algorithmic changes from sequential to parallel especially for an irregular code, while using OpenACC. The results will demonstrate how to create a portable, productive, and maintainable codebase without compromising on performance using OpenACC.
We'll dive deeper into using OpenACC and explore potential solutions that can overcome challenges faced while parallelizing an irregular algorithm, sparse Fast Fourier Transform (sFFT). We'll analyze code characteristics using profilers, discuss optimizations applied, things we did right, things we did wrong, along with roadblocks that we faced and steps taken to overcome them. We'll highlight how to compare data reproducibility between accelerators in heterogeneous platforms, and report on the algorithmic changes from sequential to parallel especially for an irregular code, while using OpenACC. The results will demonstrate how to create a portable, productive, and maintainable codebase without compromising on performance using OpenACC.  Back
 
Topics:
Programming Languages, Algorithms & Numerical Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7478
Download:
Share:
 
Abstract:
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.  Back
 
Topics:
OpenACC, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6747
Streaming:
Share:
 
Abstract:

The Multicore Association (MCA) is an industry association that defines and promotes open specification to enable multicore product development. The main goal of MCA is to abstract hardware details and offer a portable software solution stack for embedded systems. One of the MCA APIs is Multicore Task Management API (MTAPI), which leverages task parallelism on embedded multicore systems that are comprised of symmetric and asymmetric processors. We have developed a runtime library (RTL) based on MTAPI that allows scheduling and mapping of tasks to the heterogeneous cores of the given platform. Our RTL utilizes Multicore Communication API (MCAPI) to communicate between cores. Our RTL is evaluated on the NVIDIA Jetson TK1 embedded processor comprising ARM and GPU cores.

The Multicore Association (MCA) is an industry association that defines and promotes open specification to enable multicore product development. The main goal of MCA is to abstract hardware details and offer a portable software solution stack for embedded systems. One of the MCA APIs is Multicore Task Management API (MTAPI), which leverages task parallelism on embedded multicore systems that are comprised of symmetric and asymmetric processors. We have developed a runtime library (RTL) based on MTAPI that allows scheduling and mapping of tasks to the heterogeneous cores of the given platform. Our RTL utilizes Multicore Communication API (MCAPI) to communicate between cores. Our RTL is evaluated on the NVIDIA Jetson TK1 embedded processor comprising ARM and GPU cores.

  Back
 
Topics:
Intelligent Machines, IoT & Robotics, Programming Languages
Type:
Poster
Event:
GTC Silicon Valley
Year:
2016
Session ID:
P6287
Download:
Share:
 
Abstract:
OpenACC API provides a portable programming model to take advantage of systems equipped with heterogeneous CPU/Accelerators. These systems offer high computational performance and are energy-efficient at the same time. OpenACC provides a directive-based approach for programmers to identify compute-intensive areas of the code and instruct compilers to offload such computations to accelerators. This model provides different levels of abstraction and also requires different levels of programming effort in order to port and optimize applications. In this poster we will discuss about evaluating OpenACC API using several applications from different scientific domains.
OpenACC API provides a portable programming model to take advantage of systems equipped with heterogeneous CPU/Accelerators. These systems offer high computational performance and are energy-efficient at the same time. OpenACC provides a directive-based approach for programmers to identify compute-intensive areas of the code and instruct compilers to offload such computations to accelerators. This model provides different levels of abstraction and also requires different levels of programming effort in order to port and optimize applications. In this poster we will discuss about evaluating OpenACC API using several applications from different scientific domains.   Back
 
Topics:
Programming Languages, Graphics Performance Optimization
Type:
Poster
Event:
GTC Silicon Valley
Year:
2013
Session ID:
P3263
Download:
Share:
 
Abstract:
GPUs are currently receiving great attention in the HPC community as are known to provide better performance to power ratio as compared to CPUs, for certain applications. It is not possible to measure power/energy accurately in all the cases; an alternative way would be to estimate power/energy using statistics. In this study we employ non-linear regression to estimate power and energy consumption of some common optimized high performance kernels (DGEMM, FFT, PRNG and FD stencils) on a multi-GPU platform. Using only 3 variables, we found the average error between measured and predicted values of power and energy to be ~ 5%.
GPUs are currently receiving great attention in the HPC community as are known to provide better performance to power ratio as compared to CPUs, for certain applications. It is not possible to measure power/energy accurately in all the cases; an alternative way would be to estimate power/energy using statistics. In this study we employ non-linear regression to estimate power and energy consumption of some common optimized high performance kernels (DGEMM, FFT, PRNG and FD stencils) on a multi-GPU platform. Using only 3 variables, we found the average error between measured and predicted values of power and energy to be ~ 5%.   Back
 
Topics:
Seismic & Geosciences
Type:
Poster
Event:
GTC Silicon Valley
Year:
2013
Session ID:
P3264
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next