GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll discuss new CUDA Fortran features, including Tensor Core support and cooperative groups, and we'll cover our current work on half-precision. We'll explain new OpenACC 2.7 features, along with Beta true deep-copy directives and support for OpenACC programs on unified memory systems. The PGI compiler-assisted software testing feature helps determine where differences arise between CPU and GPU versions of a program or when porting to a new system. Learn upcoming projects, which include a high-performance PGI Subset of OpenMP for NVIDIA GPUs, support for GPU programming with standard C++17 parallel STL and Fortran, and incorporating GPU-Accelerated math libraries to support porting and optimization of HPC applications on NVIDIA GPUs.
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll discuss new CUDA Fortran features, including Tensor Core support and cooperative groups, and we'll cover our current work on half-precision. We'll explain new OpenACC 2.7 features, along with Beta true deep-copy directives and support for OpenACC programs on unified memory systems. The PGI compiler-assisted software testing feature helps determine where differences arise between CPU and GPU versions of a program or when porting to a new system. Learn upcoming projects, which include a high-performance PGI Subset of OpenMP for NVIDIA GPUs, support for GPU programming with standard C++17 parallel STL and Fortran, and incorporating GPU-Accelerated math libraries to support porting and optimization of HPC applications on NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9289
Streaming:
Download:
Share:
 
Abstract:

Come learn why the authors of VASP, Fluent, Gaussian, Synopsys and numerous other science and engineering applications are using OpenACC. OpenACC supports and promotes scalable parallel programming on both multicore CPUs and GPU-accelerated systems, enabling large production applications to port effectively to the newest generation of supercomputers. It has very well-supported interoperability with CUDA C++, CUDA Fortran, MPI and OpenMP, allowing you to optimize each aspect of your application with the appropriate tools. OpenACC has proven to be the ideal on-ramp to parallel and GPU computing, even for those who need to tune their most important kernels using libraries or CUDA. Come see how you can try OpenACC with the free PGI Community Edition compiler suite. 

Come learn why the authors of VASP, Fluent, Gaussian, Synopsys and numerous other science and engineering applications are using OpenACC. OpenACC supports and promotes scalable parallel programming on both multicore CPUs and GPU-accelerated systems, enabling large production applications to port effectively to the newest generation of supercomputers. It has very well-supported interoperability with CUDA C++, CUDA Fortran, MPI and OpenMP, allowing you to optimize each aspect of your application with the appropriate tools. OpenACC has proven to be the ideal on-ramp to parallel and GPU computing, even for those who need to tune their most important kernels using libraries or CUDA. Come see how you can try OpenACC with the free PGI Community Edition compiler suite. 

  Back
 
Topics:
Programming Languages
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1804
Download:
Share:
 
Abstract:
We'll briefly review how programming for GPU computing has progressed over the past ten years, and where it is going over the next ten years, specifically for data management and parallel compute management. CUDA languages expose all aspects of data and compute management, allowing and sometimes requiring programmers to take control of both. Libraries typically internalize all compute management, and some internalize all data management as well. Directives virtualize both data and compute management, but don't completely hide either. Future hardware and software capabilities will allow programs to enjoy automatic data movement between DDR memory and GPU device memory, and enhanced caching hardware reduces the need for explicit scratchpad memory programming. As parallel constructs are added to standard programming languages, writing parallel programs for GPU computing will become no more or less difficult than multicore programming.
We'll briefly review how programming for GPU computing has progressed over the past ten years, and where it is going over the next ten years, specifically for data management and parallel compute management. CUDA languages expose all aspects of data and compute management, allowing and sometimes requiring programmers to take control of both. Libraries typically internalize all compute management, and some internalize all data management as well. Directives virtualize both data and compute management, but don't completely hide either. Future hardware and software capabilities will allow programs to enjoy automatic data movement between DDR memory and GPU device memory, and enhanced caching hardware reduces the need for explicit scratchpad memory programming. As parallel constructs are added to standard programming languages, writing parallel programs for GPU computing will become no more or less difficult than multicore programming.  Back
 
Topics:
Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8273
Streaming:
Download:
Share:
 
Abstract:
Optimizing data movement between host and device memories is an important step when porting applications to GPUs. This is true for any programming model (CUDA, OpenACC, OpenMP 4+, etc.), and becomes even more challenging with complex aggregate data structures (arrays of structs with dynamically allocated array members). The CUDA and OpenACC APIs expose the separate host and device memories, requiring the programmer or compiler to explicitly manage the data allocation and coherence. The OpenACC committee is designing directives to extend this explicit data management for aggregate data structures. CUDA C++ has managed memory allocation routines and CUDA Fortran has the managed attribute for allocatable arrays, allowing the CUDA driver to manage data movement and coherence. Future NVIDIA GPUs will support true unified memory, with operating system and driver support for sharing the entire address space between the host and the GPU. We'll compare and contrast the current and future explicit memory movement with driver- and system-managed memory, and discuss how future developments will affect application development and performance.
Optimizing data movement between host and device memories is an important step when porting applications to GPUs. This is true for any programming model (CUDA, OpenACC, OpenMP 4+, etc.), and becomes even more challenging with complex aggregate data structures (arrays of structs with dynamically allocated array members). The CUDA and OpenACC APIs expose the separate host and device memories, requiring the programmer or compiler to explicitly manage the data allocation and coherence. The OpenACC committee is designing directives to extend this explicit data management for aggregate data structures. CUDA C++ has managed memory allocation routines and CUDA Fortran has the managed attribute for allocatable arrays, allowing the CUDA driver to manage data movement and coherence. Future NVIDIA GPUs will support true unified memory, with operating system and driver support for sharing the entire address space between the host and the GPU. We'll compare and contrast the current and future explicit memory movement with driver- and system-managed memory, and discuss how future developments will affect application development and performance.  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7628
Download:
Share:
 
Abstract:

Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers.

Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers.

  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Panel
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7564
Download:
Share:
 
Abstract:
Performance portability means the ability to write a single program that runs with high performance across a wide range of target systems, including multicore systems, GPU-accelerated systems, and manycore systems, independent of the instruction set. It's not a "myth" or a "dream," as has been claimed recently. It should be demanded by developers and expected from any modern high level parallel programming language. OpenACC was designed five years ago with broad cross-platform performance portability in mind. The current PGI compiler suite delivers on this promise. Come hear about the current capabilities and performance of PGI OpenACC on GPUs, x86 and OpenPOWER, and learn about our plans for new features and even wider platform support.
Performance portability means the ability to write a single program that runs with high performance across a wide range of target systems, including multicore systems, GPU-accelerated systems, and manycore systems, independent of the instruction set. It's not a "myth" or a "dream," as has been claimed recently. It should be demanded by developers and expected from any modern high level parallel programming language. OpenACC was designed five years ago with broad cross-platform performance portability in mind. The current PGI compiler suite delivers on this promise. Come hear about the current capabilities and performance of PGI OpenACC on GPUs, x86 and OpenPOWER, and learn about our plans for new features and even wider platform support.  Back
 
Topics:
Programming Languages, OpenACC, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6709
Streaming:
Download:
Share:
 
Abstract:
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.
This panel will discuss OpenACC as a directives-based programming model and the successes and challenges developers are experiencing. There will be discussion of how the developer communities are organizing to run GPU Hackathons and what it takes to be successful. We will also cover OpenACC 2.5 and the roadmap for the specification as well as for software tools that support this standard. This will be an interactive Q/A session where participants can discuss their experiences with OpenACC experts and developers. Special attention will be paid to parallel programming challenges educators and researchers face.  Back
 
Topics:
OpenACC, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6747
Streaming:
Share:
 
Abstract:

This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The panel will include users, speakers from compiler and tools vendors, and representatives of open source efforts to support directives. Session participants are encouraged to participate in the discussions of this panel.

This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The panel will include users, speakers from compiler and tools vendors, and representatives of open source efforts to support directives. Session participants are encouraged to participate in the discussions of this panel.

  Back
 
Topics:
OpenACC, Tools & Libraries, Programming Languages, HPC and Supercomputing
Type:
Panel
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5198
Streaming:
Download:
Share:
 
Abstract:
Learn about the new features being added to OpenACC in the upcoming 2.5 version, and the new data management features being designed for the subsequent version. OpenACC is the popular directive-based API for GPU and accelerator programming, first released in 2011, supported by the Cray and PGI commercial products, and being implemented by numerous open-source compilers. The latest OpenACC release includes several simplifications and exposes some new behavior that programmers should be aware of. This presentation will also discuss the continuing work on deep data structure management features being designed for the subsequent release.
Learn about the new features being added to OpenACC in the upcoming 2.5 version, and the new data management features being designed for the subsequent version. OpenACC is the popular directive-based API for GPU and accelerator programming, first released in 2011, supported by the Cray and PGI commercial products, and being implemented by numerous open-source compilers. The latest OpenACC release includes several simplifications and exposes some new behavior that programmers should be aware of. This presentation will also discuss the continuing work on deep data structure management features being designed for the subsequent release.  Back
 
Topics:
OpenACC, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5382
Streaming:
Download:
Share:
 
Abstract:
Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fortran programs. The second half will be intermediate material, with more advanced hints and tips for Fortran programmers with larger applications that they want to accelerate with a GPU. Among the topics to be covered will be dynamic device data lifetimes, global data, procedure calls, derived type support, and much more.
Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fortran programs. The second half will be intermediate material, with more advanced hints and tips for Fortran programmers with larger applications that they want to accelerate with a GPU. Among the topics to be covered will be dynamic device data lifetimes, global data, procedure calls, derived type support, and much more.  Back
 
Topics:
OpenACC, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5388
Streaming:
Download:
Share:
 
Abstract:
OpenACC is designed to support performance portable parallel programming across a wide variety of heterogeneous and parallel node configurations. Learn what that means and how it affects the programs you write today and in the future. Examples will include NVIDIA Kepler and AMD Radeon targets.
OpenACC is designed to support performance portable parallel programming across a wide variety of heterogeneous and parallel node configurations. Learn what that means and how it affects the programs you write today and in the future. Examples will include NVIDIA Kepler and AMD Radeon targets.  Back
 
Topics:
Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4468
Streaming:
Download:
Share:
 
Abstract:
Learn how to use performance analysis tools to find the bottlenecks in your OpenACC applications. With the proper performance information, and the feedback from the compiler, you can tune your application and improve overall performance. Live demonstrations will use PGI's pgprof, NVIDIA's Visual Profiler and command-line nvprof, and additional tools available to the parallel computing community.
Learn how to use performance analysis tools to find the bottlenecks in your OpenACC applications. With the proper performance information, and the feedback from the compiler, you can tune your application and improve overall performance. Live demonstrations will use PGI's pgprof, NVIDIA's Visual Profiler and command-line nvprof, and additional tools available to the parallel computing community.   Back
 
Topics:
Performance Optimization1
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4472
Streaming:
Download:
Share:
 
Abstract:
Learn how to scale your OpenACC application across multiple GPUs. This Example-based presentation will cover three methods of using multiple GPUs. First, you can use MPI with OpenACC to program a different GPU from each MPI process. You can even share data on the GPU across the MPI processes when you have multiple MPI processes on a single node. Second, you can use OpenMP with OpenACC, assigning a different GPU to each OpenMP thread. If you have more CPU threads than GPUs, you can share some GPUs across multiple threads. Third, even a single thread or process can distribute data and computation across multiple GPUs. By dynamically selecting the device, you can easily split or replicate data across multiple devices.
Learn how to scale your OpenACC application across multiple GPUs. This Example-based presentation will cover three methods of using multiple GPUs. First, you can use MPI with OpenACC to program a different GPU from each MPI process. You can even share data on the GPU across the MPI processes when you have multiple MPI processes on a single node. Second, you can use OpenMP with OpenACC, assigning a different GPU to each OpenMP thread. If you have more CPU threads than GPUs, you can share some GPUs across multiple threads. Third, even a single thread or process can distribute data and computation across multiple GPUs. By dynamically selecting the device, you can easily split or replicate data across multiple devices.   Back
 
Topics:
Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4474
Streaming:
Download:
Share:
 
Abstract:

The OpenACC API provides a high-level, performance portable programming mechanism for parallel programming accelerated nodes. Learn about the latest additions to the OpenACC specification, and see the PGI Accelerator compilers in action targeting the fastest NVIDIA GPUs.

The OpenACC API provides a high-level, performance portable programming mechanism for parallel programming accelerated nodes. Learn about the latest additions to the OpenACC specification, and see the PGI Accelerator compilers in action targeting the fastest NVIDIA GPUs.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2013
Session ID:
SC3106
Streaming:
Download:
Share:
 
Abstract:

In three distinct parts, this talk presents the new features that have been accepted into OpenACC version 2.0 by the time of the conference. Among the important proposed features are support for separate compilation and procedure calls and nested parallelism. For each feature, there is a motivating example and a discussion of usage guidelines. Second, the implementation of these features in the PGI Accelerator compilers will be presented. Finally, new features in the PGI compilers that are not yet part of the OpenACC specification, including support for multiple devices and multiple device types will be discussed.

In three distinct parts, this talk presents the new features that have been accepted into OpenACC version 2.0 by the time of the conference. Among the important proposed features are support for separate compilation and procedure calls and nested parallelism. For each feature, there is a motivating example and a discussion of usage guidelines. Second, the implementation of these features in the PGI Accelerator compilers will be presented. Finally, new features in the PGI compilers that are not yet part of the OpenACC specification, including support for multiple devices and multiple device types will be discussed.

  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3447
Streaming:
Download:
Share:
 
Speakers:
Michael Wolfe
- The Portland Group
 
Topics:
Tools & Libraries
Type:
Talk
Event:
Supercomputing
Year:
2010
Session ID:
SC1021
Download:
Share:
 
Speakers:
Michael Wolfe
Abstract:

This talk provides an introduction to programming NVIDIA GPUs using CUDA Fortran. It is suitable for expert Fortran or CUDA C programmers who need to extract maximum performance from GPUs using an explicit GPU Fortran programming model. This talk introduces the CUDA Fortran language, and through examples, illustrates how to explicitly program NVIDIA GPUs in native Fortran 95/03 through creation of GPU kernel subroutines, management of host and GPU device memory, definition of CUDA grids and thread blocks, launching kernels on an NVIDIA GPU device, and use of the CUDA Fortran runtime API. This talk includes a live component with a Linux workstation containing a Tesla card, and the PGI CUDA Fortran compiler.

This talk provides an introduction to programming NVIDIA GPUs using CUDA Fortran. It is suitable for expert Fortran or CUDA C programmers who need to extract maximum performance from GPUs using an explicit GPU Fortran programming model. This talk introduces the CUDA Fortran language, and through examples, illustrates how to explicitly program NVIDIA GPUs in native Fortran 95/03 through creation of GPU kernel subroutines, management of host and GPU device memory, definition of CUDA grids and thread blocks, launching kernels on an NVIDIA GPU device, and use of the CUDA Fortran runtime API. This talk includes a live component with a Linux workstation containing a Tesla card, and the PGI CUDA Fortran compiler.

  Back
 
Topics:
HPC and AI, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2009
Session ID:
S09083
Streaming:
Download:
Share:
 
Speakers:
Michael Wolfe
Abstract:
This talk provides an introduction to programming NVIDIA GPUs using the PGI Accelerator Programming Model in C and Fortran. It is suitable for application programmers, in particular those who are not expert GPU programmers. This talk introduces the compute-specific details of the NVIDIA GPU, and through examples, illustrates how to program common computational algorithms on NVIDIA GPUs using portable directive- based C and Fortran 95/03. The material covers programming language features, interpreting compiler feedback, performance analysis, and performance tuning. This talk includes a live component with a Linux workstation containing a Tesla card, and the latest PGI Accelerator compilers and tools.
This talk provides an introduction to programming NVIDIA GPUs using the PGI Accelerator Programming Model in C and Fortran. It is suitable for application programmers, in particular those who are not expert GPU programmers. This talk introduces the compute-specific details of the NVIDIA GPU, and through examples, illustrates how to program common computational algorithms on NVIDIA GPUs using portable directive- based C and Fortran 95/03. The material covers programming language features, interpreting compiler feedback, performance analysis, and performance tuning. This talk includes a live component with a Linux workstation containing a Tesla card, and the latest PGI Accelerator compilers and tools.  Back
 
Topics:
Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2009
Session ID:
S09113
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next