SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Astronomy and Astrophysics
Presentation
Media
Abstract:
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that ...Read More
Abstract:
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that populate it: stars, galaxies, black holes The most powerful computing systems are required to pursue such goals and GPUs represent an outstanding opportunity. In this talk, we present one of these codes, Ramses, and the ongoing work to enable this code to efficiently exploit GPUs through the adoption of the OpenACC programming model. The most recent achievement will be shown together with some of the scientific challenges GPUs can help addressing.  Back
 
Topics:
Astronomy and Astrophysics, OpenACC, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5531
Streaming:
Download:
Share:
Clusters & GPU Management
Presentation
Media
Abstract:
CUDA-ready clusters enable developers to: Focus on coding, not maintaining infrastructure (drivers, configs) and toolchains (compilers, libraries) Routinely keep pace with innovation - from the latest in GPU hardware to the CUDA toolkit itself C ...Read More
Abstract:

CUDA-ready clusters enable developers to: Focus on coding, not maintaining infrastructure (drivers, configs) and toolchains (compilers, libraries) Routinely keep pace with innovation - from the latest in GPU hardware to the CUDA toolkit itself Cross-develop with confidence and ease - maintain, and shift between, highly customized CUDA development environments Exercise their preference in programming GPUs - choose CUDA or OpenCL or OpenACC and combine appropriately (with, for example, the Message Passing Interface, MPI) Exploit the convergence of HPC and Big Data Analytics - make simultaneous use HPC and Hadoop services in GPU applications   Make use of private and public clouds - create a CUDA-ready cluster in a cloud or extend an on-site CUDA infrastructure into a cloud In this webinar, participants will learn how Bright Cluster Manager provisions, monitors and manages CUDA-ready clusters for developer advantage. Case studies will be used to illustrate all six advantages for Bright developers. Specific attention will be given to: Cross-developing under CUDA 6.0 and CUDA 6.5 with Kepler-architecture GPUs (e.g., the NVIDIA Tesla K80 GPU accelerator) The challenges and opportunities for making use of private (using OpenStack) and public (using Amazon Web Services) clouds in GPU applications

  Back
 
Topics:
Clusters & GPU Management
Type:
Webinar
Event:
GTC Webinars
Year:
2015
Session ID:
GTCE107
Streaming:
Download:
Share:
Developer - Performance Optimization
Presentation
Media
Abstract:

In this webinar, we will bring CUDA into a compute intensive application usin ...Read More

Abstract:

In this webinar, we will bring CUDA into a compute intensive application using Allinea tools. First of all, we will discover Allinea Performance Reports - a great tool to analyze an existing application and determine whether it is appropriate for GPUs or not. If it is, profiling the application is critical to identify the most compute intensive code regions that need to be replaced with CUDA (or OpenACC) implementations. But as the code is being reworked, errors can be introduced. To resolve those profiling and debugging challenges, professional tools such as Allinea Forge are necessary to produce the correct, working, high performance GPU accelerated code with a minimum level of effort. During this technical session, an Allinea expert will illustrate how Allinea Performance Reports and Allinea Forge can help modernize applications very easily.

  Back
 
Topics:
Developer - Performance Optimization, Developer - Tools & Libraries
Type:
Webinar
Event:
GTC Webinars
Year:
2015
Session ID:
GTCE112
Streaming:
Share:
Developer - Programming Languages
Presentation
Media
Abstract:
Advances in directive-based programming models have made GPU programming more accessible than ever. Even so, models like OpenMP 4.0 and OpenACC offer no worksharing or automated memory managment facilities for multi-GPU environments. We present a mem ...Read More
Abstract:
Advances in directive-based programming models have made GPU programming more accessible than ever. Even so, models like OpenMP 4.0 and OpenACC offer no worksharing or automated memory managment facilities for multi-GPU environments. We present a memory-association interface for OpenMP-like models that enables multi-device worksharing, data re-shaping, and NUMA management as a single extension. Our implementation, AffinityTSAR, scales well to multiple GPUs, GPUs and CPUs together, and even shows improvement in CPU-only performance.  Back
 
Topics:
Developer - Programming Languages, Developer - Tools & Libraries
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5128
Download:
Share:
 
Abstract:
We particularly focus on data layout such as Structure of Arrays, advantageous data structure for GPUs, as opposed to Array of Structures, which exhibits good performance on CPUs. We propose a directive extension to OpenACC that allows the users to f ...Read More
Abstract:
We particularly focus on data layout such as Structure of Arrays, advantageous data structure for GPUs, as opposed to Array of Structures, which exhibits good performance on CPUs. We propose a directive extension to OpenACC that allows the users to flexibility specify optimal layouts, even if the data structures are nested. Performance results show that we gain as much as 96 % in performance for CPUs and 165% for GPUs compared to programs without such directives, essentially attaining both functional and performance portability in OpenACC.  Back
 
Topics:
Developer - Programming Languages
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5222
Download:
Share:
 
Abstract:
R is a free software environment that provides a programming language and built-in libraries of mathematics operations for statistics, data analysis, machine learning and much more. In this talk, I will give an overview of applying GPU in R, and ...Read More
Abstract:

R is a free software environment that provides a programming language and built-in libraries of mathematics operations for statistics, data analysis, machine learning and much more. In this talk, I will give an overview of applying GPU in R, and focus on three topics. First, I will introduce accelerating R computations by CUDA libraries, including apply drop-in library (nvblas) with zero coding effort, and step-by-step guide how to call CUDA-accelerated libraries such as cuFFT. Second, I am going to show how to accelerate legacy codes by directives (OpenACC), and write up your own CUDA algorithms in R. Third, I will illustrate the way to use CUDA tool chains with R as diverse as nvprof, cuda-memcheck and cuda-debug. Finally, I will present CUDA-accelerated results of several R benchmark.

  Back
 
Topics:
Developer - Programming Languages, Big Data Analytics, OpenACC
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5145
Streaming:
Download:
Share:
 
Abstract:
This webinar will serve as an introductory tutorial for anyone interested in accelerated computing using compiler directives. Participants will learn about OpenACC and a proven process for accelerating applications using compiler directives.&nbs ...Read More
Abstract:

This webinar will serve as an introductory tutorial for anyone interested in accelerated computing using compiler directives. Participants will learn about OpenACC and a proven process for accelerating applications using compiler directives. No prior GPU or parallel programming experience is required to attend this webinar, but the ability to read and understand C, C++, and or Fortran code is needed.

  Back
 
Topics:
Developer - Programming Languages, OpenACC
Type:
Webinar
Event:
GTC Webinars
Year:
2015
Session ID:
GTCE114
Streaming:
Download:
Share:
Developer - Tools & Libraries
Presentation
Media
Abstract:
The Kokkos library enables development of HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust fo ...Read More
Abstract:
The Kokkos library enables development of HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for best performance. In contrast Kokkos integrates compile-time polymorphic data layout with parallel execution policies to portably manage memory access patterns. This year we present recently added Kokkos concepts & capabilities with an emphasis on improvements to usability, such as C++11 lambda support now available in CUDA 7.0 for portable nested parallelism. We will also present a new application of Kokkos to Sandia's multithreaded graph library (MTGL).  Back
 
Topics:
Developer - Tools & Libraries, Developer - Algorithms, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5166
Streaming:
Download:
Share:
General Interest
Presentation
Media
Speakers:
Abstract:
If you are an academic researcher you won't want to miss this session! In this session, we highlight and reward excellent research taking place at institutions at the forefront of GPU computing teaching and research - NVIDIA Centers of Excellence (C ...Read More
Abstract:
If you are an academic researcher you won't want to miss this session! In this session, we highlight and reward excellent research taking place at institutions at the forefront of GPU computing teaching and research - NVIDIA Centers of Excellence (COE). We asked each of our COEs to submit a proposal with their best achievement over the past year. An NVIDIA panel of GPU computing luminaries, selected four exemplars from our twenty-two COEs to represent the amazing GPU computing research being done. Each of the finalists will each give a 15 minute presentation. After the presentation we will award a NVIDIA Achievement Award to one of the four COEs. The COE finalists are: 1. Harvard University, Extended excitonic systems, vibrational-excitonic effects & GPUs 2. Technische Universität Dresden, The OpenACC Profiling Interface 3. Tokyo Tech, Big Data Processing on GPU-based Supercomputers 4. Universidade Federal Fluminense, Education, Outreach & GRID  Back
 
Topics:
General Interest
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5115
Streaming:
Share:
HPC and Supercomputing
Presentation
Media
Abstract:
In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA a ...Read More
Abstract:
In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA and also covers advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. The latest improvements with CUDA-aware MPI, the Multi Process Service (MPS aka Hyper-Q for MPI) and MPI support in the NVIDIA performance analysis tools are covered.  Back
 
Topics:
HPC and Supercomputing, Data Center, Cloud Computing & HPC
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5117
Streaming:
Download:
Share:
 
Abstract:
OpenACC was applied to the a global high-resolution atmosphere model named Nonhydrostatic ICosahedral Atmospheric Model (NICAM). We succeed the execution of the dynamical core test without re-writing any specific kernel subroutines for GPU execution. ...Read More
Abstract:
OpenACC was applied to the a global high-resolution atmosphere model named Nonhydrostatic ICosahedral Atmospheric Model (NICAM). We succeed the execution of the dynamical core test without re-writing any specific kernel subroutines for GPU execution. Only 5% of the lines of source code were modified, demonstrating good portability. The performance and scalability was evaluated using the TSUBAME2.5 supercomputer. The results showed that the kernels generated by OpenACC achieved good performance, which was appropriate to the memory performance of GPU, as well as weak scalability. A large-scale simulation was carried out using 2560 GPUs, which achieved 60 TFLOPS.  Back
 
Topics:
HPC and Supercomputing, OpenACC, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5297
Streaming:
Download:
Share:
 
Abstract:
Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device me ...Read More
Abstract:

Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY library, framework for MPI Datatype processing using CUDA kernels, and more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Impact of processor affinity to GPU and network affecting the performance will be presented.

  Back
 
Topics:
HPC and Supercomputing, Data Center, Cloud Computing & HPC, Developer - Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5461
Streaming:
Download:
Share:
Machine Learning & Deep Learning
Presentation
Media
Abstract:
The Support Vector Machine (SVM) is a fundamental machine learning algorithm, effective for many classification problems, but with a high computational cost. Moreover, to obtain the best results for a given problem, the SVM meta-parameters need to be ...Read More
Abstract:
The Support Vector Machine (SVM) is a fundamental machine learning algorithm, effective for many classification problems, but with a high computational cost. Moreover, to obtain the best results for a given problem, the SVM meta-parameters need to be tuned, leading to numerous SVM executions and to a huge execution time. We have developed a semi-automatic solution based on OpenACC that allows the use of multiple GPUs for fast and efficient SVM meta-parameter tuning. We present our results on several handwritten digit classification problems.  Back
 
Topics:
Machine Learning & Deep Learning, Developer - Tools & Libraries
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5299
Download:
Share:
OpenACC
Presentation
Media
Abstract:
Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to r ...Read More
Abstract:
Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to record how much time is spent in OpenACC regions and what device activity it turns into. See how this can be turned into a natural timeline based visualization to show with great detail what an OpenACC application is doing at any point in time.  Back
 
Topics:
OpenACC, Developer - Performance Optimization, Developer - Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5139
Streaming:
Download:
Share:
 
Abstract:
Learn how to effectively use the directive-based OpenACC programming model to accelerate scientific applications and easily harness the computational power of GPUs. We share in this session our experiences in porting and tuning three applications to ...Read More
Abstract:
Learn how to effectively use the directive-based OpenACC programming model to accelerate scientific applications and easily harness the computational power of GPUs. We share in this session our experiences in porting and tuning three applications to GPUs using OpenACC: (i) an explicit seismic imaging kernel used in the Reverse Time Migration and Full Waveform Inversion applications, widely used in oil and gas exploration, where we show that fine tuning some of its clauses results in better performance, (ii) an implicit solver used in CFD for simulating the fluid structure interaction of flow over airfoil, and (iii) a CEM code that is based on the time-domain volume-integral-equation for simulating transient electromagnetics using both CAPS and PGI compilers.  Back
 
Topics:
OpenACC, Developer - Performance Optimization, Computational Physics, Energy Exploration, Developer - Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5160
Streaming:
Download:
Share:
 
Abstract:
Compiler directives, such as OpenACC and OpenMP, simplify parallel programming by exposing concepts at a high level and insulating developers from low-level, architectural details. In this session participants will learn the fundamentals of using com ...Read More
Abstract:
Compiler directives, such as OpenACC and OpenMP, simplify parallel programming by exposing concepts at a high level and insulating developers from low-level, architectural details. In this session participants will learn the fundamentals of using compiler directives to program for GPUs. This session will be taught using OpenACC, but the skills will be directly transferable to OpenMP. At the end of this tutorial participants will be able to use compiler directives to accelerate an application on a GPU.  Back
 
Topics:
OpenACC, Developer - Programming Languages, HPC and Supercomputing
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5192
Streaming:
Download:
Share:
 
Abstract:
This tutorial will teach advanced topics in using OpenACC to accelerate applications on GPUs. Some experience in OpenACC and/or OpenMP will be beneficial for attending this session. Participants will learn how to further improve the performance of an ...Read More
Abstract:
This tutorial will teach advanced topics in using OpenACC to accelerate applications on GPUs. Some experience in OpenACC and/or OpenMP will be beneficial for attending this session. Participants will learn how to further improve the performance of an OpenACC application using advanced topics, such as aynchronicity and interoperability with accelerated libraries. After attending this session participants will be able to optimize an OpenACC application for additional GPU performance.  Back
 
Topics:
OpenACC, Developer - Performance Optimization, Developer - Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5195
Streaming:
Download:
Share:
 
Abstract:
OpenACC and OpenMP provide programmers with two good options for portable, high-level parallel programming for GPUs. This talk will discuss similarities and differences between the two specifications in terms of programmability, portability, and perf ...Read More
Abstract:
OpenACC and OpenMP provide programmers with two good options for portable, high-level parallel programming for GPUs. This talk will discuss similarities and differences between the two specifications in terms of programmability, portability, and performance.  Back
 
Topics:
OpenACC, Developer - Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5196
Streaming:
Download:
Share:
 
Abstract:
This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The pane ...Read More
Abstract:

This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The panel will include users, speakers from compiler and tools vendors, and representatives of open source efforts to support directives. Session participants are encouraged to participate in the discussions of this panel.

  Back
 
Topics:
OpenACC, Developer - Tools & Libraries, Developer - Programming Languages, HPC and Supercomputing
Type:
Panel
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5198
Streaming:
Download:
Share:
 
Abstract:
This session presents valuable "lessons learned" during the process of porting computational physics applications to the Titan supercomputer with hybrid OpenACC and OpenMP. Specifically, three real-world HPC codes are enhanced with OpenACC ...Read More
Abstract:
This session presents valuable "lessons learned" during the process of porting computational physics applications to the Titan supercomputer with hybrid OpenACC and OpenMP. Specifically, three real-world HPC codes are enhanced with OpenACC directives to take advantage of the Kepler GPUs and OpenMP directives to target the CPUs of the Titan supercomputer. The first application is TACOMA, a computational fluid dynamics code which solves finite-volume, block-structured, compressible flows. The second application is Delta5D, a Monte Carlo fusion code which follows particle orbits in Boozer space using Hamiltonian guiding center equations solved with an adaptive time step integrator. Finally, the third application is NekCEM, a high-fidelity electromagnetics solver based on spectral element methods. While the science behind these applications may differ significantly, the same porting process and lessons learned apply to each.  Back
 
Topics:
OpenACC, Developer - Performance Optimization, Computational Physics, Developer - Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5202
Streaming:
Download:
Share:
 
Abstract:
This tutorial provides strategies of using OpenACC to accelerate C++ classes. Examples illustrate topics such as member functions, inheritance, templates, containers, the implicit 'this' pointer, private data and deep copies. OpenACC 2.0 features s ...Read More
Abstract:
This tutorial provides strategies of using OpenACC to accelerate C++ classes. Examples illustrate topics such as member functions, inheritance, templates, containers, the implicit 'this' pointer, private data and deep copies. OpenACC 2.0 features such as unstructured data regions and the "routine" directive are highlighted. We also discuss current limitations and the future directions of OpenACC. Familiarity with OpenACC is recommended but not required.  Back
 
Topics:
OpenACC, Developer - Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5233
Streaming:
Download:
Share:
 
Abstract:
We present an extended OpenACC programming model to fully exploit GPU-specific features still at a high level. Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution for GPU programming. However, too mu ...Read More
Abstract:
We present an extended OpenACC programming model to fully exploit GPU-specific features still at a high level. Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution for GPU programming. However, too much abstraction in the directive models makes it difficult for users to control architecture-specific features, incurring large performance gap between the directive models and low-level CUDA/OpenCL. We propose and implement new OpenACC extensions to support 1) hybrid programming of the unified memory and separate memory and 2) exploiting GPU-specific memories and synchronizations in an abstract manner. Experimental results show that the extended OpenACC programming can perform similarly to low-level CUDA programs, while at high level.  Back
 
Topics:
OpenACC, Developer - Performance Optimization, Developer - Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5366
Streaming:
Download:
Share:
 
Abstract:
Learn about the new features being added to OpenACC in the upcoming 2.5 version, and the new data management features being designed for the subsequent version. OpenACC is the popular directive-based API for GPU and accelerator programming, first rel ...Read More
Abstract:
Learn about the new features being added to OpenACC in the upcoming 2.5 version, and the new data management features being designed for the subsequent version. OpenACC is the popular directive-based API for GPU and accelerator programming, first released in 2011, supported by the Cray and PGI commercial products, and being implemented by numerous open-source compilers. The latest OpenACC release includes several simplifications and exposes some new behavior that programmers should be aware of. This presentation will also discuss the continuing work on deep data structure management features being designed for the subsequent release.  Back
 
Topics:
OpenACC, Developer - Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5382
Streaming:
Download:
Share:
 
Abstract:
Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fo ...Read More
Abstract:
Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fortran programs. The second half will be intermediate material, with more advanced hints and tips for Fortran programmers with larger applications that they want to accelerate with a GPU. Among the topics to be covered will be dynamic device data lifetimes, global data, procedure calls, derived type support, and much more.  Back
 
Topics:
OpenACC, Developer - Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5388
Streaming:
Download:
Share:
 
Abstract:
This session will showcase the results of the inaugural GPU Hackathon held at the Oak Ridge Leadership Computing Facility. The event hosted six teams paired with mentors over a week where applications where ported to GPUs using OpenACC directive ...Read More
Abstract:

This session will showcase the results of the inaugural GPU Hackathon held at the Oak Ridge Leadership Computing Facility. The event hosted six teams paired with mentors over a week where applications where ported to GPUs using OpenACC directives. The talk will describe the progress of each team from beginning to end as well as details about their implementation. Best practices, lessons learned as well as anecdotes from mentors who participated in this training event will be shared.

  Back
 
Topics:
OpenACC, Developer - Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5515
Streaming:
Download:
Share: