GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
In this session, we will study a real CUDA application and use NVIDIA(R) Nsight(TM) Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.
In this session, we will study a real CUDA application and use NVIDIA(R) Nsight(TM) Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Topics:
Performance Optimization
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4165
Streaming:
Download:
Share:
 
Abstract:
Traversal of unstructured meshes presents an interesting challenge for massively parallel processors such as GPUs. The problem offers abundant but irregular parallelism. Fortunately this irregular parallelism can still be harnessed to provide a speedup using GPUs. This talk presents our work on accelerating UMT2013, a benchmark that performs distributed 3D unstructured-mesh photon transport. UMT leverages both OpenMP and SIMD parallelism on CPUs, but neither by itself is sufficient to allow UMT to scale onto a GPU. Using the CPU and GPU together to detect and resolve sequential dependencies across the mesh, we can maximize parallelism.
Traversal of unstructured meshes presents an interesting challenge for massively parallel processors such as GPUs. The problem offers abundant but irregular parallelism. Fortunately this irregular parallelism can still be harnessed to provide a speedup using GPUs. This talk presents our work on accelerating UMT2013, a benchmark that performs distributed 3D unstructured-mesh photon transport. UMT leverages both OpenMP and SIMD parallelism on CPUs, but neither by itself is sufficient to allow UMT to scale onto a GPU. Using the CPU and GPU together to detect and resolve sequential dependencies across the mesh, we can maximize parallelism.  Back
 
Topics:
HPC and Supercomputing, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4489
Streaming:
Download:
Share:
 
Abstract:

Starting from the fundamentals of parallel programming in CUDA C/C++, learn how to maximize your development productivity. We present a design cycle we call APOD: Assess, Parallelize, Optimize, and Deploy, which helps application developers to rapidly identify the portions of their code that would most readily benefit from GPU acceleration, rapidly realize that benefit, and begin leveraging the resulting speedups in production as early as possible.

Starting from the fundamentals of parallel programming in CUDA C/C++, learn how to maximize your development productivity. We present a design cycle we call APOD: Assess, Parallelize, Optimize, and Deploy, which helps application developers to rapidly identify the portions of their code that would most readily benefit from GPU acceleration, rapidly realize that benefit, and begin leveraging the resulting speedups in production as early as possible.

  Back
 
Topics:
Programming Languages
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3008
Streaming:
Download:
Share:
 
Abstract:

When integrating CUDA C++ kernels into existing C++ applications, it is at times desirable to migrate a C++ object instance from the host to the device or vice versa. Given variations among host compilers regarding structure layout, accomplishing this data marshalling in a manner that is reliable, simple, and efficient is a complex issue. cudaMemcpy is our primary means to transfer data to the GPU, but memcpy-style operations are more readily amenable to C-style structures and arrays than to C++ objects or collections of objects. In this session, we will cover the caveats and best practices for marshalling C++ data.

When integrating CUDA C++ kernels into existing C++ applications, it is at times desirable to migrate a C++ object instance from the host to the device or vice versa. Given variations among host compilers regarding structure layout, accomplishing this data marshalling in a manner that is reliable, simple, and efficient is a complex issue. cudaMemcpy is our primary means to transfer data to the GPU, but memcpy-style operations are more readily amenable to C-style structures and arrays than to C++ objects or collections of objects. In this session, we will cover the caveats and best practices for marshalling C++ data.

  Back
 
Topics:
Finance
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2377
Streaming:
Download:
Share:
 
 
Topics:
Programming Languages
Type:
Talk
Event:
Supercomputing
Year:
2011
Session ID:
S2624
Streaming:
Download:
Share:
 
Abstract:

OpenCL is Khronos' new open standard for parallel programming of heterogeneous systems. This tutorial session will introduce the main concepts behind the standard and illustrate them with some simple code walkthrough. Attendees will also learn how to make efficient use of the API to achieve good performance on the GPU.

OpenCL is Khronos' new open standard for parallel programming of heterogeneous systems. This tutorial session will introduce the main concepts behind the standard and illustrate them with some simple code walkthrough. Attendees will also learn how to make efficient use of the API to achieve good performance on the GPU.

  Back
 
Topics:
General Interest, Tools & Libraries
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S09409
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next