GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

We'll introduce cuTT, a tensor transpose library for GPUs that on average achieves over 70% of the attainable memory bandwidth, independent of tensor rank. Tensor transposing is important in many applications such as multi-dimensional Fast Fourier Transforms and deep learning, and in quantum chemistry calculations. Until now, no runtime library existed that fully utilized the remarkable memory bandwidth of GPUs and could perform well independent of tensor rank. We'll describe two transpose algorithms, "Tiled" and "Packed," which achieve high-memory bandwidth in most use cases, as well as their variations that take care of many important corner cases. We'll also discuss a heuristic method based on GPU performance modeling that helps cuTT choose the optimal algorithm for the particular use case. Finally, we'll present benchmarks for tensor ranks 2 to 12 and show that cuTT, a fully runtime library, performs as well as an approach based on code generation.

We'll introduce cuTT, a tensor transpose library for GPUs that on average achieves over 70% of the attainable memory bandwidth, independent of tensor rank. Tensor transposing is important in many applications such as multi-dimensional Fast Fourier Transforms and deep learning, and in quantum chemistry calculations. Until now, no runtime library existed that fully utilized the remarkable memory bandwidth of GPUs and could perform well independent of tensor rank. We'll describe two transpose algorithms, "Tiled" and "Packed," which achieve high-memory bandwidth in most use cases, as well as their variations that take care of many important corner cases. We'll also discuss a heuristic method based on GPU performance modeling that helps cuTT choose the optimal algorithm for the particular use case. Finally, we'll present benchmarks for tensor ranks 2 to 12 and show that cuTT, a fully runtime library, performs as well as an approach based on code generation.

  Back
 
Topics:
Algorithms & Numerical Techniques, Tools & Libraries, Performance Optimization, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7255
Download:
Share:
 
Abstract:

Learn about recent performance improvements in the GPU acceleration of NAMD biomolecular modeling application. These improvements include performance gains in the non-bonded CUDA kernels and new GPU-only implementation of Particle Mesh Ewald (PME) reciprocal computation. We will describe in detail the changes made in the non-bonded CUDA kernels that give 1.4-1.7 times better performance compared to the previous version. We will describe the new PME reciprocal code that enables computation on multiple GPUs and gives performance that is between 1.4-1.8 times faster than the previous code.

Learn about recent performance improvements in the GPU acceleration of NAMD biomolecular modeling application. These improvements include performance gains in the non-bonded CUDA kernels and new GPU-only implementation of Particle Mesh Ewald (PME) reciprocal computation. We will describe in detail the changes made in the non-bonded CUDA kernels that give 1.4-1.7 times better performance compared to the previous version. We will describe the new PME reciprocal code that enables computation on multiple GPUs and gives performance that is between 1.4-1.8 times faster than the previous code.

  Back
 
Topics:
Computational Biology & Chemistry, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6623
Streaming:
Download:
Share:
 
Abstract:

Running the latest versions of GPU accelerated applications maximizes performance and improves user productivity. The latest version, NAMD 2.11, provides up to 7x* speedup on GPUs over CPU-only systems and up to 2x performance over NAMD 2.10. Watch this on-demand webinar to hear experts from NVIDIA and NAMD answer your NAMD and GPU related questions ranging from installation to job optimization.  *Dual CPU server, Intel E5-2698 v3@2.3GHz, NVIDIA Tesla K80 with ECC off, Autoboost On; STMV datasetoolkit to date.

Running the latest versions of GPU accelerated applications maximizes performance and improves user productivity. The latest version, NAMD 2.11, provides up to 7x* speedup on GPUs over CPU-only systems and up to 2x performance over NAMD 2.10. Watch this on-demand webinar to hear experts from NVIDIA and NAMD answer your NAMD and GPU related questions ranging from installation to job optimization.  *Dual CPU server, Intel E5-2698 v3@2.3GHz, NVIDIA Tesla K80 with ECC off, Autoboost On; STMV datasetoolkit to date.

  Back
 
Topics:
Computational Biology & Chemistry
Type:
Webinar
Event:
GTC Webinars
Year:
2016
Session ID:
GTCE126
Streaming:
Download:
Share:
 
Abstract:
This presentation provides a first glimpse of a heterogeneous CPU+GPU Molecular Dynamics (MD) engine in CHARMM. In the MD engine, the GPU is used for the calculation of the direct part of the non-bonded force calculation, while the CPU takes care of the rest of the work (reciprocal force calculation, bonded force calculation, integration, etc.). The MD engine is built around the CHARMM domain decomposition code enabling massively parallel MD simulations on multiple CPU+GPU nodes. The new MD engine outperforms the CPU code by a factor of 8 or more.
This presentation provides a first glimpse of a heterogeneous CPU+GPU Molecular Dynamics (MD) engine in CHARMM. In the MD engine, the GPU is used for the calculation of the direct part of the non-bonded force calculation, while the CPU takes care of the rest of the work (reciprocal force calculation, bonded force calculation, integration, etc.). The MD engine is built around the CHARMM domain decomposition code enabling massively parallel MD simulations on multiple CPU+GPU nodes. The new MD engine outperforms the CPU code by a factor of 8 or more.  Back
 
Topics:
Molecular Dynamics, Numerical Algorithms & Libraries, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4163
Streaming:
Download:
Share:
 
Abstract:

This is a first snapshot of the heterogeneous CPU+GPU Molecular Dynamics (MD) in CHARMM and its performance and the accuracy. GPU is used only for the direct part of forces; CPU computes all other contributions (reciprocal, bonded, SHAKE, etc.). The GPU code was implemented natively in CHARMM using CUDA C. The MD engine is built around the DOMDEC domain decomposition code and therefore naturally enables MD simulations on multiple CPU+GPU nodes. We will present discoveries that used features implemented in DOMDEC_GPU, showing the current usefulness of the code and GPUs for biomolecular simulation, advanced sampling techniques, and for enabling DOE/NREL efforts toward affordable consumer biofuels.

This is a first snapshot of the heterogeneous CPU+GPU Molecular Dynamics (MD) in CHARMM and its performance and the accuracy. GPU is used only for the direct part of forces; CPU computes all other contributions (reciprocal, bonded, SHAKE, etc.). The GPU code was implemented natively in CHARMM using CUDA C. The MD engine is built around the DOMDEC domain decomposition code and therefore naturally enables MD simulations on multiple CPU+GPU nodes. We will present discoveries that used features implemented in DOMDEC_GPU, showing the current usefulness of the code and GPUs for biomolecular simulation, advanced sampling techniques, and for enabling DOE/NREL efforts toward affordable consumer biofuels.

  Back
 
Topics:
Molecular Dynamics
Type:
Webinar
Event:
GTC Webinars
Year:
2014
Session ID:
GTCE103
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next