GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
With every generation of GPU it becomes increasingly more difficult to keep the data pipeline full so that the GPU can be fully utilized. We'll propose a method for offloading the CPU and using the GPU to process image data to increase throughput.
With every generation of GPU it becomes increasingly more difficult to keep the data pipeline full so that the GPU can be fully utilized. We'll propose a method for offloading the CPU and using the GPU to process image data to increase throughput.  Back
 
Topics:
Deep Learning & AI Frameworks, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8906
Streaming:
Download:
Share:
 
Abstract:
Classical molecular dynamics is a very important method in computational physics, chemistry and biology. It is also very computationally demanding. That is why it was among the first scientific methods to be ported to run on the GPUs. However, only some types of potentials used in MD, namely pair potentials, were ported. Other types, like REBO many body potential, very important to simulate systems of C and H, are still computed on the CPU. The reason for this lies in a huge complexity of many body potentials, as well as in a lack of an efficient communication scheme between threads that would resolve race conditions without atomic operations. This work shows a method of overcoming these difficulties in the CUDA implementation of 2nd generation REBO potential and achieved speedup.
Classical molecular dynamics is a very important method in computational physics, chemistry and biology. It is also very computationally demanding. That is why it was among the first scientific methods to be ported to run on the GPUs. However, only some types of potentials used in MD, namely pair potentials, were ported. Other types, like REBO many body potential, very important to simulate systems of C and H, are still computed on the CPU. The reason for this lies in a huge complexity of many body potentials, as well as in a lack of an efficient communication scheme between threads that would resolve race conditions without atomic operations. This work shows a method of overcoming these difficulties in the CUDA implementation of 2nd generation REBO potential and achieved speedup.  Back
 
Topics:
Life & Material Science, Developer - Algorithms, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5358
Streaming:
Download:
Share:
 
Abstract:
In this session we will detail how we accelerated the VASP software package, used for atomic scale material modeling, on GPUs. Presenters in past years have shown that a straightforward implementation of VASP on GPUs with the help of the GPU-accelerated cuFFT and cuBLAS libraries can yeild reasonable speedups, bur we will show in this session that by targeting the implementation more towards the GPU's strengths and porting additional work, we can achieve more than a 3x speedup over this. We will present the methodology we followed, for improving both single GPU performance and multi-GPU, multi-node scaling. This work has been implemented in collaboration by NVIDIA interns and engineers (Jeroen Bedorf, Przemyslaw Tredak , Dusan Stosic, Arash Ashari, Paul Springer, Darko Stosic and Sarah Tariq), and researchers from Ens-lyon, IFPEN (Paul Fleurat-Lessard and Anciaux Sedrakian), CMU(Michael Widom) and University of Chicago (Maxwell Hutchinson).
In this session we will detail how we accelerated the VASP software package, used for atomic scale material modeling, on GPUs. Presenters in past years have shown that a straightforward implementation of VASP on GPUs with the help of the GPU-accelerated cuFFT and cuBLAS libraries can yeild reasonable speedups, bur we will show in this session that by targeting the implementation more towards the GPU's strengths and porting additional work, we can achieve more than a 3x speedup over this. We will present the methodology we followed, for improving both single GPU performance and multi-GPU, multi-node scaling. This work has been implemented in collaboration by NVIDIA interns and engineers (Jeroen Bedorf, Przemyslaw Tredak , Dusan Stosic, Arash Ashari, Paul Springer, Darko Stosic and Sarah Tariq), and researchers from Ens-lyon, IFPEN (Paul Fleurat-Lessard and Anciaux Sedrakian), CMU(Michael Widom) and University of Chicago (Maxwell Hutchinson).  Back
 
Topics:
Quantum Chemistry
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4692
Streaming:
Download:
Share:
 
Abstract:
Described will be a novel approach to implementation of Mersenne Twister MT19937 Random Number Generator on the GPUs. MT19937 is one of the most widely used PRNG for the Monte Carlo simulations, due to its speed and very good statistical properties. It was believed, however, that it does not parallelize well, and parallel implementations of it are rare (for example, Intel MKL library provides only single-threaded implementation of MT19937). Our implementation proves the opposite, being much faster than both cuRAND XORWOW and MTGP (Mersenne Twister for Graphics Processors) on Tesla K20X. Presented algorithm is being incorporated in the upcoming release of cuRAND.
Described will be a novel approach to implementation of Mersenne Twister MT19937 Random Number Generator on the GPUs. MT19937 is one of the most widely used PRNG for the Monte Carlo simulations, due to its speed and very good statistical properties. It was believed, however, that it does not parallelize well, and parallel implementations of it are rare (for example, Intel MKL library provides only single-threaded implementation of MT19937). Our implementation proves the opposite, being much faster than both cuRAND XORWOW and MTGP (Mersenne Twister for Graphics Processors) on Tesla K20X. Presented algorithm is being incorporated in the upcoming release of cuRAND.   Back
 
Topics:
Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2013
Session ID:
P3132
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next