SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Bioinformatics & Genomics
Presentation
Media
Unveiling Cellular Mechanisms Using GPU-based Sparse Linear Algebra
Marco Maggioni (University of Illinois at Chicago)
In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical M ...Read More

In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical Master Equation (CME) stochastic framework at large scale, determining both probabilistic steady-state and transient dynamic of biochemical reaction networks. Our GPU implementation leverages the structure of the problem to optimize the sparse linear algebra routines needed by the stochastic model. As a result, we achieve an average 15.57x speedup over the optimized Intel MKL library running on a 64-core architecture.

  Back
 
Keywords:
Bioinformatics & Genomics, Quantum Chemistry, GTC 2013 - ID S3245
Streaming:
Download:
 
Computing Protein Size Distributions Using Centrifugation Techniques and the Tesla K20 GPU
Robert Zigon (Beckman Coulter)
Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first p ...Read More

Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first principles to derive the relative molecule sizes. Learn how the solution to the resulting regularized least squares problem can be computed in real time with the Tesla K20.

  Back
 
Keywords:
Bioinformatics & Genomics, Quantum Chemistry, GTC 2013 - ID S3330
Streaming:
Download:
Computational Physics
Presentation
Media
Synchrotron Light-source Data Analysis through Massively-parallel GPU Computing
Abhinav Sarje (Lawrence Berkeley National Laboratory)
In this session, we will report efforts and experiences in developing high-performance parallel algorithms and codes on large-scale GPU clusters for analysis of the large amounts of data generated by present high-throughput synchrotron light-sou ...Read More

In this session, we will report efforts and experiences in developing high-performance parallel algorithms and codes on large-scale GPU clusters for analysis of the large amounts of data generated by present high-throughput synchrotron light-sources. Such analyses are used in the characterization of macromolecules and particle-systems at micro/nano-scales. Codes include multi-GPU accelerated implementations for X-ray scattering pattern simulation using Distorted Wave Born Approximation theory, and structural fitting of such patterns through inverse modeling using Reverse Monte Carlo simulation algorithm. These codes are designed to be architecture-aware, and deliver high-performance through dynamic selection of the best-performing computational parameter values, such as computation decomposition parameters and block sizes, for the GPU architecture being used. Discussed will be detailed performance analyses and optimizations of codes.

  Back
 
Keywords:
Computational Physics, Quantum Chemistry, HPC and Supercomputing, GTC 2013 - ID S3282
Streaming:
Download:
 
Green Supercomputing for Greener Silicon Production
Wei Ge (Institute of Process Engineering, Chinese Academy of Sciences)
Crystalline silicon is a fundamental material for IT and green energy industries. Pyrolysis of silane to silicon which deposites onto the seeds in circulating fluidized bed reactors may bring a revolution for its greener production. Unfortunatel ...Read More

Crystalline silicon is a fundamental material for IT and green energy industries. Pyrolysis of silane to silicon which deposites onto the seeds in circulating fluidized bed reactors may bring a revolution for its greener production. Unfortunately, its commercial application is limited by our poor understanding on its complicated hydrodynamics and reaction kinetics. A multiscale simulation to this process, from reactors to reactions, is carried out using petascale CPU-GPU hybrid computing. The molecular dynamics simulations using Tersoff family potential are carried out for gaseous silane molecules and interfacial silicon atoms by multi-threads on multi-core CPUs, while other silicon atoms are computed on GPUs with fixed neighbor-list, reaching petaflops sustainable performance. Direct numerical simulation for the gas flow around suspended silicon powders are then carried out on GPUs, coupling lattice Boltzmann method with immersed moving boundary, while the collisions among the powders are processed on CPUs with discrete element method (DEM). 1 million solid particles in 2D and 100 thousand particles in 3D with about 1 billion lattices are computed using up to 672 GPUs, which is by far the largest scale for gas-solid systems now, and enters, for the first time, the scale-independent range where intrinsic constitutive correlations can be obtained. The whole reactor is finally simulated on GPUs in coarse-grained DEM, while Navier-Stokes equation is solved for the silane flow with coarse grids on CPUs. The simulation has revealed unprecedented details of the silicon production process which is most valuable to its scaling-up and optimization.

  Back
 
Keywords:
Computational Physics, Quantum Chemistry, Computational Fluid Dynamics, GTC 2013 - ID S3363
Streaming:
Download:
 
Moving Biophysics to the GPU Cloud for Studying Energy-Transfer in Photosynthesis
Tobias Kramer (Department of Physics, Humboldt-University Berlin, Germany)
We discuss the CUDA and OpenCL implementation of the hierarchical equations of motions (GPU-HEOM) method for tracking quantum-mechanical effects in photosynthesis. The hierarchy of coupled equations yields the time-evolution of the density matrix of ...Read More
We discuss the CUDA and OpenCL implementation of the hierarchical equations of motions (GPU-HEOM) method for tracking quantum-mechanical effects in photosynthesis. The hierarchy of coupled equations yields the time-evolution of the density matrix of a photosynthetic network and is efficiently mapped to the GPU architecture by assigning one thread to each hierarchy member, while storing time-independent information in constant memory. This makes the GPU architecture the optimal choice compared to conventional pthread-based parallelization schemes suffering from higher thread latency and allows one to connect theoretical simulations directly with experimental images of the energy-flow in photosynthesis. It answers the outstanding questions in the field: why is transport in photosynthesis so efficient and how to design artificial devices? The ready-to-run GPU-HEOM tool is installed on the publicly accessible nanoHUB platform where user share data and sessions while performing computations on the connected NVIDIA M2090 GPU cluster.  Back
 
Keywords:
Computational Physics, Quantum Chemistry, Desktop & Application Virtualization, GTC 2014 - ID S4490
Streaming:
Download:
 
GPU Acceleration of a Variational Monte Carlo Method
Niladri Sengupta (Louisiana State University, Baton Rouge, USA)
The session will describe the CUDA implementation of a variational Monte Carlo method for the study of strongly correlated quantum systems including high-temperature superconductors, magnetic semiconductors and metal oxides heterostructures. The pres ...Read More
The session will describe the CUDA implementation of a variational Monte Carlo method for the study of strongly correlated quantum systems including high-temperature superconductors, magnetic semiconductors and metal oxides heterostructures. The presentation will cover different tuning and optimization strategies implemented in the GPU code. To eliminate the bandwidth limited performance we have used caching and a novel restructuring of the computation and data access patterns.We also perform two specific optimizations for Kepler. The code uses dynamic compilation to improve performance, especially in parts with limited parallelism. Using Kepler, our code achieves 22 times and 176 times speedup compared to 8 cores and single core CPU implementations respectively. The GPU code allows us to obtain accurate results for large lattices which are crucial for developing predictive capabilities of materials properties. Our developed techniques for matrix inverse and determinant updates can be recycled for other quantum Monte Carlo methods.  Back
 
Keywords:
Computational Physics, Quantum Chemistry, GTC 2014 - ID S4554
Streaming:
HPC and Supercomputing
Presentation
Media
Targeting Extreme Scale Computational Challenges with Heterogeneous Systems
Antonino Tumeo (Pacific Northwest National Laboratory), Oreste Villa (Pacific Northwest National Laboratory)
Learn the techniques that Pacific Northwest National Laboratory (PNNL) computer scientists are applying to enhance the performance of scientific applications such as NWChem (Quantum Chemistry), STOMP (subsurface flow transport) and Paraflow (mul ...Read More

Learn the techniques that Pacific Northwest National Laboratory (PNNL) computer scientists are applying to enhance the performance of scientific applications such as NWChem (Quantum Chemistry), STOMP (subsurface flow transport) and Paraflow (multiflow simulation) on large scale GPU-accelerated clusters (e.g. ORNL Titan). This talk will discuss approaches such as Domain Specific Languages and auto-tuners for tensor contractions, library based approaches, dynamic heterogeneous task-based runtimes, compiler and run-time transformations for GPU code, which we are currently exploring to allow scaling these scientific applications to tens of thousands of GPU-accelerated nodes. Will provide initial results on the various approaches, comparing the performances obtained with code restructuring to pragma based (e.g., OpenACC) and to library based approaches, which maintain most of the legacy code intact while still providing considerable speedups.

  Back
 
Keywords:
HPC and Supercomputing, Quantum Chemistry, GTC 2013 - ID S3289
Streaming:
Download:
Quantum Chemistry
Presentation
Media
Enabling Faster Material Science Modeling Using the Accelerated Quantum ESPRESSO
Filippo Spiga (Irish Centre for High-End Computing)
The goal of this session is to present the advantages of mixing CUDA libraries and CUDA kernels to deliver a robust community package for material science modeling that fully exploits multi-core systems equipped with GPUs. The Plane-Wave Self-Co ...Read More

The goal of this session is to present the advantages of mixing CUDA libraries and CUDA kernels to deliver a robust community package for material science modeling that fully exploits multi-core systems equipped with GPUs. The Plane-Wave Self-Consistent Field (PWscf) code of the Quantum ESPRESSO suite is the focus of this work. During the session the main computation-dependent components, that also represent fundamental building blocks for many other quantum chemistry codes, will be discussed and analyzed. Subsequently an in-depth performance assessment of several realistic scientific cases will be presented, starting from single workstations to large clusters equipped with hundreds of GPUs.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2220
Streaming:
Download:
 
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
Antonino Tumeo (Pacific Northwest National Laboratory), Oreste Villa (Pacific Northwest National Laboratory)
This talk discuss the development of a Domain-Specific Language (DSL), the tools and the related runtime for efficiently generating Tensor Contractions (generalized matrix multiplications), an important part of many quantum chemistry methods (e. ...Read More

This talk discuss the development of a Domain-Specific Language (DSL), the tools and the related runtime for efficiently generating Tensor Contractions (generalized matrix multiplications), an important part of many quantum chemistry methods (e.g. Coupled Cluster Theory). Starting from a high level description of the computation, the tool analyses it and generates optimized C, OpenCL or CUDA implementations. The runtime, supporting a task based computation model, is then able to execute the generated code on GPU-accelerated heterogeneous large scale clusters, maximizing the utilization of the processing elements and minimizing communication costs.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2343
Streaming:
Download:
 
VASP Accelerated with GPUs
Maxwell Hutchinson (University of Chicago)
This session will detail the performance and capabilities of GPU-accelerated VASP, explain design decisions made in porting VASP to CUDA, and present a roadmap for GPU accelerated VASP development. We've achieved performance improvements up ...Read More

This session will detail the performance and capabilities of GPU-accelerated VASP, explain design decisions made in porting VASP to CUDA, and present a roadmap for GPU accelerated VASP development. We've achieved performance improvements up to around 20x on systems of around 100 ions and have implemented exact-exchange. We are working on ports of more conventional functionality.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2378
Streaming:
Download:
 
Large-Scale First Principle Pseudopotential DFT Calculations on GPU Clusters
WeiLe Jia (Supercomputing Center of CNIC, Chinese Academy of Sciences)
In this session, we will present a series of work on density functional theory (DFT) plane wave pseudopotential(PWP) calculations on GPU clusters. The GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate ~1000 atoms o ...Read More

In this session, we will present a series of work on density functional theory (DFT) plane wave pseudopotential(PWP) calculations on GPU clusters. The GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate ~1000 atoms on thousands of processors. Our test indicates that the GPU version can have a ~20 times speedup over CPU code. A detail analysis of the speed-up and the scaling on the number of CPU/GPU(up to 256) will be presented.As far as we know, this is the first GPU DFT-PWP code scalable to large number of CPU/GPU.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2392
Streaming:
Download:
 
Quantum Chemistry: Automated Code Generation and Optimization for GPU Kernels
Alexey Titov (Stanford), Ivan Ufimtsev (Stanford)
In this session we discuss the challenges encountered in development of quantum chemistry software for GPUs from scratch and optimization of the kernels for the best performance. We attempt to create a unified framework for automatic generation ...Read More

In this session we discuss the challenges encountered in development of quantum chemistry software for GPUs from scratch and optimization of the kernels for the best performance. We attempt to create a unified framework for automatic generation of efficient quantum chemistry codes tailored individually for various GPU (NVidia, ATI) and CPU architectures and programming (CUDA, OpenCL, C/C++) languages using a meta-programming approach based on a computer algebra system. We demonstrate its utility by generating highly optimized GPU and CPU kernels dealing with various integrals over Gaussian basis functions implemented in the TeraChem quantum chemistry package.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2429
Streaming:
Download:
 
Accelerating Coupled-Cluster Methods with GPUs
Kirill Khistyaev (University of Southern California)
Coupled-cluster techniques (CCSD, CCSD(T), etc.) are arguably the most accurate routinely applicable electronic structure methods. The most computationally expensive part of the CC methods are tensor contractions which can be expressed as combination ...Read More
Coupled-cluster techniques (CCSD, CCSD(T), etc.) are arguably the most accurate routinely applicable electronic structure methods. The most computationally expensive part of the CC methods are tensor contractions which can be expressed as combinations of matrix operations. Our poster explains the development of a tensor library which will effectively utilize the resources of GPUs while providing a simple interface for users at the same time.  Back
 
Keywords:
Quantum Chemistry, GTC 2013 - ID P3228
Download:
 
Case Studies and Optimization Using Nsight Visual Studio Edition
Julien Demouth (NVIDIA)
GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performanc ...Read More

GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performance. However, the heterogeneity of hardware and the inherent bottlenecks involved make efficient resource utilization and strong scaling very challenging. This advanced session describes our recent efforts on multi-level load-balancing, kernel execution strategies, CPU-GPU work splitting, and ways to exploit Kepler features such as Hyper-Q. Join us to talk about current limits of GPU acceleration in MD, and how to take molecular dynamics simulations to 100 millisecond/iteration, equivalent to 10,000 fps, in the near future!

  Back
 
Keywords:
Quantum Chemistry, HPC and Supercomputing, GTC 2013 - ID S3011
Streaming:
Download:
 
Implementing Many-body Potentials for Molecular Dynamics Simulations
Christian Trott (Sandia National Laboratories)
The session will present implementation and optimization strategies for molecular dynamics many-body potentials. It will concentrate on the new SNAP potential for LAMMPS, which is based on the GAP bispectrum analysis of Bartok et al. [PRL 104, 1 ...Read More

The session will present implementation and optimization strategies for molecular dynamics many-body potentials. It will concentrate on the new SNAP potential for LAMMPS, which is based on the GAP bispectrum analysis of Bartok et al. [PRL 104, 136403 (2010)]. SNAP is fit to large amounts of quantum-based DFT data and is capable of reproducing the accuracy of DFT while still exhibiting linear scaling with the system size. By exploiting multiple parallelisation layers it is possible to mitigate its high cost of 500,000 flops per interaction through excellent strong scaling behaviour down to 16 atoms per GPU. Thus the achievable time to solution on GPU clusters using SNAP is comparable to running simple Lennard Jones simulations.

  Back
 
Keywords:
Quantum Chemistry, GTC 2013 - ID S3080
Streaming:
Download:
 
VMD: GPU-accelerated Visualization and Analysis of Petascale Molecular Simulations
John Stone (University of Illinois at Urbana-Champaign)
This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. The talk will f ...Read More

This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. The talk will focus on recent algorithm algorithm developments and the applicability and efficient use of new CUDA features on state-of-the-art Kepler GPUs. Will present the latest performance results for GPU accelerated trajectory analysis runs on the Blue Waters Cray XK7 and other GPU-accelerated HPC platforms, and conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.

  Back
 
Keywords:
Quantum Chemistry, Large Scale Data Visualization & In-Situ Graphics, GTC 2013 - ID S3097
Streaming:
Download:
 
Folding@home: Petascale Scientific Computing on a Heterogeneous GPU Cluster
Vijay Pande (Stanford University)
This session will present recent results from Molecular Dynamics simulations Folding@home, discussing both schemes for parallelization on thousands to millions of GPUs as well as how these simulations have had an impact in basic biophysics and b ...Read More

This session will present recent results from Molecular Dynamics simulations Folding@home, discussing both schemes for parallelization on thousands to millions of GPUs as well as how these simulations have had an impact in basic biophysics and biomedical science, with an emphasis on protein folding and Alzheimer''s Disease.

  Back
 
Keywords:
Quantum Chemistry, HPC and Supercomputing, GTC 2013 - ID S3140
Streaming:
Download:
 
Fast Quantum Molecular Dynamics on Multi-GPU Architectures in LATTE
Susan Mniszewski (Los Alamos National Laboratory)
This session will demonstrate how GPUs were used to accelerate the primary computational bottleneck in explicitly quantum mechanical reactive molecular dynamics simulations in the open-source code LATTE. Focusing on implementations on single and ...Read More

This session will demonstrate how GPUs were used to accelerate the primary computational bottleneck in explicitly quantum mechanical reactive molecular dynamics simulations in the open-source code LATTE. Focusing on implementations on single and multi-GPU architectures of a remarkably simple algorithm for the computation of the density matrix in electronic structure theory that is based on a recursive series of generalized matrix-matrix multiplications. Utilizing CUDA and CUBLAS, resulted not only in significantly faster code, but also density matrices with numerical errors smaller than those obtained from traditional CPU-based algorithms. Real-world applications and timings computed using GPU-accelerated LATTE will be presented.

  Back
 
Keywords:
Quantum Chemistry, GTC 2013 - ID S3195
Streaming:
Download:
 
GPU-enabled Studies of Carbon Nanomaterials (CNT) and Aqueous Micellar Systems
Michela Taufer (University of Delaware), Sandeep Patel (University of Delaware)
With the plethora of future applications of carbon nanotube materials rapidly being realized and exploited, we are pursuing fundamental studies of structural, dynamic, and energetic properties of model single-walled carbon nanotubes in pure wate ...Read More

With the plethora of future applications of carbon nanotube materials rapidly being realized and exploited, we are pursuing fundamental studies of structural, dynamic, and energetic properties of model single-walled carbon nanotubes in pure water and in aqueous solutions of simple inorganic salt, sodium chloride (NaCl) and sodium iodide (NaI). Our transformative research is supported and made possible because of a hybrid combination of resources at Oak Ridge National Lab such as the GPU cluster Keeneland for FEN ZI GPU molecular dynamics simulations of mean force calculations and the data-intensive cluster Nautilus for the data analysis of the GPU-computed potentials of mean force. In this talk we dive deep into the various key aspects of CNT simulations on hybrid resources. Come and learn some of the underlying challenges and get the latest solutions devised to tackle both algorithmic and scientific challenges of CNT simulations and their heterogeneous workflows with GPUs.

  Back
 
Keywords:
Quantum Chemistry, Developer - Algorithms, GTC 2013 - ID S3199
Streaming:
Download:
 
Multi-GPU Accelerated Large Scale Electronic Structure Calculations
Samuli Hakala (Aalto University School of Science and Technology)
The goal of this session is to present the design and capabilities of GPU-accelerated GPAW, a density-functional theory (DFT) code based on grid based projector-augmented wave method. It''s suitable for large scale electronic structure c ...Read More

The goal of this session is to present the design and capabilities of GPU-accelerated GPAW, a density-functional theory (DFT) code based on grid based projector-augmented wave method. It''s suitable for large scale electronic structure calculations and capable of scaling to thousands of cores. We''ll discuss how we have accelerated the most computationally intensive components of the program with CUDA. We''ll provide detailed performance and scaling analysis of our multi-GPU-accelerated code staring from small systems up to systems with few thousands atoms running on large GPU clusters with over 200 GPUs. We''ve achieved up to 15 times speed-ups on large systems.

  Back
 
Keywords:
Quantum Chemistry, GTC 2013 - ID S3206
Streaming:
Download:
 
From Folding@Home to AMBER: Five Years of Molecular Dynamics with CUDA
Scott LeGrand (Amazon Web Services)
In 2008, NVIDIA demonstrated that CUDA-enabled GPUs accelerated molecular dynamics calculations by nearly 3 orders of magnitude compared to traditional CPUs. This allowed a single GPU to achieve the performance of a supercomputer at this task. A ...Read More

In 2008, NVIDIA demonstrated that CUDA-enabled GPUs accelerated molecular dynamics calculations by nearly 3 orders of magnitude compared to traditional CPUs. This allowed a single GPU to achieve the performance of a supercomputer at this task. Additionally, performance has improved by 1.5x to 2x per GPU generation. Despite these obvious benefits, there is still entrenched resistance to porting many existing codes to GPUs because of the work involved in doing so. However, with 5 years of performance data now in the rear-view mirror, it is clear that not only is it of huge benefit to port to GPUs now, but also that failing to do so will only result in having to do so later when many-core architectures become the standard. Finally, given you have already ported your code to GPUs, the next logical step is make your code cloud-accessible, freeing your users from having to purchase any hardware whatsoever and allowing them to take advantage of exponentially improving performance.

  Back
 
Keywords:
Quantum Chemistry, Cloud Visualization, GTC 2013 - ID S3228
Streaming:
Download:
 
Efficient Techniques for Massively Parallel Many-particle Simulations on GPUs
Joshua Anderson (University of Michigan)
Monte Carlo and Molecular Dynamics simulations are standard tools for analyzing the thermodynamic and statistical behavior of many-particle systems. The first computer experiment performed for the Manhattan project was a simulation of 12 hard sp ...Read More

Monte Carlo and Molecular Dynamics simulations are standard tools for analyzing the thermodynamic and statistical behavior of many-particle systems. The first computer experiment performed for the Manhattan project was a simulation of 12 hard spheres using a Monte Carlo algorithm. Now, massive parallelism enables routine simulations of millions of particles. In this talk, we describe our novel GPU Monte Carlo algorithm and compare it with HOOMD-blue, our open-source Molecular Dynamics code. Recent improvements to HOOMD-blue make possible parallel multiple GPU simulations on workstations and clusters. Applications include polymer dynamics, granular materials, non-equilibrium systems, and hard particle self-assembly.

  Back
 
Keywords:
Quantum Chemistry, Computational Physics, GTC 2013 - ID S3251
Streaming:
Download:
 
Petascale Molecular Dynamics Simulations on Titan and Blue Waters
James Phillips (University of Illinois)
The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduc ...Read More

The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. How do the GPU-accelerated Cray XK6 Blue Waters and ORNL Titan machines compare to CPU-based platforms for a hundred-million-atom Blue Waters acceptance test? Come learn the opportunities and pitfalls of taking GPU computing to the petascale and the importance of CUDA 5 and Kepler features in combining multicore host processors and GPUs in a legacy message-driven application.

  Back
 
Keywords:
Quantum Chemistry, HPC and Supercomputing, GTC 2013 - ID S3272
Streaming:
Download:
 
A Microsecond-a-day Keeps the Doctor Away: Efficient GPU Molecular Dynamics with GROMACS
Erik Lindahl (KTH Royal Institute of Technology at Stockholm University)
Learn how to perform molecular dynamics simulations reaching microsecond-per-day performance on GPUs, how to achieve impressive GPU acceleration of a code that was already extremely hand-tuned for x86 CPUs, and how we hope to take it even furthe ...Read More

Learn how to perform molecular dynamics simulations reaching microsecond-per-day performance on GPUs, how to achieve impressive GPU acceleration of a code that was already extremely hand-tuned for x86 CPUs, and how we hope to take it even further in the future. GROMACS is one of the most widespread programs in the world to simulate biomolecular dynamics, and has long been accelerated for CPUs with handtuned assembly code. This session will cover our challenges and successes in achieving significantly higher absolute performance with CUDA in GROMACS compared to extremely tuned CPU code both on low-end systems and massively parallel supercomputers. Join us to learn about the overall architectural decisions and features of this heterogeneous multi-level parallelization, see examples of application performance, and participate in a discussion about how future molecular simulation needs to focus on efficient throughput and sampling to achieve scaling.

  Back
 
Keywords:
Quantum Chemistry, HPC and Supercomputing, GTC 2013 - ID S3283
Streaming:
Download:
 
Challenges and Solutions for Heterogeneous Parallelization of Molecular Dynamics at 10,000 fps
Szilard Pall (KTH Royal Institute of Technology)
GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performanc ...Read More

GROMACS is a state-of-the-art molecular simulation package that employs extensive multi-level heterogeneous parallelization. Our new CUDA-based algorithms provide 4x speedup over handtuned CPU SIMD assembly, and unprecedented absolute performance. However, the heterogeneity of hardware and the inherent bottlenecks involved make efficient resource utilization and strong scaling very challenging. This advanced session describes our recent efforts on multi-level load-balancing, kernel execution strategies, CPU-GPU work splitting, and ways to exploit Kepler features such as Hyper-Q. Join us to talk about current limits of GPU acceleration in MD, and how to take molecular dynamics simulations to 100 millisecond/iteration, equivalent to 10,000 fps, in the near future!

  Back
 
Keywords:
Quantum Chemistry, HPC and Supercomputing, GTC 2013 - ID S3288
Streaming:
Download:
 
FastROCS: What Does It Mean To Be Fast?
Brian Cole (OpenEye Scientific Software)
ROCS (Rapid Overlay of Chemical Structure) is a proprietary algorithm that helped build OpenEye as a pillar of molecular modeling software. This was due to ROCS being very fast on the CPU and its robustness as a scientific model. Porting the alg ...Read More

ROCS (Rapid Overlay of Chemical Structure) is a proprietary algorithm that helped build OpenEye as a pillar of molecular modeling software. This was due to ROCS being very fast on the CPU and its robustness as a scientific model. Porting the algorithm to OpenCL achieved over a 100x speed improvement. What has been the effect after 3 years of experience on the market? And why was it ported to CUDA? What is the true value of speed? And are there other ways to achieve it?

  Back
 
Keywords:
Quantum Chemistry, Databases, Data Mining, Business Intelligence, HPC and Supercomputing, GTC 2013 - ID S3328
Streaming:
Download:
 
Interactive Drug Design Using GPGPUs
Thanasis Anthopoulos (Cardiff University)
The present session refers to a haptic Protein - Ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enable the application to run with a full ...Read More

The present session refers to a haptic Protein - Ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enable the application to run with a fully flexible ligand and protein target. The first part of the talk describes the algorithm used to perform the MMFF94s force-field energy and force calculations. Performance benchmarks will be presented to show the speed-up gained from the presented CUDA algorithms. The second part of the talk refers to how asynchronous stream processing helped to provide smooth visual rendering as well as force feedback on the haptic device at a rate of 1000Hz. The session closes by showing how flexible HPLD improves docking results during simulations.

  Back
 
Keywords:
Quantum Chemistry, GTC 2013 - ID S3333
Streaming:
Download:
 
The Impact of Kepler: Molecular Dynamics and the GPU Revolution
Ross Walker (University of California, San Diego)
This talk will focus on the impact that GPUs have had on Molecular Dynamics (MD) Simulations. In particular it will highlight the massive performance improvements that GPUs have brought to MD simulations with AMBER. Kepler based solutions can ro ...Read More

This talk will focus on the impact that GPUs have had on Molecular Dynamics (MD) Simulations. In particular it will highlight the massive performance improvements that GPUs have brought to MD simulations with AMBER. Kepler based solutions can routinely provide simulation rates exceeding 100ns/day on a single GPU in a single desktop while replica exchange approaches to accelerating convergence enable hundreds of GPUs to be employed in parallel. The GPU revolution has transformed the MD landscape. No longer is access to supercomputer resources required to routinely access microsecond timescales and beyond. The world of MD research is now flat, with all researchers, young and old, rich and poor being able to run simulations that previously were restricted to those privileged enough to have routine access to supercomputers. This has made it an exciting time for research involving Molecular Dynamics.

  Back
 
Keywords:
Quantum Chemistry, GTC 2013 - ID S3380
Streaming:
Download:
 
Experiences Using OpenACC to port the CCSD(T) Computational Chemistry Method in GAMESS on Blue Waters
Ryan Olson (Cray)
The distributed shared-memory implementation of the coupled-cluster singles and doubles with perturbative triples algorithm, CCSD(T), in the GAMESS chemistry package was ported to the GPU using the directive-based OpenACC standard. The focus of ...Read More

The distributed shared-memory implementation of the coupled-cluster singles and doubles with perturbative triples algorithm, CCSD(T), in the GAMESS chemistry package was ported to the GPU using the directive-based OpenACC standard. The focus of this port was to achieve maximum strong-scaling performance for small molecular systems (

  Back
 
Keywords:
Quantum Chemistry, Developer - Tools & Libraries, Developer - Programming Languages, GTC 2013 - ID S3506
Streaming:
Download:
 
Challenges and Advances in Large-scale DFT Calculations on GPUs
Heather Kulik (Massachusetts Institute of Technology)
Recent advances in reformulating electronic structure algorithms for stream processors such as graphical processing units have made DFT calculations on systems comprising up to O(10 to the 3) atoms feasible. Simulations on such systems that prev ...Read More

Recent advances in reformulating electronic structure algorithms for stream processors such as graphical processing units have made DFT calculations on systems comprising up to O(10 to the 3) atoms feasible. Simulations on such systems that previously required half a week on traditional processors can now be completed in only half an hour. Join Professor Heather Kulik, Massachusetts Institute of Technology, as she discusses how she leverages these GPU-accelerated quantum chemistry methods in the code TeraChem to investigate large-scale quantum mechanical features in applications ranging from protein structure to mechanochemical depolymerization. In each case, large-scale and rapid evaluation of electronic structure properties is critical for unearthing previously poorly understood properties and mechanistic features of these systems. Professor Kulik will also discuss outstanding challenges in the use of Gaussian localized-basis-set codes on GPUs pertaining to limitations in basis set size and how she circumvents such challenges to computational efficiency with systematic, physics-based error corrections to basis set incompleteness. 

  Back
 
Keywords:
Quantum Chemistry, GTC Webinars 2014 - ID GTCE080
Streaming:
Download:
 
Acceleration of Electron Repulsion Integral Evaluation on Graphics Processing Units via Use of Recurrence Relations
Yipu Miao (University of Florida)
A fast and efficient implementation of ab initio quantum chemistry calculation on GPU with novel accuracy level. Our software supports Hartree-Fock and DFT calculation with 10-100 times relative to traditional CPU nodes. ...Read More
A fast and efficient implementation of ab initio quantum chemistry calculation on GPU with novel accuracy level. Our software supports Hartree-Fock and DFT calculation with 10-100 times relative to traditional CPU nodes.  Back
 
Keywords:
Quantum Chemistry, GTC 2014 - ID S4211
Streaming:
 
Speeding-up NWChem on Heterogeneous Clusters
Antonino Tumeo (Pacific Northwest National Laboratory)
Learn the approaches that we implemented to accelerate NWChem, one of the flagship high performance computational chemistry tools, on heterogeneous supercomputers. In this talk we will discuss the new domain specific code generator, the auto-tuners ...Read More
Learn the approaches that we implemented to accelerate NWChem, one of the flagship high performance computational chemistry tools, on heterogeneous supercomputers. In this talk we will discuss the new domain specific code generator, the auto-tuners for the tensor contractions, and the related optimizations that enable acceleration of the Coupled-Cluster methods module for single- and multi-reference formulations of NWChem.  Back
 
Keywords:
Quantum Chemistry, Clusters & GPU Management, Computational Fluid Dynamics, HPC and Supercomputing, GTC 2014 - ID S4329
Streaming:
 
GRID-Based Methods for the Analysis of the Wave Function in Quantum Chemistry Accelerated by GPUs
Jorge Garza (Universidad Autonoma Metropolitana-Iztapalapa)
Learn how to distribute on GPUs scalar and vectorial fields defined in quantum chemistry. In this talk we analyze the wave function obtained by Hartree-Fock, density functional theory or many-body perturbation theory to second order by using the atom ...Read More
Learn how to distribute on GPUs scalar and vectorial fields defined in quantum chemistry. In this talk we analyze the wave function obtained by Hartree-Fock, density functional theory or many-body perturbation theory to second order by using the atoms in molecules approach. Gradient and laplacian of the electron density are used as examples of fields that can be evaluated easily on GPUs. The performance of our algorithms are contrasted with algorithms non accelerated by GPUs.  Back
 
Keywords:
Quantum Chemistry, Numerical Algorithms & Libraries, HPC and Supercomputing, GTC 2014 - ID S4389
Streaming:
Download:
 
Great Performance for Tiny Problems: Batched Products of Small Matrices
Nikolay Markovskiy (NVIDIA)
Learn how to get great performance on Kepler GPUs for small dense matrix products. Dense linear algebra operations are generally best performed in cuBLAS, but for batches of very small matrices, it may be possible to exploit some extra knowledge of ...Read More
Learn how to get great performance on Kepler GPUs for small dense matrix products. Dense linear algebra operations are generally best performed in cuBLAS, but for batches of very small matrices, it may be possible to exploit some extra knowledge of your particular application to improve the performance. After an analysis of an initial implementation, we will look into different algorithmic improvements (tiling, prefetching), use special features of the Kepler architecture and finally investigate autotuning to select the best implementation for a given problem size.   Back
 
Keywords:
Quantum Chemistry, Numerical Algorithms & Libraries, GTC 2014 - ID S4391
Streaming:
Download:
 
Achievements and Challenges Running GPU-Accelerated Quantum ESPRESSO on Heterogeneous Clusters
Filippo Spiga (Quantum ESPRESSO Foundation)
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. Within the Quantum ESPRESSO suite, the Plane-Wave Self-Consistent Field (PWscf) code represents a powerful compu ...Read More
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. Within the Quantum ESPRESSO suite, the Plane-Wave Self-Consistent Field (PWscf) code represents a powerful computational tool for scientists in both academia and industries for electronic-structure calculations at nanoscale. Due to wide adoption of GPU computing, it is now mandatory to push further the capability of the code by adding new functionalities and explicit optimizations. The aim of this talk is to present challenges and achievements of running a CPU-GPU code on heterogeneous clusters of various sizes. Benchmarks are performed on Darwin (University of Cambridge GPU cluster) and Titan (Oak Ridge National Laboratory). Input cases are provided by researchers in both academia and private companies.  Back
 
Keywords:
Quantum Chemistry, Clusters & GPU Management, Computational Physics, GTC 2014 - ID S4397
Streaming:
 
Virtual Molecular Modelling Kits: Playing Games with Quantum Chemistry
Nathan Luehr (Stanford University)
We discuss the impact of GPU-based quantum chemistry calculations for small molecules. Based on a specially optimized version of TeraChem, we demonstrate real-time molecular dynamics for systems up to a few dozen atoms. Harnessing this performance, w ...Read More
We discuss the impact of GPU-based quantum chemistry calculations for small molecules. Based on a specially optimized version of TeraChem, we demonstrate real-time molecular dynamics for systems up to a few dozen atoms. Harnessing this performance, we describe the development of interactive interfaces to virtual quantum chemistry models. Such interfaces make possible a new paradigm for chemical education and research.  Back
 
Keywords:
Quantum Chemistry, Combined Simulation & Real-Time Visualization, Molecular Dynamics, GTC 2014 - ID S4427
Streaming:
Download:
 
Enabling Gaussian 09 on GPGPUs
Roberto Gomperts (NVIDIA)
In 2011 Gaussian, Inc., NVIDIA Corp. and PGI started a long-term project to enable all the performance critical paths of Gaussian on GPGPUs. While the ultimate goal is to show significant performance improvement by using accelerators in conjunction w ...Read More
In 2011 Gaussian, Inc., NVIDIA Corp. and PGI started a long-term project to enable all the performance critical paths of Gaussian on GPGPUs. While the ultimate goal is to show significant performance improvement by using accelerators in conjunction with CPUs, the initial efforts are directed towards creating an infrastructure that will leverage the current CPU code base and at the same time minimize the additional maintenance effort associated with running on GPUs. Here we present the current status of this work for Direct Hartree-Fock and triples-correction calculations as applied in for example Coupled Cluster calculations that uses mostly the directives based OpenACC framework.  Back
 
Keywords:
Quantum Chemistry, Developer - Programming Languages, HPC and Supercomputing, GTC 2014 - ID S4613
Streaming:
Download:
 
GPUs and Real-Space Grids: A Powerful Alternative for the Simulation of Electrons
Xavier Andrade (Harvard University)
Learn why modeling electrons is important and what can we learn from these simulations, followed by a very brief introduction to the method of density functional theory (DFT) as an approximation of quantum mechanics to model electrons in molecular sy ...Read More
Learn why modeling electrons is important and what can we learn from these simulations, followed by a very brief introduction to the method of density functional theory (DFT) as an approximation of quantum mechanics to model electrons in molecular systems. This presentation also introduces the traditional method used in quantum chemistry to solve the DFT equations; namely the expansion of the molecular orbitals in a basis of Gaussian functions, and discuss its limitations for parallelization on GPUs. An alternative idea of simulating electrons with real-space grids and finite-differences and the application of GPUs to accelerate real-space calculations. The presentation will explain the scheme we developed to expose the data parallelism available in the DFT approach. Finally, results for current-generation GPUs which show that our scheme, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when o the CPU version of the code will be presented.  Back
 
Keywords:
Quantum Chemistry, Developer - Performance Optimization, Computational Physics, GTC 2014 - ID S4625
Streaming:
Download:
 
VASP: A Case Study for Accelerating Plane Wave DFT Codes
Sarah Tariq (NVIDIA), Przemyslaw Tredak (University of Warsaw)
In this session we will detail how we accelerated the VASP software package, used for atomic scale material modeling, on GPUs. Presenters in past years have shown that a straightforward implementation of VASP on GPUs with the help of the GPU-accelera ...Read More
In this session we will detail how we accelerated the VASP software package, used for atomic scale material modeling, on GPUs. Presenters in past years have shown that a straightforward implementation of VASP on GPUs with the help of the GPU-accelerated cuFFT and cuBLAS libraries can yeild reasonable speedups, bur we will show in this session that by targeting the implementation more towards the GPU's strengths and porting additional work, we can achieve more than a 3x speedup over this. We will present the methodology we followed, for improving both single GPU performance and multi-GPU, multi-node scaling. This work has been implemented in collaboration by NVIDIA interns and engineers (Jeroen Bedorf, Przemyslaw Tredak , Dusan Stosic, Arash Ashari, Paul Springer, Darko Stosic and Sarah Tariq), and researchers from Ens-lyon, IFPEN (Paul Fleurat-Lessard and Anciaux Sedrakian), CMU(Michael Widom) and University of Chicago (Maxwell Hutchinson).  Back
 
Keywords:
Quantum Chemistry, GTC 2014 - ID S4692
Streaming:
Download:
Signal and Audio Processing
Presentation
Media
GPU-based Real-time Synthetic Aperture Sonar Processing On-board Autonomous Underwater Vehicles
Jesus Ortiz (Advanced Robotics Department, Istituto Italiano di Tecnologia, Italy), Francesco Baralli (NATO STO Centre for Maritime Research and Exploration, Italy)
In this session we''ll speak about the implementation of a SAS (Synthetic Aperture Sonar) processing software on the GPU, running in real-time on-board of an Autonomous Underwater Vehicle (AUV). Current AUVs run in pre-planned survey rou ...Read More

In this session we''ll speak about the implementation of a SAS (Synthetic Aperture Sonar) processing software on the GPU, running in real-time on-board of an Autonomous Underwater Vehicle (AUV). Current AUVs run in pre-planned survey routes and record all the data for off-line processing. They don''t have flexibility to adapt to environmental conditions and sonar performance. With this new software we can increase the level of autonomy, allowing adaptive behaviors. We''ll show the process of design and implementation of the software, as well as the first results of the tests carried out with a real AUV.

  Back
 
Keywords:
Signal and Audio Processing, Quantum Chemistry, GTC 2013 - ID S3133
Streaming:
Download: