Multi-scale molecular dynamics of the systems of nanomagnets is investigated by numerical simulation using parallel algorithms. Fortran- code Magnetodynamics-F provides next types of research: study of the possibility of regulation time of switching of the magnetic moment of the nanostructure; estimation of the role of nanocrystal geometry on super-radiation of 1-, 2- and 3-dimensional objects; study of magnetodynamics of a nanodots inductively coupled with the passive resonator; depending on the solution from initial orientation of the magnetic moment in order to find the configurations for which the super-radiance and radiative damping are maximal. The parallel programs created using application programming interfaces OpenMP and OpenACC. The estimates of speedup and efficiency of implemented algorithms in comparison with sequential algorithms have been obtained. It is shown that the use of NVIDIA Tesla accelerates simulation for study of magnetic dynamics systems which include thousands of magnetic nanoparticles.
Learn how Folding@home has used petascale computing with GPUs to make fundamental breakthroughs in computational biology and how this technology can make an impact in your work.
Computer simulations are indispensible tools for deciphering how biomolecular structures and folding correspond to functions. These simulations benefit greatly from advances in parallel computations (e.g., GPUs) because the calculated forces are inherently independent computations. However, a major limitation of GPUs is that the transfer of data between the CPU and GPU must be minimized. We introduce a new algorithm for calculating neighbor lists and transferring them to GPUs with minimal memory transfer. This algorithm is readily implemented with CUDPP and CURAND libraries. Using simulations of the ribosome, we observe a significant improvement in the performance, which is system size dependent.
In this poster, we present our implementation of the density functional theory (DFT) plane wave pseudo-potential (PWP) calculation on GPU clusters. This GPU version is developed based on a CPU DFT-PWP code: PEtot. Our test indicates that the GPU version can have a ~10 times speed-up over the CPU version and is about 5 times faster than the legendary VASP code. An analysis of the speed-up and the scaling on the number of CPU/GPU computing units(up to 256) are presented. The success of our speed-up relies on a hybrid reciprocal-space and band-index parallelization scheme.
The original AMBER 11 provided performance on one GPU equivalent to an 8 node cluster and almost 60ns/day for 8 GPUs running the JAC production benchmark without additional approximations outstripping the performance of all conventional supercomputers. Here we describe further optimization of the code, coupled with hardware and software advances on the part of NVIDIA, that provides performance of >50ns/day on a single GPU with multiple GPUs providing simulation rates on systems the size of DHFR approaching a microsecond per day. This brings performance levels on desktops and commodity hybrid clusters to levels previously only considered possible using custom silicon.
An efficient and highly scalable algorithm for molecular dynamics (MD) simulation (using sophisticated many-body potentials) of solid covalent crystals is presented. Its effective memory throughput on a single C2050 GPU board reached 102 GB/s (81% of the peak), the instruction throughput reached 412 Ginstr/s (80% of the peak), and 27% of the peak flops of a single GPU was obtained. Parallel efficiency of the algorithm can be as high as 95% on all 7168 GPUs of Tianhe-1A, reaching possibly a record in high performance of MD simulations, 1.87Pflops in single precision.
Learn how rigid body dynamics are implemented in HOOMD-blue. Previous releases were capable of executing classical molecular dynamics -- where free particles interact via smooth potentials and their motion through time is computed using Newton's laws. The latest version allows particles to be grouped into bodies that move as rigid units. Users can now simulate materials made of cubes, rods, bent rods, jacks, plates, patchy particles, bucky balls, or any other arbitrary shapes. This talk covers how these algorithms are implemented on the GPU, tuned to perform well for bodies of any size, and discusses several use-cases relevant to research.
In this paper, we present how we improved the speedup of the electronic structure calculator VASP by more than an order of magnitude. Recently, the research works done (at IFP Energies Nouvelles) have shown that by coupling traditional clusters or High Performance Computing (HPC) machines with accelerators based on graphical processor units (GPUs), by recording the most time consuming parts of the codes (with programming languages like CUDA, OpenCL) and offloading them on the graphic chips, it is possible to reduce the computing time to ensure a speedup of a factor of 5 to 15.
Discover how GPUs are used to identify optimal framework structures for carbon dioxide separation with the goal of reducing carbon emission. We describe the algorithm behind our GPU software tool that iterates through a database of hypothetical zeolites and computes the selectivity of each of the structures. The code can be easily extended to simulate other adsorbent structures such as ZIFs (zeolitic imidazolate frameworks) and provide valuable insights to both theorists and experimentalists who have interest in carbon capture research.
The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. How do the Cray XK6 and modern GPU clusters compare to 300,000 CPU cores for a hundred-million-atom Blue Waters acceptance test? Come learn the opportunities and pitfalls of taking GPU computing to the petascale and the importance of CUDA 4.0 features in combining multicore host processors and GPUs in a legacy message-driven application.
Protein and RNA biomolecular folding and assembly problems have important applications because misfolding is associated with diseases like Alzheimer's and Parkinson's. However, simulating complex biomolecules on the same timescales as experiments is an extraordinary challenge due to a bottleneck in the force calculations. To overcome these hurdles, we perform coarse-grained molecular dynamics simulations where biomolecules are reduced into simpler components. Furthermore, our GPU-based simulations have a significant performance improvement over CPU-based simulations, which is limited to systems of 50-150 residues/nucleotides. The GPU-based code can simulate protein/RNA systems of 400-10,000+ residues/nucleotides, and we present ribosome assembly simulations.
This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on desktop computers, and batch-mode simulation and analysis jobs on GPU-accelerated HPC clusters. We'll present Fermi-specific algorithms and optimizations and compare with those for other devices. We'll also present performance and performance/watt results for VMD analysis calculations on GPU clusters, and conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.
We present the usefulness of a new style of GPU programming called Persistent Threads, known to be useful on irregular workloads. First, we will begin by formally defining the PT model. We will then categorize use of PT into four "use cases", and present micro-benchmark analyses of when this model is useful over traditional kernel formulations. Third, we will show a full speech recognition application that uses all four PT use cases. Finally, we will conclude our talk by suggesting appropriate modifications to GPU hardware, software, and APIs that make PT kernels both easier to implement and more efficient.
Learn how to create a scalable volume visualization system for interactive rendering of terascale EM data. We will describe the major design principles, how we can avoid the standard approach of pre-computing a 3D multi-resolution hierarchy such as an octree, and how to handle continuous streaming of newly acquired data. For rendering we build upon a visibility-driven approach and 3D virtual texturing, and perform interactive volume rendering of a "virtual" volume, where the corresponding physical storage is only represented and populated in a sparse manner with 2D instead of 3D image data on the fly during rendering.
GPU enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism in the order of several hundreds to few thousands of cores, presents opportunities as well poses implementation challenges. In this talk dive deep into the various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. Learn some of the underlying challenges and get the latest solutions devised to tackle them in the FEN ZI code for fully atomistic macromolecular simulations.
Using the latest algorithmic development in molecular dynamics on multiple GPUs over MPI, and technologies like GPUDirect it is now possible to address problems of interaction at bio-nano interface via large scale atomistic simulations. This talk will discuss the aspects of DNA-nanotube interactions and SWCNT induced conformational changes in DNA nucleosome structure. We will also address technical challenges upon porting and tuning AMBER 11 code on Condor GPU cluster at AFRL.
In this session we will talk about how to improve strong scaling for molecular dynamics applications. Using the NAMD molecular dynamics code as our primary case study, we will discuss the types of issues that can impede scaling, how to use already available and custom tools to discover such issues, and how to build a model to help analyze and predict scaling performance. Although this session is primarily focused on molecular dynamics applications, most of the lessons can be applied equally well to many other areas and applications.
Molecular Dynamics is an important application for GPU acceleration, but many algorithmic optimizations and features still rely on code that prefers traditional CPUs. It is only with the latest hardware and software we have been able to realize a heterogeneous GPU/CPU implementation and reach performance significantly beyond the state-of-the-art of hand-tuned CPU code in our GROMACS program. The sub-millisecond iteration time poses challenges on all levels of parallelization. Come and learn about our new atom-cluster pair interaction approach for non-bonded force evaluation that achieves 60% work-efficiency and other innovative solutions for heterogeneous GPU systems.
GPUs have made molecular dynamics simulations faster, better, and cheaper, achieving supercomputer performance from a single GPU without sacrificing stability or accuracy. In this talk we demonstrate how the GPU refactoring of AMBER 12 Molecular Dynamics has led to an implementation that produces results that are indistinguishable from the original CPU code. In addition, we describe the GPU compute instances available on the Amazon EC2 platform to show how anyone can run any number of AMBER 12 simulations, anytime from anywhere.
Markov Chain Monte Carlo (MCMC) simulation of chemical systems allows examination of nanoscopic thermodynamics and associated behavior at small time scales. These simulations tend to be computationally expensive, requiring days or more of CPU time to collect data. Optimization work is essential in order to remedy the inherent time complexity of these simulations. To date, there is no multi-ensemble molecular MCMC engine for the simulation of chemical systems that leverages GPUs. A speed up of 6.3 and 14.4 times were achieved for a problem size of 131072 particles for the canonical and Gibbs ensemble implementations, respectively.
GPU-enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism, in the order of several hundred to a few thousand cores, presents opportunities as well as poses implementation challenges. In this webinar, Michela Taufer, Assistant Professor, Department of Computer and Information Sciences, University of Delaware, discusses various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. She will also visit some of the underlying challenges and solutions devised to tackle them.
We present software development efforts in LAMMPS that allow for acceleration with GPUs on supercomputers. We present benchmark results for solid-state, biological and mesoscopic systems along with results from simulation of liposomes, polyelectrolyte brushes, and copper nanostructures on graphite. We present methods for efficient simulation with GPUs at larger node counts.
This webinar showcases the latest GPU-acceleration technologies available to AMBER users and discusses features, recent updates and future plans. Join us to learn how to obtain the latest accelerated versions of AMBER, which features are supported, the simplicity of its installation and use, and how it performs with Kepler GPUs.
Learn about the first multi-node, multi-GPU-enabled release 4.6 of GROMACS from Dr. Erik Lindahl, the project leader for this popular molecular dynamics package. GROMACS 4.6 allows you to run your models up to 3X faster compare to the latest state-of-the-art parallel AVX-accelerated CPU-code in GROMACS. Dr. Lindahl will talk about the new features of the latest GROMACS 4.6 release as well as future plans. You will learn how to download the latest accelerated version of GROMACS and which features are GPU supported. Dr. Lindahl will cover GROMACS performance on the very latest NVIDIA Kepler hardware and explain how to run GPU-accelerated MD simulations. You will also be invited to try GROMACS on K20 with a free test drive and experience all the new features and enhanced performance for yourself: http://www.nvidia.com/gputestdrive
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening (searching a database of molecules for those compounds that are similar in shape to a molecule with desirable biological activity) has become an integral part of computational drug discovery, due to both its speed and efficacy.OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past (shape comparisons of millions of molecules to one another) and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Join Paul Hawkins, Applications Science Group Leader at OpenEye, as he presents some recent performance data on FastROCS on NVIDIA hardware and discusses some of the new applications that this speed has enabled.You will also be invited to take the Tesla K20 for a free test drive and experience all the new features and enhanced performance for yourself: www.nvidia.com/gputestdrive.
Join Dr. Juan R. Perilla and learn how in a tour de force effort, experimental and computational scientists at the University of Illinois at Urbana–Champaign and the University of Pittsburg have now resolved the HIV capsid's chemical structure. As reported recently on the cover of Nature, the researchers combined NMR structure analysis, electron microscopy and data-guided molecular dynamics simulations utilizing VMD to prepare and analyze simulations performed using NAMD on NVIDIA GPUs in one of the most powerful computers worldwide, Blue Waters, to obtain and characterize the HIV-1 capsid. The discovery can now guide the design of novel drugs for enhanced antiviral therapy.Also learn how NAMD performs with the latest Kepler GPUs, as well as details about GPU Test Drive (www.nvidia.com/GPUTestDrive) and how to try NAMD on Kepler GPUs for free.
Join Acellera Founder, Gianni De Fabritiis, and CTO, Matt Harvey, to learn about the latest developments of high-throughput molecular dynamics both in terms of applications and methodological advances. Examples will be given in the context of ACEMD, a highly efficient, best-in-class graphical processing units (GPUs) centric code for running MD simulations, and its protocols. In particular, attendees will learn how the high arithmetic performance and intrinsic parallelism of the latest NVIDIA Kepler GPUs can offer a technological edge for molecular dynamics simulations. Micro to milliseconds molecular dynamics on accelerator hardware which will have important methodological and scientific implications will be highlighted. This webinar presents a great opportunity for industrial scientists to get an overview of the current achievements in molecular simulations for medicinal chemistry.
VMD is a tool for preparing, analyzing, and visualizing molecular dynamics simulations, with particular emphasis on large biomolecular systems, including drug targets such as the bacterial ribosome, and large viruses such as HIV.The computational challenges posed by large simulations present a significant hurdle for simulation and analysis tools. GPUs provide unprecedented computational capabilities at a very low cost, making it possible for applications like VMD to accelerate tasks that would otherwise be beyond our grasp. The ubiquitous nature of powerful GPUs on hardware ranging from tablets to supercomputers has allowed us to make a significant investment in developing GPU algorithms for a broad range of uses covering everything from ion placement during simulation preparation to photorealistic ray tracing of movies on hundreds of supercomputer nodes.Join us for this webinar as John Stone, Senior Research Programmer, University of Illinois provides an overview of the GPU-accelerated features of VMD and how they can be used to speed up a wide range of simulation preparation, analysis, and visualization tasks today, along with a roadmap of things to come in the future.
This webinar will provide an overview of the AMBER Molecular Dynamics Software package with focus on what is new with regards to GPU acceleration in the recently released version 14. This includes details of peer-to-peer support and optimizations, which have resulted in version 14 being the fastest MD software package on commodity hardware. Benchmarks will be provided, along with recommended hardware choices. In addition, an overview of the new GPU centric features in AMBER 14 will be covered, including support for multi-dimensional replica exchange MD, hydrogen mass repartitioning, accelerated MD, Scaled MD, and support-as-a-service on Amazon Web Services. This is a joint webinar by Ross C. Walker, University of California San Diego and Adrian Roitberg, University of Florida.
Explore the concepts behind large-scale modeling of faceted anisotropic particles. Dynamical methods are the most direct way to study the full set of properties of systems of colloidal and nanoscale particles. Classical and event-driven molecular dynamics simulations of the past have focused on behavior of isotropic particles and limited classes of anisotropic particles such as ellipsoids. In this talk, we discuss the algorithms and data structures behind a GPU-accelerated implementation of the discrete element method for polyhedral particles in HOOMD-Blue. This formulation allows us to efficiently simulate conservative and non-conservative dynamics of faceted shapes within a classical molecular dynamics framework. Research applications include studies of nucleation and growth, granular materials, glassy dynamics and active matter.
Learn about various methods and trade-offs in the distributed GPU implementation of molecular dynamics proxy application that achieves more than 90% weak scaling efficiency on 512 GPU nodes. CoMD represents a reference implementation of classical molecular dynamics algorithms and workloads. It is created and maintained by The Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) and is part of the R&D100 Award-winning Mantevo 1.0 software suite. In this talk we will discuss the main techniques and methods that are involved in GPU implementation of CoMD, including (1) cell-based and neighbor list approaches for neighbor particles search, (2) different thread-mapping strategies and memory layouts. An efficient distributed implementation will be covered in detail. Interior/boundary cells separation is used to allow efficient asynchronous processing and concurrent execution of kernels, memory copies and MPI transfers.
This is a first snapshot of the heterogeneous CPU+GPU Molecular Dynamics (MD) in CHARMM and its performance and the accuracy. GPU is used only for the direct part of forces; CPU computes all other contributions (reciprocal, bonded, SHAKE, etc.). The GPU code was implemented natively in CHARMM using CUDA C. The MD engine is built around the DOMDEC domain decomposition code and therefore naturally enables MD simulations on multiple CPU+GPU nodes. We will present discoveries that used features implemented in DOMDEC_GPU, showing the current usefulness of the code and GPUs for biomolecular simulation, advanced sampling techniques, and for enabling DOE/NREL efforts toward affordable consumer biofuels.