In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logistic map. This RNG can take advantage of the higher FP-to-integer instruction throughput ratio of CUDA GPUs to generate a large number of high quality random streams in situ. Second, warp-votes and shared memory were used to consolidate workload from diverging warps. Last, inline PTX was used to emulate 24-bit integer arithmetics by their floating point counterparts in order to increase throughput. An implementation using C++ templates ensures that no type-casting overhead is triggered and also guards the technique from unintentional usage.
For Sierra, a pre-exascale CORAL supercomputer arriving at Lawrence Livermore National Lab in 2017, neutral-particle transport codes will be a primary application and ensuring peak performance of these applications on this system (multiple IBM POWER9 CPUs + multiple Volta GPUs per node) is important. In preparation, transport mini-apps, like Kripke, are being optimized on today's hybrid CPU-GPU clusters using different programming models. This talk discusses performance issues encountered by Kripke on these systems and their solutions. Specifically we will focus on: a) a novel implementation of the sweep algorithm; b) techniques useful for modeling physical problems requiring memory footprint exceeding the aggregated GPU memory; and c) porting Kripke using OpenMP4.
We'll discuss, analyze, and improve the performance of deep neural network inference using GPUs. Other than neural net training, which is an offline process where large batches of images are fed to the GPU to maximize computational throughput, inference focuses on small-batch, low-latency forward propagation through the network. We'll discuss how the different performance requirements for inference impact the way we implement it on GPUs and what performance optimizations are possible, and we'll show how GPUs, all the way from the small Tegra X1 to the powerful TITAN X, excel at performance and energy efficiency when performing inference for deep neural networks.
At CSIRO Data61, we're building the next generation of science platforms that exploit GPU computing to dramatically accelerate the time to discovery and the pace of innovation in science and industry. Scientific applications routinely generate huge amounts of data. In response to these trends, we've developed and deployed a new breed of GPU-accelerated big data technologies, earth system modeling tools, and machine learning capabilities. We'll present examples of our work in big data analytics, earth system modeling, and deep learning that clearly demonstrate the value that GPU computing can deliver to research organisations and industry. CSIRO has been at the forefront of GPU computing since 2009 and was one of the first NVIDIA CUDA Research Centers.
Learn how deep learning can address some of the most critical problems of computational drug discovery. Historically, the field has been strongly focused on the development of drugs intended to act against one specific target with high potency and selectivity. It is now recognized that these concepts are too simplistic. At the same time, there was an unprecedented growth of chemical databases incorporating hundreds of billions of useful chemical records. Deep learning is well suited to address both of these challenges. GPU computing is the central hardware technology that allows deep learning to scale.
Running deep learning inference tasks on embedded platforms often requires deployment of pretrained models. Finding the best hyper-parameters and training are usually performed on a workstation or large-scale system to obtain the best model. In this talk, we'll show through examples using frameworks how to train models on a workstation and deploy models on embedded platforms such as the NVIDIA® Jetson™ TX1 or NVIDIA Drive™ PX. We'll also show dedicated tools and how to monitor performance and debug issues on embedded platforms for easy demo setup. This talk will include a live demo session.
Deep learning has enabled significant advances in supervised learning problems such as speech recognition and visual recognition. Reinforcement learning provides only a weaker supervisory signal, posing additional challenges in the form of temporal credit assignment and exploration. Nevertheless, deep reinforcement learning has already enabled learning to play Atari games from raw pixels (without access to the underlying game state) and learning certain types of visuomotor manipulation primitives. I will discuss major challenges for, as well as some preliminary promising results towards, making deep reinforcement learning applicable to real robotic problems.
This talk provides an overview of how Microsoft uses its open-source, distributed deep learning toolkit, CNTK, to make our products and services better. We'll show how you can use CNTK to train deep learning models of almost any topology and scale out to many GPUs. We'll review some of the challenges arising in scaling out deep learning workloads and CNTK way of solving them.
Theano is an extremely popular framework for machine learning, providing a generalized toolset for machine learning tasks. In this session, we'll discuss the Why and the What of Theano, leaving the How for the Theano Hands-on Lab. We'll cover the high-level motivations and general philosophy behind Theano, including future direction and goals. Please join the Theano developers to learn where this tool fits into your machine learning efforts.
Deep Learning is delivering the future today, enabling computers to perform tasks once thought possible only in science fiction. Innovations such as autonomous vehicles, speech recognition and advances in medical imaging will transform the world as we know it. GPUs are at the core of this transformation, providing the engines that power Deep Learning. In this session, we'll discuss the software tools NVIDIA provides to unlock the power of Deep Learning on GPUs. We'll provide an overview of NVIDIA's Deep Learning Software, including cuDNN and DIGITS, and pointers to maximize your experience with Deep Learning at GTC.
Cray Cluster Systems have long been used to support Supercomputing and Scientific Applications. In this talk we'll demonstrate how these same systems can be easily configured to support Docker and subsequently various Machine Learning Software Packages ? including NVIDIA's Digits Software. Additionally, these systems can be configured in such a way that their Docker containers can be configured to pull data from Cray's Sonexion Scale-out Lustre Storage System. With this configuration our systems can have maximum application flexibility through docker as well as simultaneously being able to support the high performance storage requirements of many types of machine learning workloads through a connection with our Lustre ecosystem.
This talk will describe how to develop and deploy deep learning applications efficiently and easily using MXNet. MXNet is a new deep learning framework developed by collaborators from over 10 institutes. It is designed for both flexibility and optimized performance, with easy to use interfaces in currently 7 programming languages including Python, Scala and R. We will discuss the technologies to scale out the framework to distributed clouds ranging from EC2, Azure, GCE to Spark clusters, and also memory optimizations to fit into embedded systems like mobile phones. Finally, we'll demonstrate deep learning applications in computer vision, natural language processing, and speech recognition.
You will learn how neural networks with memory and attention mechanisms allow for state of the art question answering.Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. We describe the dynamic memory network (DMN), which uses both of these mechanisms to achieve state of the art performance on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset. We demonstrate how attention mechanisms allow for improved inspection of deep learning models, helping to understand the evidence behind specific decisions. The techniques discussed are applicable to a wide range of tasks, helping to improve both the accuracy and interpretability of the resulting models.
Caffe is an open framework for deep learning that equips researchers and engineers with state-of-the-art tools and models. Caffe and its community provide an open source library, reference models, and do-it-yourself examples. We'll highlight scientific and industrial usage of Caffe, talk about recent changes in the latest roast, and discuss future directions. At present the framework has 150+ contributors, 1,000+ citations, and 5,000+ forks.
Yahoo's Hadoop clusters consist of 35,000 servers and store hundreds of petabytes of structured/non-structured data. Recently, we introduced a new capability -- deep learning -- into Hadoop clusters and developed software solutions, like CaffeOnSpark, to conduct distributed deep learning easily. Also, we expanded those clusters with GPU nodes and Infiniband connectivity, which brought 10X faster connectivity. Yahoo's Flickr teams, for example, have since made significant improvements to image recognition accuracy by training the framework with millions of photos. CaffeOnSpark was recently open sourced at github.com/yahoo/CaffeOnSpark under the Apache 2.0 License. Built upon the deep learning framework Caffe, and big-data framework Apache Spark, CaffeOnSpark supports neural network model training, testing, and feature extraction on a cluster of GPU and CPU servers. Caffe users can use their existing LMDB data files and network configurations. CaffeOnSpark's data-frame style API enables deep learning to be invoked along with non-deep learning and SQL analysis in a single program. In this talk, we will share Yahoo's experience on distributed deep learning, and provide a technical overview of CaffeOnSpark (including a demo on AWS EC2).
Highlighting the key role GPUs will play in creating systems that understand data in human-like ways, Rob High, IBM Watson's Chief Technology Officer, will discuss how cognitive computing helps doctors, lawyers, marketers and others glean key insights by analyzing large volumes of data.
We''ll present an innovate approach to efficiently mapping a popular pedestrian detection algorithm (HoG) on an NVIDIA Tegra GPU. Attendees will learn new techniques to optimize a real computer vision application on Tegra X1, as well as several new architecture features of the Tegra X1 GPU.
Most of today's IVI solutions are trying to replicate the smartphone interaction model in the car. Adopting an approach that is similar to smartphones will not result in differentiated solutions with a sustainable competitive advantage. More importantly, the immersive experiences that are typical of smartphone interaction, are not suitable in a driving environment. CloudCar is proposing a new approach in delivering connected services to the car, which brings about a new interaction model suited for the car.
Modern vehicle functions like advanced driver assistance systems (ADAS) or even fully autonomous driving functions have a rapidly growing demand for high performance computing power. To fulfill fail-operational requirements of autonomous driving functions, the next generation of a vehicle infrastructure platform has to ensure the execution of safety critical functions with high reliability. In addition the "always connected" feature, needed for autonomous driving, should be protected by the powerful security mechanisms. We'll show how the requirements of ADAS can be fulfilled in an efficient way, on both system and software architecture levels, using the example of automated valet parking from Elektrobit.
Learn how to use NVIDIA performance tools to optimize your scene graph and rendering pipeline for the use in automotive software. We'll demonstrate the capabilities of these tools using some simple Qt-based examples and will look at some of the more common mistakes in writing efficient software and how to avoid them.
A solution for vehicle integration targeting the NVIDIA Tegra Jetson Pro and DriveCX platforms will be presented. Communication with the vehicle via the automotive CAN bus is managed by a system that runs separately from other functions in its own execution environment and backed by its own real-time operating system -- all based on the industry's standard Automotive Open System Architecture (AUTOSAR). Learn about the various benefits this design often has versus handling CAN directly in systems like Linux, Android, or QNX.
Get an overview of the techniques used for Audi's Tegra 3 powered virtual cockpit, focusing on the topics (1) reduction of start-up time, (2) instrument display with 60 fps, and (3) synchronization with the infotainment main unit. Additionally, get to know the overall software structure and see how graphical effects were implemented. The virtual cockpit is available in single-display and dual-display configurations. The single-display configuration is used for sport models, like the TT and R8, where the output of the infotainment main unit is integrated into the instrument cluster. In contrast, the dual-display configuration additionally features a ""standard"" main unit display.
Learn how realistic virtual worlds can be used to train vision-based classifiers that operate in the real world, i.e., avoiding the cumbersome task of collecting ground truth by manual annotation. Many vision-based applications rely on classifiers trained with annotated data. We avoid manual annotation by using realistic computer graphics (e.g. video games). However, the accuracy of the classifiers drops because virtual (training) and real (operation) worlds are different. We overcome the problem using domain adaptation (DA) techniques. In the context of vision-based driver assistance and autonomous driving, we present our DA experiences using classifiers based on both handcrafted features and CNNs. We show how GPUs are used in all the stages of our training and operation paradigm.
This talk will describe how a single forward propagation of a neural network can give us locations of objects interested on an image frame. There are no proposal generation steps before running neural networks and no post processing steps after. The speaker will describe fully neural detection system, implemented by deep learning research teams of Panasonic, that achieves real-time speed and state-of-the-art performance. The talk also includes live demonstration of the system on a laptop PC with NVIDIA 970m and tablet with NVIDIA Tegra K1 GPU.
D(r)ive deep into crash prediction in future automotive systems that allow the tracking of dozens of objects in real time by utilizing the processing power of embedded GPUs. We'll describe (1) the new possibilities for crash prediction systems in embedded systems that are only possible by taking advantage of recent developments of embedded GPUs, and (2) the implementation and optimization of such a system on the Tegra K1 utilizing AnyDSL, a framework for rapid prototyping of domain-specific libraries that targets NVVM and CUDA.
The car presents a particular challenge for creators of learning systems -- it is incredibly rich in data and context, its hardware and software environments are heterogeneous and fragmented, and drivers expect incredible precision from its interactions. CloudMade has pioneered an approach to machine learning in the automotive context that leverages the richness of car data, the emerging computational power of the car, and the existing computational power of the cloud to deliver an automotive-grade machine learning toolset. With CloudMade's solutions, automotive OEMs can deliver personalized experiences to customers that together create a self-learning car that anticipates the needs and desires of the user.
A universal, real-time capable NMPC (nonlinear model predictive controller) based implementation of a trajectory generator for highly automated vehicles is presented. Its main target is to serve as the central instance for all high-level ADAS or automated vehicle functions, therefore abstracting vehicle-dependent kinematics and dynamics. The trajectory planner is capable of the combined optimization of lateral and longitudinal dynamics in urban, rural, and highway scenarios. One of the major challenges besides stable system layout is the fast solution of the embedded optimal control problem. For this, a bespoke GPU-optimized implementation was developed; apart from the planner itself, details about this implementation will be presented.
This tutorial covers for the first time the technology, operation and application of Quanergy's solid state LiDAR that is making 3D sensing ubiquitous, with its low price point, no moving parts, small form factor, light weight, low power consumption, long range, high resolution, high accuracy, long lifetime, and ability to operate in various environmental conditions. GPUs are used for performing in real time (1) LiDAR/Video data fusion for modeling and recognizing the environment around a vehicle, (2) object detection, classification, identification, and tracking, (3) scenario analysis and path planning based on deep learning, and (4) actuation of vehicle controls.
At CES 2016, DRIVE PX 2 was launched as the world's first AI supercomputer designed for autonomous vehicles from NVIDIA. DRIVE PX is a lot more than that. It is an incredible development platform for developers to write autonomous car applications. It is a reference design for Tier-1s, OEMs to reuse it for safety critical ECUs meant for Level 3/4/5 (As defined by SAE International). This talk will present the *under the hood* details of what makes it an AI Supercomputer, a Development platform and a Reference platform for autonomous cars.
Attendees will be able to walk away with an appreciation for how modern computing power and GPUs are enabling a whole new world of map design potential for the car. Vector-based maps can render data on the fly, 60fps, taking in-car map design to a more video game-like state. The driving experience can be seamless across devices, and tailored to exactly what a user needs for any specific use case.
We'll present how Nauto uses deep learning in its distributed, vehicle-based compute and sensor network, and our learnings to date. Topics will include the performance of deep learning algorithms for computer vision in embedded systems, strategies for distributing compute across networks of embedded systems and in the cloud, and collecting and labeling data to maximize the performance of the system. Nauto's system is a dual-camera, windshield-mounted dashcam with GPS, IMU, wireless/cellular connection, and a SoC capable of running small CNNs in real time.
Robots. Supercomputers. Cars. They're all coming together. Come hear Gill Pratt, one of the world's leading figures in artificial intelligence and CEO of the Toyota Research Institute, deliver what is sure to be an enlightening presentation.
In this presentation, we discuss Ford's autonomous vehicle technology including an overview of the tasks of sensing, sensor fusion, localization and mapping, object detection and object classification. We examine the impact of GPU hardware to achieve significant improvements to the computational efficiency of our parallelized algorithms for vehicle localization based on a combination of a synthetic aperture camera (derived from lidar data) and a Gaussian mixture 3d map approach. We provide an overview of some preliminary results of our deep learning research in the novel area of lidar-based methods for vehicle localization and object classification.
Hear the latest thinking on the maps that autonomous cars will use for highly accurate positioning. Autonomous cars need maps to function. The most critical use of maps is centimeter-level positioning. TomTom solves this with highly accurate lane information and lateral depth maps, which we call RoadDNA. Autonomous driving and map creation have incredible synergy. Mobile mapping cars go through the exact same process as autonomous cars: sensor perception, sensor data processing and comparing it with a stored version of reality. We process the sensor data with GPUs for fast creation of deep neural networks (DNNs) that can recognize traffic signs and other road attributes, both in-car as well as in the cloud. These DNNs, RoadDNA and sensors in the car together enable autonomous cars.
To fulfill the EuroNCAP requirements an autonomous braking system has to be developed. The emergency braking system is designed to brake for pedestrians as well as for car to car scenarios. We'll explain how the functional logic is developed and what has to be done to reach a zero false positive goal with an excellent field performance. Audi was the first OEM who fulfilled this goal with a single 3D Monovision camera by developing the first A-SIL B camera with our supplier Kostal, the architecture of the 3D camera is explained as well.
The Electronics Research Laboratory (ERL) is a part of the global research and development network that supports the Volkswagen Group brands. These brands include Audi, Bentley, Bugatti, Lamborghini, Porsche, and Volkswagen. Located in Silicon Valley, we draw upon its innovation spirit to build new concepts and technologies for our future vehicles. Deep learning is at the center of our work in the fast evolution of piloted driving. As part of our research into this technology, our mission is to research deep neural network architectures and bridge the gap between concept and series development application. In this paper, we'll present our current development in a variety of Deep Learning projects as well as insights into how this technology could affect the future of piloted driving.
Will review current connected and automated driving initiatives with the goal of identifying progress and impediments. Will look at market/thought leaders, tests, implementations, partnerships and the latest developments, some of which will be reflected from presentations and announcements taking place at GTC. Will share some forecast specifics and perspectives on the timing of partial and full autonomy and the expansion of vehicle connectivity.
ROBORACE is a global race series for full-size driverless electric cars. The championship will provide a showcase platform for the autonomous driving solutions that are now being developed by many large industrial automotive and technology players as well as top tech universities. As a competition of intelligence and technology, ROBORACE is fusing AI with automotive engineering in extreme conditions. Bringing together motorsports and gaming in that battle of algorithms the teams will compete on the racing tracks in major cities across the world. During the talk we will share the technical vision of our competition and explain the selection criteria for the racing teams. Join us to discuss and be the first to hear some exciting news about ROBORACE!
The WePod is the first self-driving vehicle on the public road without a steering wheel or pedals. To achieve driving in such a complex environment and guarantee safety, multiple sensors covering 360 degrees around the vehicle have been used. Sensor-fusion, road-user detection, classification and tracking have been implemented on NVIDIA's DrivePX platform. This session will give an overview of the systems architecture and implementation, as well preliminary test results of driving on the public road will be presented.
Pedestrian detection for autonomous driving has gained a lot of prominence during the last few years. Besides the fact that this is one of the hardest tasks within computer vision, it involves huge computational costs. Obtaining acceptable real-time performance, measured in frames per second (fps), for the most advanced algorithms is a difficult challenge. We propose a CUDA implementation of a well known pedestrian detection system (i.e., Random Forest of Local Experts). It includes LBP and HOG as feature descriptors and SVM and Random Forest as classifiers. We introduce significant algorithmic adjustments and optimizations to adapt the problem to the NVIDIA GPU architecture. The aim is to deploy a real-time system providing reliable results.
We show how our interactive, integrated analytics solution allows a new class of users to perform machine-assisted visual sensemaking. Up till now, machine learning techniques such as predictive analytics and deep learning are mostly used as part of a complex tool-chain that serves as an endpoint in the decision making process. We combine the strengths of human decision making and GPU-driven machine learning in a multi-coordinated visual analytics solution. This enables the discovery of actionable insights by bridging the gap between data scientist and business user.
The growth of the OpenPOWER Foundation has been phenomenal. Why, you might ask? In less than two years, OpenPOWER has grown from five members to over 180, with membership across all tiers of hardware, software, and end users themselves. The Foundation provides a compelling and rapidly growing open approach to infrastructure and software for rapidly changing workloads and evolving IT consumption models. This is a revolution that is making a profound difference in the price/performance criteria of end users, as well as accelerating compelling development for performance to drive business advantage. OpenPOWER members are co-creating their approach to technology?as innovators, producers, and consumers utilizing IBM's Power Architecture.
In an era defined by increasing diversity in computing architectures, performance portability is a key requirement for weather and climate applications that require massive computing resources. In this talk, you will learn about how we developed and achieve performance on CPU, GPU and MIC architectures using industry-standard OpenACC and OpenMP directives. Performance results from the NIM weather model will be shown for a number of device, node and multi-node and system configurations. Further, communications optimizations will highlight a more than a 40% improvement in runtime with scaling to thousands of GPUs.
Explore a GPU-based efficient algorithm for chemical ODEs, which is the core and costly part of atmosphere chemistry model in CAS-ESM project. Chemical ODEs is numerically sticky because of its stiffness, nonlinearity, and nonnegativity. Traditional solvers, such as LSODE, are hard for parallelism because of its complicated control flow and coupling. In our experiments, we have obtained 3-5X speedup on GPU when the same input is set on each node, which eliminates the divergences in kernel, while the performance with real input is even worse than the serial code. So we develop a new solver Modified-Backward-Euler (MBE). In our numerical experiments, MBE is shown to be faster and more precise than LSODE, and it's easy to parallelize, so we can expect a significant speedup on GPU.
As with many complex scientific computing applications, NASA's GEOS-5 climate modeling tool is computationally intense and can benefit from modern accelerated co-processing hardware. However, the burden of utilizing these new devices and achieving optimal results is placed on the same scientists responsible for developing the core algorithms and applying them to applications of interest. We'll present a task-based programming approach coupled with a dynamic scheduler. This allows the science of the software to be divorced from its implementation, both reducing the burden on the programmer and allowing it to adapt to changes in hardware architecture. In collaboration with NASA's Goddard Flight Research Center, we show our results in applying this technique to GEOS-5.
We'll demonstrate our efforts on developing highly efficient solvers for atmospheric dynamics on the GPU platforms. Besides general optimizations for GPU-based scientific computing applications, we apply optimization strategies that are specifically customized for atmospheric dynamic solvers. We'll show that by combining both algorithmic and architectural considerations, our optimization improves the computation efficiency from the original 2.24% to around 16% at the peak, with a sustained double-precision performance of 1.04 Tflops within one CPU-GPU node. We think this work demonstrates a huge potential for performing more efficient climate modeling work on GPU platforms.
Porting applications to GPUs still requires compromises between time-to-solution, GPU performance, and CPU performance. This often leads to major challenges for large, Fortran-based applications like weather and climate models. We'll focus on two of these challenges, whose significance is shown using real-world code examples and performance results: The differing requirements on parallel task granularity as well as storage order between the two architectures. A proposed solution is a flexible preprocessor framework called "Hybrid Fortran," which has been used to port both dynamics and physics of ASUCA, one of the Japan Meteorological Agency's current operational weather models. Finally, an even more hands-off solution to GPU portability is proposed in the shape of a black box solution.
In partnership with scientists from Space Science and Engineering Center (SSEC), Tempo Quest Inc. is embarking on a quest to complete a proprietary version of Weather Research and Forecasting Model (WRF) - AceCAST, a mesoscale and global model designed for both operational forecasters and atmospheric researchers and widely used by commercial, government, and institutional users. The state-of-the-art acceleration of low throughput, low energy consumption, and error resilient satellite remote sensing data compression suitable for data, image, and video transmission and archive will also be discussed.
The European Centre for Medium-Range Weather Forecasts has been at the cutting edge of Numerical Weather Prediction for the past 40 years, and is making sure it will remain so as HPC heads for the exascale. To this end, ECMWF is leading the EU H2020 ESCAPE project, which promises to address the many requirements necessary for achieving exascale NWP. After talking about the general strategy that ECMWF currently envisages for accelerator usage, we'll look at GPGPU work being carried out for the ESCAPE project, focusing on two important components of the ECMWF weather model, the cloud physics routine and spectral transforms.
We'll discuss the hardware-software co-design project behind the most cost and energy efficient system for numerical weather prediction -- an appliance based on the Cray CS-Storm system architecture that is loaded with NVIDIA K80 GPUs and operated on behalf of MeteoSwiss by CSCS since October 2015.
We describe the implementation of a simple numerical scheme for solving the shallow water equations on a GPU, which will be used in the further development of a massive ensemble prediction system running on GPUs. The numerical scheme has previously been used in operational forecasting, and benchmarks comparing the FORTRAN CPU version with the new GPU version have been performed. The results show that the GPU implementation gives a speedup over the CPU of slightly more than 200X. This is highly promising regarding the possibilities of running a large number of ensembles cost effectively on a computer and thereby increasing the usefulness of short-term ocean current forecasts and drift trajectory predictions.
Generation of huge amounts of spatial data has increased demand for applications that are capable of handling large-scale and high-resolution terrain data. A novel example of this would be the Iowa Flood Information System, which is a web-based, one-stop platform for accessing flood-related data. One of the most challenging tasks for terrain analysis is the delineation of watersheds. Although traditional methods for watershed analysis give high-accuracy results, it becomes more burdensome as the data resolution increases, and there is no client-side analysis tool for watershed delineation. In this project, we developed a client-side GPGPU algorithm to analyze high-resolution terrain data for watershed delineation, which allows parallelization using GPUs.
AceCAST is a proprietary version of WRF, a mesoscale and global weather research and forecasting model designed for both operational forecasters and atmospheric researchers that are widely used by commercial, government & institutional users around the world, in >150 countries. WRF is suitable for a broad spectrum of applications across domain scales ranging from meters to hundreds of kilometers. AceCAST increases in computational power which enables all time critical weather sensitive industry/commerce to achieve (1) High resolution accuracy and cost performance, (2) Need for strong scaling, and (3) Greatly improved profits. AceCAST is already one third completed and time to first commercial product is only ~12 months away.
Come and see how to use HOOMD-blue, a flexible particle simulation tool. HOOMD-blue runs hard particle Monte Carlo, Molecular Dynamics, DPD, and other types of particle simulations, all on GPUs. It runs on single GPU workstations up to thousands of GPUs on supercomputers. Use python scripts to configure jobs with custom initialization, complex flow control, and in-situ analysis of data. This talk introduces HOOMD-blue features and describes how to use them, focusing on the newest capabilities. It demonstrate job scripts for common usage patterns and shows examples of efficient workflows.
The AMBER molecular dynamics (MD) package is one of the fastest MD packages on commodity hardware and was one of the first widely used packages to exploit GPUs. We'll discuss the history of AMBER on NVIDIA GPUs and then highlight some of the newest advances in MD simulation that feature in the latest version 16 of AMBER. This includes extremely high-throughput thermodynamic integration free energy methods, explicit solvent constant pH simulations, advanced umbrella sampling restraints, multi-dimensional replica exchange methods, and asymmetric boundary conditions. We'll also discuss the development and validation of our latest precision model, SPXP, which is focused on maximizing the performance achievable from Maxwell-generation hardware without sacrificing accuracy.
This talk presents the first GPU enabled code for ReaxFF MD, GMD-Reax, and its applications in simulating large scale reactive molecular systems for challenging problems in energy applications, including reaction mechanism investigation of coal and biomass pyrolysis and combustion of jet fuels. GMD-Reax allows for efficient simulations of large models of ~10,000 atoms. Combined with using VARxMD, the first code we created for ReaxFF MD reaction analysis in our methodology development, the coal pyrolysis simulations can predict the overall spectrum evolution trend of products and uncover important reaction pathways and radical behaviour. What we obtained in simulations of coal and biomass pyrolysis and fuel combustion is hardly accessible experimentally or by other computational approach.