Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.
We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's most challenging science and national security problems. We will discuss how this multi-year effort paid off with exciting possibilities for new science. We'll also outline using GPUs in our traditional HPC platforms and workflows is adding an exciting new dimension to simulation-based science, prompting LLNL to rethink how we perform future simulations as intelligent simulation. We'll give an overview of the application preparation process that led up to Sierra's deployment, as well as look at current and planned research aimed at riding the AI and machine learning waves in pursuit of game-changing science.
Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.
For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions.
The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed and Unified memory support, datatype processing, and support for OpenPOWER and NVLink will be highlighted for HPC applications. We will also present novel designs and enhancements to the MPI library to boost performance and scalability of Deep Learning frameworks on GPU clusters. Container-based solutions for GPU-based cloud environment will also be highlighted.
AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we will work with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.
Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric models, and the mushrooming of data volumes. Our team at the National Center for Atmospheric Research is pursuing a hybrid approach to surmounting these barriers that combines machine learning techniques and GPU-acceleration to produce, we hope, a new generation of ultra-fast models of enhanced fidelity with nature and increased value to society.
The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, such as a specific astronomical event or scientific experiment. By combining vast amounts of labeled surrogate data with advanced few-shot learning, we have demonstrated success in leveraging small data in deep learning. In this talk, we will discuss these exciting results and explore the scientific innovations that made this possible.
Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with traditional scientific methods to push the state-of-the-art in many disciplines. We will provide an overview of some of the thirty projects we have stewarded, demonstrating how we have leveraged computing and analytics in fields as diverse as ultrasensitive detection to metabolomics to atmospheric science.
We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML framework on commodity Azure VMs that scales to tens of terabytes and thousands of cores, while achieving better accuracy than state-of-the-art.
PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 institutions. Bridges emphasizes "nontraditional" uses that span the life, physical, and social sciences, engineering, and business, many of which are based on AI or AI-enabled simulation. We describe the characteristics of Bridges that have made it a success, and we highlight several inspirational results and how they benefited from the system architecture. We then introduce "Bridges AI", a powerful new addition for balanced AI capability and capacity that includes NVIDIA's DGX-2 and HPE NVLink-connected 8-way Volta servers.
Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We''ll also discuss the expected performance in combination with the new computing platforms that emerged last year.
This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.
Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.
Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the importance of the necessary reshaping of the architecture of the software using its GPU accelerated analytics. Minsky featuring far better GPU memory bandwidth and GPU-CPU interconnect rase the bar even further. Murex will show how it has handled this new challenge for its business.
HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation power usually involve heterogeneous computation resources, e.g., GPUs and InfiniBand as installed on HPE Apollo. NovuMinds NovuForce system leveraging state of art technologies make the deployment and configuration procedure fast and smooth. NovuForce deep learning softwares within the docker image has been optimized for the latest technology like NVIDIA Pascal GPU and infiniband GPUDirect RDMA. This flexibility of the software, combined with the broad GPU servers in HPE portfolio, makes one of the most efficient and scalable solutions.
Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.
We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us satisfy strict real-time restrictions dictated by the imaging hardware used to scan the rock samples. We present a detailed description of the workflow from a DRP approach perspectives, our algorithm and optimization techniques and performance results on the latest NVIDIA GPU generations.
In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guidance and prepare the scientific communities to challenges of exascale architectures. The talk will present the OpenPOWER platform bought by GENCI and provided to the scientific community. Then, it will present the first results obtained on the platform for a set of about 15 applications using all the solutions provided to the users (CUDA,OpenACC,OpenMP,...). Finally, a presentation about one specific application will be made regarding its porting effort and techniques used for GPUs with both OpenACC and OpenMP.
Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes hand in hand with the increasing demand of Wireless-VR Hardware solutions, what hardware is out on the market now, what is coming and how can wireless solutions - whether standalone devices or Addons - create a higher value for your VR application? How large-scale location based VR and hardware manufacturers are expanding the boundaries of the VR industry, both for Entertainment and B2B?
With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughput of multiple applications, covering all aspects of the workflow, including data analysis and visualisation. We will discuss ongoing efforts to further integrate these extreme-scale compute and data services with infrastructure services of the cloud. As Tier-0 systems of PRACE, Piz Daint is accessible to all scientists in Europe and worldwide. It provides a baseline for future development of exascale computing. We will present a strategy for developing exascale computing technologies in domains such as weather and climate or materials science.
The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release.
Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding errors. Although algorithms like binomial correction or Karatsuba can reduce rounding errors considerably, they require additional calculations. We'll detail performance of these algorithms based on the Warp Matrix Multiply Accumulate API.
Learn how GPUs are pushing the limits of the largest astronomical telescopes on Earth and how they'll be used to image life-bearing planets outside our solar system. Thanks to hardware features such as Tensor Cores and mixed-precision support, plus optimized AI frameworks, GPU technology is changing how large data streams from optical sensors are digested in real time. We'll discuss how real-time AI made possible by GPUs opens up new means to optimally control the system and calibrate images, which will help scientists get the most out of the largest optical telescopes. GPUs will also benefit future extreme-size facilities like the European Extremely Large Telescope because the complexity of maintaining exquisite image quality increases with the square of its diameter size. We'll present on-sky results obtained on the 8.2-meter Subaru Telescope and explain why these techniques will be essential to future giant telescopes.
We'll do a deep dive into previously undisclosed architectural details of NVIDIA's Turing T4 Cloud GPU, which we unearthed via micro-benchmarks, and compare the architecture's features with previous generations of NVIDIA GPUs. We'll also reveal the geometry and latency of Turing's complex memory hierarchy, the format of its encoded instructions, and the latency of instructions. Learn how developers can use this knowledge to design workloads that adapt exactly to the characteristics of the T4 GPU. We'll also explain how to manually assemble binary code that squeezes every bit of bare-metal performance from the hardware, which maximizes dual issues and avoids bank conflicts.
We'll discuss the Clara Platform, which is designed to bring NVIDIA technology and expertise in high performance computing, artificial intelligence, and photorealistic rendering to the medical-imaging industry. Our talk will focus on how developers from industry and institutions are leveraging the platform to integrate artificial intelligence into hospitals to bend the cost curve and improve patient outcomes.
Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models for implementing efficient tasking frameworks. Participants will learn about the pitfalls for tasking arising from the architectural differences between latency-driven CPUs and throughput-driven GPUs. To overcome these pitfalls, we consider programming concepts such as persistent threads, warp-aware data structures and CUDA asynchronous task graphs. In addition, we look at the latest GPU features such as forward progress guarantees and grid synchronization that facilitate the implementation of tasking approaches. A task-based fast multipole method for the molecular dynamics package GROMACS serves as use case for our considerations.
As multi-GPU deep learning performance improves, the performance of the storage system hosting a dataset becomes critical in keeping these GPUs fully utilized. We survey the different methods for providing training data to a TensorFlow application on a GPU, and benchmark data throughput for a variety of popular neural network architectures. We look at performance and potential bottlenecks for local storage technologies (SCSI SSD and NVMe), high performance network-attached file systems, TensorFlow native connectors (HDFS and S3), and FUSE-connected object storage.
We''ll present a multi-node distributed deep learning framework called ChainerMN. Even though GPUs are continuously gaining more computation throughput, it is still very time-consuming to train state-of-the-art deep neural network models. For better scalability and productivity, it is paramount to accelerate the training process by using multiple GPUs. To enable high-performance and flexible distributed training, ChainerMN was developed and built on top of Chainer. We''ll first introduce the basic approaches to distributed deep learning and then explain the design choice, basic usage, and implementation details of Chainer and ChainerMN. To demonstrate the scalability and efficiency of ChainerMN, we''ll discuss the remarkable results from training ResNet-50 classification model on ImageNet database using 1024 Tesla P100 GPUs and our in-house cluster, MN-1.
Across the Mediterranean basins, the Messinian salinity crisis resulted in the deposition of up to 2 km thick multi-layered evaporitic succession consisting of alternating layers of halite and clastics. Such geological objects obscure seismic imaging and may even be over pressurized posing potential drilling hazards, which are often hard to predict. We demonstrate TPDOT&TWSM approach developed in IPGG SB RAS by example of evaluating the interference wavefields wave fragment into the shadow zone for real geological case from the Levant Basin, offshore Israel. Using of GPUs allowed accelerating TWSM algorithm based on multiple large size matrix-vector operations in hundreds and more times.
Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations of automotive aerodynamics. Highly-resolved, transient CFD simulations based on pure CPU systems are computationally expensive and constrained by available computational resources. This was posing a big challenge for automotive OEMs in their aerodynamic design process over many years. To overcome this problem, we present ultraFluidX, a novel CFD solver that was specifically designed to leverage the massively parallel architecture of GPUs. With its multi-GPU implementation based on CUDA-aware MPI, the tool can achieve turnaround times of just a few hours for simulations of fully detailed production-level passenger and heavy-duty vehicles a breakthrough for simulation-based design.
Come and learn about new fast low-rank matrix computations on GPUs! By exploiting the low-rank off-diagonal block structure, we design and implement fast linear algebra operations on massively parallel hardware architectures. The main idea is to refactor the numerical algorithms and the corresponding implementations by aggregating similar numerical operations in terms of highly optimized batched kernels. Applications in weather prediction, seismic imaging and material science are employed to assess the trade-off between numerical accuracy and parallel performance of these fast matrix computations compared to more traditional approaches..
The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind this simulation. In particular we highlight and focus on those transformations and optimizations carried out to achieve a good performance on NVIDIA GPUs.
As embedded software in intelligent vehicles becomes more and more complex, it becomes critical to automakers and suppliers to use advanced and efficient software solutions. Learn how to dramatically reduces development cycles and how to simplify the deployment of critical real-time applications on embedded targets. In this presentation we will show how RTMaps embedded facilitates porting design from early prototyping stages on PCs down to the most recent ECUs designed for production. RTMaps is a component based software which facilitates the design and the execution of ADAS and HAD applications. It offers an easy-to use drag-and-drop approach for GPU-based computer-vision and AI systems, including an integration of the NVIDIA DriveWorks software modules as independent building-block.
The goal of the session is to deep dive into key technical building blocks of interactive Computer Aided Engineering (CAE) and to understand along specific prototypes how GPU computing will impact it. Considering the example of interactive design assistants, we will explain the ingredients of future GPU-based simulation codes: (i) multi-level voxel geometry representation from integration to finite elements, (ii) Indirect (weak) realization of boundary conditions, (iii) (non-linear) geometric multi-grid methods. By streamlining all algorithms with respects to GPU, state-of-the-art industrial solutions are outperformed by orders of magnitude in computational efficiency yet conserving accuracy. This is shown along a few prototypes towards the vision of a virtual maker space.
Polymatica is an OLAP and Data Mining server with hybrid CPU+GPU architecture which turns any analytical work on billions-records data volumes into a proactive process with no waitings. Polymatica architecture uses NVIDIA Multi-GPU (i.e. in DGX-1) in critical operations with billions of raw business data records. This allows to eliminate pauses and accelerate the speed of analytical operations for up to hundred times. You'll see the performance difference on the example of the real analytical process in retail on different hardware: 1) CPU-only calculations on 2*Intel Xeon, no GPU; 2) 2*Intel Xeon + single Tesla P100; 3) DGX-1: 2*Intel Xeon + 8*Tesla P100. Polymatica on DGX-1 become the fastest OLAP and Data Mining engine allowing advanced analytics on datasets of billions of records.
Come and learn how the grand challenge of controlling adaptive optics systems on future Extremely Large Telescopes is being solved using GPUs. As part of Green Flash, an international EU funded joint industrial and academic project, our team is developing solutions based on GPUs for the real-time control of large optical systems operating under tough operating environments. This includes the hard real-time data pipeline, the soft real-time supervisor module as well as a real-time capable numerical simulation to test and verify the proposed solutions. We will discuss how the unprecedented memory bandwidth provided by HBM2 on the new Pascal architecture is changing the game in dimensioning these complex real-time computers crunching up to 200 Gb/s of noisy data.
Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequential code, we guide the audience through the different steps that allowed us to increase performances on a GPU around 149 times the original runtime of the code (performances evaluated on a K20Xm). The same optimization strategy is applied to the CPU code and increases performances around 35 times the original run time (performances evaluated on a E5-1650v3 processor). Based on this methodology, we finally end up with an optimized unified version of the code which can run simultaneously on both GPU and CPU architectures.
Learn how large requests on big datasets, like production or finance data, can benefit from hybrid engine approaches for calculating on in-memory databases. While hybrid architectures are state-of-the-art in specialized calculation scenarios (e.g., linear algebra), multi-GPU or even multicore usage in database servers is still far from everyday use. In general, the approach to handle requests on large datasets would be scaling the database resources by adding new hardware nodes to the compute cluster. We use intelligent request planning and load balancing to distribute the calculations to multi-GPU and multicore engines in one node. These calculation engines are specifically designed for handling hundreds of millions of cells in parallel with minimal merging overhead.
This session will show how we combine high performance GPU processing with Deep Learning (DL). We use automated tomographic imaging microscopes for various studies in physics and biology. These systems have raw data flow of up to 2GB/s, making real-time (RT) data processing mandatory. To make the system more intelligent, an advanced processing pipeline must be incorporated. So far DL inference speed doesnt allow us to apply it to all the data. To address the problem, we are designing a hybrid system, that allows DL usage for high throughput microscopy in RT. Concepts and approaches that we use to design the system will be illustrated with examples from high energy physics and biology.
Online shopping is nothing if not efficient. Walmart together with new Jersey-startup Jet take things a step further, using AI and Deep Learning to optimize their entire E-Commerce business. The first AI application we discuss is Jet’s unique smart merchant selection: the platform finds the best merchant and warehouse combination in real time so that the total order cost is as low as possible. Then we show how to efficiently pack fresh and frozen orders with Deep Reinforcement Learning. The value of this approach is not just to find the best boxes and the tightest packing, but also the least amount of coolant and its placement so that the temperature of all items stays within the required limits during shipment.
Come join us, and learn how to build a data-centric GPU clusters for artificial intelligence. We will briefly present the state-of-the-art techniques for distributed Machine Learning, and the special requirements they impose on the GPU cluster. Additionally, we will present an overview of interconnect technologies used to scale and accelerate distributed Machine Learning. During the session we will cover RDMA, NVIDIA's GPUDirect RDMA and GPUDirect Asynch as well as in-network-computing and how the use of those technologies enables new level of scalability and performance in large scale deployments in artificial intelligence and high performance computing.
Calculation of surface normals can be crucial to the process of extracting useful information from point clouds. Surface normals give an estimate of the objects in te scene which might be of importance for more complex algorithms like feature extraction using machine learning techniques. In this poster, we present our implementation of normal estimation on a GPU and CPU and show results for both platforms. Through our implementation we show that GPU impementations can be up to an order of magnitude faster or more on a rather modest desktop Xeon workstation when compared to a GPU implementation on a Quadro M4000 graphics card. To substantiate our finding we also share profilign information and plots on the distribution of errors in our approach.
Deep learning optimization in real world applications is often limited by the lack of valuable data, either due to missing labels or the sparseness of relevant events (e.g. failures, anomalies) in the dataset. We face this problem when we optimize dispatching and rerouting decisions in the Swiss railway network, where the recorded data is variable over time and only contains a few valuable events. To overcome this deficiency we use the high computational power of modern GPUs to simulate millions of physically plausible scenarios. We use this artificial data to train our deep reinforcement learning algorithms to find and evaluate novel and optimal dispatching and rerouting strategies.
A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in this field require scalable compute resources or the use of advance data analytics methods (including deep learning) for processing extreme scale data volumes. GPUs are a key enabling technology and we will thus focus on the opportunities for using these for computing, data analytics and visualisation. GPU-accelerated servers based on POWER processors are here of particular interest due to the tight integration of CPU and GPU using NVLink and the enhanced data transport capabilities.
We present our experience of running computationally intensive camera-based perception algorithms on NVIDIA GPUs. Geometric (depth) and semantic (classification) information is fused in the form of semantic stixels, which provide a rich and compact representation of the traffic scene. We present some strategies to reduce the computational complexity of the algorithms. Using synthetic data generated by the SYNTHIA tool, including slanted roads from a simulation of San Francisco city, we evaluate performance latencies and frame rates on a DrivePX2-based platform.
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.
NVIDIA DGX Systems powered by Volta deliver breakthrough performance for today''s most popular deep learning frameworks. Attend this session to hear from DGX product experts and gain insights that will help researchers, developers, and data science practitioners accelerate training and iterate faster than ever. Learn (1) best practices for deploying an end-to-end deep learning practice, (2) how the newest DGX systems including DGX Station address the bottlenecks impacting your data science, and (3) how DGX software including optimized deep learning frameworks give your environment a performance advantage over GPU hardware alone.
The WCHG and BDI at the University of Oxford have an established research computing platform for genomics, statistical genetics and structural biology research and I will outline how we are developing this platform to include a significant GPU infrastructure to support our researchers great wave of enthusiasm for exploring the potential of deep learning and AI for life sciences research. We are deploying a mixture of GPU architectures and deep learning AI frameworks and I will report on our current plans the the initial areas of research in the life sciences that show promise for AI.