Unlike typical network adapters, Mellanox SmartNICs maximize the performance and agility of modern data centers without sacrificing efficiency. Mellanox ConnectX and BlueField SmartNICs offer state-of-the-art intelligent hardware offloads that accelerate a variety of Cloud workloads including AI/ML, HPC, Big Data, 5G core and edge services, and Cloud computing. In this session, learn how Mellanox SmartNICs together with NVIDIA GPUs push the envelope of Cloud Datacenter innovation to achieve ultimate performance, agility and efficiency.
Well demonstrate how to build a scalable, high-performance, data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, low-latency network interconnects for both InfiniBand and Ethernet. Well present state-of-the-art techniques for distributed machine learning, and explain what special requirements they impose on the system. There will be an overview of interconnect technologies used to scale and accelerate distributed machine learning. This will include RDMA, NVIDIAS GPUDIRECT technology and in-network computing platform, which is used to accelerate large-scale deployments in HPC and artificial intelligence.
Exploring the Best Server for AI Speaker: Samuel D. Matzek, Sr. Software Engineer Speaker: Maria Ward, IBM Accelerated Server Offering Manager Explore the server at the heart of the Summit and Sierra supercomputers, and the best server for AI. We will discuss the technical details that set this server apart and why it matters for your machine learning and deep learning workloads. IBM Cloud for AI at Scale Speaker: Alex Hudak, IBM Cloud Offering Manager AI is fast changing the modern enterprise with new applications that are resource demanding, but provide new capabilities to drive insight from customer data. IBM Cloud is partnering with NVIDIA to provide a world class and customized cloud environment to meet the needs of these new applications. Learn about the wide range of NVIDIA GPU solutions inside the IBM Cloud virtual and bare metal server portfolio, and how customers are using them across Deep Learning, Analytics, HPC workloads, and more. IBM Spectrum LSF Family Overview & GPU Support Speaker: Larry Adams, Global Architect - Cross Sector, Developer, Consultant, IBM Systems How to Fuel the Data Pipeline Speaker: Kent Koeninger, IBM IBM Storage Reference Architecture for AI with Autonomous Driving Speaker: Kent Koeninger, IBM
For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we're building on top of NGC to accelerate new algorithm development, and then a deep dive into a case study of the evolution of our cardiovascular ultrasound scanner and the underlying extensible software stack. It will contain 3 main parts as follows: (a) Cardiovascular ultrasound imaging from a user perspective. Which problems we need to solve for our customers. Impact of Cardiovascular disease in a global perspective (b) An introduction to the Vivid E95 and the cSound platform , GPU based real time image reconstruction & visualization. How GPU performance can be translated to customer value and outcomes and how this has evolved the platform during the last 2 ½ years. (c) Role of deep learning in cardiovascular ultrasound imaging, how we are integrating deep learning inference into our imaging system and preliminary results from automatic cardiac view detection.
NVSwitch on the DGX-2 is a super crossbar switch which greatly increases the application performance in several ways. First, it increases the problem size capacity traditionally limited by a single GPU's memory to the aggregate DGX-2 GPU memory of 512 GB. Second, NUMA-effects of traditional multi-GPU servers are greatly reduced, growing memory bandwidth with the number of GPUs. Finally, ease-of-use is simplified as apps written for a smaller number of GPUs can now be more easily ported with the large memory space.
OpenMP has a 20 year history in HPC, and been used by NERSC developers for node-level parallelization on several generations of NERSC flagship systems. Recent versions of the OpenMP specification include features that enable Accelerator programming generally, and GPU programming in particular. Given the extensive use of OpenMP on previous NERSC systems, and the GPU-based node architecture of NERSC-9, we expect OpenMP to be important in helping users migrate applications to NERSC-9. In this talk we'll give an overview of the current usage of OpenMP at NERSC, describe some of the new features we think will be important to NERSC-9 users, and give a high-level overview of a collaboration between NERSC and NVIDIA to enable OpenMP for GPUs in the PGI Fortran, C and C++ compilers.
The next big step in data science combines the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions for data science while taking advantage of GPU accelerated hardware commonly found in HPC centers. This session discusses RAPIDS, how to get started, and our roadmap for accelerating more of the data science ecosystem.
Exciting advances in technology have propelled AI computing to the forefront of mainstream applications. The desire to drive advanced visualization with photo realistic real-time rendering and efficient exa-scale class high performance computing fed with huge scale data collection have driven development of the key elements needed to build the most advanced AI computational engines. While these engines connected with advanced high speed busses like NVLINK are now providing true scalable AI computation within single systems, the challenge to break out of the box with large scale AI is upon us. In this talk we will discuss insights gained from creating NVIDIA's SATURNV AI Supercomputer enabling efficient use of this new class of dense AI computational engines and keys to optimizing data centers for GPU multi-node computing specifically targeted for today's neural net and HPC computing.
It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new generation of GPU-powered analytics platforms can enable enterprises from a range of verticals to dramatically accelerate the process of insight generation at scale. In particular, he will focus on how the key technical differentiators of GPUs: their massive computational bandwidth, fast memory, and native rendering pipeline, make them uniquely suited to allow analysts and data scientists to query, visualize and power machine learning over large, often high-velocity, datasets. Using the open source MapD analytics platform as an example, Todd will detail the technical approaches his team took to leverage the full parallelism of GPUs and demo how the platform allows analysts to interactively explore datasets containing tens of billions of records.
A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in this field require scalable compute resources or the use of advance data analytics methods (including deep learning) for processing extreme scale data volumes. GPUs are a key enabling technology and we will thus focus on the opportunities for using these for computing, data analytics and visualisation. GPU-accelerated servers based on POWER processors are here of particular interest due to the tight integration of CPU and GPU using NVLink and the enhanced data transport capabilities.
The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind this simulation. In particular we highlight and focus on those transformations and optimizations carried out to achieve a good performance on NVIDIA GPUs.
NAMD and VMD provide state-of-the-art molecular simulation, analysis, and visualization tools that leverage a panoply of GPU acceleration technologies to achieve performance levels that enable scientists to routinely apply research methods that were formerly too computationally demanding to be practical. To make state-of-the-art MD simulation and computational microscopy workflows available to a broader range of molecular scientists including non-traditional users of HPC systems, our center has begun producing pre-configured container images and Amazon EC2 AMIs that streamline deployment, particularly for specialized occasional-use workflows, e.g., for refinement of atomic structures obtained through cryo-electron microscopy. This talk will describe the latest technological advances in NAMD and VMD, using CUDA, OpenACC, and OptiX, including early results on ORNL Summit, state-of-the-art RTX hardware ray tracing on Turing GPUs, and easy deployment using containers and cloud computing infrastructure.
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPU with upstreamed improvements to community projects and with tools that are seeing broad interest and adoption. Furthermore, NVIDIA is acting as a catalyst for the broader community in enumerating key technical challenges for developers, admins and end users, and is helping to identify gaps and drive them to closure. This talk describes NVIDIA new developments and upcoming efforts. It outlines progress in the most important technical areas, including multi-node containers, security, and scheduling frameworks. It highlights the breadth and depth of interactions across the HPC community that are making the latest, highly-quality HPC applications available to platforms that include GPUs.
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment. The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. However, if the desired application is not available on NGC registry, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting. HPC Container Maker (HPCCM) is an open-source project that addresses the challenges of creating HPC application containers. Scott McMillan will present how HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image and will cover the best practices to minimize container development effort, minimize image size, and take advantage of image layering.
Neural Networks have capitalized on recent advances on HPC, GPUs, GPGPUs, and the rising amounts of publicly available labeled data. In doing so, NN have and will revolutionize virtually every current application domain, as well as enable novel ones such as those on recognition, autonomous, predictive, resilient, self-managed, adaptive, and evolving applications.
Nevertheless, it is to point out that NN training is rather resource intensive in data, time and energy; turning the resulting trained models into valuable assets represents an IP imperatively worth of being protected.
Furthermore, in the wake of Edge computing, NNs are progressively deployed across decentralized landscapes; as a consequence, IP owners are very protective of their NN based software products.
In this session, we propose to leverage Fully Homomorphic Encryption (FHE) to protect simultaneously the IP of trained NN based software and the input and the output data.
Within the context of a smart city scenario, we outline our NN model-agnostic approach, approximating and decomposing the NN operations into linearized transformations while employing a SIMD for vectorization.
For your business projects you want to rely on solid partners to master their development and deployment. How to avoid the nightmare of cost increase or exceeding deadlines? How to benefit from industrialized solutions, avoiding demos that have been freshly issued from labs?
In this session, you will learn how Atos, with a proven set of products and services, helps you accelerate your projects in HPC, enterprise and Internet of Things domains, from cloud to on-premises, from central to edge while leveraging the most powerful NVIDIA technologies.
Because AI applications and models rely on secure, reliable and up-to-date data, this session will also introduce how Atos is managing, updating and securing data and will end up with a presentation of operational applications in the domains of image recognition, video intelligence, prescriptive maintenance and cyber security.
The University of Queensland needed to solve problems at a scale that had never been contemplated before. Enormous challenges in the field of scientific research imaging, modeling and analysis on the path to cure diseases such as Alzheimer’s and increasingly demanding cases in machine vision for digital skin cancer pathology were all mounting up against traditional HPC infrastructure. UQ took a considered leap towards GPU. This is UQ's architectural journey – how it built one of the most successful supercomputing facilities the state had ever created, the ways in which key components and architectural choices play a pivotal role in artificial intelligence and inference solving performance and a “whole of system” balance approach to getting HPC “right” in the era of GPU. A presentation for C-level, AI practitioners and HPC professional attendees alike, this talk will provide something useful and refreshing for all that have the chance to attend.
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation, AI and computational tools to seek a new cure for cancer or predict hospitalisation prevention. This presentation will demonstrate visual analytics techniques that use various mixed reality approaches that link simulations, AI with collaborative, complex and interactive data exploration, placing the human-in-the-loop. In the recent days, thanks to advances in graphics hardware and compute power (especially GPGPU and modern Big Data / HPC infrastructures), the opportunities are immense, especially in improving our understanding of complex models that represent the real- or hybrid-worlds. Use cases will be drawn from ongoing research at CSIRO Data61, and the Expanded Perception and Interaction Centre (EPICentre) UNSW using world class GPU clusters and high-end visualisation capabilities. Highlight will be on Defence projects, Massive Graph Visualisation and Medicine.
Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning, and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology and in-network computing use to accelerates large scale deployments in HPC and artificial intelligence.
Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions.
AI is revolutionizing the $10T transportation industry. Every vehicle will be autonomous â cars, trucks, taxis, buses and shuttles. AI is core to enabling autonomous driving, but AI is also being applied to mobility, logistics, connected vehicles, connected factory, customer experience and a myriad of other use cases in Automotive. Come learn from experts at Audi, BMW and VW about how they are applying data ingestion, labeling, discovery and exploration to develop trained AI models with significant reductions in the time it takes due to GPU-accelerated computing infrastructures.
Come learn about Google Cloud solutions with NVIDIA GPUs. We will show why Google Cloud is the best choice to run your NVIDIA instances. You will learn how Google's fundamental principles around infrastructure, data intelligence, and openness help provide the best services for your HPC and ML deployments. In addition, we'll announce exciting details on our new NVIDIA GPU offerings. It is a talk that technical leaders, developers, data scientists, or anyone with a Cloud and GPU interest will not want to miss!
Zenotech Ltd are a UK based company developing the latest in Computation Fluid Dynamics solvers and Cloud based HPC systems. Their Computation Fluid Dynamics solver (zCFD) has been engineered to take full advantage of the latest developments in GPU technology. This talk will present the performance advantages that Zenotech see when using GPUs and the impact this has on their customers. It will showcase industrial problems that have been solved in a fast/cost effective manner using a combination of zCFD and the P100 and V100 GPU's available on AWS. Traditionally these cases are run on in-house parallel computing clusters but the larger number of GPUs per node with AWS have enable the solving of large CFD problems on a single instance. Benchmarking with zCFD demonstrates that a single P3 node is providing the equivalent performance to over 1100 CPU cores. As well as performance benefits, the spot market and on-demand nature of AWS provide some real cost savings for Zenotech's customers and opens up a scale of simulation that was previously not affordable. The session will present real world examples from Zenotech's customers in the aerospace, renewable and automotive sectors. The session will also show how Zenotech's EPIC platform makes the combination of zCFD and AWS GPUs a simple and cost effective solution for engineers.
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.
Migrating and building solutions in the cloud is challenging, expensive and not nearly as performant. Oracle Cloud Infrastructure (OCI) has been working with NVIDIA on giving you the on-premises performance you need with the cloud benefits and flexibility you expect. In this session we'll discuss how you can take big data and analytics workloads, database workloads, or traditional enterprise HPC workloads that require multiple components along with a portfolio of accelerated hardware and not only migrate them to the cloud, but run them successfully. We'll discuss solution architectures, showcase demos, benchmarks and take you through the cloud migration journey. We'll detail the latest instances that OCI provides, along with cloud-scale services.
Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background? NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to accelerate AI workflow deployment and time to insight. We'll discuss lessons learned about building, deploying, and managing AI infrastructure at scale from design to deployment to management and monitoring. We will show how the DGX Pod Management software (DeepOps) along with our storage partner reference-architectures can be used for the deployment and management of multi-node GPU clusters for Deep Learning and HPC environments, in an on-premise, optionally air-gapped datacenter. The modular nature of the software also allows experienced administrators to pick and choose items that may be useful, making the process compatible with their existing software or infrastructure.
Whether it's for AI, data science and analytics, or HPC, GPU-Accelerated software can make possible the previously impossible. But it's well known that these cutting edge software tools are often complex to use, hard to manage, and difficult to deploy. We'll exlain how NGC solves these problems and gives users a head start on their projects by simplifying the use of GPU-Optimized software. NVIDIA product management and engineering experts will walk through the latest enhancements to NGC and give examples of how software from NGC can improve GPU-accelerated workflows.
Learn how you can scale your Deep Learning & traditional HPC-based workloads in Azure using powerful NVIDIA Tesla-based GPUs and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. This is a great session to learn about Azure's accelerated offerings and roadmap in the future. This session will cover specific announcements on what's to come in both hardware and software. This is a session you don't want to miss!
Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ww look at how to integrate Heterogenous Computing with Cloud technologies.
Learn why Scyld Cloud Workstation, a browser-based, high-quality, low-bandwidth, 3D-accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated -- allowing for easy integration with industry security policies.
The Pascal generation of GPUs is bringing an increased compute density to data centers and NVLink on IBM Power 8 CPUs makes this compute density ever more accessible to HPC applications. However, reduced memory-to-compute ratios present some unique challenges for the cost of throughput-oriented compute. We'll present a case study of moving up production Monte Carlo GPU codes to IBM's "Minsky" S822L servers with NVIDIA Tesla P100 GPUs.
VDI users across multiple industries can now harness the power of the world's most advanced virtual workstation to enable increasingly demanding workflows. This session brings together graphics virtualization thought leaders and experts from across the globe who have deep knowledge of NVIDIA virtual GPU architecture and years of experience implementing VDI across multiple hypervisors. Panelists will discuss how they transformed organizations, including how they leveraged multi-GPU support to boost GPU horsepower for photorealistic rendering and data-intensive simulation and how they used GPU-Accelerated deep learning or HPC VDI environments with ease using NGC containers.
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC or Deep Learning job engines in conjunction with both Linux and Windows machines used for virtual desktop infrastructure. The demonstration will focus on a very minimal VMware vSphere cluster deployment using VSAN storage or RedHat RHVM cluster deployment to host both the Linux HPC multi node cluster for CUDA workloads and a VMware Horizon view or Citrix XenDesktop deployment for Linux and Windows Virtual Desktops performing DirectX, OpenGL, OpenCL, and CUDA based visualization workloads as used by engineering and analysis power users.
What if you could combine VDI, HPC, Deep Learning and AI all together on one platform with VMware vSphere 6.7 and NVIDIA virtual GPU (vGPU) technology? In this session, we'll guide you through how to set up a uniform, well-performing platform. We will cover the virtualisation of HPC, the sharing of compute resources with VDI, and the implementation of mixed workloads leveraging NVIDIA vGPU technology, and automation of the platform. If you want to have fun at work while preparing for the the future, don't miss this N3RD session!
With the latest release of NVIDIA vGPU software the world's most powerful virtual workstation gets even more powerful. Learn more about how our latest enhancements enable your data center to be more agile and scale your data center to meet the needs of thousands to ten-thousands and even hundreds of thousands of users. The newest release of NVIDIA virtual GPU software adds support for more powerful VMs, which can be managed from the cloud or from the on premises data center, or private cloud. With support for live migration of GPU-enabled VMs, IT can truly deliver high availability and a quality user experience. IT can further ensure they get the most out of their investments with the ability to re-purpose the same infrastructure that runs VDI during the day to run HPC and other compute workloads at night. In this session, we will unveil the new features of NVIDIA vGPU solutions and demonstrate how GPU virtualization enables you to easily support the most demanding users and scale virtualized, digital workspaces on an agile and flexible infrastructure, from the cloud and as well as the on premises data center.
Enabling access to NVIDIA compute acceleration is an key component of VMware's approach to enabling HPC and ML workloads on vSphere. In this talk we will discuss the various available options, provide performance results, and share some hardware and software tips and guidance to help you meet the needs of your organization's data scientists and researchers.
Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.
We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's most challenging science and national security problems. We will discuss how this multi-year effort paid off with exciting possibilities for new science. We'll also outline using GPUs in our traditional HPC platforms and workflows is adding an exciting new dimension to simulation-based science, prompting LLNL to rethink how we perform future simulations as intelligent simulation. We'll give an overview of the application preparation process that led up to Sierra's deployment, as well as look at current and planned research aimed at riding the AI and machine learning waves in pursuit of game-changing science.
Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.
For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions.
The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed and Unified memory support, datatype processing, and support for OpenPOWER and NVLink will be highlighted for HPC applications. We will also present novel designs and enhancements to the MPI library to boost performance and scalability of Deep Learning frameworks on GPU clusters. Container-based solutions for GPU-based cloud environment will also be highlighted.
AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we will work with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.
Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric models, and the mushrooming of data volumes. Our team at the National Center for Atmospheric Research is pursuing a hybrid approach to surmounting these barriers that combines machine learning techniques and GPU-acceleration to produce, we hope, a new generation of ultra-fast models of enhanced fidelity with nature and increased value to society.
The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, such as a specific astronomical event or scientific experiment. By combining vast amounts of labeled surrogate data with advanced few-shot learning, we have demonstrated success in leveraging small data in deep learning. In this talk, we will discuss these exciting results and explore the scientific innovations that made this possible.
Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with traditional scientific methods to push the state-of-the-art in many disciplines. We will provide an overview of some of the thirty projects we have stewarded, demonstrating how we have leveraged computing and analytics in fields as diverse as ultrasensitive detection to metabolomics to atmospheric science.
We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML framework on commodity Azure VMs that scales to tens of terabytes and thousands of cores, while achieving better accuracy than state-of-the-art.
PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 institutions. Bridges emphasizes "nontraditional" uses that span the life, physical, and social sciences, engineering, and business, many of which are based on AI or AI-enabled simulation. We describe the characteristics of Bridges that have made it a success, and we highlight several inspirational results and how they benefited from the system architecture. We then introduce "Bridges AI", a powerful new addition for balanced AI capability and capacity that includes NVIDIA's DGX-2 and HPE NVLink-connected 8-way Volta servers.
Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We''ll also discuss the expected performance in combination with the new computing platforms that emerged last year.
This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.
Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.
Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the importance of the necessary reshaping of the architecture of the software using its GPU accelerated analytics. Minsky featuring far better GPU memory bandwidth and GPU-CPU interconnect rase the bar even further. Murex will show how it has handled this new challenge for its business.
HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation power usually involve heterogeneous computation resources, e.g., GPUs and InfiniBand as installed on HPE Apollo. NovuMinds NovuForce system leveraging state of art technologies make the deployment and configuration procedure fast and smooth. NovuForce deep learning softwares within the docker image has been optimized for the latest technology like NVIDIA Pascal GPU and infiniband GPUDirect RDMA. This flexibility of the software, combined with the broad GPU servers in HPE portfolio, makes one of the most efficient and scalable solutions.
Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.
We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us satisfy strict real-time restrictions dictated by the imaging hardware used to scan the rock samples. We present a detailed description of the workflow from a DRP approach perspectives, our algorithm and optimization techniques and performance results on the latest NVIDIA GPU generations.
In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guidance and prepare the scientific communities to challenges of exascale architectures. The talk will present the OpenPOWER platform bought by GENCI and provided to the scientific community. Then, it will present the first results obtained on the platform for a set of about 15 applications using all the solutions provided to the users (CUDA,OpenACC,OpenMP,...). Finally, a presentation about one specific application will be made regarding its porting effort and techniques used for GPUs with both OpenACC and OpenMP.
Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes hand in hand with the increasing demand of Wireless-VR Hardware solutions, what hardware is out on the market now, what is coming and how can wireless solutions - whether standalone devices or Addons - create a higher value for your VR application? How large-scale location based VR and hardware manufacturers are expanding the boundaries of the VR industry, both for Entertainment and B2B?
With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughput of multiple applications, covering all aspects of the workflow, including data analysis and visualisation. We will discuss ongoing efforts to further integrate these extreme-scale compute and data services with infrastructure services of the cloud. As Tier-0 systems of PRACE, Piz Daint is accessible to all scientists in Europe and worldwide. It provides a baseline for future development of exascale computing. We will present a strategy for developing exascale computing technologies in domains such as weather and climate or materials science.
The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release.