SEARCH SESSIONS
SEARCH SESSIONS

Search All
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Topic(s) Filter: HPC and AI, HPC and Supercomputing, Life & Material Science
Presentation
Media
Abstract:
Researchers at Purdue University utilize GPUs for accelerating science for modeling, simulation, and for the university's data science initiative. In this presentation, Executive Director for Research Computing Preston Smith will discuss h ...Read More
Abstract:
Researchers at Purdue University utilize GPUs for accelerating science for modeling, simulation, and for the university's data science initiative. In this presentation, Executive Director for Research Computing Preston Smith will discuss how GPU technology supports Purdue researchers taking the next Giant Leap in HPC and data science.
 
  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1902
Streaming:
Download:
Share:
 
Abstract:
The NVIDIA Deep Learning Institute (DLI) offers hands-on training in AI and accelerated computing to solve real-world problems. Developers, data scientists, researchers, and students can get practical experience powered by GPUs in the cloud and earn ...Read More
Abstract:
The NVIDIA Deep Learning Institute (DLI) offers hands-on training in AI and accelerated computing to solve real-world problems. Developers, data scientists, researchers, and students can get practical experience powered by GPUs in the cloud and earn a certificate of competency to support professional growth. DLI offers self-paced, online training for individuals, instructor-led workshops for teams, and downloadable course materials for university educators. The DLI University Ambassador Program enables qualified educators to teach DLI workshops at university campuses and academic conferences to faculty, students, and researchers at no cost, complementing the traditional theoretical approaches to university education in machine learning, data science, AI, and parallel computing.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1903
Streaming:
Download:
Share:
 
Abstract:
The largest supercomputers in the world are now designed with AI in mind, and enterprise and AI research systems are being designed more like supercomputers. Scaling is a core part of modern AI frameworks and methodologies. Designing and running a co ...Read More
Abstract:
The largest supercomputers in the world are now designed with AI in mind, and enterprise and AI research systems are being designed more like supercomputers. Scaling is a core part of modern AI frameworks and methodologies. Designing and running a compute infrastructure that provides maximum performance with constantly changing software stacks at a limited cost for scale-out is a challenge. While single GPU and single machine performance is now easy to attain, larger scale systems on the order of 1500+ GPUs require focusing on large numbers of putative bottlenecks at the design stage and also in everyday operations. These issues are usually ignored by users and software developers and therefore have to be made as transparent and efficient as possible: interconnect design, filesystem performance, power balancing, thermal control, job scheduling, software for management, and software versatility in a stable and reliable, yet high performance, environment have to be addressed. In this talk, we will showcase the Superpod as an example of rapid time to floor for an AI performance infrastructure, and how modern AI frameworks and models are built to leverage these systems.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1906
Streaming:
Download:
Share:
 
Abstract:
In modeling chemical spaces for drug design, catalysis, etc, one is always struggling with the fact that very accurate calculations, based on quantum chemistry, have a very high computational cost. Cheaper approximations to true quantum chemistry are ...Read More
Abstract:
In modeling chemical spaces for drug design, catalysis, etc, one is always struggling with the fact that very accurate calculations, based on quantum chemistry, have a very high computational cost. Cheaper approximations to true quantum chemistry are less accurate, so there are less useful. We show that a deep learning algorithm, running on NVIDIA GPUS, can be trained to a large database of quantum energies for small molecules, and the resulting networks are at the same time highly accurate and extremely fast, with speedups versus quantum chemistry up to 10^8. This breakthrough opens the possibility of modeling processes such as drug binding and reaction modeling at a cost and accuracy previously thought impossible to achieve.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1905
Streaming:
Share:
 
Abstract:
On the road to Exa-scale supercomputers, there is a need to solve new challenges. This requires new system architectures moving from homogenous CPU centered systems to heterogeneous systems. New processing engines such as GPUs enables greater process ...Read More
Abstract:
On the road to Exa-scale supercomputers, there is a need to solve new challenges. This requires new system architectures moving from homogenous CPU centered systems to heterogeneous systems. New processing engines such as GPUs enables greater processing power, and at the same time the network becomes a more important part of the system. With a better co-design, the network is required to perform smarter and more complex operations beyond the traditional data movement. In this talk we will present how BlueField smart networking devices can change the boundaries between CPU and network and between software and hardware to enable greater scalability and performance for your supercomputer.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1904
Streaming:
Share:
 
Abstract:
The IceCube Neutrino Observatory is the National Science Foundations (NSF)s premier facility to detect neutrinos with energies above approximately 10 GeV and a pillar for NSFs Multi-Messenger Astrophysics (MMA) program, one of NSFs 10 Big Ideas. T ...Read More
Abstract:
The IceCube Neutrino Observatory is the National Science Foundations (NSF)s premier facility to detect neutrinos with energies above approximately 10 GeV and a pillar for NSFs Multi-Messenger Astrophysics (MMA) program, one of NSFs 10 Big Ideas. The detector is located at the geographic South Pole and is designed to detect interactions of neutrinos of astrophysical origin by instrumenting over a gigaton of polar ice with 5160 optical sensors. The sensors are buried between 1450 and 2450 meters below the surface of the South Pole ice sheet. To understand the impact of ice properties on the incoming neutrino detection, and origin, photon propagation simulations on GPUs are used. We report on a few hour GPU burst across Amazon Web Services, Microsoft Azure, and Google Cloud Platform that harvested all available for sale GPUs across the three cloud providers the weekend before SC19. GPU types span the full range of generations from NVIDIA GRID K520 to the most modern NVIDIA T4 and V100. We report the scale and science performance achieved across all the various GPU types, as well as the science motivation to do so.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1907
Streaming:
Download:
Share:
 
Abstract:
Merlin is a workflow framework that enables orchestrating multi-machine, multi-batch, large-scale science simulation ensembles with in-situ postprocessing, deep learning at-scale using LBANN, surrogate model driven sampling and data exploration. This ...Read More
Abstract:
Merlin is a workflow framework that enables orchestrating multi-machine, multi-batch, large-scale science simulation ensembles with in-situ postprocessing, deep learning at-scale using LBANN, surrogate model driven sampling and data exploration. This session describes how MERLIN was used to combine hydrodynamics simulations for ICF with LBANN for training and iterative feedback.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1908
Streaming:
Share:
 
Abstract:
Learn the best way to introduce Tensor Core acceleration in HPC applications, followed by quick introduction on Tensor Core architecture and functionality. This session will also present case studies of HPC applications using Tensor Core.
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1909
Streaming:
Download:
Share:
 
Abstract:
Supercomputing is playing a key role in our efforts to understand complex biological systems. To date we have performed calculations on the Summit supercomputer at OLCF with two different algorithms achieving 2.41 exaflops and 2.32 exaflops of mixed ...Read More
Abstract:
Supercomputing is playing a key role in our efforts to understand complex biological systems. To date we have performed calculations on the Summit supercomputer at OLCF with two different algorithms achieving 2.41 exaflops and 2.32 exaflops of mixed precision performance. The larger of these calculations required 22 Zeta-floating point-operations to achieve. The cost of generating biological data is dropping exponentially, resulting in increased data that has far outstripped the predictive growth in computational power from Moores Law. This flood of data has opened a new era of systems biology in which there are unprecedented opportunities to gain insights into complex biological systems. Integrated biological models need to capture the higher order complexity of the interactions among cellular components. Solving such complex combinatorial problems will give us extraordinary levels of understanding of biological systems. Paradoxically, understanding higher order sets of relationships among biological objects leads to a combinatorial explosion in the search space of biological data. These exponentially increasing volumes of data, combined with the desire to model more and more sophisticated sets of relationships within a cell, across an organism and up to ecosystems and, in fact, climatological scales, have led to a need for computational resources and sophisticated algorithms that can make use of such datasets. The result is a comprehensive systems biology model of an organism and how it has adapted to and responds to its abiotic and biotic environment which has applications in bioenergy, precision agriculture, and ecosystem studies among other disciplines.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1910
Streaming:
Share:
 
Abstract:
A new and expanded GPU port of the Vienna Ab initio Simulation Package (VASP) for atomic scale materials modeling is now available. VASP is one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamic ...Read More
Abstract:
A new and expanded GPU port of the Vienna Ab initio Simulation Package (VASP) for atomic scale materials modeling is now available. VASP is one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. The blocked Davidson algorithm (including exact exchange) and RMM-DIIS for the real-space projection scheme were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload because VASP is otherwise written entirely in Fortran. The new approach using OpenACC directives combined with calling NVIDIA libraries allowed us to extend GPU acceleration to the reciprocal projection and to direct minimizers with higher productivity and increased maintainability of the code because we can focus on a unified code base for the CPU and GPU version of VASP. It also allowed us to offer first-day GPU acceleration for features newly introduced into VASP, including adaptively compressed exchange (ACE) and a double-buffering implementation for hybrid DFT calculations. The performance relative to the CUDA port and CPU version will be discussed, as will the strong scaling with multiple GPUs. The performance we are able to deliver and the vastly decreased maintenance effort has led the VASP group to adopt OpenACC as the programming model for all future GPU porting of VASP.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1911
Streaming:
Download:
Share:
 
Abstract:
Well share early experiences using the PGI Fortran and C++ compilers on Arm and Rome processor-based systems with V100 GPUs. The PGI compilers support GPU Acceleration using OpenACC and CUDA. Support for mapping standard Fortran and C++ parallelism ...Read More
Abstract:
Well share early experiences using the PGI Fortran and C++ compilers on Arm and Rome processor-based systems with V100 GPUs. The PGI compilers support GPU Acceleration using OpenACC and CUDA. Support for mapping standard Fortran and C++ parallelism to GPUs and CPUs is coming soon. Well give an update on these PGI compiler features and more that simplify your on-ramp to GPU computing and maximize portability of your GPU-accelerated applications across all major CPU families.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1912
Streaming:
Download:
Share:
 
Abstract:
The announcement of the support of CUDA in the Arm ecosystem is opening up a new path forward for energy efficient AI-enabled supercomputing. NVIDIA and Arm are working together to provide the complete software stack for AI and HPC and Arms compil ...Read More
Abstract:
The announcement of the support of CUDA in the Arm ecosystem is opening up a new path forward for energy efficient AI-enabled supercomputing. NVIDIA and Arm are working together to provide the complete software stack for AI and HPC and Arms compilers and tools for HPC developers will be available on this platform. We will demonstrate how to migrate applications to Arm + CUDA including tuning and optimizing the results with Arm Allinea Studio.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1913
Streaming:
Download:
Share:
 
Abstract:
Research is evolving to be even more data-centric. AI is driving this change and is increasingly enabling breakthroughs. For analyzing large data, AI is helping to spot correlations and identify anomalies. For simulation and modeling, AI is reducing ...Read More
Abstract:
Research is evolving to be even more data-centric. AI is driving this change and is increasingly enabling breakthroughs. For analyzing large data, AI is helping to spot correlations and identify anomalies. For simulation and modeling, AI is reducing time to solution by orders of magnitude by replacing expensive computation with fast inferencing. This talk describes two unique platforms at the Pittsburgh Supercomputing Center that combine AI and HPC, at no cost for research and education. Bridges-AI, available today and an AI-focused extension to the Bridges supercomputer, features an NVIDIA DGX-2 and HPE Apollo 6500 servers, with 88 Volta GPUs total. Bridges-2 will build on Bridges and Bridges-AI to serve AI and AI-enabled simulation of tomorrow. To illustrate the systems impact, we will detail use cases in genomics and medical imaging, weather forecasting, agricultural sustainability, and other fields. Learn whats possible, how to get access, and of opportunities for collaboration.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1914
Streaming:
Share:
 
Abstract:
We are working with NVIDIA to lower the barrier of scientific understanding by improving the communication tools that scientists will have access to. NVIDIA has been working with Kitware to not only bring NVIDIA RTX support to ParaView, but allow Par ...Read More
Abstract:
We are working with NVIDIA to lower the barrier of scientific understanding by improving the communication tools that scientists will have access to. NVIDIA has been working with Kitware to not only bring NVIDIA RTX support to ParaView, but allow ParaView users to access the omniverse. Come see how advancements in ParaView will unlock the next generation of visualization communication/collaboration techniques for your science.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1915
Streaming:
Download:
Share:
 
Abstract:
Come and learn about the wide range of Developer Tools that enables you to harness even more power out of the NVIDIA GPUs, and the latest new capabilities.
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1916
Streaming:
Download:
Share:
 
Abstract:
Discover how Open OnDemand (OOD) can help lower the barrier to entry and ease access to computing resources for both new and existing users of HPC, big data, and analytics. OOD is an NSF-funded open-source HPC portal whose goal is to provide an easy ...Read More
Abstract:
Discover how Open OnDemand (OOD) can help lower the barrier to entry and ease access to computing resources for both new and existing users of HPC, big data, and analytics. OOD is an NSF-funded open-source HPC portal whose goal is to provide an easy way for system administrators to provide web access to their HPC resources and is in use at dozens of HPC centers. This presentation will touch upon the capabilities and architecture of OOD, installation experiences, priority of upcoming features such as customized workflows, training users, integration with other science gateways, and growing the community. Special emphasis will be on the joint efforts between NVIDIA engineers and the OOD project team to provide GPU specific metrics, accessibility and workflows to facilitate utilization of GPUs in HPC environments.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1917
Streaming:
Download:
Share:
 
Abstract:
Connectomics is the study of how neurons are connected to each other in the brain. The neural connectivity of a brain can be thought of as a circuit; connectomics aims to discover and understand this circuit. Brain tissue must be imaged at very ...Read More
Abstract:

Connectomics is the study of how neurons are connected to each other in the brain. The neural connectivity of a brain can be thought of as a circuit; connectomics aims to discover and understand this circuit. Brain tissue must be imaged at very high levels of resolution to capture the relevant cellular structures. At a characteristic resolution of 4x4x40nm, a cubic millimeter of tissue consists of 1.56 x 10^15 voxels (3D pixels). Therefore, connectomics requires the gathering and processing of vast amounts of data A volume of brain tissue is typically processed by slicing it and imaging the slices with an electron microscope. After this, its connectivity can be reconstructed by determining which voxels belong to the same neuron, and detecting which neurons make synapses onto each other. At the SeungLab, we use convolutional neural networks, amongst other algorithms, to perform these tasks. Our largest reconstruction thus far is a petabyte-scale dataset, which we processed by distributing work across thousands of nodes and GPUs in the cloud. This process took approximately 1.5 months and results in a reconstruction with on the order of 10^5 neurons and 10^9 synapses. Manuel will present an overview of the computational reconstruction pipeline, with an emphasis on use of distributed computing and storage.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1918
Streaming:
Download:
Share:
 
Abstract:
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the ...Read More
Abstract:
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss early results from our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we are working with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1919
Streaming:
Download:
Share:
 
Abstract:
Many have speculated that combining GPU computational power with machine learning algorithms could radically improve weather and climate modeling. This talk will discuss an experimental project centered on the Model for Prediction Across Scales-Atmos ...Read More
Abstract:
Many have speculated that combining GPU computational power with machine learning algorithms could radically improve weather and climate modeling. This talk will discuss an experimental project centered on the Model for Prediction Across Scales-Atmosphere (MPAS-A) to evaluate this programs prospects of success. Initially, the project set out to determine whether CPU-GPU performance portability could be attained in a single MPAS-A source code by applying OpenACC directives. The initial porting project is nearing completion, and is showing scalability and throughput performance competitive with other the state-of-the-art models. At the same time, machine learning scientists at NCAR and elsewhere began looking at the piecemeal replacement of atmospheric parameterizations with machine-learning emulators. This talk will present results from efforts at NCAR to apply machine learning to emulate the atmospheric surface layer and cloud microphysics parametizations. The talk will also discuss related efforts to tackle radiative transport and other physics components, and will conclude with our own future plans to emulate the complex chemistry of aerosol formation.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1920
Streaming:
Download:
Share:
 
Abstract:
Reservoir simulation is an important component in the recovery of oil and gas from subsurface reservoirs. Reservoir engineers use simulators to understand geology, quantify uncertainty and then optimize production strategy. Typically simulations run ...Read More
Abstract:
Reservoir simulation is an important component in the recovery of oil and gas from subsurface reservoirs. Reservoir engineers use simulators to understand geology, quantify uncertainty and then optimize production strategy. Typically simulations run for hours to days and many simulations are needed to generate reliable forecasting of oil and gas production. As such performance is key. INTERSECT is a highly advanced reservoir simulator and mature product which has been widely deployed by large number of clients worldwide. In this presentation we discuss how we accelerated INTERSECT with GPUs based on NVIDIAs AMGX library. We discuss the challenges of integrating GPUs in an existing source code and the various design choices and trade-offs that were made along the way. Furthermore, we compare the performance of INTERSECT with and without GPU acceleration.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1921
Share:
 
Abstract:
GPUDirect Storage is a new technology that enables a direct data path between storage devices and the GPU. Eliminating unnecessary memory copies through the CPU, boosts bandwidth, lowers latency, and reduced CPU and GPU overhead. It is the easiest wa ...Read More
Abstract:
GPUDirect Storage is a new technology that enables a direct data path between storage devices and the GPU. Eliminating unnecessary memory copies through the CPU, boosts bandwidth, lowers latency, and reduced CPU and GPU overhead. It is the easiest way to scale performance when IO to the GPU is a bottleneck. In this talk well explain the technology its benefits and explain the end to end use cases. We will also introduce distributed file systems partners supporting GPUDirect Storage.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1922
Streaming:
Download:
Share:
 
Abstract:
At the brink of Exascale its clear that massive parallelism at the node level is the path forward. Scientists and engineers need highly productive programming environments to speed their time to discovery on todays HPC systems. In addition to the r ...Read More
Abstract:
At the brink of Exascale its clear that massive parallelism at the node level is the path forward. Scientists and engineers need highly productive programming environments to speed their time to discovery on todays HPC systems. In addition to the requirements this puts on compilers and software development tools, researchers must shore up their skills in parallel and accelerated computing in order to be ready for the Exascale era. Join Jack Wells, Director of Science at the ORNL Leadership Computing Facility and Vice President of the OpenACC Organization, as he discusses plans to help the HPC developer community take advantage of todays fastest supercomputers and prepare for Exascale through hands-on training and education in state-of-the art programming techniques in 2020 and beyond. Jack will give an overview of how the OpenACC organization mission is expanding to meet these needs and building on its philosophy of a user-driven OpenACC specification to create a bridge to heterogeneous programming using parallel features in standard C++ and Fortran.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1923
Streaming:
Download:
Share:
 
Abstract:
NGC is a container registry of GPU-optimized software for AI frameworks, HPC applications, and scientific visualization tools that eliminate complex application installations and provide easy access to the latest versions of the software. Simply pull ...Read More
Abstract:
NGC is a container registry of GPU-optimized software for AI frameworks, HPC applications, and scientific visualization tools that eliminate complex application installations and provide easy access to the latest versions of the software. Simply pull and run the applications on Docker or Singularity. We will discuss the expansion of NGC with new containers, support for ARM, pre-trained AI models, and deployment tools that simplify the use of NGC offering in HPC environments.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1924
Streaming:
Download:
Share:
 
Abstract:
Julia, through its abstractions, makes reuse of code possible at a high level for CPUs and GPUs. We will demonstrate, through simple examples, how Julia makes general purpose coding of GPUs possible.
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1925
Streaming:
Download:
Share:
 
Abstract:
We will present a campaign to investigate the use of supersonic retropropulsion as a means to land payloads on Mars large enough to enable human exploration. Simulations are performed on the worlds largest supercomputer, Summit, located at Oak Ridge ...Read More
Abstract:
We will present a campaign to investigate the use of supersonic retropropulsion as a means to land payloads on Mars large enough to enable human exploration. Simulations are performed on the worlds largest supercomputer, Summit, located at Oak Ridge National Laboratory. The engineering and computational challenges associated with retropropulsion aerodynamics and the need for large-scale resources like Summit are reviewed. For these simulations, a GPU implementation of NASA Langley Research Center's FUN3D flow solver is used. The development history, performance, and scalability are compared with those of contemporary HPC architectures. The use of an optimized GPU-accelerated CFD solver on Summit has enabled simulations well beyond conventional computing paradigms.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1926
Streaming:
Download:
Share:
 
Abstract:
The Weather Company (TWC), an IBM Business, runs the worlds first weather prediction system - IBM GRAF (Global High-Resolution Atmospheric Forecast System) driven by MPAS - providing global, rapidly-updating weather forecasts at a very high resolut ...Read More
Abstract:
The Weather Company (TWC), an IBM Business, runs the worlds first weather prediction system - IBM GRAF (Global High-Resolution Atmospheric Forecast System) driven by MPAS - providing global, rapidly-updating weather forecasts at a very high resolution. This presentation will describe how a cluster of IBM AC922 servers with NVIDIA Volta V100 GPUs delivers weather forecasts that are run globally with a unique ability to predict events that are hyper-local like thunderstorms and updated in minutes versus hours. In order to exploit the capabilities of the high-performance cluster, MPAS was ported to run very efficiently on hundreds of interconnected CPUs and GPUs. Further, the presentation will show how output derived from the weather prediction system is used to aid business decision-making in areas such as commercial aviation, energy and agriculture.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1927
Streaming:
Download:
Share:
 
Abstract:
The National Energy Technology Laboratory (NETL) has been exploring the use of TensorFlow (TF) for general scientific and engineering computations within High Performance Computing (HPC) environments which might include Machine Learning (ML). For ins ...Read More
Abstract:
The National Energy Technology Laboratory (NETL) has been exploring the use of TensorFlow (TF) for general scientific and engineering computations within High Performance Computing (HPC) environments which might include Machine Learning (ML). For instance, NETL recently developed a novel stiff chemistry solver implemented in TF and achieved an ~300x speed up over LSODA serial and ~35x speedup over LSODA parallel. Further, NETL recently developed a TF based single-phase fluid solver and achieved ~3.1x improvement over 40 ranks of MPI on CPU (Benchmarking results will be presented at DOEs theater in a related talk). Researchers at NETL have found that TF is an incredibly easy to use tensor algebra package that supports the highest performing hardware in the world, runs efficiently, and is easy to interface with existing HPC software. This talk will reveal the recently developed methodology NETL is using to accelerate Computational Fluid Dynamics (CFD) and add Machine Learning. NETL will discuss how to set up a computation workflow, several gotcha issues and how to deal with them, how to integrate ML into the workflow, how to run a TF graph from an existing application, and how to call an existing application from within a TF graph.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1928
Streaming:
Download:
Share:
 
Abstract:
AI techniques have been around for more than 5 decades, but only in the 2000s, have we seem neural networks have commercial use and machine learning techniques to start surpassing traditional methods in picture recognition, natural language processin ...Read More
Abstract:
AI techniques have been around for more than 5 decades, but only in the 2000s, have we seem neural networks have commercial use and machine learning techniques to start surpassing traditional methods in picture recognition, natural language processing and other tasks. Probably the most important piece enabling AI was the use of GPUs to train modela, enabling great speedup compared to CPUs. Running distributed machine learning on a large number of GPUs requires movement of a large amount of data between the GPUs or between GPUs and the Parameter-Server imposing heavy load on the interconnect, which now becomes the new bottleneck. Creating an efficient system for distributed machine learning requires not only the best processing engines and latest GPU model but also requires an efficient high performance interconnect technology to enable efficient utilization of the GPUs and near-linear scaling. Mellanox focuses on CPU offload technologies designed to process data as it moves through the network, either by the Host Channel Adapter of the switch. This frees up CPU and GPU cycles for computation, reduces the amount of data transferred over the network, allows for efficient pipelining of network and computation, and provides for very low communication latencies and overheads. We will present the special requirements imposed on the interconnect by the distributed machine learning applications, and describe the latest interconnect technologies allowing efficient data transfer and processing.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1929
Streaming:
Download:
Share:
 
Abstract:
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applicat ...Read More
Abstract:
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax=b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16 and FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4×speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1930
Streaming:
Share:
 
Abstract:
Over the past year numerous updates to the CUDA platform have been released for libraries, language and system software. These target a range of diverse features from mixed precision solvers to scalable programming models to memory management to appl ...Read More
Abstract:
Over the past year numerous updates to the CUDA platform have been released for libraries, language and system software. These target a range of diverse features from mixed precision solvers to scalable programming models to memory management to applications of ray tracing in numerical methods. This talk will present a tour of all thats new and how to take advantage of it.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1931
Streaming:
Download:
Share:
 
Abstract:
The U.S. Department of Energys Exascale Computing Project (ECP) established two co-design centers that are focused on data sciences, ExaGraph and ExaLearn. This talk will provide a brief overview of recent accomplishments by PNNL teams that are supp ...Read More
Abstract:
The U.S. Department of Energys Exascale Computing Project (ECP) established two co-design centers that are focused on data sciences, ExaGraph and ExaLearn. This talk will provide a brief overview of recent accomplishments by PNNL teams that are supported by these two co-design centers. The ExaGraph team designed and developed a scalable hybrid CPU-GPU influence maximization algorithm called CuRipples, and has collected performance measurements on three different state-of-the-art multi-GPU systems. This talk will also present recent ExaLearn team results from the application of convolutional neural networks to study clusters of water molecules as a graph structure.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1932
Streaming:
Share:
 
Abstract:
We introduce cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor o ...Read More
Abstract:
We introduce cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor operations such as tensor contractions (a generalization of matrix-matrix multiplications), element-wise tensor operations such as tensor permutations, and tensor reductions. While providing high performance, cuTENSOR also allows users to express their mathematical equations for tensors in a straight-forward way that hides the complexity of dealing with these high-dimensional objects behind an easy-to-use API.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1933
Download:
Share:
 
Abstract:
The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural networ ...Read More
Abstract:
The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this talk we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also 16x faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1934
Streaming:
Share:
 
Abstract:
We will provide an overview of the operational status of the GPU and including it in the decisions of the job scheduler is useful for providing the users the optimal job placement. It is also beneficial for administrators to grasp the usage of GPU re ...Read More
Abstract:
We will provide an overview of the operational status of the GPU and including it in the decisions of the job scheduler is useful for providing the users the optimal job placement. It is also beneficial for administrators to grasp the usage of GPU resources for planning resource allocation. Information Technology Center (ITC), The University of Tokyo, Hewlett-Packard Enterprise (HPE) and Altair Engineering, Inc. set up monitoring GPUs using NVIDIA Data Center GPU Manager (DCGM) and have deployed on Reedbush system.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1935
Streaming:
Download:
Share:
 
Abstract:
The parallel algorithms that were introduced in C++17 were designed to support GPU parallel programming. We have implemented these parallel algorithms in the PGI C++ compiler for NVIDIA GPUs, making it possible in some cases to run standard C++ on GP ...Read More
Abstract:
The parallel algorithms that were introduced in C++17 were designed to support GPU parallel programming. We have implemented these parallel algorithms in the PGI C++ compiler for NVIDIA GPUs, making it possible in some cases to run standard C++ on GPUs with no directives, pragmas, or annotations, and with performance similar to other GPU programming models. We will share our experiences and performance results, and explain the capabilities of the PGI implementation.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1936
Streaming:
Download:
Share:
 
Abstract:
Big data pipelines are emerging as a common approach to scientific data analysis, in which large amounts of data are put through multiple stages of processing. In many cases, these pipelines can take advantage of hardware accelerators such as GPUs, a ...Read More
Abstract:
Big data pipelines are emerging as a common approach to scientific data analysis, in which large amounts of data are put through multiple stages of processing. In many cases, these pipelines can take advantage of hardware accelerators such as GPUs, and they can be run on a variety of systems ranging from local workstations to HPC clusters and cloud platforms. Here we present a big data pipeline for KINC, a network construction tool from the field of bioinformatics, we demonstrate how it can be run seamlessly on any of the aforementioned systems, and we demonstrate how it can take advantage of a GPU cluster to achieve massive speedup.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1937
Streaming:
Download:
Share:
 
Abstract:
Squeeze the most performance from your machine learning & GPU Pods with Ethernet switches that are built to accelerate these workloads. An AI-optimized switch delivers more than just low latency and high packet rates. An AI-optimized switch will ...Read More
Abstract:
Squeeze the most performance from your machine learning & GPU Pods with Ethernet switches that are built to accelerate these workloads. An AI-optimized switch delivers more than just low latency and high packet rates. An AI-optimized switch will have advanced congestion handling for RDMA, easy ROCE configuration, and great workload-specific telemetry functionality - like Mellanoxs What Just Happened visibility innovation.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1938
Streaming:
Download:
Share:
 
Abstract:
Reliance Jio has 350 million subscribers, Oracle Cloud and NVIDIA GPUs enable Jio to provide new and innovative ways to interact with their customers. Across multiple products, Reliance uses Bare Metal GPUs on Oracle cloud to answer over 300 million ...Read More
Abstract:
Reliance Jio has 350 million subscribers, Oracle Cloud and NVIDIA GPUs enable Jio to provide new and innovative ways to interact with their customers. Across multiple products, Reliance uses Bare Metal GPUs on Oracle cloud to answer over 300 million utterances and train with over 100 million parameters. With Speech to text, text to speech, and Natural Language Processing, Reliance has reduced their training time by 2.4X. Reliances Jio interact uses SparkML for an ensemble model to create the worlds first AI based video call platform. In this session Oracle will present the framework that they have used to enable their customers, demonstrate how Oracle Cloud Infrastructure has decreased time to market and reduced overall ML costs.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1940
Streaming:
Download:
Share:
 
Abstract:
Microsoft and NVIDIA are continuously partnering to provide HPC customers with tools to push the boundaries of innovation while saving time and money. Come learn how recently announced state-of-the-art infrastructure on Azure powered by large cluster ...Read More
Abstract:
Microsoft and NVIDIA are continuously partnering to provide HPC customers with tools to push the boundaries of innovation while saving time and money. Come learn how recently announced state-of-the-art infrastructure on Azure powered by large cluster of NVIDIA V100 Tensor Core GPUs, connected by IB empower HPC practitioners to scale their HPC runs to hundreds of GPUs, solving problems previously unattainable.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1941
Streaming:
Share:
 
Abstract:
CUDA C++ is an extension of the ISO C++ language which allows you to use familiar C++ tools to write parallel programs that run on GPUs. The depth and breadth of C++ language support has been a major element of CUDAs roadmap, with a number of signif ...Read More
Abstract:
CUDA C++ is an extension of the ISO C++ language which allows you to use familiar C++ tools to write parallel programs that run on GPUs. The depth and breadth of C++ language support has been a major element of CUDAs roadmap, with a number of significant features enabled in the most recent release. In this example-oriented talk well give an in-depth review of the newest and upcoming C++ capabilities, and explain how they can be used to build complex concurrent data structures and enable new classes of applications on modern NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1942
Streaming:
Download:
Share:
 
Abstract:
Simulation on NVIDIA GPUs on AWS is a game-changer for aerodynamic development in the automotive industry. The team at Altair believe this evolution in development will help to optimize fuel efficiency further and improve the range of electric vehicl ...Read More
Abstract:
Simulation on NVIDIA GPUs on AWS is a game-changer for aerodynamic development in the automotive industry. The team at Altair believe this evolution in development will help to optimize fuel efficiency further and improve the range of electric vehicles while allowing for flexibility in the choices and changes made by stylists. The resulting computational cost savings that can be achieved are also significant: In this session Altair and AWS talk about how using ultraFluidX on GPUs instead of a CPU-based CFD solver, Volkswagen could save up to 70 percent of its existing hardware cost.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1943
Streaming:
Share:
 
Abstract:
Well discuss the plan for the human exploration of Mars, which will require the retropropulsive deceleration of payloads. Conventional computational capabilities dont allow for the simulation of interactions between the atmosphere and retropropulsi ...Read More
Abstract:
Well discuss the plan for the human exploration of Mars, which will require the retropropulsive deceleration of payloads. Conventional computational capabilities dont allow for the simulation of interactions between the atmosphere and retropropulsion exhaust plumes at sufficient spatial resolution to resolve governing phenomena with a high level of confidence. Researchers from NASA Langley and Ames Research Centers, NVIDIA Corporation, and Old Dominion University have developed a GPU-accelerated version of Langley's FUN3D flow solver. An ongoing campaign on the Summit supercomputer at Oak Ridge National Lab is using this capability to apply detached eddy simulation methods to retropulsion in atmospheric environments for nominal operation of a human-scale Mars lander concept. Well give an overview of the Mars lander fluid dynamics project, the history and details of FUN3D GPU development, and the optimization and performance of the code on emerging HPC architectures.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91220
Download:
Share:
 
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resou ...Read More
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resources from Google Cloud to supply their users with the computing power they need, from exploration and modeling to visualization.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91040
Streaming:
Download:
Share:
 
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to ...Read More
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to exceed the limits of a single GPU and how they can reduce computational time for larger problem sizes.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9158
Streaming:
Download:
Share:
 
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the ...Read More
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the CGYRO code, built by researchers at General Atomics to effectively and efficiently simulate plasma evolution over multiple scales that range from electrons to heavy ions. Fusion plasma simulations are compute- and memory-intensive and usually run on leadership-class, GPU-Accelerated HPC systems like Oak Ridge National Laboratory's Titan and Summit. We'll explain how we designed and implemented CGYRO to make good use of the tens of thousands of GPUs on such systems, which provide simulations that bring us closer to fusion as an abundant clean energy source. We'll also share benchmarking results of both CPU- and GPU-Based systems.  Back
 
Topics:
HPC and AI, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9202
Streaming:
Download:
Share:
 
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different ...Read More
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9476
Streaming:
Download:
Share:
 
Abstract:
Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frame ...Read More
Abstract:

Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9501
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, port ...Read More
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, porting, and performance tuning to achieve performance on NVIDIA Tesla V100 GPUs, while maintaining portability to our other supported platforms. We'll explain why this process poses many challenges and how LLNL code teams have worked with the Sierra Center of Excellence to build experience and expertise in porting complex multi-physics simulation tools to NVIDIA GPU-Based HPC systems. We'll also provide an overview of this porting process, the abstraction technologies employed, lessons learned, and current challenges.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9512
Streaming:
Download:
Share:
 
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We ...Read More
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.  Back
 
Topics:
HPC and AI, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share:
 
Abstract:
We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's ...Read More
Abstract:

We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's most challenging science and national security problems. We will discuss how this multi-year effort paid off with exciting possibilities for new science. We'll also outline using GPUs in our traditional HPC platforms and workflows is adding an exciting new dimension to simulation-based science, prompting LLNL to rethink how we perform future simulations as intelligent simulation. We'll give an overview of the application preparation process that led up to Sierra's deployment, as well as look at current and planned research aimed at riding the AI and machine learning waves in pursuit of game-changing science.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9560
Streaming:
Download:
Share:
 
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those r ...Read More
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those running on a new breed of dense, GPU-Accelerated servers such as the Summit and Sierra supercomputers and the NVIDIA DGX line of servers.  Back
 
Topics:
HPC and AI, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9653
Streaming:
Download:
Share:
 
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thous ...Read More
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thousands of GPUs. We'll also discuss the state of integration of NCCL in deep learning frameworks.  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9656
Streaming:
Share:
 
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate com ...Read More
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth bound and codes like Breadth First Search that have a dynamic communication pattern.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9677
Streaming:
Download:
Share:
 
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density func ...Read More
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density functional theory methods. We'll show how our work is applicable to different atom types and architectures and how it avoids relying on the physical model. Instead, it uses a purely mathematical representation, which reduces the need for human intervention.  Back
 
Topics:
HPC and AI, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9843
Streaming:
Download:
Share:
 
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the r ...Read More
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the recent trend and solutions for accelerating materials discovery and discuss future prospects.  Back
 
Topics:
HPC and AI, Industrial Inspection
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9967
Streaming:
Download:
Share:
 
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it wil ...Read More
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it will lead the future memory trend owing to the growth in AI, ML, and HPC applications. We will discuss a technical overview of HBM technology and the future trends of HBM.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9978
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time whe ...Read More
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time when Moore's Law is tapering off, and the slowdown in the speed of increase single-threaded performance, thus requiring a new compute paradigm, accelerated computing, powered by massively parallel GPUs.  Back
 
Topics:
HPC and AI, AI & Deep Learning Research
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9981
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the ...Read More
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the complete requirements of the data life cycle including initial acquisition, processing, inference, long-term storage, and driving data back into the field to sustain ever-growing processes of improvement. As the data landscape evolves with emerging requirements, the relationship between compute and data is undergoing a fundamental transition. We will provide examples of data life cycles in production triggering diverse architectures from turnkey reference systems with DGX and DDN A3I to tailor-made solutions.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9983
Streaming:
Share:
 
Abstract:
Understanding the emergence of nuclear physics from the underlying fundamental theory of strong interactions with Quantum chromodynamics (QCD) requires the fastest supercomputers. We will describe the role of QCD in the evolution of our universe ...Read More
Abstract:

Understanding the emergence of nuclear physics from the underlying fundamental theory of strong interactions with Quantum chromodynamics (QCD) requires the fastest supercomputers. We will describe the role of QCD in the evolution of our universe and discuss how we use the latest supercomputers, such as Summit at Oak Ridge National Laboratory and Sierra at Lawrence Livermore National Laboratory, to address basic questions such as why does the universe contain more matter than antimatter? Looking towards the exascale era, we can dream of tackling more complex questions related to the rate of protons fusing to helium in the sun and the state of matter in extreme conditions such as neutron stars. We'll explain why making the most of these new computers will require clever software to take advantage of the heterogeneous architectures. We'll also describe some advances in optimized use of GPUs, as well as management of the complex set of tasks required.

  Back
 
Topics:
HPC and Supercomputing, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91010
Streaming:
Download:
Share:
 
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll ...Read More
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll discuss new CUDA Fortran features, including Tensor Core support and cooperative groups, and we'll cover our current work on half-precision. We'll explain new OpenACC 2.7 features, along with Beta true deep-copy directives and support for OpenACC programs on unified memory systems. The PGI compiler-assisted software testing feature helps determine where differences arise between CPU and GPU versions of a program or when porting to a new system. Learn upcoming projects, which include a high-performance PGI Subset of OpenMP for NVIDIA GPUs, support for GPU programming with standard C++17 parallel STL and Fortran, and incorporating GPU-Accelerated math libraries to support porting and optimization of HPC applications on NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9289
Streaming:
Download:
Share:
 
Abstract:
Learn about the opportunities and pitfalls of running billion-atom science at scale on ORNL's Summit, the world's fastest GPU-Accelerated supercomputer. We'll talk about the latest performance improvements and scaling results for NAM ...Read More
Abstract:

Learn about the opportunities and pitfalls of running billion-atom science at scale on ORNL's Summit, the world's fastest GPU-Accelerated supercomputer. We'll talk about the latest performance improvements and scaling results for NAMD, a highly parallel molecular dynamics code and one of the first codes to run on Summit. NAMD performs petascale biomolecular simulations — these have included 64 million-atom model of the HIV virus capsid — and previously ran on the GPU-Accelerated Cray XK7 Blue Waters and ORNL Titan machines. Summit features IBM POWER9 CPUs, NVIDIA Volta GPUs, and the NVLink CPU-GPU interconnect.

  Back
 
Topics:
HPC and Supercomputing, Computational Biology & Chemistry
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9302
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how FPGAs are changing as a result of new technology such as the Open CL high-level programming language, hard floating-point units, and tight integration with CPU cores. Traditionally energy-efficient FPGAs were considered notoriously ...Read More
Abstract:
We'll discuss how FPGAs are changing as a result of new technology such as the Open CL high-level programming language, hard floating-point units, and tight integration with CPU cores. Traditionally energy-efficient FPGAs were considered notoriously difficult to program and unsuitable for complex HPC applications. We'll compare the latest FPGAs to GPUs, examining the architecture, programming models, programming effort, performance, and energy efficiency by considering some real applications.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9338
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good ...Read More
Abstract:
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale.  Back
 
Topics:
HPC and Supercomputing, Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9412
Streaming:
Download:
Share:
 
Abstract:
Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models f ...Read More
Abstract:

Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models for implementing efficient tasking frameworks. Participants will learn about the pitfalls for tasking arising from the architectural differences between latency-driven CPUs and throughput-driven GPUs. To overcome these pitfalls, we consider programming concepts such as persistent threads, warp-aware data structures and CUDA asynchronous task graphs. In addition, we look at the latest GPU features such as forward progress guarantees and grid synchronization that facilitate the implementation of tasking approaches. A task-based fast multipole method for the molecular dynamics package GROMACS serves as use case for our considerations.

  Back
 
Topics:
HPC and Supercomputing, Computational Biology & Chemistry, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9548
Streaming:
Download:
Share:
 
Abstract:
Large-scale scientific endeavors often focus on improving predictive capabilities by challenging theory-driven simulations with experimental data. We'll describe our work at LLNL using advances in deep learning, computational workflows, and computer ...Read More
Abstract:
Large-scale scientific endeavors often focus on improving predictive capabilities by challenging theory-driven simulations with experimental data. We'll describe our work at LLNL using advances in deep learning, computational workflows, and computer architectures to develop an improved predictive model the learned predictive model. We'll discuss necessary advances in machine learning architectures and methods to handle the challenges of ICF science, including rich, multimodal data (images, scalars, time series) and strong nonlinearities. These include advances in the scalability of our deep learning toolkit LBANN, an optimized asynchronous, GPU-Aware communication library, and a state-of-the-art scientific workflows. We'll also how the combination of high-performance NVLINK and the rich GPU architecture of Sierra enables us to train neural networks efficiently and begin to develop learned predictive models based on a massive data set.  Back
 
Topics:
HPC and Supercomputing, Accelerated Data Science, Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9565
Streaming:
Share:
 
Abstract:
We'll showcase the latest successes with GPU acceleration of challenging molecular simulation analysis tasks on the latest Volta and Turing GPUs paired with both Intel and IBM/OpenPOWER CPUs on petascale computers such as ORNL Summit. This presentat ...Read More
Abstract:
We'll showcase the latest successes with GPU acceleration of challenging molecular simulation analysis tasks on the latest Volta and Turing GPUs paired with both Intel and IBM/OpenPOWER CPUs on petascale computers such as ORNL Summit. This presentation will highlight the performance benefits obtained from die-stacked memory, NVLink interconnects, and the use of advanced features of CUDA such as just-in-time compilation to increase the performance of key analysis algorithms. We will present results obtained with OpenACC parallel programming directives, as well as discuss current challenges and future opportunities. We'll also describe GPU-Accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations. To make our tools easy to deploy for non-tradtional users of HPC, we publish GPU-Accelerated container images in NGC, and Amazon EC2 AMIs for GPU instance types.  Back
 
Topics:
HPC and Supercomputing, In-Situ & Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9594
Streaming:
Download:
Share:
 
Abstract:
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acce ...Read More
Abstract:
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acceleration for LQCD applications like MILC and Chromaon on six generations of GPUs with various network and node configurations, including IBM POWER9- and x86-based systems and NVLink- and NVSwitch-based systems. Based on those experiences, we'll discuss best practices for scaling to hundreds and even thousands of GPUs. We'll also cover peer-to-peer memory access, GPUDirect RDMA, and NVSHMEM, as well as the techniques such as auto-tuning kernel launch configurations.  Back
 
Topics:
HPC and Supercomputing, Performance Optimization, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9708
Streaming:
Download:
Share:
 
Abstract:
Confused about how Unified Memory works on modern GPU architectures? Did you try Unified Memory some time ago and never wanted to return to it? We'll explain how the last few generations of GPU architectures and software improvements have opened up ...Read More
Abstract:
Confused about how Unified Memory works on modern GPU architectures? Did you try Unified Memory some time ago and never wanted to return to it? We'll explain how the last few generations of GPU architectures and software improvements have opened up new ways to manage CPU and GPU memories. We will dive into the advantages and disadvantages of various OS and CUDA memory allocators, explore how memory is managed by the driver, and examine user controls to tune it. Learn about software enhancements for Unified Memory developed over the past year, how HMM is different from ATS, and how to use Unified Memory with multiple processes.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9727
Streaming:
Download:
Share:
 
Abstract:
We'll present an overview of the upcoming NERSC9 system architecture, throughput model, and application readiness efforts.
 
Topics:
HPC and Supercomputing, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9809
Streaming:
Download:
Share:
 
Abstract:
Learn about algorithm design, implementation, and optimization techniques to accelerate large-scale, phase-field molecular dynamics simulations on a GPU platform. Numerical simulations of phase-field equations are conventionally performed by stencil ...Read More
Abstract:
Learn about algorithm design, implementation, and optimization techniques to accelerate large-scale, phase-field molecular dynamics simulations on a GPU platform. Numerical simulations of phase-field equations are conventionally performed by stencil computation with a very small time-step size and low efficiency. We'll describe how we designed an efficient, GPU friendly algorithm that combines a large step-size exponential time integrator with domain decomposition and localization of matrix exponentials. By using this algorithm with optimization techniques on a single GPU and multiple GPU platforms, we achieved a 50X increase in simulation speed over the conventional stencil computing approach. We'll also discuss GPU-Accelerated molecular dynamics simulation, with a focus on parallel strategies of atomic partition and spatial partition. We demonstrated efficiency on dissipative particle dynamics and free-energy calculations on GPU devices for hundreds of millions of particles.  Back
 
Topics:
HPC and Supercomputing, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9851
Streaming:
Download:
Share:
 
Abstract:
在极端高温高密条件下,囚禁在强子袋中的夸克和胶子会被解禁闭,变得自由而形成一种新的物质形态——夸克胶子等离子体。一般认为大爆炸后几个毫秒内早期宇宙就处于夸克胶子等离子相。 格点量子色动力学是唯一的从第一性原理出发的、可以用来研究强子相到夸 克胶子等离子体相的转变的理论方法。格点量子色动力学的研究可为理解美国布鲁克海文国家实验室的相对论重离子对撞机 (RHIC) 和瑞士 CERN 的大型强子对撞机 (LHC) 的上的相对论重离子碰撞实验结果提供重要的理论依据。在这次报告中,我将首先介绍相对论重离 子碰撞物理和格点量子色动力学,然后回顾利用格点量子色动力学在研究热密物质性质方 面取得的进展,并阐述图形处理器在我们研究中所起的作用。 ...Read More
Abstract:
在极端高温高密条件下,囚禁在强子袋中的夸克和胶子会被解禁闭,变得自由而形成一种新的物质形态——夸克胶子等离子体。一般认为大爆炸后几个毫秒内早期宇宙就处于夸克胶子等离子相。 格点量子色动力学是唯一的从第一性原理出发的、可以用来研究强子相到夸 克胶子等离子体相的转变的理论方法。格点量子色动力学的研究可为理解美国布鲁克海文国家实验室的相对论重离子对撞机 (RHIC) 和瑞士 CERN 的大型强子对撞机 (LHC) 的上的相对论重离子碰撞实验结果提供重要的理论依据。在这次报告中,我将首先介绍相对论重离 子碰撞物理和格点量子色动力学,然后回顾利用格点量子色动力学在研究热密物质性质方 面取得的进展,并阐述图形处理器在我们研究中所起的作用。  Back
 
Topics:
HPC and Supercomputing, Computational Physics, Physics Simulation
Type:
Talk
Event:
GTC China
Year:
2018
Session ID:
CH8401
Download:
Share:
 
Abstract:
(1)基于 GPU 实现高性能并行计算,打破油气藏数值模拟需要粗化的传统研究流程,可以直接模拟无需粗化的高精度的精细油气藏模型,大幅提高精细油气藏研究成果质量,对油气田开发生产实现增储上产、提高采收率影响意义深远; ...Read More
Abstract:
(1)基于 GPU 实现高性能并行计算,打破油气藏数值模拟需要粗化的传统研究流程,可以直接模拟无需粗化的高精度的精细油气藏模型,大幅提高精细油气藏研究成果质量,对油气田开发生产实现增储上产、提高采收率影响意义深远;
(2)基于 GPU 实现高性能超级运算数字岩石技术,利用高分辨率电镜和 MicroCT,采集到纳米级(10 – 9 米)地下岩石的矿物类型、含量、结构、能谱等信息,数据量庞大,常规工作站无法满足大图像文件三维处理及可视化研究需求,利用先进的数字岩心处理算法,以及 GPU 超级并行运算能力,可快速提取微观岩心中的各项信息,有力促进了油田开发后期油气藏认识。
(3)基于 GPU 强大的计算和渲染能力,打造广泛应用于 AI、石油、天然气和采矿业等专业领域中的面向对象的应用程序编程接口、可拓展架构以及一整套先进庞大的组件,为软件开发者提供一个完美的高级研发平台。  Back
 
Topics:
HPC and Supercomputing, Climate, Weather & Ocean Modeling
Type:
Talk
Event:
GTC China
Year:
2018
Session ID:
CH8402
Share:
 
Abstract:
This presentation will communicate selected, early results from application readiness activities at the Oak Ridge Leadership Computing Facility (OLCF), in preparation for Summit, the Department of Energy, Office of Science new supercomputer oper ...Read More
Abstract:

This presentation will communicate selected, early results from application readiness activities at the Oak Ridge Leadership Computing Facility (OLCF), in preparation for Summit, the Department of Energy, Office of Science new supercomputer operated by Oak Ridge National Laboratory. With over 9,000 POWER9 CPUs and 27,000 V100 GPUs, high-bandwidth data movement, and large node-local memory, Summit's architecture is proving to be effective in advancing performance across diverse applications in traditional modeling and simulation, high-performance data analytics, and artificial intelligence. These advancements in application performance are being achieved with small increases in Summit's electricity consumption as compared with previous supercomputers operated at OLCF.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1806
Download:
Share:
 
Abstract:
Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as ...Read More
Abstract:

Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1807
Download:
Share:
 
Abstract:
For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inacc ...Read More
Abstract:

For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1810
Download:
Share:
 
Abstract:
The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed a ...Read More
Abstract:

The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed and Unified memory support, datatype processing, and support for OpenPOWER and NVLink will be highlighted for HPC applications. We will also present novel designs and enhancements to the MPI library to boost performance and scalability of Deep Learning frameworks on GPU clusters. Container-based solutions for GPU-based cloud environment will also be highlighted.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1812
Download:
Share:
 
Abstract:
AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the ...Read More
Abstract:

AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1814
Download:
Share:
 
Abstract:
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs o ...Read More
Abstract:

The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we will work with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1816
Download:
Share:
 
Abstract:
Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric mod ...Read More
Abstract:

Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric models, and the mushrooming of data volumes. Our team at the National Center for Atmospheric Research is pursuing a hybrid approach to surmounting these barriers that combines machine learning techniques and GPU-acceleration to produce, we hope, a new generation of ultra-fast models of enhanced fidelity with nature and increased value to society.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1818
Download:
Share:
 
Abstract:
The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, suc ...Read More
Abstract:

The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, such as a specific astronomical event or scientific experiment. By combining vast amounts of labeled surrogate data with advanced few-shot learning, we have demonstrated success in leveraging small data in deep learning. In this talk, we will discuss these exciting results and explore the scientific innovations that made this possible.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1822
Download:
Share:
 
Abstract:
Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with tra ...Read More
Abstract:

Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with traditional scientific methods to push the state-of-the-art in many disciplines. We will provide an overview of some of the thirty projects we have stewarded, demonstrating how we have leveraged computing and analytics in fields as diverse as ultrasensitive detection to metabolomics to atmospheric science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1823
Download:
Share:
 
Abstract:
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 Tensor Core GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was twea ...Read More
Abstract:

We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 Tensor Core GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1825
Download:
Share:
 
Abstract:
The use of low-precision arithmetic in computing methods has been a powerful tool to accelerate numerous scientific computing applications including Artificial Intelligence. We present an investigation showing that other HPC applications can har ...Read More
Abstract:

The use of low-precision arithmetic in computing methods has been a powerful tool to accelerate numerous scientific computing applications including Artificial Intelligence. We present an investigation showing that other HPC applications can harness this power too, and in particular, the general HPC problem of solving Ax = b, where A is a large dense matrix, and the solution is needed in FP64 accuracy. Our approach is based on the mixed-precision (FP16->FP64) iterative refinement technique – we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly-tuned implementations where we show how the use of FP16-TC (tensor cores) arithmetic can provide up to 4X speedup and improve the energy consumption by a factor of 5 achieving 74 Gflop/Watt. This is due to the performance boost that the FP16 (Tensor Cores) provide and to its better accuracy that outperforms the classical FP16.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1826
Download:
Share:
 
Abstract:
Detailed knowledge of application workload characteristics can optimize performance of current and future systems. This may sound daunting, with many HPC data centers hosting over 2,000 users running thousands of applications and millions of job ...Read More
Abstract:

Detailed knowledge of application workload characteristics can optimize performance of current and future systems. This may sound daunting, with many HPC data centers hosting over 2,000 users running thousands of applications and millions of jobs per month. XALT is an open source tool developed at the Texas Advanced Computing Center (TACC) that collects system usage information to quantitatively report how users are using your system. This session will explore the benefits of detailed application workload profiling and how the XALT tool has helped leading supercomputing sites unlock the power of their application usage data.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1830
Download:
Share:
 
Abstract:
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NIVIDA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism ...Read More
Abstract:
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NIVIDA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. 
  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1833
Download:
Share:
 
Abstract:
Summit, the world fastest supercomputer, located in the Oak Ridge Leadership Computing Facility at the DOE Oak Ridge National Laboratory will provide unprecedented computational resources for open science supported by the DOE user programs. The ...Read More
Abstract:

Summit, the world fastest supercomputer, located in the Oak Ridge Leadership Computing Facility at the DOE Oak Ridge National Laboratory will provide unprecedented computational resources for open science supported by the DOE user programs. The unique aspects of its GPU-accelerated architecture are reviewed in this presentation. The collaborative efforts to prepare scientific modeling and simulation as well as data-intensive computing applications to take advantage of the architectural features of Summit are highlighted, and early scientific results enabled by the porting and development work presented.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1839
Download:
Share:
 
Abstract:
We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML frame ...Read More
Abstract:

We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML framework on commodity Azure VMs that scales to tens of terabytes and thousands of cores, while achieving better accuracy than state-of-the-art. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1842
Download:
Share:
 
Abstract:
Integrated biological models need to capture the higher order complexity in the interactions that occur among cellular components. A full model of all of the higher order interactions of cellular and organismal components is one of the ultimat ...Read More
Abstract:
Integrated biological models need to capture the higher order complexity in the interactions that occur among cellular components. A full model of all of the higher order interactions of cellular and organismal components is one of the ultimate grand challenges of systems biology. The ability to build such comprehensive models will usher in a new era in biology. Success in the construction and application of computational algorithms will enable new insights into the molecular mechanisms responsible for complex biological systems and related emergent properties; using technologies not previously available on a scale not feasible before. A full systems biology model of all of the higher order interactions of cellular and organismal components would lead to breakthroughs, which would have profound effects on the field. 
  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1809
Download:
Share:
 
Abstract:
PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 insti ...Read More
Abstract:

PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 institutions. Bridges emphasizes "nontraditional" uses that span the life, physical, and social sciences, engineering, and business, many of which are based on AI or AI-enabled simulation. We describe the characteristics of Bridges that have made it a success, and we highlight several inspirational results and how they benefited from the system architecture. We then introduce "Bridges AI", a powerful new addition for balanced AI capability and capacity that includes NVIDIA's DGX-2 and HPE NVLink-connected 8-way Volta servers. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1832
Download:
Share:
 
Abstract:
We'll discuss parallel implementations to resampling techniques, commonly used in particle filtering, and their performance on NVIDIA GPUs, including the embedded TX2. A novel parallel approach to implementing systematic and stratified schem ...Read More
Abstract:

We'll discuss parallel implementations to resampling techniques, commonly used in particle filtering, and their performance on NVIDIA GPUs, including the embedded TX2. A novel parallel approach to implementing systematic and stratified schemes is the highlight, but we'll also feature an optimized version of the Metropolis resampling technique. There are two main challenges that have been addressed: Traditional systematic and stratified techniques are serial by nature, but our approach breaks the algorithm up in a way to allow implementation on a GPU while producing identical results to the serial method. Secondly, while the Metropolis method is well suited for a GPU, its naive implementation does not utilize coalesced accesses to global memory.

  Back
 
Topics:
HPC and Supercomputing, Accelerated Data Science
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8105
Streaming:
Share:
 
Abstract:
The vision of the Exascale Computing Project, initiated in 2016 as a formal U.S. Department of Energy project executing through 2022, is to accelerate innovation with exascale simulation and data science solutions. After a brief overview of this ...Read More
Abstract:

The vision of the Exascale Computing Project, initiated in 2016 as a formal U.S. Department of Energy project executing through 2022, is to accelerate innovation with exascale simulation and data science solutions. After a brief overview of this, we will give illustrative examples on how the ECP teams are leveraging, exploiting, and advancing accelerated-node software technologies and applications on hardware such as the powerful GPUs provided by NVIDIA. We will summarize best practices and lessons learned from these accelerated-node experiences along with ECP's plans moving into the exascale era, which is on the now near-term horizon.

These solutions will enhance U.S. economic competitiveness, change our quality of life, and strengthen our national security. ECP's mission is to deliver exascale-ready applications and solutions that address currently intractable problems of strategic importance and national interest; create and deploy an expanded and vertically integrated software stack on DOE HPC exascale and pre-exascale systems, defining the enduring US exascale ecosystem; and leverage U.S. HPC vendor research activities and products into DOE HPC exascale systems. The project is a joint effort of two DOE programs: the Office of Science Advanced Scientific Computing Research Program and the National Nuclear Security Administration Advanced Simulation and Computing Program. ECP's RD&D activities, which encompass the development of applications, software technologies, and hardware technologies and architectures, is carried out by over 100 small teams of scientists and engineers from the DOE national laboratories, universities, and industries.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8152
Streaming:
Share:
 
Abstract:
Most AI researchers and industry pioneers agree that the wide availability and low cost of highly-efficient and powerful GPUs and accelerated computing parallel programming tools (originally developed to benefit HPC applications) catalyzed the m ...Read More
Abstract:

Most AI researchers and industry pioneers agree that the wide availability and low cost of highly-efficient and powerful GPUs and accelerated computing parallel programming tools (originally developed to benefit HPC applications) catalyzed the modern revolution in AI/Deep Learning. Now, AI methods and tools are starting to be applied to HPC applications to great effect. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8175
Streaming:
Share:
 
Abstract:
For the first time in human history we're using Artificial Intelligence technology to automate tasks and decision-making, and in most cases we don't understand how the technology works. This lack of understanding creates distrust and can ...Read More
Abstract:

For the first time in human history we're using Artificial Intelligence technology to automate tasks and decision-making, and in most cases we don't understand how the technology works. This lack of understanding creates distrust and can disenfranchise the users the technology is intended to benefit the most. This is compounded in highly-regulated spaces, such as the U.S. government. In this session, we'll cover the shortcomings of how Machine Learning and AI technologies are being applied in the USG today and how you can establish a trusted environment for successful human and machine collaboration.

  Back
 
Topics:
HPC and Supercomputing, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8214
Streaming:
Download:
Share:
 
Abstract:
The Hartree Centre, a department of the UK National Labs, focusses on industry-led challenges in HPC, High Performance Data Analytics, and AI. Its mission is to make UK industry more competitive through the uptake of novel technologies. Historic ...Read More
Abstract:

The Hartree Centre, a department of the UK National Labs, focusses on industry-led challenges in HPC, High Performance Data Analytics, and AI. Its mission is to make UK industry more competitive through the uptake of novel technologies. Historically the focus has been on HPC (simulation and modelling), and more recently on data centric computing. This sessions focuses on on how AI can best be applied to add value for industry partners.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8105
Streaming:
Download:
Share:
 
Abstract:
In pushing the limits of throughput of floating-point operations, GPUs have become a unique technology. During this session, we'll explore the current state of affairs from an application perspective. For this, we'll consider different c ...Read More
Abstract:

In pushing the limits of throughput of floating-point operations, GPUs have become a unique technology. During this session, we'll explore the current state of affairs from an application perspective. For this, we'll consider different computational science areas including fundamental research on matter, materials science, and brain research. Focusing on key application performance characteristics, we review current architectural and technology trends to derive an outlook towards future GPU-accelerated architectures.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8108
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and about their individual advantages. All programmin ...Read More
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and about their individual advantages. All programming models will be introduced using the example of applying a domain decomposition strategy.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8121
Streaming:
Download:
Share:
 
Abstract:
In this talk attendees will learn how key algorithms for Numerical Weather Prediction were ported to the latest GPU technology and the substantial benefits gained from doing so. We will showcase the power of individual Voltas and the impressive perfo ...Read More
Abstract:
In this talk attendees will learn how key algorithms for Numerical Weather Prediction were ported to the latest GPU technology and the substantial benefits gained from doing so. We will showcase the power of individual Voltas and the impressive performance of the cutting edge DGX-2 server with multiple GPUS connected by a high speed interconnect.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8195
Streaming:
Download:
Share:
 
Abstract:
In this session, we will describe how we successfully extended a large legacy fortran code to GPUs using OpenACC. Based on a state of the art code for combustion simulation AVBP (http://www.cerfacs.fr/avbp7x/), our objective is to keep the code as si ...Read More
Abstract:
In this session, we will describe how we successfully extended a large legacy fortran code to GPUs using OpenACC. Based on a state of the art code for combustion simulation AVBP (http://www.cerfacs.fr/avbp7x/), our objective is to keep the code as simple as possible for the AVBP community while taking advantage of high end computing resources as GPU; OpenACC allows the flexibility to conduct the extension with respect to these constraints. This session will present the various strategies we tried during the refactoring of the application, including the limitations of the directive-only approach which can severely impair performance on particular parts of the code. The lessons learned are applicable to a wide range of codes in the research community.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8217
Streaming:
Download:
Share:
 
Abstract:
This session will describe strategies to achieve an efficient implementation of a parallel high fidelity CFD solver that runs on GPUs. The solver is based on a nodal Discontinuous Galerkin Flux Reconstruction spatial discretisation. The strong data l ...Read More
Abstract:
This session will describe strategies to achieve an efficient implementation of a parallel high fidelity CFD solver that runs on GPUs. The solver is based on a nodal Discontinuous Galerkin Flux Reconstruction spatial discretisation. The strong data locality of the resulting scheme makes it very attractive to be implemented on GPUs. Details of the implementation of the most time consuming kernels are provided, putting emphasis on the extensive use of the GPU shared memory to minimize the memory access time penalty. The communications between GPUs also play a big role in the solver parallel performance. The benefits of overlapping communication and computation will also be quantified. The resulting solver is able to perform LES and DNS simulations of low pressure turbines blades within engine design time scales.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8223
Download:
Share:
 
Abstract:
Come and learn how GPUs help identify biological activity on nearby exoplanets. Deployed on on the Japanese Subaru telescope at 4,200m elevation atop Maunakea, Hawaii, the GPU hardware technology constitutes the backbone of the adaptive optics, which ...Read More
Abstract:
Come and learn how GPUs help identify biological activity on nearby exoplanets. Deployed on on the Japanese Subaru telescope at 4,200m elevation atop Maunakea, Hawaii, the GPU hardware technology constitutes the backbone of the adaptive optics, which drives the real-time correction of the optical aberrations introduced by Earth's atmosphere. Using machine learning technique and advanced linear algebra algorithms accelerated by GPUs, a predictive control problem can now be solved at the multi-kHz frame rate required to keep up with turbulence changes. This represents the first successful on-sky result of this approach for exoplanet imaging.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8251
Streaming:
Download:
Share:
 
Abstract:
Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations for simulation-based design. This talk gives insights into the key ingredients of academic and commercial GPU-accelerated CFD solvers and di ...Read More
Abstract:
Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations for simulation-based design. This talk gives insights into the key ingredients of academic and commercial GPU-accelerated CFD solvers and discusses the technical and physical challenges of (near) real-time simulations of complex flows. Then, ultraFluidX is presented, a recently released commercial GPU-based CFD solver. The solver was specifically designed to leverage the massively parallel architecture of GPUs. With its multi-GPU implementation based on CUDA-aware MPI, the tool can achieve turnaround times of just a few hours for simulations of fully detailed production-level passenger and heavy-duty vehicles. Basics of the solver and several selected application examples are presented.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8253
Streaming:
Download:
Share:
 
Abstract:
We present in this talk a portable matrix assembly strategy used in solving PDEs, suited for co-execution on both the CPUs and accelerators. In addition, a dynamic load balancing strategy is considered to balance the workload among the different ...Read More
Abstract:

We present in this talk a portable matrix assembly strategy used in solving PDEs, suited for co-execution on both the CPUs and accelerators. In addition, a dynamic load balancing strategy is considered to balance the workload among the different CPUs and GPUs available on the cluster. Numerical methods for solving partial differential equations (PDEs) involve two main steps: the assembly of an algebraic system of the form Ax=b and the solution of it with direct or iterative solvers. The assembly step consists of a loop over elements, faces and nodes in the case of the finite element, finite volume, and finite difference methods, respectively. It is computationally intensive and does not involve communication. It is therefore well-suited for accelerators.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8292
Streaming:
Download:
Share:
 
Abstract:
This talk provides an overview of the key strategies used to design and implement OpenStaPLE, an application for Lattice QCD (LQCD) Monte Carlo simulations. LQCD are an example of HPC grand challenge applications, where the accuracy of results s ...Read More
Abstract:

This talk provides an overview of the key strategies used to design and implement OpenStaPLE, an application for Lattice QCD (LQCD) Monte Carlo simulations. LQCD are an example of HPC grand challenge applications, where the accuracy of results strongly depends on available computing resources. OpenStaPLE has been developed on top of MPI and OpenACC frameworks. It manages the parallelism across multiple computing nodes and devices, while OpenACC exploits the high level parallelism available on modern processors and accelerators, enabling a good level of portability across different architectures. After an initial overview, we also present performance and portability results on different architectures, highlighting key improvements of hardware and software key that may lead this class of applications to exhibit better performances.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8317
Streaming:
Download:
Share:
 
Abstract:
We present our experiences implementing GPU acceleration in the massively parallel, real space FHI-aims electronic structure code for computational materials science. For fourteen years, FHI-aims has focused on high numerical accuracy for curren ...Read More
Abstract:

We present our experiences implementing GPU acceleration in the massively parallel, real space FHI-aims electronic structure code for computational materials science. For fourteen years, FHI-aims has focused on high numerical accuracy for current methods, such as Kohn-Sham density-functional theory and beyond, and on outstanding scaling on distributed-parallel high-performance computers. We show how to exploit vectorized implementations in FHI-aims to achieve an overall 3x-4x GPU acceleration with minimal code rewrite for complete simulations. Furthermore, FHI-aims' domain decomposition scheme on non-uniform grids enables compute and memory-parallel computing across thousands of GPU-containing nodes for real-space operations.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8321
Streaming:
Download:
Share:
 
Abstract:
This talk will present the roadmap, the strategy and the currently ongoing efforts to port the fundamental building blocks of the QuantumESPRESSO suite of codes to accelerated architectures. QuantumESPRESSO is an integrated suite of codes provid ...Read More
Abstract:

This talk will present the roadmap, the strategy and the currently ongoing efforts to port the fundamental building blocks of the QuantumESPRESSO suite of codes to accelerated architectures. QuantumESPRESSO is an integrated suite of codes providing computational methods to estimate a vast number of physical properties at the nanoscale. It features high modularity and a user-oriented design, and it can efficiently exploit standalone workstations as well as state-of-art HPC systems. The differences characterizing this new work and the original GPU porting done in CUDA C back in 2012 will be used to discuss aspects of code evolution and maintainability. Special attention will also be devoted to the performance-critical kernels shared by most of the components of the suite.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8340
Streaming:
Download:
Share:
 
Abstract:
VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview on the status of porting VASP to ...Read More
Abstract:

VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview on the status of porting VASP to GPUs with OpenACC. Parts of VASP were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload, because VASP is otherwise written wholly in Fortran. We'll discuss OpenACC performance relative to CUDA, the impact of OpenACC on VASP code maintenance, and challenges encountered in the port related to management of aggregate data structures. Finally, we'll discuss possible future solutions for data management that would simplify both new development and the maintenance of VASP and similar large production applications on GPUs.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8367
Streaming:
Download:
Share:
 
Abstract:
Today we are investigating different technologies and architectures, and we will present the first hardware and software prototype that will evolve into a system able to overcome an unprecedented challenge. To probe the predictions of t ...Read More
Abstract:

Today we are investigating different technologies and architectures, and we will present the first hardware and software prototype that will evolve into a system able to overcome an unprecedented challenge.

To probe the predictions of the Standard Model of Particle Physics, the Large Hadron Collider at CERN will be upgraded by 2026 to produce 6 billion proton collisions every second at the centre of the Compact Muon Solenoid (CMS) detector. These collisions produce events in which new particles, which did not exist before the collision, are generated.

The CMS experiment will be able to observe and record the most energetic and rare of these events.

Observing the details of all these events requires reading and analyzing almost 100TB of data every second... and CMS is working on a hybrid approach to tackle this challenge: ASICs and FPGAs will be used for the first level of data reduction, while a hybrid cluster of computer servers and GPUs will be used for the full event reconstruction and final online selection.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8382
Streaming:
Download:
Share:
 
Abstract:
This talk would cover the background and distinguishing features of the European Interactive Computing E-Infrastructure (ICEI) project, which will offer a set of federated services to realize the Fenix infrastructure (https://fenix-ri.eu). For d ...Read More
Abstract:

This talk would cover the background and distinguishing features of the European Interactive Computing E-Infrastructure (ICEI) project, which will offer a set of federated services to realize the Fenix infrastructure (https://fenix-ri.eu). For decades, high "performance" computing, networking, and storage technologies have been among the driving forces behind numerous scientific discoveries and breakthroughs. Recently, the X-as-a-service model offered by several cloud technologies has enabled researchers, particularly in the fields of data science, to access resources and services in an on-demand and elastic manner. Complex workflows in different domains, such as the European Human Brain Project (HBP), however require a converged, consolidated, and flexible set of infrastructure services to support their performance and accessibility requirements.

  Back
 
Topics:
HPC and Supercomputing, GPU Virtualization
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8106
Streaming:
Download:
Share:
 
Abstract:
Classical molecular dynamics (MD) simulations will be able to reach sampling in the second timescale within five years thanks to GPUs, producing petabytes of simulation data at current force field accuracy. Notwithstanding this, MD will still be in t ...Read More
Abstract:
Classical molecular dynamics (MD) simulations will be able to reach sampling in the second timescale within five years thanks to GPUs, producing petabytes of simulation data at current force field accuracy. Notwithstanding this, MD will still be in the regime of low-throughput, high-latency predictions with average accuracy. We envisage that machine learning (ML) will be able to solve both the accuracy and time-to-prediction problem by learning predictive models using expensive simulation data. The synergies between classical, quantum simulations and ML methods, such as artificial neural networks, have the potential to drastically reshape the way we make predictions in computational structural biology and drug discovery.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8116
Streaming:
Share:
 
Abstract:
Nuclear fusion is the process that powers the sun, and it is one of the best hopes of achieving a virtually unlimited energy source for the future of humanity. However, reproducing sustainable nuclear fusion reactions here on Earth is a tremendous sc ...Read More
Abstract:
Nuclear fusion is the process that powers the sun, and it is one of the best hopes of achieving a virtually unlimited energy source for the future of humanity. However, reproducing sustainable nuclear fusion reactions here on Earth is a tremendous scientific and technical challenge. Special devices - called tokamaks - have been built around the world, with JET (Joint European Torus, in the UK) being the largest tokamak currently in operation. Such devices confine matter and heat it up to extremely high temperatures, creating a plasma where fusion reactions begin to occur. JET has over one hundred diagnostic systems to monitor what happens inside the plasma, and each 30-second experiment generates about 50 GB of data to be analyzed. In this talk, we will show how Convolutional Neural Networks (CNNs) can be used to reconstruct the 2D plasma profile inside the device based on data coming from those diagnostics. We will also discuss how Recurrent Neural Networks (RNNs) can be used to predict plasma disruptions, which are one of the major problems affecting fusion devices today. Training of such networks is done on NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8144
Streaming:
Download:
Share:
 
Speakers:
,
Abstract:
In this talk we will give an overview of the benefits GPU computing can provide to the Structural Bioinformatics field. We will explain how most of biomolecular simulations methods can be efficiently accelerated using massively computational arc ...Read More
Abstract:

In this talk we will give an overview of the benefits GPU computing can provide to the Structural Bioinformatics field. We will explain how most of biomolecular simulations methods can be efficiently accelerated using massively computational architectures and will show several fundamental research and technology transfer success cases.

  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning, Genomics & Bioinformatics
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8287
Streaming:
Download:
Share:
 
Abstract:
In 2014, GENCI set up a French technology watch group that targets the provisioning of test systems, selected as part of the prospective approach among partners from GENCI. This was done in order to prepare scientific commun ...Read More
Abstract:

In 2014, GENCI set up a French technology watch group that targets the provisioning of test systems, selected as part of the prospective approach among partners from GENCI. This was done in order to prepare scientific communities and users of GENCI's computing resources for the arrival of the next "Exascale" technologies.\nThe talk will present results obtained on the OpenPOWER platform bought by GENCI and open to the scientific community. We will present on the first results obtained for a set of scientific applications using the available environments (CUDA,OpenACC,OpenMP,â¦), along with results obtained for AI applications using IBM's software distribution PowerAI.

  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8288
Streaming:
Download:
Share:
 
Abstract:
Legacy, performance hungry and cutting edge deep learning workloads require best of breed cloud services and hardware. Enterprises require low cost and financial flexibility. Learn how Oracle and NVIDIA have partnered together to solve these cha ...Read More
Abstract:

Legacy, performance hungry and cutting edge deep learning workloads require best of breed cloud services and hardware. Enterprises require low cost and financial flexibility. Learn how Oracle and NVIDIA have partnered together to solve these challenges with a bare-metal NVIDIA Tesla GPU offering to squeeze every ounce of performance at a fraction of the cost. We'll also detail the ability to use NVIDIA GPU CLOUD to streamline the experience for customers to launch and run clusters of GPU Virtual Machines or bare metal instances for AI or HPC workloads. Come see live demos and learn what Oracle Cloud Infrastructure is doing in this space!

  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8528
Streaming:
Download:
Share:
 
Abstract:
CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, gain in ...Read More
Abstract:
CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, gain insight into the philosophy driving the development of CUDA, and see how it will take advantage of current and future GPUs. You'll also learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8128
Streaming:
Download:
Share:
 
Abstract:
Microsoft Azure's N-Series VMs powered by latest NVIDIA GPUs enable a range of new accelerated scenarios. Learn how you can take advantage of GPUs in Azure - from Workstation Graphics and Visualization, to HPC simulation, to training models ...Read More
Abstract:

Microsoft Azure's N-Series VMs powered by latest NVIDIA GPUs enable a range of new accelerated scenarios. Learn how you can take advantage of GPUs in Azure - from Workstation Graphics and Visualization, to HPC simulation, to training models for artificial intelligence. This session will delve deep into today's exciting offerings with live examples and offer a view of what's to come in the future.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8500
Streaming:
Share:
 
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data ce ...Read More
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment. The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. However, if the desired application is not available on NGC, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting. HPC Container Maker (HPCCM) [1] is an open-source project that addresses the challenges of creating HPC application containers. HPCCM encapsulates into modular building blocks the best practices of deploying core HPC components with container best practices, to reduce container development effort, minimize image size, and take advantage of image layering. HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image from the specification details of how to configure, build, and install a component. This separation also enables the best practices of HPC component deployment to transparently evolve over time.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80022
Download:
Share:
 
Abstract:
20 多年來,Amazon 大量投資於人工智能領域。現在,我們的使命是將我們的經驗和機器學習功能作為完全托管的服務進行分享,並把它們交付給每位開發人員和數據科學家。 AWS 提供了一系列人工智能服務,通過雲端原生的機器學習和深度學習技術來滿足不同的使用場景和需求。這些服務讓每位開發人員均能使用自然語言理解 (NLU) 、自動語音辨識 (ASR) 、視覺搜索和圖像辨識、文本轉語音 (TTS) 及最新的機器學習 (ML) 技術。 無論您是剛剛開始使用 AI 或者是深度學習專家,本次會議都將向您展示如何在 AWS 雲端進行 AI 創新,提升 AI 應用的規模,並提高 AI 應用的效率。 ...Read More
Abstract:
20 多年來,Amazon 大量投資於人工智能領域。現在,我們的使命是將我們的經驗和機器學習功能作為完全托管的服務進行分享,並把它們交付給每位開發人員和數據科學家。 AWS 提供了一系列人工智能服務,通過雲端原生的機器學習和深度學習技術來滿足不同的使用場景和需求。這些服務讓每位開發人員均能使用自然語言理解 (NLU) 、自動語音辨識 (ASR) 、視覺搜索和圖像辨識、文本轉語音 (TTS) 及最新的機器學習 (ML) 技術。 無論您是剛剛開始使用 AI 或者是深度學習專家,本次會議都將向您展示如何在 AWS 雲端進行 AI 創新,提升 AI 應用的規模,並提高 AI 應用的效率。  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8020
Streaming:
Download:
Share:
 
Abstract:
NVIDIA Volta GPU 導入的 Tensor Core 可透過 IEEE 半精度輸入提供高達 125 TeraFLOPS,讓混合精度訓練提供遠高於單精度的速度提升。我們將提供三項混合精度訓練的基本技巧說明:損耗縮放、精通權重及因應指定運算選擇適當精度。這些技巧都能達到與單精度網路相同的一定模型精度,且不會改變超級參數或訓練排程。最後,我們將解釋如何為您的網路啟用 Tensor Core 、如何確保 Tensor Core 使用及我們將透過圖例顯示以上所有項目,藉助簡單卻不失完備的方式說明 PyTorch 的功能範例。 ...Read More
Abstract:
NVIDIA Volta GPU 導入的 Tensor Core 可透過 IEEE 半精度輸入提供高達 125 TeraFLOPS,讓混合精度訓練提供遠高於單精度的速度提升。我們將提供三項混合精度訓練的基本技巧說明:損耗縮放、精通權重及因應指定運算選擇適當精度。這些技巧都能達到與單精度網路相同的一定模型精度,且不會改變超級參數或訓練排程。最後,我們將解釋如何為您的網路啟用 Tensor Core 、如何確保 Tensor Core 使用及我們將透過圖例顯示以上所有項目,藉助簡單卻不失完備的方式說明 PyTorch 的功能範例。  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8021
Streaming:
Download:
Share:
 
Abstract:
NVIDIA Tesla HGX 是一套平台架構,可提供適合人工智慧、深度學習及高效能運算的最高效能端對端解決方案。在本說明中我們將探討 HGX 資料中心產品藍圖、架構如何標準化加速 AI 的資料中心設計、打造新一代效能的最新技術,以及我們如何配合 OEM 與 ODM 合作夥伴在雲端推出 HGX 平台。 ...Read More
Abstract:
NVIDIA Tesla HGX 是一套平台架構,可提供適合人工智慧、深度學習及高效能運算的最高效能端對端解決方案。在本說明中我們將探討 HGX 資料中心產品藍圖、架構如何標準化加速 AI 的資料中心設計、打造新一代效能的最新技術,以及我們如何配合 OEM 與 ODM 合作夥伴在雲端推出 HGX 平台。  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8006
Streaming:
Download:
Share:
 
Abstract:
慧與科技為了能有效的協助企業和政府單位建置 AI 環境,發掘 AI 在提升產業競爭力的各種可能性,推出 AI CookBook. 讓 AI 工作者能透過 AI CookBook 中的內容快速的佈署 AI 環境,在慧與科技 AI Center 的 AI/Data Scientist 專家提供著專業諮詢下進行各個產業不同的AI應用。 ...Read More
Abstract:
慧與科技為了能有效的協助企業和政府單位建置 AI 環境,發掘 AI 在提升產業競爭力的各種可能性,推出 AI CookBook. 讓 AI 工作者能透過 AI CookBook 中的內容快速的佈署 AI 環境,在慧與科技 AI Center 的 AI/Data Scientist 專家提供著專業諮詢下進行各個產業不同的AI應用。  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8007
Streaming:
Download:
Share:
 
Abstract:
深度學習揭密第一次在GTC Taiwan 分享給大眾,什麼是深度學習?為什麼這幾天突然深度學習變成顯學?為什麼GPU 在深度學習領域扮演了重要的領域?該如何入門?我的公司適合導入深度學習嗎?透過NVIDIA 來解密 ...Read More
Abstract:
深度學習揭密第一次在GTC Taiwan 分享給大眾,什麼是深度學習?為什麼這幾天突然深度學習變成顯學?為什麼GPU 在深度學習領域扮演了重要的領域?該如何入門?我的公司適合導入深度學習嗎?透過NVIDIA 來解密  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8008
Streaming:
Download:
Share:
 
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。 ...Read More
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8029
Download:
Share:
 
Abstract:
針對當今人工智慧/機器學習和 HPC 工作負載設計的眾多 GPU 硬體平台概述,包括針對深度學習推論和深度學習訓練的客製解決方案。也會涵蓋基於 PCIe GPU 的系統以及具有 NVLink 接口的 GPU 系統。 ...Read More
Abstract:
針對當今人工智慧/機器學習和 HPC 工作負載設計的眾多 GPU 硬體平台概述,包括針對深度學習推論和深度學習訓練的客製解決方案。也會涵蓋基於 PCIe GPU 的系統以及具有 NVLink 接口的 GPU 系統。  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8031
Download:
Share:
 
Abstract:
The simulation of the behavior of the human brain is one of the most important challenges in the recent history of computing, with a large number of practical applications. The main constraint is to simulate efficiently a huge number of neurons using ...Read More
Abstract:
The simulation of the behavior of the human brain is one of the most important challenges in the recent history of computing, with a large number of practical applications. The main constraint is to simulate efficiently a huge number of neurons using current computer technology. One of the most efficient ways in which the scientific community attempts to simulate the behavior of the human brain consists of computing the next three major steps: The computing of 1) the voltage on neuron morphology, 2) the synaptic elements in each of the neurons, and 3) the connectivity between neurons. In this work, we focus on the first step, which is one of the most time-consuming steps of the simulation. Also, it is strongly linked with the rest of steps. All these steps must be carried out on each of the neurons (between 50 and 100 thousand million of neurons in the human brain), which are completely different among them in size and shape.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8328
Streaming:
Download:
Share:
 
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the s ...Read More
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the state-of-the-art system designed for HPC and cognitive computing. This system also introduces NVLink 2.0 high-speed connectivity between CPU and GPU, along with coherent device memory. System characteristics such as CPU and GPU compute and memory throughput, NVLink latency, and bandwidth play key roles in application performance. We'll demonstrate how each of these influences application performance through a case study.  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8309
Streaming:
Download:
Share:
 
Abstract:
We will present the latest development in the Gunrock library, mainly in programability and scalability, in the talk. Gunrock (http://gunrock.github.io/) is a high performance GPU graph processing library for large graphs. We revise the APIs of the l ...Read More
Abstract:
We will present the latest development in the Gunrock library, mainly in programability and scalability, in the talk. Gunrock (http://gunrock.github.io/) is a high performance GPU graph processing library for large graphs. We revise the APIs of the library, aiming to make programming with Gunrock easier, and also to support more graph formats (including goai open format for data analytics) and operations. We also develop some new techniques to scale graph traversal on more GPUs, and can process graphs with several hundreds of billion edges around half a second on more than 100 Tesla P100 GPUs.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8594
Streaming:
Download:
Share:
 
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain sci ...Read More
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain science application development projects focusing on preparing codes for efficient use on Summit. Over the last three years, these teams have developed and executed a development plan based on detailed information about Summit's architecture and system software stack. This presentation will highlight the progress made by the teams that have used Titan, the 27 PF Cray XK7 with NVIDIA K20X GPUs, SummitDev, an early IBM Power8+ access system with NVIDIA P100 GPUs, and since very recently Summit, OLCF's new IBM Power9 system with NVIDIA V100 GPUs. The program covers a wide range of domain sciences, with applications including ACME, DIRAC, FLASH, GTC, HACC, LSDALTON, NAMD, NUCCOR, NWCHEM, QMCPACK, RAPTOR, SPECFEM, and XGC  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8908
Streaming:
Download:
Share:
 
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use t ...Read More
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use the same Python code on different platforms like X86, RISC and ARM. The python development community is growing fast and many community members are interested on how to start moving to GPU accelerated programming but don't know how to start and what is needed. We'll go through the steps and adoption path to start developing python solutions taking advantage of GPU acceleration, including some details, advantages and challenges for the strongest and more popular python3 modules to be used with GPUs: scikit-cuda, PyCUDA, Numba, cudamat and cupy. Some code samples and programs execution statistics will be shown as a performance analysis exercising as well.  Back
 
Topics:
HPC and AI, Tools & Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8214
Streaming:
Download:
Share:
 
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based ...Read More
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based GPUs as well as GPU systems with the NVLink interface.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8999
Streaming:
Share:
 
Abstract:
QMCPACK is an open-source, massively parallel Quantum Monte-Carlo code enabling the accurate calculation of quantum many-body problems such as systems of atoms, molecules, and even solids. Here, we demonstrate the implementation of a rank-k matrix up ...Read More
Abstract:
QMCPACK is an open-source, massively parallel Quantum Monte-Carlo code enabling the accurate calculation of quantum many-body problems such as systems of atoms, molecules, and even solids. Here, we demonstrate the implementation of a rank-k matrix update scheme leading to increased compute density and performance improvements up to 1.5-fold compared to the current rank-1 update at every step. We compare performance results on Oak Ridge's next supercomputer, Summit, as well as its development precursor, SummitDev to the current machine, Titan. Based on detailed runtime traces we illustrate how speed-ups were achieved and give an outlook which future library features could be most beneficial to our application performance.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8937
Streaming:
Download:
Share:
 
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal example ...Read More
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal examples of this complexity. We use the scalable FLASH code to model these astrophysical cataclysms, incorporating hydrodynamics, thermonuclear kinetics, and self-­?gravity across considerable spans in space and time. Using OpenACC and GPU-­?enabled libraries coupled to new NVIDIA GPU hardware capabilities, we have improved the physical fidelity of these simulations by increasing the number of evolved nuclear species by more than an order-­?of-­? magnitude. I will discuss these and other performance improvements to the FLASH code on the Summit supercomputer at Oak Ridge National Laboratory.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8926
Streaming:
Share:
 
Abstract:
Top supercomputers in the TOP500 list have transitioned from homogeneous node architectures toward heterogeneous many-core nodes with accelerators and CPUs. These new architectures present significant challenges to developers of large-scale multi phy ...Read More
Abstract:
Top supercomputers in the TOP500 list have transitioned from homogeneous node architectures toward heterogeneous many-core nodes with accelerators and CPUs. These new architectures present significant challenges to developers of large-scale multi physics applications, especially at Department of Energy laboratories that have invested heavily in scalable message passing interference codes over decades. Preserving developer productivity requires single source high-performance code bases, while porting to new architectures. We'll introduce RAJA, a C++-based programming model abstraction developed at Lawrence Livermore National Laboratory (LLNL) and used to abstract fine-grained on-node parallelization in multiple production applications. Then, we'll describe how RAJA is used in ARES, a large, multiphysics application at LLNL.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8470
Streaming:
Share:
 
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distribute ...Read More
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications including QMCPack, HPGMG, CoMD, and Memcached demonstrating the programming model advantages.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8135
Streaming:
Share:
 
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and ...Read More
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and best practices from real applications. Developers interested in writing OpenMP codes for GPUs will learn how best to achieve good performance and portability.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8344
Streaming:
Share:
 
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific an ...Read More
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific and engineering workflows. In this talk, Vic will discuss an application of machine learning to develop a fast-running surrogate model that captures the dynamics of an industrial multiphase fluid flow. He will also discuss an improved population search method that can help the analyst explore a high-dimensional parameter space to optimize production while reducing the model uncertainty.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8828
Streaming:
Download:
Share:
 
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distribu ...Read More
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distributed shared memory systems can be carried over to CPUs and GPUs in a cluster. Unicorn is designed for easy programmability and provides a deterministic execution environment. Device, node and cluster management are completely handled by the runtime and no related API is provided to the application programmer. Load balancing, scheduling and scalability are also fully transparent to the application code. Programs written on one cluster can be run verbatim on a different cluster. Application code is agnostic to data placement within the cluster as well as the changes in network interfaces and data availability pattern. Unicorn''s programming model, being deterministic, by design eliminates several data races and deadlocks. Unicorn''s runtime employs several data optimizations including prefetching and subtask streaming in order to overlap communication and computation. Unicorn employs pipelining at two levels first to hide data transfer costs among cluster nodes and second to hide transfer latency between CPUs and GPUs on all nodes. Among other optimizations, Unicorn''s work-stealing based scheduler employs a two-level victim selection technique to reduce the overhead of steal operations. Further, it employs special proactive and aggressive stealing mechanism to prevent the said pipelines from stalling (during a steal operation). We will showcase the scalability and performance of Unicorn on several scientific workloads. We will also demonstrate the load balancing achieved in some of these experiments and the amount of time the runtime spends in communications. We find that parallelization of coarse-grained applications like matrix multiplication or 2D FFT using our system requires only about 30 lines of C code to set up the runtime. The rest of the application code is regular single CPU/GPU implementation. This indicates the ease of extending sequential code to a parallel environment. We will be showing the efficiency of our abstraction with minimal loss on performance on latest GPU architecture like Pascal and Volta. Also we will be comparing our approach to other similar implementations like StarPU-MPI and G-Charm.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8565
Streaming:
Download:
Share:
 
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. Th ...Read More
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. There is a need for a novel approach towards the design of large scale collective communication algorithms for CUDA aware MPI runtimes. The session will deep dive into our implementation of these collectives and its performance advantages on IBM POWER 9 Systems with NVIDIA V100 GPUs for OSU benchmark and Distributed TensorFlow.  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8306
Streaming:
Download:
Share:
 
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer pr ...Read More
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer processes are intrinsically coupled and must be considered simultaneously in complex domains associated with devices such as gas-turbine and rocket engines. The problem is compounded by the effects of turbulence and high-pressure phenomena, which require treatment of nonideal fluid mixtures at supercritical conditions. The combination of complex multicomponent property evaluations along with the computational grid resolution requirements makes these simulations expensive and cumbersome. Recent advances in high performance computing (HPC) systems, such as the graphics processing unit (GPU) based architectures, provides an opportunity for significant advances in dealing with these complexities while reducing the time to solution.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8910
Streaming:
Share:
 
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale ...Read More
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale data problems. The LTFB approach creates a set of Deep Neural Network (DNN) models and trains each instance of these models independently and in parallel. Periodically, each model selects another model to pair with, exchanges models, and then run a local tournament against held-out tournament datasets. The winning model continues training on the local training datasets. This new approach maximizes computation and minimizes amount of synchronization required in training deep neural network, a major bottleneck in existing synchronous deep learning algorithms. We evaluate our proposed algorithm on two HPC machines at Lawrence Livermore National Laboratory including an early access IBM Power8+ with NVIDIA Tesla P100 GPUs machine. Experimental evaluations of the LTFB framework on two popular image classification benchmark: CIFAR10 and ImageNet, show significant speed up compared to the sequential baseline.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8829
Streaming:
Share:
 
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter ...Read More
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter and environmental challenges, network performance and optimization, data pipeline and storage challenges as well as workload orchestration and optimization. You will learn more about open architectures for HPC, AI and Deep Learning, combining flexible compute architectures, rack scale platforms, and software-defined networking and storage, to provide a scalable software-defined deep learning environment. We will discuss strategies, providing insight into everything from specialty compute for training vs. inference to high performance storage for data workflows to orchestration and workflow management tools. We will also discuss deploying deep learning environments from development to production at scale from private cloud to public cloud.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8972
Streaming:
Download:
Share:
 
Abstract:
Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamen ...Read More
Abstract:

Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We''ll also discuss the expected performance in combination with the new computing platforms that emerged last year.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8474
Download:
Share:
 
Abstract:
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We''ll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we''ll cover advanced top ...Read More
Abstract:
Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We''ll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we''ll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We''ll also cover the latest improvements with CUDA-aware MPI, interaction with unified memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8314
Streaming:
Download:
Share:
 
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take ad ...Read More
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take advantage of all of Julia's most powerful features: generic programming, n-dimensional kernels, higher order functions and custom numeric types. We'll give overview the compiler's implementation and performance characteristics via the Rodinia benchmark suite. We'll show how these techniques enable highly flexible AI libraries with state-of-the-art performance, and allow a major government user to run highly computational threat modelling on terabytes of data in real time.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8269
Download:
Share:
 
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will ...Read More
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will also cover current technical and business challenges, and the future considerations for next-generation HBM line-up and many more.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8949
Streaming:
Download:
Share:
 
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles t ...Read More
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis.  Back
 
Topics:
HPC and AI, 5G & Edge, In-Situ & Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8561
Streaming:
Download:
Share:
 
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms ...Read More
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms to market faster, and getting the most productivity out of your researcher's. At this session, Greg Schmidt introduces the new HPE Apollo 6500 Gen10 System with NVLink for the enterprise. This innovative system design allows for a high degree of flexibility with a range of configuration and topology options to match your workloads. Learn how the Apollo 6500 unlocks business value from your data for AI.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8969
Streaming:
Share:
 
Abstract:
Learn about the latest developments in the high-performance mass passing interference (MPI) over InfiniBand, iWARP, and RoCE (MVAPICH2) library that simplify the task of porting MPI applications to HPC and Supercomputing clusters with NVIDIA GPUs. MV ...Read More
Abstract:
Learn about the latest developments in the high-performance mass passing interference (MPI) over InfiniBand, iWARP, and RoCE (MVAPICH2) library that simplify the task of porting MPI applications to HPC and Supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA framework for MPI datatype processing using CUDA kernels, support for GPUDirect Async, support for heterogeneous clusters with GPU and non-GPU nodes, and more. We use the popular Ohio State University micro-benchmark suite and example applications to demonstrate how developers can effectively take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We provide guidance on issues like processor affinity to GPU and network that can significantly affect the performance of MPI applications that use MVAPICH2.  Back
 
Topics:
HPC and Supercomputing, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8373
Streaming:
Download:
Share:
 
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file sys ...Read More
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file systems, and MPI backends. We'll discuss examples of how deep learning workflows are being deployed on next-generation systems at the Oak Ridge Leadership Computing Facility. We'll share benchmarks between native compiled versus containers on Power systems, like Summit, as well as best practices for deploying learning and models on HPC resources on scientific workflows.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8551
Streaming:
Download:
Share:
 
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long lat ...Read More
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8595
Streaming:
Download:
Share:
 
Abstract:
When does the Multi-Process Service (MPS) improve performance of HPC codes? We aim to answer this question by exploring the effectiveness of MPS in a number of HPC applications combining distributed and shared memory parallel models. A single complex ...Read More
Abstract:
When does the Multi-Process Service (MPS) improve performance of HPC codes? We aim to answer this question by exploring the effectiveness of MPS in a number of HPC applications combining distributed and shared memory parallel models. A single complex application typically includes stages with limited degree of parallelism where CPU cores are more effective than GPUs, and highly parallelizable stages acceleratable by offloading to the GPUs. MPS allows offloading computation from a number of processes to the same GPU, and, as a result, more CPU cores per node can tackle tasks characterized by limited shared memory parallelism. We demonstrate effectiveness of MPS in large-scale simulations on the IBM Minsky and Witherspoon nodes with two multi-core POWER CPUs combined with 4-6 NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8434
Streaming:
Download:
Share:
 
Abstract:
This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA develo ...Read More
Abstract:

This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.

  Back
 
Topics:
HPC and AI, Data Center & Cloud Infrastructure, AI & Deep Learning Business Track (High Level), HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8688
Streaming:
Download:
Share:
 
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-&sh ...Read More
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-­?in-­?Cell method, which models the plasma as billions of quasiparticles representing ions and electrons. Ostensibly, the process of advancing each particle in time is embarrassingly parallel. However, the electric and magnetic fields must be known in order to push the particle, which requires an implicit gather operation from XGC's sophisticated unstructured mesh. In this session, we'll show how careful mapping of field and particle data structures to GPU memory allowed us to decouple the performance of the critical electron push routine from size of the simulation mesh and allowed the true particle parallelism to dominate. This improvement enables performant, high resolution, ITER scale simulations on Summit.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8909
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming ...Read More
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
HPC and AI, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23031
Download:
Share:
 
Abstract:
Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the impor ...Read More
Abstract:

Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the importance of the necessary reshaping of the architecture of the software using its GPU accelerated analytics. Minsky featuring far better GPU memory bandwidth and GPU-CPU interconnect rase the bar even further. Murex will show how it has handled this new challenge for its business.

  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23209
Download:
Share:
 
Abstract:
HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation ...Read More
Abstract:

HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation power usually involve heterogeneous computation resources, e.g., GPUs and InfiniBand as installed on HPE Apollo.   NovuMinds NovuForce system leveraging state of art technologies make the deployment and configuration procedure fast and smooth. NovuForce deep learning softwares within the docker image has been optimized for the latest technology like NVIDIA Pascal GPU and infiniband GPUDirect RDMA. This flexibility of the software, combined with the broad GPU servers in HPE portfolio, makes one of the most efficient and scalable solutions.

  Back
 
Topics:
HPC and AI, Performance Optimization, Tools & Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23250
Download:
Share:
 
Abstract:
Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it ...Read More
Abstract:

Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.

  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23277
Download:
Share:
 
Abstract:
We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us sat ...Read More
Abstract:

We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us satisfy strict real-time restrictions dictated by the imaging hardware used to scan the rock samples. We present a detailed description of the workflow from a DRP approach perspectives, our algorithm and optimization techniques and performance results on the latest NVIDIA GPU generations.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing, Video & Image Processing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23303
Download:
Share:
 
Abstract:
In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guida ...Read More
Abstract:

In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guidance and prepare the scientific communities to challenges of exascale architectures. The talk will present the OpenPOWER platform bought by GENCI and provided to the scientific community. Then, it will present the first results obtained on the platform for a set of about 15 applications using all the solutions provided to the users (CUDA,OpenACC,OpenMP,...). Finally, a presentation about one specific application will be made regarding its porting effort and techniques used for GPUs with both OpenACC and OpenMP.

  Back
 
Topics:
HPC and AI, Performance Optimization, Programming Languages
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23183
Download:
Share:
 
Abstract:
Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes han ...Read More
Abstract:

Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes hand in hand with the increasing demand of Wireless-VR Hardware solutions, what hardware is out on the market now, what is coming and how can wireless solutions - whether standalone devices or Addons - create a higher value for your VR application? How large-scale location based VR and hardware manufacturers are expanding the boundaries of the VR industry, both for Entertainment and B2B?

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23388
Download:
Share:
 
Abstract:
With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughpu ...Read More
Abstract:

With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughput of multiple applications, covering all aspects of the workflow, including data analysis and visualisation. We will discuss ongoing efforts to further integrate these extreme-scale compute and data services with infrastructure services of the cloud. As Tier-0 systems of PRACE, Piz Daint is accessible to all scientists in Europe and worldwide. It provides a baseline for future development of exascale computing. We will present a strategy for developing exascale computing technologies in domains such as weather and climate or materials science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23429
Download:
Share:
 
Abstract:
The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Strea ...Read More
Abstract:

The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release.

  Back
 
Topics:
HPC and AI, Programming Languages, Tools & Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23434
Download:
Share:
 
Speakers:
, ,
Abstract:
We'll guide you step by step to port and optimize an oil-and-gas miniapplication to efficiently leverage the amazing computing power of NVIDIA GPUs. While OpenACC focuses on coding productivity and portability, CUDA enables extracting the ma ...Read More
Abstract:

We'll guide you step by step to port and optimize an oil-and-gas miniapplication to efficiently leverage the amazing computing power of NVIDIA GPUs. While OpenACC focuses on coding productivity and portability, CUDA enables extracting the maximum performance from NVIDIA GPUs. OmpSs, on the other hand, is a GPU-aware task-based programming model that may be combined with CUDA, and recently with OpenACC as well. Using OpenACC, we'll start benefiting from GPU computing, obtaining great coding productivity, and a nice performance improvement. We can next fine-tune the critical application parts developing CUDA kernels to hand-optimize the problem. OmpSs combined with either OpenACC or CUDA will enable seamless task parallelism leveraging all system devices.

  Back
 
Topics:
HPC and Supercomputing
Type:
Instructor-Led Lab
Event:
GTC Europe
Year:
2017
Session ID:
53020
Download:
Share:
 
Abstract:
The development of cognitive computing applications is at a critical juncture with tough challenges but ample opportunities for great breakthroughs. Many of the cognitive solutions involved are complex and methods required to develop them remain poor ...Read More
Abstract:
The development of cognitive computing applications is at a critical juncture with tough challenges but ample opportunities for great breakthroughs. Many of the cognitive solutions involved are complex and methods required to develop them remain poorly understood. Any major breakthrough in improving such understanding would require large-scale experimentation and extensive data-driven development. In short, we are witnessing the formation of a new modality of programming and even a new modality of application execution. The IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) is developing scalable cognitive solutions that embody both advanced cognitive computing workloads and optimized heterogeneous computing systems for these cognitive workloads. The two streams of research not only complement, but also empower each other, and thus should be carried out in a tightly integrated fashion.  Back
 
Topics:
HPC and Supercomputing, Accelerated Data Science
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7140
Download:
Share:
 
Abstract:
AI and machine learning techniques are finding increasing application across all areas of science and engineering, with high performance computing, data and networking playing a critical role in enabling research advances in these areas. In this ta ...Read More
Abstract:
AI and machine learning techniques are finding increasing application across all areas of science and engineering, with high performance computing, data and networking playing a critical role in enabling research advances in these areas. In this talk we overview current and planned investments by the National Science in such advanced research cyberinfrastructure. We illustrate the use and promise of these technologies drawing on NSF-funded research at the intersection of a domain science, AI, and advanced research cyberinfrastructure. We'll also discuss how these investments are well aligned with NSF's forward-looking "Big Ideas."  Back
 
Topics:
HPC and Supercomputing, Leadership and Policy in AI
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7153
Download:
Share:
 
Abstract:
There has been a surge of success in using deep learning as it has provided a new state of the art for a variety of domains. While these models learn their parameters through data-driven methods, model selection through hyper-parameter choices remain ...Read More
Abstract:
There has been a surge of success in using deep learning as it has provided a new state of the art for a variety of domains. While these models learn their parameters through data-driven methods, model selection through hyper-parameter choices remains a tedious and highly intuition-driven task. We've developed two approaches to address this problem. Multi-node evolutionary neural networks for deep learning (MENNDL) is an evolutionary approach to performing this search. MENNDL is capable of evolving not only the numeric hyper-parameters, but is also capable of evolving the arrangement of layers within the network. The second approach is implemented using Apache Spark at scale on Titan. The technique we present is an improvement over hyper-parameter sweeps because we don't require assumptions about independence of parameters and is more computationally feasible than grid-search.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7200
Download:
Share:
 
Abstract:
The 2015 Nobel Prize in Physics was awarded for the discovery of neutrino oscillations, which indicates that neutrinos have mass. This phenomenon was unexpected and is one of the clearest signs of new physics beyond the Standard Model. The NOvA exper ...Read More
Abstract:
The 2015 Nobel Prize in Physics was awarded for the discovery of neutrino oscillations, which indicates that neutrinos have mass. This phenomenon was unexpected and is one of the clearest signs of new physics beyond the Standard Model. The NOvA experiment aims to deepen our understanding of neutrino oscillations by measuring the properties of a muon neutrino beam produced at Fermi National Accelerator Laboratory at a Near Detector close to the beam source, and measuring the rate that muon neutrinos oscillate into electron neutrinos over an 810 km trip to a 14,000 ton Far Detector in Ash River, MN. Understanding this process may explain why the universe is made of matter instead of antimatter. Performing this measurement requires a high-precision method for classifying neutrino interactions. To this end, we developed a convolutional neural network that gave a 30 percent improvement in electron neutrino selection over previous methods equivalent increasing the Far Detector mass by 4,000 tons.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7230
Download:
Share:
 
Abstract:
Recent advances in the deployment of deep learning recurrent nets have been demonstrated in scaling studies of Princeton's new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Net) Code on modern GPU systems. This is a "big-data" ...Read More
Abstract:
Recent advances in the deployment of deep learning recurrent nets have been demonstrated in scaling studies of Princeton's new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Net) Code on modern GPU systems. This is a "big-data" project in that it has access to the huge EUROFUSION/JET disruption data base of over a half-petabyte to drive these studies. FRNN implements a distributed data parallel synchronous stochastic gradient approach with TensorFlow and Theano libraries at the backend and MPI for communication. This deep learning software has recently demonstrated excellent scaling up to 6,000 GPUs on Titan at Oak Ridge National Lab. The associated accomplishments exhibit clear progress toward the goal of establishing the practical feasibility of using leadership-class supercomputers to greatly enhance training of neural nets for transformational impact on key discovery science application domains such as fusion energy science. Powerful systems expected to be engaged for near-future deployment of this deep learning software include: (1) NVIDIA's SATURN V featuring its nearly 1,000 Pascal P100 GPUs; (2) Switzerland's Piz Daint Cray XC50 system with 4,500 P100 GPUs; (3) Japan's Tsubame 3 system with 3,000 P100 GPUs; (4) and OLCF's Summit-Dev system. Summarily, deep learning software trained on large scientific datasets hold exciting promise for delivering much-needed predictive tools capable of accelerating knowledge discovery. The associated creative methods being developed including a new half-precision capability -- also has significant potential for cross-cutting benefit to a number of important application areas in science and industry.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7243
Download:
Share:
 
Abstract:
Learn how Microsoft Azure Government enables compliant deep learning and traditional HPC-based workloads using powerful NVIDIA Tesla GPU accelerators and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. Fin ...Read More
Abstract:

Learn how Microsoft Azure Government enables compliant deep learning and traditional HPC-based workloads using powerful NVIDIA Tesla GPU accelerators and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. Find out about the roadmap to support visualization scenarios and hear about customer stories and partner solutions that are leveraging these platform capabilities.

  Back
 
Topics:
HPC and Supercomputing, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7177
Download:
Share:
 
Abstract:
On October 26, 2017 AWS launched P3, the first instances to include NVIDIA Tesla V100 GPUs and the most powerful GPU instances available in the cloud. These instances are designed for compute-intensive applications that require massive parallel float ...Read More
Abstract:
On October 26, 2017 AWS launched P3, the first instances to include NVIDIA Tesla V100 GPUs and the most powerful GPU instances available in the cloud. These instances are designed for compute-intensive applications that require massive parallel floating point performance, including machine learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, genomics, and autonomous vehicle systems. We''ll talk about how these instances help AWS customers innovate and accelerate their workloads, and how NVIDIA and AWS are making new exciting use cases possible in the Cloud, both for High Performance Computing and advanced 3D Visualization.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1703
Download:
Share:
 
Abstract:
There has been a surge of success in using deep learning as it has provided a new state of the art for a variety of domains. While these models learn their parameters through data-driven methods, model selection through hyper-parameter choices remain ...Read More
Abstract:
There has been a surge of success in using deep learning as it has provided a new state of the art for a variety of domains. While these models learn their parameters through data-driven methods, model selection through hyper-parameter choices remains a tedious and highly intuition-driven task. We''ve developed two approaches to address this problem. Multi-node evolutionary neural networks for deep learning (MENNDL) is an evolutionary approach to performing this search. MENNDL is capable of evolving not only the numeric hyper-parameters, but is also capable of evolving the arrangement of layers within the network. The second approach is implemented using Apache Spark at scale on Titan. The technique we present is an improvement over hyper-parameter sweeps because we don''t require assumptions about independence of parameters and is more computationally feasible than grid-search.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1704
Download:
Share:
 
Abstract:
Learn how to adopt a MATLAB-centric workflow to design, develop, scale and deploy deep learning applications on to GPUs whether on your desktop, a cluster, or on embedded Tegra platforms, including Jetson TK1/TX1 and DRIVE PX boards. The workflow sta ...Read More
Abstract:
Learn how to adopt a MATLAB-centric workflow to design, develop, scale and deploy deep learning applications on to GPUs whether on your desktop, a cluster, or on embedded Tegra platforms, including Jetson TK1/TX1 and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB. Next, those networks are trained using MATLAB''s GPU and parallel computing support either on the desktop, a local compute cluster, or in the cloud. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. We''ll use examples of common computer vision algorithms and deep learning networks to describe this workflow, and we''ll present their performance benchmarks, including training with multiple GPUs on an Amazon P2 cloud instance.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1706
Download:
Share:
 
Abstract:
undefined
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1708
Download:
Share:
 
Abstract:
Lawrence Livermore National Laboratory (LLNL) has a long history of leadership in large-scale computing. Our next platform, Sierra, a heterogeneous system that will be sited as part of a Collaboration between Oak Ridge, Argonne and Lawrence Live ...Read More
Abstract:

Lawrence Livermore National Laboratory (LLNL) has a long history of leadership in large-scale computing. Our next platform, Sierra, a heterogeneous system that will be sited as part of a Collaboration between Oak Ridge, Argonne and Lawrence Livermore National Laboratories (CORAL) and delivered through a partnership with IBM, NVIDIA and Mellanox, will continue that tradition. That partnership has reached a key milestone that has begun the siting of Sierra as well as the Summit System at ORNL. This talk will provide a detailed look at the design of Sierra. It will compare and contrast Sierra to Summit, explaining the motivation for the design choices of each system. It will also preview some early uses of Sierra that target its technical opportunities and the challenges that accompany them.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1709
Share:
 
Abstract:
The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simulation, ...Read More
Abstract:
The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simulation, and machine learning to advance our understanding of cancer biology and to integrate what is known into predictive models that can inform research and guide therapeutic developments. In 2015, the U.S. Department of Energy formed a collaboration with the National Cancer Institute for the joint development of advanced computing solutions for cancer.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1711
Share:
 
Abstract:
TBA
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1713
Share:
 
Abstract:
Big science has often been accompanied by big data, but scientists have often been stymied by the best way to leverage their data-rich observations. By combining advanced scientific computing with cutting edge deep learning, we have been able to broa ...Read More
Abstract:
Big science has often been accompanied by big data, but scientists have often been stymied by the best way to leverage their data-rich observations. By combining advanced scientific computing with cutting edge deep learning, we have been able to broadly apply deep learning through-out our scientific mission. From high energy physics to computational chemistry to cyber-security, we are enhancing the pace and impact of diverse scientific disciplines by bringing together domain scientists and deep learning researchers across our laboratory. We are seeing in field after field, deep learning is driving transformational innovation, opening the door to a future of data-driven scientific discovery.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1714
Share: