SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

AI Application Deployment and Inference
Presentation
Media
Abstract:
Exploring the Best Server for AI Speaker: Samuel D. Matzek, Sr. Software Engineer Speaker: Maria Ward, IBM Accelerated Server Offering Manager Explore the server at the heart of the Summit and Sierra supercomputers, and the best server for ...Read More
Abstract:

Exploring the Best Server for AI Speaker: Samuel D. Matzek, Sr. Software Engineer Speaker: Maria Ward, IBM Accelerated Server Offering Manager Explore the server at the heart of the Summit and Sierra supercomputers, and the best server for AI. We will discuss the technical details that set this server apart and why it matters for your machine learning and deep learning workloads. IBM Cloud for AI at Scale Speaker: Alex Hudak, IBM Cloud Offering Manager AI is fast changing the modern enterprise with new applications that are resource demanding, but provide new capabilities to drive insight from customer data. IBM Cloud is partnering with NVIDIA to provide a world class and customized cloud environment to meet the needs of these new applications. Learn about the wide range of NVIDIA GPU solutions inside the IBM Cloud virtual and bare metal server portfolio, and how customers are using them across Deep Learning, Analytics, HPC workloads, and more. IBM Spectrum LSF Family Overview & GPU Support Speaker: Larry Adams, Global Architect - Cross Sector, Developer, Consultant, IBM Systems How to Fuel the Data Pipeline Speaker: Kent Koeninger, IBM IBM Storage Reference Architecture for AI with Autonomous Driving Speaker: Kent Koeninger, IBM  

  Back
 
Topics:
AI Application Deployment and Inference
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91053
Streaming:
Download:
Share:
 
Abstract:
Now that Deep learning has moved out of the lab and into production, how do you provide training environments to all your internal customers working across business units with different requirements and avoid provisioning separate clusters? IBM has a ...Read More
Abstract:
Now that Deep learning has moved out of the lab and into production, how do you provide training environments to all your internal customers working across business units with different requirements and avoid provisioning separate clusters? IBM has applied decades of HPC experience to build a production ready learning stack, including servers accelerated with NVIDIA GPUs, workload and resource management software, ready to use open source frameworks and it's all covered by IBM support. The solution provides a secure multi-tenant environment so multiple data scientists can share a common set of resources, eliminating silos, while running multiple instances of the same or different applications. The deep learning effort is enhanced with end-to-end pipeline support from data ingestion and preparation, through model training and tuning, to inference. In this session, we will explore what an enterprise deep learning environment looks like and provide insights into the unique IBM value for accelerating the use of deep learning across a wide variety of industries.  Back
 
Topics:
AI Application Deployment and Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81049
Streaming:
Download:
Share:
 
Abstract:
How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many de ...Read More
Abstract:
How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more.  Back
 
Topics:
AI Application Deployment and Inference, Climate, Weather, Ocean Modeling, Computer Vision, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8816
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll ...Read More
Abstract:
We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll present the current challenges and workarounds when using Singularity in a HPC cluster. We'll compare the performance of Singularity to bare-metal systems.  Back
 
Topics:
AI Application Deployment and Inference, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8368
Streaming:
Download:
Share:
AI and DL Research
Presentation
Media
Abstract:
Join a special presentation from our 2018-2019 Graduate Fellowship recipients to learn what's next from the world of research and academia. Sponsored projects involve a variety of technical challenges, including topics such as 3D scene understanding ...Read More
Abstract:
Join a special presentation from our 2018-2019 Graduate Fellowship recipients to learn what's next from the world of research and academia. Sponsored projects involve a variety of technical challenges, including topics such as 3D scene understanding, new programming models for tensor computations, HPC physics simulations for astrophysics, deep learning algorithms for AI natural language learning, and cancer diagnosis. We believe that theses students will lead the future in our industry and we're proud to support the 2018-2019 NVIDIA Graduate Fellows. For more information on the NVIDIA Graduate Fellowship program, visit www.nvidia.com/en-us/research/graduate-fellowships.  Back
 
Topics:
AI and DL Research, Virtual Reality and Augmented Reality, Graphics and AI, Computational Biology and Chemistry, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9976
Streaming:
Download:
Share:
 
Abstract:
We''ll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We''ll discuss the current state-of-the-art approaches and performance ga ...Read More
Abstract:
We''ll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We''ll discuss the current state-of-the-art approaches and performance gains where applicable. We''ll also investigate current barriers to adoption and consider possible solutions.  Back
 
Topics:
AI and DL Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8242
Streaming:
Download:
Share:
 
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better ...Read More
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better understand the mechanisms of cancer, use large amounts of diverse medical data for predictive models, and enable precision medicine by providing guidance for treatment to individual patients. Leveraging the compute expertise of DOE in high performance computing (HPC) and new methods for deep learning in artificial intelligence, this HPC+AI approach aims to create a single scalable deep neural network code called CANDLE (CANcer Distributed Learning Environment) that will be used to address all three challenges. This talk aims to give an overview of the project and highlight how GPU accelerated systems in the DOE ecosystem, Summit and Sierra, have contributed to the project.  Back
 
Topics:
AI and DL Research, HPC and AI, Medical Imaging and Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81033
Streaming:
Share:
AI in Healthcare
Presentation
Media
Abstract:
For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we' ...Read More
Abstract:

For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we're building on top of NGC to accelerate new algorithm development, and then a deep dive into a case study of the evolution of our cardiovascular ultrasound scanner and the underlying extensible software stack. It will contain 3 main parts as follows: (a) Cardiovascular ultrasound imaging from a user perspective. Which problems we need to solve for our customers. Impact of Cardiovascular disease in a global perspective (b) An introduction to the Vivid E95 and the cSound platform , GPU based real time image reconstruction & visualization. How GPU performance can be translated to customer value and outcomes and how this has evolved the platform during the last 2 ½ years. (c) Role of deep learning in cardiovascular ultrasound imaging, how we are integrating deep learning inference into our imaging system and preliminary results from automatic cardiac view detection.

  Back
 
Topics:
AI in Healthcare, Medical Imaging and Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8849
Streaming:
Download:
Share:
Accelerated Analytics
Presentation
Media
Abstract:
Learn how the breakthrough HPE Superdome Flex platform equips scientists, engineers, and business lines with in-memory computing at unparalleled scale to solve complex, data-intensive problems holistically, accelerate analytics, and coupled with Nvid ...Read More
Abstract:
Learn how the breakthrough HPE Superdome Flex platform equips scientists, engineers, and business lines with in-memory computing at unparalleled scale to solve complex, data-intensive problems holistically, accelerate analytics, and coupled with Nvidia GPU technology, leverage large-scale data visualization to speed time to discovery and innovation.  Back
 
Topics:
Accelerated Analytics, Computational Fluid Dynamics, Computer Aided Engineering, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8973
Streaming:
Download:
Share:
 
Abstract:
It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new generati ...Read More
Abstract:
It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new generation of GPU-powered analytics platforms can enable enterprises from a range of verticals to dramatically accelerate the process of insight generation at scale. In particular, he will focus on how the key technical differentiators of GPUs: their massive computational bandwidth, fast memory, and native rendering pipeline, make them uniquely suited to allow analysts and data scientists to query, visualize and power machine learning over large, often high-velocity, datasets. Using the open source MapD analytics platform as an example, Todd will detail the technical approaches his team took to leverage the full parallelism of GPUs and demo how the platform allows analysts to interactively explore datasets containing tens of billions of records.  Back
 
Topics:
Accelerated Analytics, NVIDIA Inception Program, GIS
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81008
Streaming:
Download:
Share:
 
Abstract:
A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in thi ...Read More
Abstract:

A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in this field require scalable compute resources or the use of advance data analytics methods (including deep learning) for processing extreme scale data volumes. GPUs are a key enabling technology and we will thus focus on the opportunities for using these for computing, data analytics and visualisation. GPU-accelerated servers based on POWER processors are here of particular interest due to the tight integration of CPU and GPU using NVLink and the enhanced data transport capabilities.

  Back
 
Topics:
Accelerated Analytics, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23189
Download:
Share:
 
Abstract:
In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructured-grid approach to accommodate geometric complexity. Furthermore, turbulent flows encountered in aerospace applications generally require h ...Read More
Abstract:
In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructured-grid approach to accommodate geometric complexity. Furthermore, turbulent flows encountered in aerospace applications generally require highly anisotropic meshes, driving the need for implicit solution methodologies to efficiently solve the discrete equations. To prepare NASA Langley Research Center''s FUN3D CFD solver for the future HPC landscape, we port two representative kernels to NVIDIA Pascal and Volta GPUs and present performance comparisons with a common multi-core CPU benchmark.  Back
 
Topics:
Accelerated Analytics
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1710
Download:
Share:
 
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features ...Read More
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features to allow for HPC to BD/AI convergence at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning.   Back
 
Topics:
Accelerated Analytics
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1720
Download:
Share:
 
Abstract:
Artificial intelligence, specifically deep learning, is rapidly becoming an important workload within the High Performance Computing space. This talk will present a couple successful systems design approaches HPE has provided customers to help them e ...Read More
Abstract:
Artificial intelligence, specifically deep learning, is rapidly becoming an important workload within the High Performance Computing space. This talk will present a couple successful systems design approaches HPE has provided customers to help them enable AI and deep learning within their HPC ecosystem.  Back
 
Topics:
Accelerated Analytics
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1731
Download:
Share:
 
Abstract:
In this talk we will look at the current state of high performance computing and look to the future toward exascale. In addition, we will examine some issues that can help in reducing the power consumption for linear algebra computations.
 
Topics:
Accelerated Analytics
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1733
Download:
Share:
 
Abstract:
The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Features like Independent Thread Scheduling and game-changing Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible ...Read More
Abstract:
The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Features like Independent Thread Scheduling and game-changing Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance of any comparable processor. Join two lead hardware and software architects for Volta on a tour of the features that will make Volta the platform for your next innovation in AI and HPC supercomputing.  Back
 
Topics:
Accelerated Analytics
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1739
Download:
Share:
Algorithms and Numerical Techniques
Presentation
Media
Abstract:
The present study deals with porting scalable parallel CFD application HiFUN on NVIDIA Graphics Processing Unit (GPU) using an off-load strategy. The present strategy focuses on improving single node performance of the HiFUN solver with the help of G ...Read More
Abstract:
The present study deals with porting scalable parallel CFD application HiFUN on NVIDIA Graphics Processing Unit (GPU) using an off-load strategy. The present strategy focuses on improving single node performance of the HiFUN solver with the help of GPUs. This work clearly brings out the efficacy of the off-load strategy using OpenACC directives on GPUs and may be considered as one of the attractive models for porting legacy CFD codes on GPU based HPC and Supercomputing platform.  Back
 
Topics:
Algorithms and Numerical Techniques, Computational Fluid Dynamics, Computer Aided Engineering
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8799
Streaming:
Download:
Share:
 
Abstract:
The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind ...Read More
Abstract:

The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind this simulation. In particular we highlight and focus on those transformations and optimizations carried out to achieve a good performance on NVIDIA GPUs.

  Back
 
Topics:
Algorithms and Numerical Techniques, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23076
Download:
Share:
 
Abstract:
With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. ...Read More
Abstract:
With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. This will include the connections to established IEEE floating point standards and existing HPC benchmarks. The discussion will focus on performance and numerical stability issues that are important for this kind of benchmarking and how they relate to NVIDIA platforms.  Back
 
Topics:
Algorithms and Numerical Techniques, Deep Learning and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7676
Download:
Share:
Artificial Intelligence and Deep Learning
Presentation
Media
Speakers:
, ,
Abstract:
Neural Networks have capitalized on recent advances on HPC, GPUs, GPGPUs, and the rising amounts of publicly available labeled data. In doing so, NN have and will revolutionize virtually every current application domain, as well as enable novel ...Read More
Abstract:

Neural Networks have capitalized on recent advances on HPC, GPUs, GPGPUs, and the rising amounts of publicly available labeled data. In doing so, NN have and will revolutionize virtually every current application domain, as well as enable novel ones such as those on recognition, autonomous, predictive, resilient, self-managed, adaptive, and evolving applications.
Nevertheless, it is to point out that NN training is rather resource intensive in data, time and energy; turning the resulting trained models into valuable assets represents an IP imperatively worth of being protected.
Furthermore, in the wake of Edge computing, NNs are progressively deployed across decentralized landscapes; as a consequence, IP owners are very protective of their NN based software products.
In this session, we propose to leverage Fully Homomorphic Encryption (FHE) to protect simultaneously the IP of trained NN based software and the input and the output data.
Within the context of a smart city scenario, we outline our NN model-agnostic approach, approximating and decomposing the NN operations into linearized transformations while employing a SIMD for vectorization.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8147
Streaming:
Share:
 
Abstract:
For your business projects you want to rely on solid partners to master their development and deployment. How to avoid the nightmare of cost increase or exceeding deadlines? How to benefit from industrialized solutions, avoiding demos that have ...Read More
Abstract:

For your business projects you want to rely on solid partners to master their development and deployment. How to avoid the nightmare of cost increase or exceeding deadlines? How to benefit from industrialized solutions, avoiding demos that have been freshly issued from labs?
In this session, you will learn how Atos, with a proven set of products and services, helps you accelerate your projects in HPC, enterprise and Internet of Things domains, from cloud to on-premises, from central to edge while leveraging the most powerful NVIDIA technologies.
Because AI applications and models rely on secure, reliable and up-to-date data, this session will also introduce how Atos is managing, updating and securing data and will end up with a presentation of operational applications in the domains of image recognition, video intelligence, prescriptive maintenance and cyber security.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8492
Streaming:
Download:
Share:
Astronomy and Astrophysics
Presentation
Media
Abstract:
Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard float ...Read More
Abstract:
Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard floating-point units, and tight integration with CPU cores. We''ll compare FPGAs and GPUs with respect to architecture, programming model, programming effort, performance, and energy efficiency, using some radio-astronomical signal-processing and imaging algorithms as examples. Can they compete with GPUs?  Back
 
Topics:
Astronomy and Astrophysics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8310
Streaming:
Share:
Autonomous Driving
Presentation
Media
Abstract:
AI is revolutionizing the $10T transportation industry. Every vehicle will be autonomous â cars, trucks, taxis, buses and shuttles. AI is core to enabling autonomous driving, but AI is also being applied to mobility, logistics, connected vehi ...Read More
Abstract:
AI is revolutionizing the $10T transportation industry. Every vehicle will be autonomous â cars, trucks, taxis, buses and shuttles. AI is core to enabling autonomous driving, but AI is also being applied to mobility, logistics, connected vehicles, connected factory, customer experience and a myriad of other use cases in Automotive. Come learn from experts at Audi, BMW and VW about how they are applying data ingestion, labeling, discovery and exploration to develop trained AI models with significant reductions in the time it takes due to GPU-accelerated computing infrastructures.  Back
 
Topics:
Autonomous Driving
Type:
Panel
Event:
GTC Europe
Year:
2018
Session ID:
E8468
Streaming:
Download:
Share:
Climate, Weather, Ocean Modeling
Presentation
Media
Abstract:
We'll discuss the revolution in computing, modeling, data handling and software development that's needed to advance U.S. weather-prediction capabilities in the exascale computing era. Creating prediction models to cloud-resolving 1 KM-resolution s ...Read More
Abstract:
We'll discuss the revolution in computing, modeling, data handling and software development that's needed to advance U.S. weather-prediction capabilities in the exascale computing era. Creating prediction models to cloud-resolving 1 KM-resolution scales will require an estimated 1,000-10,000 times more computing power, but existing models can't exploit exascale systems with millions of processors. We'll examine how weather-prediction models must be rewritten to incorporate new scientific algorithms, improved software design, and use new technologies such as deep learning to speed model execution, data processing, and information processing. We'll also offer a critical and visionary assessment of key technologies and developments needed to advance U.S. operational weather prediction in the next decade.  Back
 
Topics:
Climate, Weather, Ocean Modeling, AI and DL Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9750
Streaming:
Download:
Share:
Computational Biology and Chemistry
Presentation
Media
Abstract:
Learn how we're bringing Gromacs up to speed with the latest cutting-edge multi-GPU technology. Gromacs, a simulation package for biomolecular systems, is one of the most highly used HPC applications globally. It already benefits from GPU accelerati ...Read More
Abstract:
Learn how we're bringing Gromacs up to speed with the latest cutting-edge multi-GPU technology. Gromacs, a simulation package for biomolecular systems, is one of the most highly used HPC applications globally. It already benefits from GPU acceleration to allow fast simulation of large and complex systems. However, as GPUs become more powerful and increasingly sophisticated multi-GPU systems become available, Gromacs must adapt to optimally benefit from the massive extent of performance on offer. We will describe work to port all significant remaining computational kernels to the GPU, and to perform the required Inter-GPU communications using peer-to-peer memory copies, such that the GPU is exploited throughout and repeated PCIe transfers are avoided. We will present performance results to show the impact of our developments, and also describe the Gromacs performance model we've created to guide our work.  Back
 
Topics:
Computational Biology and Chemistry, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9270
Streaming:
Download:
Share:
 
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor ...Read More
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor operations such as tensor contractions (a generalization of matrix-matrix multiplications), point-wise tensor operations such as tensor permutations, and tensor decompositions (a generalization of matrix decompositions). While providing high performance, cuTENSOR also allows users to express their mathematical equations for tensors in a straightforward way that hides the complexity of dealing with these high-dimensional objects behind an easy-to-use API. CUDA 10.1 enables CUDA programmers to utilize Tensor Cores directly with the new mma.sync instruction. In this presentation, we describe the functionality of mma.sync and present strategies for implementing efficient matrix multiply computations in CUDA that maximize performance on NVIDIA Volta GPUs. We then describe how CUTLASS 1.3 provides reusable components embodying these strategies. CUTLASS 1.3 demonstrates a median 44% speedup of CUDA kernels executing layers from real-world Deep Learning workloads.  Back
 
Topics:
Computational Biology and Chemistry, Tools and Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9593
Streaming:
Download:
Share:
 
Abstract:
Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll ...Read More
Abstract:
Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll review the last two years of development in single-particle cryo-electron microscopy processing, with a focus on accelerated software, and discuss benchmarks and best practices for common software packages in this domain. Our talk will include videos and images of atomic resolution molecules and viruses that demonstrate our success in high-resolution imaging.  Back
 
Topics:
Computational Biology and Chemistry, In-Situ and Scientific Visualization, Data Center and Cloud Infrastructure, HPC and AI, Medical Imaging and Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9664
Streaming:
Download:
Share:
 
Abstract:
The existing drug discovery process is costly, slow, and in need of innovation. At ATOM, a public-private consortium consisting of LLNL, GSK, UCSF, and FNL, we built an HPC-driven drug discovery pipeline that is supported by GPU-enabled supercomputer ...Read More
Abstract:
The existing drug discovery process is costly, slow, and in need of innovation. At ATOM, a public-private consortium consisting of LLNL, GSK, UCSF, and FNL, we built an HPC-driven drug discovery pipeline that is supported by GPU-enabled supercomputers and containerized infrastructure. We'll describe the pipeline's infrastructure, including our data lake and model zoo, and share lessons learned along the way. We'll discuss the data-driven modeling pipeline we're using to create thousands of optimized models and the critical role of GPUs in this work. We'll also share model performance results and touch on how these models are integral to ATOM's new drug discovery paradigm. By building GPU-Accelerated tools, we aim to transform drug discovery from a time-consuming and sequential process to a highly parallelized and integrated approach.  Back
 
Topics:
Computational Biology and Chemistry, AI and DL Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9950
Streaming:
Download:
Share:
 
Abstract:
We will demonstrate the features and capabilities of OpenACC for porting and optimizing the ParDOCK docking module of the Sanjeevini suite for computer aided drug discovery developed at the HPC and Supercomputing Facility for Bioinformatics and Compu ...Read More
Abstract:
We will demonstrate the features and capabilities of OpenACC for porting and optimizing the ParDOCK docking module of the Sanjeevini suite for computer aided drug discovery developed at the HPC and Supercomputing Facility for Bioinformatics and Computational Biology at the Indian Institute of Technology Delhi. We have used OpenACC to efficiently port the existing C++ programming model of ParDOCK software with minimal code modifications to run on latest NVIDIA P100 GPU card. These code modifications and tuning resulted in a six times average speedup of improvements in turnaround time. By implementing openACC, the code is now able to sample ten times more ligand conformations leading to an increase in accuracy. The ACC ported ParDOCK code is now able to predict a correct pose of a protein-ligand interaction from 96.8 percent times, compared to 94.3 percent earlier (for poses under 1 A) and 89.9 percent times compared to 86.7 percent earlier (for poses under 0.5 A).  Back
 
Topics:
Computational Biology and Chemistry, Performance Optimization, Bioinformatics & Genomics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8188
Download:
Share:
 
Abstract:
We'll describe recent work to map comparative genomics algorithms to GPU-accelerated leadership-class systems. The explosion in availability of genomic data holds promise for enabling determination of the genetic causes of phenotypic characteristics ...Read More
Abstract:
We'll describe recent work to map comparative genomics algorithms to GPU-accelerated leadership-class systems. The explosion in availability of genomic data holds promise for enabling determination of the genetic causes of phenotypic characteristics, with applications to problems such as the discovery of the genetic roots of diseases. The growing sizes of these datasets and the quadratic and cubic scaling properties of the algorithms necessitate use of leadership-scale accelerated computing. We'll discuss the mapping of two-way and three-way algorithms for comparative genomics calculations to large-scale GPU-accelerated systems. Focusing primarily on the Proportional Similarity metric and the Custom Correlation Coefficient, we'll discuss issues of optimal mapping of the algorithms to GPUs, eliminating redundant calculations due to symmetries, and efficient mapping to many-node parallel systems. We'll also present results scaled to thousands of GPUs on the ORNL Titan system.  Back
 
Topics:
Computational Biology and Chemistry, Algorithms and Numerical Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7156
Download:
Share:
 
Abstract:
Imaging datasets are becoming larger and larger as new generation equipment provides higher definition imaging and scanning modalities. Part of analysing these datasets involves choosing the optimal hardware and software. We'll look at the design ch ...Read More
Abstract:
Imaging datasets are becoming larger and larger as new generation equipment provides higher definition imaging and scanning modalities. Part of analysing these datasets involves choosing the optimal hardware and software. We'll look at the design choices and workflow made for processing cryo-electron microscopy data with results from an NVIDIA DGX-1 and cloud-provisioned HPC.  Back
 
Topics:
Computational Biology and Chemistry, Healthcare and Life Sciences, Video and Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7232
Download:
Share:
Computational Fluid Dynamics
Presentation
Media
Abstract:
Learn how we explore the feasibility of porting YALES2 on GPU. YALES2 is an HPC application for turbulent combustion modeling from primary atomization to pollutant prediction on massive complex meshes. It runs over thousands of CPU cores solving seve ...Read More
Abstract:
Learn how we explore the feasibility of porting YALES2 on GPU. YALES2 is an HPC application for turbulent combustion modeling from primary atomization to pollutant prediction on massive complex meshes. It runs over thousands of CPU cores solving several billions element meshes through MPI+OpenMP programming. The work presented here is focusing on a preliminary feasibility study of GPU porting. In this session we will describe: a methodology for porting a large code to GPU; the choices that have been made regarding the different constraints; the performance results. We will also present the final benchmarks run across several platforms form classic Intel+Kepler cluster at ROMEO HPC Center (University of Reims, France) to prototypes with IBM Power8+Pascal at IDRIS (CNRS, France).  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23254
Download:
Share:
 
Abstract:
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name ...Read More
Abstract:

Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.

  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23348
Download:
Share:
Computational Physics
Presentation
Media
Abstract:
AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. ...Read More
Abstract:
AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. This session features a real-world use case from the advanced product engineering team at Western Digital, who is using HPC solutions to model new technologies and capabilities prior to production. Western Digital's computational tools incorporate the description of physics occurring during the HDD recording process and ultimately result in input to a recording sub system channel model which produces an Error Rate. The length scales involved in the recording model range from a few nanometers in the description of the recording media to microns in the description of the recording head. The power of the current generation of NVIDIA GPUs allows Western Digital to generate enough simulation data so that the same recording sub system channel model, used in experiments, can be employed in studies that include fabrication processes variances.   Back
 
Topics:
Computational Physics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81041
Streaming:
Download:
Share:
 
Abstract:
We explore using OpenACC to migrate applications required for modeling solar storms from CPU HPC clusters to an "in-house" multi-GPU system. We describe the software pipeline and the utilization of OpenACC in the computationally heavy codes ...Read More
Abstract:
We explore using OpenACC to migrate applications required for modeling solar storms from CPU HPC clusters to an "in-house" multi-GPU system. We describe the software pipeline and the utilization of OpenACC in the computationally heavy codes. A major step forward is the initial implementation of OpenACC in our Magnetohydrodynamics code MAS. Strategies for overcoming some of the difficulties encountered are discussed, including handling Fortran derived types, array reductions, and performance tuning. Production-level "time-to-solution" results will be shown for multi-CPU and multi-GPU systems of various sizes. The timings show that it is possible to achieve acceptable "time-to-solution"s on a single multi-GPU server/workstation for problems that previously required using multiple HPC CPU-nodes.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8847
Streaming:
Download:
Share:
Computer Vision
Presentation
Media
Abstract:
The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expan ...Read More
Abstract:
The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expansion efficiency. We'll share our journey to deliver functional maps of the world that include building extraction, human settlement maps, mobile home parks, and facility mapping using a variety of remote sensing imagery. Our research addresses three frontier challenges; 1) distinct characteristics of remote sensing data for deep learning (including the model distribution shifts encountered with remote sensing images), multisensor sources, and data multi modalities; 2) training very large deep-learning models using multi-GPU and multi-node HPC platforms; 3) large-scale inference using ORNL's Titan and Summit with NVIDIA TensorRT. We'll also talk about developing workflows to minimize I/O inefficiency, doing parallel gradient-descent learning, and managing remote sensing data in HPC environment.  Back
 
Topics:
Computer Vision, GIS, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8420
Streaming:
Download:
Share:
Data Center and Cloud Infrastructure
Presentation
Media
Abstract:
We'll discuss how Microsoft and NVIDIA have partnered to bring a broad portfolio of GPU products to Azure to support the demands of the most bleeding-edge customers. Our talk will cover how Azuer's industry-leading accelerator technology, delivered ...Read More
Abstract:
We'll discuss how Microsoft and NVIDIA have partnered to bring a broad portfolio of GPU products to Azure to support the demands of the most bleeding-edge customers. Our talk will cover how Azuer's industry-leading accelerator technology, delivered in multiple formats, puts demanding applications in an environment in which needed resources available on demand. From high performance networking and storage, to AI-aware cluster management and job orchestration tools, Azure takes the work out of running high-performance workloads.  Back
 
Topics:
Data Center and Cloud Infrastructure, GPU Virtualization
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91017
Streaming:
Share:
 
Abstract:
Migrating and building solutions in the cloud is challenging, expensive and not nearly as performant. Oracle Cloud Infrastructure (OCI) has been working with NVIDIA on giving you the on-premises performance you need with the cloud benefits and f ...Read More
Abstract:

Migrating and building solutions in the cloud is challenging, expensive and not nearly as performant. Oracle Cloud Infrastructure (OCI) has been working with NVIDIA on giving you the on-premises performance you need with the cloud benefits and flexibility you expect. In this session we'll discuss how you can take big data and analytics workloads, database workloads, or traditional enterprise HPC workloads that require multiple components along with a portfolio of accelerated hardware and not only migrate them to the cloud, but run them successfully. We'll discuss solution architectures, showcase demos, benchmarks and take you through the cloud migration journey. We'll detail the latest instances that OCI provides, along with cloud-scale services.

  Back
 
Topics:
Data Center and Cloud Infrastructure, Performance Optimization
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91026
Streaming:
Download:
Share:
 
Abstract:
Learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present state-of-the-art techniques for ...Read More
Abstract:
Learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present state-of-the-art techniques for distributed machine learning and examine what special requirements these techniques impose on the system. We'll also give an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing that accelerates large-scale deployments in HPC and artificial intelligence.  Back
 
Topics:
Data Center and Cloud Infrastructure, Deep Learning and AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9268
Streaming:
Share:
 
Abstract:
Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background?   NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to a ...Read More
Abstract:

Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background?   NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to accelerate AI workflow deployment and time to insight. We'll discuss lessons learned about building, deploying, and managing AI infrastructure at scale from design to deployment to management and monitoring.   We will show how the DGX Pod Management software (DeepOps) along with our storage partner reference-architectures can be used for the deployment and management of multi-node GPU clusters for Deep Learning and HPC environments, in an on-premise, optionally air-gapped datacenter. The modular nature of the software also allows experienced administrators to pick and choose items that may be useful, making the process compatible with their existing software or infrastructure.  

  Back
 
Topics:
Data Center and Cloud Infrastructure, AI Application Deployment and Inference
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9334
Streaming:
Download:
Share:
 
Abstract:
Whether it's for AI, data science and analytics, or HPC, GPU-Accelerated software can make possible the previously impossible. But it's well known that these cutting edge software tools are often complex to use, hard to manage, and diffi ...Read More
Abstract:

Whether it's for AI, data science and analytics, or HPC, GPU-Accelerated software can make possible the previously impossible. But it's well known that these cutting edge software tools are often complex to use, hard to manage, and difficult to deploy. We'll exlain how NGC solves these problems and gives users a head start on their projects by simplifying the use of GPU-Optimized software. NVIDIA product management and engineering experts will walk through the latest enhancements to NGC and give examples of how software from NGC can improve GPU-accelerated workflows.

  Back
 
Topics:
Data Center and Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9504
Streaming:
Download:
Share:
 
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPUs with upstreamed improvements to community projects and with tools that are seeing b ...Read More
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPUs with upstreamed improvements to community projects and with tools that are seeing broad interest and adoption. In addition, NVIDIA is a catalyst for the broader community in enumerating key technical challenges for developers, admins and end users, and is helping to identify gaps and drive them to closure. Our talk describes NVIDIA's new developments and upcoming efforts. We'll detail progress in the most important technical areas, including multi-node containers, security, and scheduling frameworks. We'll also offer highlights of the breadth and depth of interactions across the HPC community that are making the latest, highly-quality HPC applications available to platforms that include GPUs.  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9525
Streaming:
Download:
Share:
 
Abstract:
Users of HPC Systems have diverse needs and requirements for their applications and ML/DL environments. Containers help streamline and simplify environment creation, but security concerns generally prohibit popular container environments such as Dock ...Read More
Abstract:
Users of HPC Systems have diverse needs and requirements for their applications and ML/DL environments. Containers help streamline and simplify environment creation, but security concerns generally prohibit popular container environments such as Docker from running in shared computing environments. Alternate container systems for HPC address security concerns but have less documentation and resources available for users. We'll describe how our pipeline and resources at MITRE enable users to quickly build custom environments and run their code on the HPC system while minimizing startup time. Our process implements LXD containers, Docker, and Singularity on a combination of development and production HPC systems using a traditional scheduler.  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9958
Streaming:
Download:
Share:
 
Abstract:
The impact of the recent Spectre and Meltdown security vulnerabilities has reached every corner of the compute ecosystem. Red Hat's Performance Engineering team has a keen interest in quantifying a wide variety of workloads in order to provide feedb ...Read More
Abstract:
The impact of the recent Spectre and Meltdown security vulnerabilities has reached every corner of the compute ecosystem. Red Hat's Performance Engineering team has a keen interest in quantifying a wide variety of workloads in order to provide feedback to upstream developers working on these problems. This presentation will detail our team's involvement over the last several months, share selected performance impacts from a variety of common enterprise and HPC workloads, how to potentially mitigate overheads, and inform the audience about what's being done to reduce impacts going forward.  Back
 
Topics:
Data Center and Cloud Infrastructure, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81017
Streaming:
Download:
Share:
 
Abstract:
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine ...Read More
Abstract:
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine learning algorithm should be shareable and easily implementable with possible options of frameworks; enable machine learning engineers to do end-to-end training pipelines that distribute and parallelize over many machines; training models should be automated and allow easy access to vast eBay datasets; engineers should be able to search past job submissions, view results, and share with others. We have built Krylov from the ground up, leveraging JVM, Python, and Go as the main technologies to build the Krylov components, while standing in shoulder of giants of technology such as Docker, Kubernetes, and Apache Hadoop. Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov HPC cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle.  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8277
Streaming:
Download:
Share:
 
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC job engines in conjunction with both Linux and ...Read More
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC job engines in conjunction with both Linux and Windows machines used for virtual desktop infrastructure. The demonstration will focus on a very minimal VMware cluster deployment using VSAN storage to host both the Linux HPC multi node cluster for CUDA workloads and a VMware Horizon view deployment for Linux and Windows Virtual Desktops performing DirectX, OpenGL, and CUDA based visualization workloads as used by engineering and analysis power users.  Back
 
Topics:
Data Center and Cloud Infrastructure, GPU Virtualization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8209
Streaming:
Share:
 
Abstract:
Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you''ll learn what containers are, how they are orchestrated to ...Read More
Abstract:
Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you''ll learn what containers are, how they are orchestrated to run together in the cloud, and how communication among containers works. You''ll get a snapshot of current support from the ecosystem, and gain insight into why NVIDIA is leading the charge to provide best performance and usability.  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8642
Streaming:
Download:
Share:
 
Abstract:
Attendees will learn how NVIDIA''s Jetson TX-series processors can be scaled out to create an adaptive HPC and Supercomputing platform for bespoke deployments and edge computing environments. Advancements in composable infrastructure technology now m ...Read More
Abstract:
Attendees will learn how NVIDIA''s Jetson TX-series processors can be scaled out to create an adaptive HPC and Supercomputing platform for bespoke deployments and edge computing environments. Advancements in composable infrastructure technology now make it possible to pool and orchestrate Jetson processors for deployments with specialized parallel computing requirements. Use cases include Jetson deployments in non-embedded environments for edge computing where traditional HPC architectures are not hospitable. Clusters of NVIDIA Jetson TX- devices can be deployed in edge compute environments connected to arrays of sensors for neural net training, pattern recognition, and deep learning. Applications for autonomous transportation can also benefit from clustering massive numbers of Jetson TX- devices to simulate fleets of vehicles to train machine learning algorithms in parallel. Jetson use cases can be expanded well beyond embedded applications when deployed with PCIe-based fabric composable infrastructure technology, permitting 16x networking performance improvement over the embedded 1Gb Ethernet interface.  Back
 
Topics:
Data Center and Cloud Infrastructure, Graphics and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8539
Streaming:
Download:
Share:
 
Abstract:
Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode an ...Read More
Abstract:
Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode and decode to accelerate the processing of video contents. In this session, we will explore performance and resource utilization of various workloads that leverage different capabilities of GPU like graphics, compute and H.264 HW encode / decode. Nvidia virtualized GPUs and VMware vSphere brings in tremendous combined benefits for both GPU-based workloads and data center management via virtualization. We will present results of our research on running diverse workloads on vSphere platform using Nvidia GRID GPUs. We explore vSphere features of Suspend/Resume and vMotioning of vGPU based virtual machines. We will quantify benefits of vGPU for data center management using VMware vSphere and describe techniques for efficient management of workloads and datacenter resources.  Back
 
Topics:
Data Center and Cloud Infrastructure, GPU Virtualization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8250
Streaming:
Download:
Share:
 
Abstract:
NVIDIA DCGM is a monitoring and management daemon, GPU Diagnostic, and SDK geared towards managing GPUs in a cluster environment. DCGM is widely deployed both internally at NVIDIA and externally at large HPC labs and Cloud Service Providers. We will ...Read More
Abstract:
NVIDIA DCGM is a monitoring and management daemon, GPU Diagnostic, and SDK geared towards managing GPUs in a cluster environment. DCGM is widely deployed both internally at NVIDIA and externally at large HPC labs and Cloud Service Providers. We will go over the core features of DCGM and features that have been added in the last year. We will also demonstrate how DCGM can be used to monitor GPU health and alert on GPU errors using both the dcgmi command-line tools and the DCGM SDK.  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8505
Streaming:
Download:
Share:
 
Abstract:
Pre and post process CAE data near your cloud compute to save time, money, and IT headaches. Whether you're building the next supercar or visualizing a medical dataset, you can now eliminate the need for data transfer to and from on-premises by runn ...Read More
Abstract:
Pre and post process CAE data near your cloud compute to save time, money, and IT headaches. Whether you're building the next supercar or visualizing a medical dataset, you can now eliminate the need for data transfer to and from on-premises by running professional design and engineering applications in the cloud. See new Oracle Cloud Infrastructure GPUs in live demonstrations of data transfer, CAD pre-processing, and CAE post processing.  Back
 
Topics:
Data Center and Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8988
Streaming:
Share:
 
Abstract:
New algorithms leverage the algebraic strengths of GPUs far beyond rendering visuals. They unlock opportunities for data analysis leveraging computer vision and artificial neural networks. Earlier this year we set out to investigate the deployment of ...Read More
Abstract:
New algorithms leverage the algebraic strengths of GPUs far beyond rendering visuals. They unlock opportunities for data analysis leveraging computer vision and artificial neural networks. Earlier this year we set out to investigate the deployment of power-efficient GPUs in commodity hardware. We did not focus on supercomputers, but instead exercised GPUs within a homogeneous set of compute nodes like those used to scale Apache Hadoop or Apache Spark clusters. Our work focused on inference deploying models and GPU acceleration for analysis tasks such as feature extraction, identification, and classification not on training or building models, tasks likely better suited to HPC-class machines. Our experiments investigated applications that aren't feasible at scale on existing CPUs, such as malware detection and object detection in images. We'll cover inference on Tesla P4 GPUs in scale-out architectures, leveraging nvidia-docker, Caffe, Torch, and TensorRT.  Back
 
Topics:
Data Center and Cloud Infrastructure, Deep Learning and AI, AI for Accelerated Analytics
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7190
Download:
Share:
 
Abstract:
Why do HPC in a cloud? How to do HPC (aaS), with GPU passthrough, in OpenStack? How to create full GPU HPC cluster, from scratch, on demand, under five minutes, all equipped with NVIDIA's DCGM and CUDA environment, and deep learning libraries/framew ...Read More
Abstract:
Why do HPC in a cloud? How to do HPC (aaS), with GPU passthrough, in OpenStack? How to create full GPU HPC cluster, from scratch, on demand, under five minutes, all equipped with NVIDIA's DCGM and CUDA environment, and deep learning libraries/frameworks? Hybrid clouds with GPUs spanning OpenStack and AWS? How to easily and automatically move HPC user data and workloads between the private and public cloud? How to dynamically scale a virtualized HPC cluster, both horizontally (within private cloud) and vertically (to public cloud)? We'll answer these questions during a deep dive into the world of HPC on top of OpenStack and AWS. We'll discuss many ways OpenStack private clouds can be used for bursting HPC workloads, HPC-as-a-service, XaaS (anything-as-a-service), and creating hybrid clouds composed of on-prem private/community cloud OpenStack deployment, which dynamically scale them to public clouds, like AWS. Session includes demo.  Back
 
Topics:
Data Center and Cloud Infrastructure, Federal, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7161
Download:
Share:
 
Abstract:
Learn how you can scale your Deep Learning & traditional HPC-based workloads in Azure using powerful NVIDIA Tesla-based GPUs and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. This is a great session ...Read More
Abstract:

Learn how you can scale your Deep Learning & traditional HPC-based workloads in Azure using powerful NVIDIA Tesla-based GPUs and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. This is a great session to learn about Azure's accelerated offerings and roadmap in the future. This session will cover specific announcements on what's to come in both hardware and software. This is a session you don't want to miss! 

  Back
 
Topics:
Data Center and Cloud Infrastructure, Deep Learning and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7204
Download:
Share:
 
Abstract:
Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ...Read More
Abstract:

Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ww look at how to integrate Heterogenous Computing with Cloud technologies.

  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7296
Download:
Share:
 
Abstract:
Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, ...Read More
Abstract:
Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, GPU-enabled high performance computing solution for machine learning and data science by drawing on the experiences gained while IBM Research built its Cognitive Computing Cluster. We'll start by discussing how to build a secure, shared-resource computing cluster optimized for deep learning. Next, we'll cover how to provide deep learning frameworks supporting speech, vision, language, and text processing and their underlying primitives. Finally, we'll discuss how to build a best practice knowledge base to improve research quality and accelerate discovery.  Back
 
Topics:
Data Center and Cloud Infrastructure, Deep Learning and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7350
Download:
Share:
 
Abstract:
M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by ...Read More
Abstract:
M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by Monash's R@CMon Research Cloud team. Built to support Monash University's high-throughput instrument processing requirements, M3 is half-half GPU-accelerated and CPU-only. We'll discuss the design and tech used to build this innovative platform as well as detailing approaches and challenges to building GPU-enabled and HPC clouds.  Back
 
Topics:
Data Center and Cloud Infrastructure, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7366
Download:
Share:
 
Abstract:
We'll explain the strategy on how to design large-scale deep learning platforms using HPC and Docker technology to realize high-performance training and scoring on GPU clusters. Topics will include how to analyze the deep learning GPU application's ...Read More
Abstract:
We'll explain the strategy on how to design large-scale deep learning platforms using HPC and Docker technology to realize high-performance training and scoring on GPU clusters. Topics will include how to analyze the deep learning GPU application's characteristics, such as GPU memory bandwidth, memory capacity, and GPU utilization when run on a GPU cluster with Teye tool; how to handle big data and improve the data reading performance with Lustre; how to optimize the network communication with IB technology; and how to ease deployment and scheduling different deep learning frameworks on a large GPU cluster with Docker.  Back
 
Topics:
Data Center and Cloud Infrastructure, Deep Learning and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7678
Download:
Share:
 
Abstract:
Learn why Scyld Cloud Workstation, a browser-based, high-quality, low-bandwidth, 3D-accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need f ...Read More
Abstract:

Learn why Scyld Cloud Workstation, a browser-based, high-quality, low-bandwidth, 3D-accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated -- allowing for easy integration with industry security policies.

  Back
 
Topics:
Data Center and Cloud Infrastructure, Other, Deep Learning and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7818
Download:
Share:
Data Center, Cloud Computing & HPC
Presentation
Media
Abstract:
NVIDIA GPU Cloud (NGC) is a GPU-accelerated platform that runs everywhere. NGC manages a catalog of fully integrated and optimized deep learning framework containers that are composed, tested, and tuned by NVIDIA to take full advantage of NVIDIA Volt ...Read More
Abstract:
NVIDIA GPU Cloud (NGC) is a GPU-accelerated platform that runs everywhere. NGC manages a catalog of fully integrated and optimized deep learning framework containers that are composed, tested, and tuned by NVIDIA to take full advantage of NVIDIA Volta powered infrastructure in the cloud or on-premises, providing massive deep learning performance and flexibility for analytics, research, and data science. In this session, you will learn how NGC can make it easier for you to get up and running with AI and deep learning.  Back
 
Topics:
Data Center, Cloud Computing & HPC
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1732
Download:
Share:
 
 
Topics:
Data Center, Cloud Computing & HPC
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1719
Share:
 
Abstract:
Azure N-series VMs powered by NVIDIA GPUs enable a range of new accelerated scenarios. Learn how you can take advantage of different GPUs offerings in Microsoft Azure to accelerate your scenarios like ray traced rendering, machine learning, remote vi ...Read More
Abstract:
Azure N-series VMs powered by NVIDIA GPUs enable a range of new accelerated scenarios. Learn how you can take advantage of different GPUs offerings in Microsoft Azure to accelerate your scenarios like ray traced rendering, machine learning, remote visualization, etc  Back
 
Topics:
Data Center, Cloud Computing & HPC
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1724
Share:
 
Abstract:
TBA
 
Topics:
Data Center, Cloud Computing & HPC
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1730
Share:
Deep Learning and AI
Presentation
Media
Abstract:
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of t ...Read More
Abstract:
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing used to accelerate large scale deployments in HPC and artificial intelligence.  Back
 
Topics:
Deep Learning and AI
Type:
Special Event
Event:
GTC Israel
Year:
2018
Session ID:
SIL8145
Streaming:
Share:
 
Abstract:
The University of Queensland needed to solve problems at a scale that had never been contemplated before. Enormous challenges in the field of scientific research imaging, modeling and analysis on the path to cure diseases such as Alzheimer&rsquo ...Read More
Abstract:

The University of Queensland needed to solve problems at a scale that had never been contemplated before. Enormous challenges in the field of scientific research imaging, modeling and analysis on the path to cure diseases such as Alzheimer’s and increasingly demanding cases in machine vision for digital skin cancer pathology were all mounting up against traditional HPC infrastructure. UQ took a considered leap towards GPU. This is UQ's architectural journey – how it built one of the most successful supercomputing facilities the state had ever created, the ways in which key components and architectural choices play a pivotal role in artificial intelligence and inference solving performance and a “whole of system” balance approach to getting HPC “right” in the era of GPU. A presentation for C-level, AI practitioners and HPC professional attendees alike, this talk will provide something useful and refreshing for all that have the chance to attend.

  Back
 
Topics:
Deep Learning and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS8015
Streaming:
Download:
Share:
 
Abstract:
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation, AI and computational tools to s ...Read More
Abstract:

In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation, AI and computational tools to seek a new cure for cancer or predict hospitalisation prevention. This presentation will demonstrate visual analytics techniques that use various mixed reality approaches that link simulations, AI with collaborative, complex and interactive data exploration, placing the human-in-the-loop. In the recent days, thanks to advances in graphics hardware and compute power (especially GPGPU and modern Big Data / HPC infrastructures), the opportunities are immense, especially in improving our understanding of complex models that represent the real- or hybrid-worlds. Use cases will be drawn from ongoing research at CSIRO Data61, and the Expanded Perception and Interaction Centre (EPICentre) UNSW using world class GPU clusters and high-end visualisation capabilities. Highlight will be on Defence projects, Massive Graph Visualisation and Medicine.

  Back
 
Topics:
Deep Learning and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80017
Download:
Share:
 
Abstract:
Since the concept of Turing machine has been first proposed in 1936, the capability of machines to perform intelligent tasks went on growing exponentially. Artificial Intelligence (AI), as an essential accelerator, pursues the target of making machin ...Read More
Abstract:
Since the concept of Turing machine has been first proposed in 1936, the capability of machines to perform intelligent tasks went on growing exponentially. Artificial Intelligence (AI), as an essential accelerator, pursues the target of making machines as intelligent as human beings. It has already reformed how we live, work, learning, discover and communicate. In this talk, I will review our recent progress on AI by introducing some representative advancements from algorithms to applications, and illustrate the stairs for its realization from perceiving to learning, reasoning and behaving. To push AI from the narrow to the general, many challenges lie ahead. I will bring some examples out into the open, and shed lights on our future target. Today, we teach machines how to be intelligent as ourselves. Tomorrow, they will be our partners to step into our daily life. HPC services are rapidly evolving to meet the demands of an AI-intensive research landscape. At the University of Sydney we have embraced the rapid change in technology to built a dynamic and hybrid HPC called Artemis that focusing on smart resourcing and a heterogenous architecture to support our academics and students with their ground breaking research. Partnering with Artemis 3 was the first university supercomputer to deploy the NVIDIA V100 at scale and now represents a flagship capability for the University and its partners.  Back
 
Topics:
Deep Learning and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80031
Streaming:
Download:
Share:
 
Abstract:
Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the stat ...Read More
Abstract:

Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning, and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology and in-network computing use to accelerates large scale deployments in HPC and artificial intelligence.

  Back
 
Topics:
Deep Learning and AI
Type:
Talk
Event:
GTC Israel
Year:
2017
Session ID:
SIL7120
Download:
Share:
 
Abstract:
We'll introduce PowerAI and the S822LC for HPC. PowerAI is an optimized software stack for AI designed to take advantage of Power processor performance features, and in particular of the new NVLink interface between Power and the NVIDIA Tesla P100 G ...Read More
Abstract:
We'll introduce PowerAI and the S822LC for HPC. PowerAI is an optimized software stack for AI designed to take advantage of Power processor performance features, and in particular of the new NVLink interface between Power and the NVIDIA Tesla P100 GPU accelerator, first introduced with S822LC for HPC. We'll introduce performance enhancements of the PowerAI, including IBM Caffe with its performance optimization centered at enhance communications and other enhancements to frameworks, libraries, and the deep learning ecosystem for Power. With its high-performance NVLink connection, the new generation S822LC for HPC server is the first that offers a sweet spot of scalability, performance, and efficiency for deep learning applications. Together, these hardware and software enhancements enabled the first release of PowerAI to achieve best in industry training for Alexnet and VGGnet.  Back
 
Topics:
Deep Learning and AI, Tools and Libraries, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7368
Download:
Share:
 
Abstract:
A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for ...Read More
Abstract:
A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for inter-device and inter-node communication provided by these frameworks are often not optimal. Using examples from several frameworks, we demonstrate that linear strong scaling to many nodes and many devices can be achieved augmenting deep learning frameworks with CUDA-aware MPI allreduce and allgather operations, which allow them to be used in an HPC setting where multi-GPU nodes are augmented with high-speed Infiniband interconnects. We'll show that these operations allow us to quickly train very large speech recognition models.  Back
 
Topics:
Deep Learning and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7543
Download:
Share:
 
Abstract:
Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance ...Read More
Abstract:

Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions.

  Back
 
Topics:
Deep Learning and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7834
Download:
Share:
Deep Learning and AI Frameworks
Presentation
Media
Abstract:
Learn how to implement and analyze a simple deep learning input pipeline pattern that prevents slowdowns from input queue exhaustion on accelerated HPC systems with limited impact to model performance. Queue exhaustion occurs because the throughput-d ...Read More
Abstract:
Learn how to implement and analyze a simple deep learning input pipeline pattern that prevents slowdowns from input queue exhaustion on accelerated HPC systems with limited impact to model performance. Queue exhaustion occurs because the throughput-driven dequeue rate is greater than the enqueue rate, which is bound by storage access bandwidth. In this session we will describe a technique that prevents queue exhaustion by artificially slowing the effective dequeue rate, without sacrificing substantial validation set performance. An example using TensorFlow is presented, and the resultant optimization step speedup and model performance are analyzed across several HPC resource configurations.  Back
 
Topics:
Deep Learning and AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8674
Streaming:
Download:
Share:
 
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distribute ...Read More
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications and benchmarks.  Back
 
Topics:
Deep Learning and AI Frameworks
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1716
Share:
 
Abstract:
This talk is a summary about the ongoing HPC visualization activities, as well as a description of the technologies behind the developer-zone shown in the booth.
 
Topics:
Deep Learning and AI Frameworks
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1735
Download:
Share:
 
 
Topics:
Deep Learning and AI Frameworks
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1736
Download:
Share:
Federal
Presentation
Media
Abstract:
We'll highlight Sentinel, a system for real-time in-situ intelligent video analytics on mobile computing platforms. Sentinel combines state-of-the-art techniques in HPC with dynamic mode decomposition (DMD), a proven method for data reduction and an ...Read More
Abstract:
We'll highlight Sentinel, a system for real-time in-situ intelligent video analytics on mobile computing platforms. Sentinel combines state-of-the-art techniques in HPC with dynamic mode decomposition (DMD), a proven method for data reduction and analysis. By leveraging CUDA, our early system prototype achieves significantly better-than-real-time performance for DMD-based background/foreground separation on high-definition video streams, thereby establishing the efficacy of DMD as the foundation on which to build higher level real-time computer vision techniques. We'll present an overview of the Sentinel system, including the application of DMD to background/foreground separation in video streams, and outline our ongoing efforts to enhance and extend the prototype system.  Back
 
Topics:
Federal, Intelligent Video Analytics and Smart Cities, In-Situ and Scientific Visualization, Deep Learning and AI, Computer Vision and Machine Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7685
Download:
Share:
Finance
Presentation
Media
Abstract:
The Pascal generation of GPUs is bringing an increased compute density to data centers and NVLink on IBM Power 8 CPUs makes this compute density ever more accessible to HPC applications. However, reduced memory-to-compute ratios present some uni ...Read More
Abstract:

The Pascal generation of GPUs is bringing an increased compute density to data centers and NVLink on IBM Power 8 CPUs makes this compute density ever more accessible to HPC applications. However, reduced memory-to-compute ratios present some unique challenges for the cost of throughput-oriented compute. We'll present a case study of moving up production Monte Carlo GPU codes to IBM's "Minsky" S822L servers with NVIDIA Tesla P100 GPUs.

  Back
 
Topics:
Finance, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7668
Download:
Share:
GPU Virtualization
Presentation
Media
Abstract:
VDI users across multiple industries can now harness the power of the world's most advanced virtual workstation to enable increasingly demanding workflows. This session brings together graphics virtualization thought leaders and experts from ...Read More
Abstract:

VDI users across multiple industries can now harness the power of the world's most advanced virtual workstation to enable increasingly demanding workflows. This session brings together graphics virtualization thought leaders and experts from across the globe who have deep knowledge of NVIDIA virtual GPU architecture and years of experience implementing VDI across multiple hypervisors. Panelists will discuss how they transformed organizations, including how they leveraged multi-GPU support to boost GPU horsepower for photorealistic rendering and data-intensive simulation and how they used GPU-Accelerated deep learning or HPC VDI environments with ease using NGC containers.

  Back
 
Topics:
GPU Virtualization
Type:
Panel
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9870
Streaming:
Download:
Share:
 
Abstract:
Universities have increasing demand for Deep Learning/AI classrooms or labs but are constrained by cost and availability of physical classroom labs. Students require access to a lab 24x7 to work on projects and assignments and find that they have to ...Read More
Abstract:
Universities have increasing demand for Deep Learning/AI classrooms or labs but are constrained by cost and availability of physical classroom labs. Students require access to a lab 24x7 to work on projects and assignments and find that they have to wait for HPC clusters to be free when submitting their jobs for training. In the past, students and researchers are tethered and require expensive data scientist workstations. Virtual GPUs provide a highly secure, flexible, accessible solution to power AI and deep learning coursework and research. Learn how Nanjing University is using virtual vGPUs with NGC for teaching AI and Deep learning courses, empowering researchers with the GPU power they need, and providing students with mobility to do coursework anywhere. Similarly, discover how other universities are maximizing their data center resources by running VDI, HPC and AI workloads on common infrastructure and even how companies like Esri are using virtualized deep learning classes to educate their user base. Discover the benefits of vGPUs for AI and how you can setup your environment to achieve optimum performance, as well as the tools you can use to manage and monitor your environment as you scale.  Back
 
Topics:
GPU Virtualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9888
Streaming:
Download:
Share:
Graphics Virtualisation
Presentation
Media
Abstract:
What if you could combine VDI, HPC, Deep Learning and AI all together on one platform with VMware vSphere 6.7 and NVIDIA virtual GPU (vGPU) technology? In this session, we'll guide you through how to set up a uniform, well-performing platform ...Read More
Abstract:
What if you could combine VDI, HPC, Deep Learning and AI all together on one platform with VMware vSphere 6.7 and NVIDIA virtual GPU (vGPU) technology? In this session, we'll guide you through how to set up a uniform, well-performing platform. We will cover the virtualisation of HPC, the sharing of compute resources with VDI, and the implementation of mixed workloads leveraging NVIDIA vGPU technology, and automation of the platform. If you want to have fun at work while preparing for the the future, don't miss this N3RD session!  Back
 
Topics:
Graphics Virtualisation
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8156
Streaming:
Download:
Share:
 
Abstract:
With the latest release of NVIDIA vGPU software the world's most powerful virtual workstation gets even more powerful. Learn more about how our latest enhancements enable your data center to be more agile and scale your data center to meet t ...Read More
Abstract:

With the latest release of NVIDIA vGPU software the world's most powerful virtual workstation gets even more powerful. Learn more about how our latest enhancements enable your data center to be more agile and scale your data center to meet the needs of thousands to ten-thousands and even hundreds of thousands of users. The newest release of NVIDIA virtual GPU software adds support for more powerful VMs, which can be managed from the cloud or from the on premises data center, or private cloud. With support for live migration of GPU-enabled VMs, IT can truly deliver high availability and a quality user experience. IT can further ensure they get the most out of their investments with the ability to re-purpose the same infrastructure that runs VDI during the day to run HPC and other compute workloads at night. In this session, we will unveil the new features of NVIDIA vGPU solutions and demonstrate how GPU virtualization enables you to easily support the most demanding users and scale virtualized, digital workspaces on an agile and flexible infrastructure, from the cloud and as well as the on premises data center.

  Back
 
Topics:
Graphics Virtualisation
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8513
Streaming:
Download:
Share:
Graphics and AI
Presentation
Media
Abstract:
We'll discuss the challenges of the GPU/DRAM bandwidth in high-performance systems. Graphics memory is a key differentiator for addressing these challenges in AI in areas that from the data center to the smart edge. We'll compare discrete GDDR memo ...Read More
Abstract:
We'll discuss the challenges of the GPU/DRAM bandwidth in high-performance systems. Graphics memory is a key differentiator for addressing these challenges in AI in areas that from the data center to the smart edge. We'll compare discrete GDDR memory and high-bandwidth memory to identify solution space for these options. We'll also discuss how applications in graphics, HPC, and AI benefit from more bandwidth during presentations at the Micron booth on the exhibit floor.  Back
 
Topics:
Graphics and AI, HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9968
Streaming:
Download:
Share:
 
Abstract:
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation and computational tools like a micros ...Read More
Abstract:
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation and computational tools like a microscope, to seek a new cure for cancer or predict hospitalisation prevention. In this presentation, we will demonstrate new visual analytics techniques that use various mixed reality approaches that link simulations with collaborative, complex and interactive data exploration, placing the human-in-the-loop. In the recent days, thanks to advances in graphics hardware and compute power (especially GPGPU and modern Big Data / HPC infrastructures), the opportunities are immense, especially in improving our understanding of complex models that represent the real- or hybrid-worlds. Use cases presented will be drawn from ongoing research at CSIRO, and Expanded Percaption and Interaction Centre (EPICentre) using world class GPU clusters and visualisation capabilities.  Back
 
Topics:
Graphics and AI, In-Situ and Scientific Visualization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8317
Streaming:
Download:
Share:
HPC and AI
Presentation
Media
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resou ...Read More
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resources from Google Cloud to supply their users with the computing power they need, from exploration and modeling to visualization.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91040
Streaming:
Download:
Share:
 
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to ...Read More
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to exceed the limits of a single GPU and how they can reduce computational time for larger problem sizes.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9158
Streaming:
Download:
Share:
 
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the ...Read More
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the CGYRO code, built by researchers at General Atomics to effectively and efficiently simulate plasma evolution over multiple scales that range from electrons to heavy ions. Fusion plasma simulations are compute- and memory-intensive and usually run on leadership-class, GPU-Accelerated HPC systems like Oak Ridge National Laboratory's Titan and Summit. We'll explain how we designed and implemented CGYRO to make good use of the tens of thousands of GPUs on such systems, which provide simulations that bring us closer to fusion as an abundant clean energy source. We'll also share benchmarking results of both CPU- and GPU-Based systems.  Back
 
Topics:
HPC and AI, AI and DL Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9202
Streaming:
Download:
Share:
 
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different ...Read More
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9476
Streaming:
Download:
Share:
 
Abstract:
Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frame ...Read More
Abstract:

Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

  Back
 
Topics:
HPC and AI, Deep Learning and AI Frameworks
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9501
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, port ...Read More
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, porting, and performance tuning to achieve performance on NVIDIA Tesla V100 GPUs, while maintaining portability to our other supported platforms. We'll explain why this process poses many challenges and how LLNL code teams have worked with the Sierra Center of Excellence to build experience and expertise in porting complex multi-physics simulation tools to NVIDIA GPU-Based HPC systems. We'll also provide an overview of this porting process, the abstraction technologies employed, lessons learned, and current challenges.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9512
Streaming:
Download:
Share:
 
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We ...Read More
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.  Back
 
Topics:
HPC and AI, AI Application Deployment and Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share:
 
Abstract:
We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's ...Read More
Abstract:

We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's most challenging science and national security problems. We will discuss how this multi-year effort paid off with exciting possibilities for new science. We'll also outline using GPUs in our traditional HPC platforms and workflows is adding an exciting new dimension to simulation-based science, prompting LLNL to rethink how we perform future simulations as intelligent simulation. We'll give an overview of the application preparation process that led up to Sierra's deployment, as well as look at current and planned research aimed at riding the AI and machine learning waves in pursuit of game-changing science.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9560
Streaming:
Download:
Share:
 
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those r ...Read More
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those running on a new breed of dense, GPU-Accelerated servers such as the Summit and Sierra supercomputers and the NVIDIA DGX line of servers.  Back
 
Topics:
HPC and AI, Tools and Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9653
Streaming:
Download:
Share:
 
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thous ...Read More
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thousands of GPUs. We'll also discuss the state of integration of NCCL in deep learning frameworks.  Back
 
Topics:
HPC and AI, Deep Learning and AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9656
Streaming:
Share:
 
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate com ...Read More
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth bound and codes like Breadth First Search that have a dynamic communication pattern.  Back
 
Topics:
HPC and AI, Tools and Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9677
Streaming:
Download:
Share:
 
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density func ...Read More
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density functional theory methods. We'll show how our work is applicable to different atom types and architectures and how it avoids relying on the physical model. Instead, it uses a purely mathematical representation, which reduces the need for human intervention.  Back
 
Topics:
HPC and AI, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9843
Streaming:
Download:
Share:
 
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the r ...Read More
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the recent trend and solutions for accelerating materials discovery and discuss future prospects.  Back
 
Topics:
HPC and AI, Industrial Inspection
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9967
Streaming:
Download:
Share:
 
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it wil ...Read More
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it will lead the future memory trend owing to the growth in AI, ML, and HPC applications. We will discuss a technical overview of HBM technology and the future trends of HBM.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9978
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time whe ...Read More
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time when Moore's Law is tapering off, and the slowdown in the speed of increase single-threaded performance, thus requiring a new compute paradigm, accelerated computing, powered by massively parallel GPUs.  Back
 
Topics:
HPC and AI, AI and DL Research
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9981
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the ...Read More
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the complete requirements of the data life cycle including initial acquisition, processing, inference, long-term storage, and driving data back into the field to sustain ever-growing processes of improvement. As the data landscape evolves with emerging requirements, the relationship between compute and data is undergoing a fundamental transition. We will provide examples of data life cycles in production triggering diverse architectures from turnkey reference systems with DGX and DDN A3I to tailor-made solutions.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9983
Streaming:
Share:
 
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data ce ...Read More
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment. The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. However, if the desired application is not available on NGC, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting. HPC Container Maker (HPCCM) [1] is an open-source project that addresses the challenges of creating HPC application containers. HPCCM encapsulates into modular building blocks the best practices of deploying core HPC components with container best practices, to reduce container development effort, minimize image size, and take advantage of image layering. HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image from the specification details of how to configure, build, and install a component. This separation also enables the best practices of HPC component deployment to transparently evolve over time.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80022
Download:
Share:
 
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。 ...Read More
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8029
Download:
Share:
 
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the s ...Read More
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the state-of-the-art system designed for HPC and cognitive computing. This system also introduces NVLink 2.0 high-speed connectivity between CPU and GPU, along with coherent device memory. System characteristics such as CPU and GPU compute and memory throughput, NVLink latency, and bandwidth play key roles in application performance. We'll demonstrate how each of these influences application performance through a case study.  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8309
Streaming:
Download:
Share:
 
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain sci ...Read More
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain science application development projects focusing on preparing codes for efficient use on Summit. Over the last three years, these teams have developed and executed a development plan based on detailed information about Summit's architecture and system software stack. This presentation will highlight the progress made by the teams that have used Titan, the 27 PF Cray XK7 with NVIDIA K20X GPUs, SummitDev, an early IBM Power8+ access system with NVIDIA P100 GPUs, and since very recently Summit, OLCF's new IBM Power9 system with NVIDIA V100 GPUs. The program covers a wide range of domain sciences, with applications including ACME, DIRAC, FLASH, GTC, HACC, LSDALTON, NAMD, NUCCOR, NWCHEM, QMCPACK, RAPTOR, SPECFEM, and XGC  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8908
Streaming:
Download:
Share:
 
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use t ...Read More
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use the same Python code on different platforms like X86, RISC and ARM. The python development community is growing fast and many community members are interested on how to start moving to GPU accelerated programming but don't know how to start and what is needed. We'll go through the steps and adoption path to start developing python solutions taking advantage of GPU acceleration, including some details, advantages and challenges for the strongest and more popular python3 modules to be used with GPUs: scikit-cuda, PyCUDA, Numba, cudamat and cupy. Some code samples and programs execution statistics will be shown as a performance analysis exercising as well.  Back
 
Topics:
HPC and AI, Tools and Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8214
Streaming:
Download:
Share:
 
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based ...Read More
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based GPUs as well as GPU systems with the NVLink interface.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8999
Streaming:
Share:
 
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal example ...Read More
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal examples of this complexity. We use the scalable FLASH code to model these astrophysical cataclysms, incorporating hydrodynamics, thermonuclear kinetics, and self-­?gravity across considerable spans in space and time. Using OpenACC and GPU-­?enabled libraries coupled to new NVIDIA GPU hardware capabilities, we have improved the physical fidelity of these simulations by increasing the number of evolved nuclear species by more than an order-­?of-­? magnitude. I will discuss these and other performance improvements to the FLASH code on the Summit supercomputer at Oak Ridge National Laboratory.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8926
Streaming:
Share:
 
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and ...Read More
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and best practices from real applications. Developers interested in writing OpenMP codes for GPUs will learn how best to achieve good performance and portability.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8344
Streaming:
Share:
 
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific an ...Read More
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific and engineering workflows. In this talk, Vic will discuss an application of machine learning to develop a fast-running surrogate model that captures the dynamics of an industrial multiphase fluid flow. He will also discuss an improved population search method that can help the analyst explore a high-dimensional parameter space to optimize production while reducing the model uncertainty.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8828
Streaming:
Download:
Share:
 
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distribu ...Read More
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distributed shared memory systems can be carried over to CPUs and GPUs in a cluster. Unicorn is designed for easy programmability and provides a deterministic execution environment. Device, node and cluster management are completely handled by the runtime and no related API is provided to the application programmer. Load balancing, scheduling and scalability are also fully transparent to the application code. Programs written on one cluster can be run verbatim on a different cluster. Application code is agnostic to data placement within the cluster as well as the changes in network interfaces and data availability pattern. Unicorn''s programming model, being deterministic, by design eliminates several data races and deadlocks. Unicorn''s runtime employs several data optimizations including prefetching and subtask streaming in order to overlap communication and computation. Unicorn employs pipelining at two levels first to hide data transfer costs among cluster nodes and second to hide transfer latency between CPUs and GPUs on all nodes. Among other optimizations, Unicorn''s work-stealing based scheduler employs a two-level victim selection technique to reduce the overhead of steal operations. Further, it employs special proactive and aggressive stealing mechanism to prevent the said pipelines from stalling (during a steal operation). We will showcase the scalability and performance of Unicorn on several scientific workloads. We will also demonstrate the load balancing achieved in some of these experiments and the amount of time the runtime spends in communications. We find that parallelization of coarse-grained applications like matrix multiplication or 2D FFT using our system requires only about 30 lines of C code to set up the runtime. The rest of the application code is regular single CPU/GPU implementation. This indicates the ease of extending sequential code to a parallel environment. We will be showing the efficiency of our abstraction with minimal loss on performance on latest GPU architecture like Pascal and Volta. Also we will be comparing our approach to other similar implementations like StarPU-MPI and G-Charm.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8565
Streaming:
Download:
Share:
 
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. Th ...Read More
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. There is a need for a novel approach towards the design of large scale collective communication algorithms for CUDA aware MPI runtimes. The session will deep dive into our implementation of these collectives and its performance advantages on IBM POWER 9 Systems with NVIDIA V100 GPUs for OSU benchmark and Distributed TensorFlow.  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8306
Streaming:
Download:
Share:
 
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer pr ...Read More
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer processes are intrinsically coupled and must be considered simultaneously in complex domains associated with devices such as gas-turbine and rocket engines. The problem is compounded by the effects of turbulence and high-pressure phenomena, which require treatment of nonideal fluid mixtures at supercritical conditions. The combination of complex multicomponent property evaluations along with the computational grid resolution requirements makes these simulations expensive and cumbersome. Recent advances in high performance computing (HPC) systems, such as the graphics processing unit (GPU) based architectures, provides an opportunity for significant advances in dealing with these complexities while reducing the time to solution.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8910
Streaming:
Share:
 
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale ...Read More
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale data problems. The LTFB approach creates a set of Deep Neural Network (DNN) models and trains each instance of these models independently and in parallel. Periodically, each model selects another model to pair with, exchanges models, and then run a local tournament against held-out tournament datasets. The winning model continues training on the local training datasets. This new approach maximizes computation and minimizes amount of synchronization required in training deep neural network, a major bottleneck in existing synchronous deep learning algorithms. We evaluate our proposed algorithm on two HPC machines at Lawrence Livermore National Laboratory including an early access IBM Power8+ with NVIDIA Tesla P100 GPUs machine. Experimental evaluations of the LTFB framework on two popular image classification benchmark: CIFAR10 and ImageNet, show significant speed up compared to the sequential baseline.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8829
Streaming:
Share:
 
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter ...Read More
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter and environmental challenges, network performance and optimization, data pipeline and storage challenges as well as workload orchestration and optimization. You will learn more about open architectures for HPC, AI and Deep Learning, combining flexible compute architectures, rack scale platforms, and software-defined networking and storage, to provide a scalable software-defined deep learning environment. We will discuss strategies, providing insight into everything from specialty compute for training vs. inference to high performance storage for data workflows to orchestration and workflow management tools. We will also discuss deploying deep learning environments from development to production at scale from private cloud to public cloud.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8972
Streaming:
Download:
Share:
 
Abstract:
Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamen ...Read More
Abstract:

Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We''ll also discuss the expected performance in combination with the new computing platforms that emerged last year.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8474
Download:
Share:
 
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take ad ...Read More
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take advantage of all of Julia's most powerful features: generic programming, n-dimensional kernels, higher order functions and custom numeric types. We'll give overview the compiler's implementation and performance characteristics via the Rodinia benchmark suite. We'll show how these techniques enable highly flexible AI libraries with state-of-the-art performance, and allow a major government user to run highly computational threat modelling on terabytes of data in real time.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8269
Download:
Share:
 
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will ...Read More
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will also cover current technical and business challenges, and the future considerations for next-generation HBM line-up and many more.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8949
Streaming:
Download:
Share:
 
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles t ...Read More
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis.  Back
 
Topics:
HPC and AI, Telecommunications, In-Situ and Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8561
Streaming:
Download:
Share:
 
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms ...Read More
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms to market faster, and getting the most productivity out of your researcher's. At this session, Greg Schmidt introduces the new HPE Apollo 6500 Gen10 System with NVLink for the enterprise. This innovative system design allows for a high degree of flexibility with a range of configuration and topology options to match your workloads. Learn how the Apollo 6500 unlocks business value from your data for AI.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8969
Streaming:
Share:
 
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file sys ...Read More
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file systems, and MPI backends. We'll discuss examples of how deep learning workflows are being deployed on next-generation systems at the Oak Ridge Leadership Computing Facility. We'll share benchmarks between native compiled versus containers on Power systems, like Summit, as well as best practices for deploying learning and models on HPC resources on scientific workflows.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8551
Streaming:
Download:
Share:
 
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long lat ...Read More
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.  Back
 
Topics:
HPC and AI, Tools and Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8595
Streaming:
Download:
Share:
 
Abstract:
This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA develo ...Read More
Abstract:

This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.

  Back
 
Topics:
HPC and AI, Data Center and Cloud Infrastructure, AI for Business, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8688
Streaming:
Download:
Share:
 
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-&sh ...Read More
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-­?in-­?Cell method, which models the plasma as billions of quasiparticles representing ions and electrons. Ostensibly, the process of advancing each particle in time is embarrassingly parallel. However, the electric and magnetic fields must be known in order to push the particle, which requires an implicit gather operation from XGC's sophisticated unstructured mesh. In this session, we'll show how careful mapping of field and particle data structures to GPU memory allowed us to decouple the performance of the critical electron push routine from size of the simulation mesh and allowed the true particle parallelism to dominate. This improvement enables performant, high resolution, ITER scale simulations on Summit.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8909
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming ...Read More
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
HPC and AI, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23031
Download:
Share:
 
Abstract:
Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the impor ...Read More
Abstract:

Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the importance of the necessary reshaping of the architecture of the software using its GPU accelerated analytics. Minsky featuring far better GPU memory bandwidth and GPU-CPU interconnect rase the bar even further. Murex will show how it has handled this new challenge for its business.

  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23209
Download:
Share:
 
Abstract:
HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation ...Read More
Abstract:

HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation power usually involve heterogeneous computation resources, e.g., GPUs and InfiniBand as installed on HPE Apollo.   NovuMinds NovuForce system leveraging state of art technologies make the deployment and configuration procedure fast and smooth. NovuForce deep learning softwares within the docker image has been optimized for the latest technology like NVIDIA Pascal GPU and infiniband GPUDirect RDMA. This flexibility of the software, combined with the broad GPU servers in HPE portfolio, makes one of the most efficient and scalable solutions.

  Back
 
Topics:
HPC and AI, Performance Optimization, Tools and Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23250
Download:
Share:
 
Abstract:
Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it ...Read More
Abstract:

Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.

  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23277
Download:
Share:
 
Abstract:
We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us sat ...Read More
Abstract:

We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us satisfy strict real-time restrictions dictated by the imaging hardware used to scan the rock samples. We present a detailed description of the workflow from a DRP approach perspectives, our algorithm and optimization techniques and performance results on the latest NVIDIA GPU generations.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing, Video and Image Processing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23303
Download:
Share:
 
Abstract:
In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guida ...Read More
Abstract:

In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guidance and prepare the scientific communities to challenges of exascale architectures. The talk will present the OpenPOWER platform bought by GENCI and provided to the scientific community. Then, it will present the first results obtained on the platform for a set of about 15 applications using all the solutions provided to the users (CUDA,OpenACC,OpenMP,...). Finally, a presentation about one specific application will be made regarding its porting effort and techniques used for GPUs with both OpenACC and OpenMP.

  Back
 
Topics:
HPC and AI, Performance Optimization, Programming Languages
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23183
Download:
Share:
 
Abstract:
Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes han ...Read More
Abstract:

Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes hand in hand with the increasing demand of Wireless-VR Hardware solutions, what hardware is out on the market now, what is coming and how can wireless solutions - whether standalone devices or Addons - create a higher value for your VR application? How large-scale location based VR and hardware manufacturers are expanding the boundaries of the VR industry, both for Entertainment and B2B?

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23388
Download:
Share:
 
Abstract:
With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughpu ...Read More
Abstract:

With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughput of multiple applications, covering all aspects of the workflow, including data analysis and visualisation. We will discuss ongoing efforts to further integrate these extreme-scale compute and data services with infrastructure services of the cloud. As Tier-0 systems of PRACE, Piz Daint is accessible to all scientists in Europe and worldwide. It provides a baseline for future development of exascale computing. We will present a strategy for developing exascale computing technologies in domains such as weather and climate or materials science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23429
Download:
Share:
 
Abstract:
The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Strea ...Read More
Abstract:

The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release.

  Back
 
Topics:
HPC and AI, Programming Languages, Tools and Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23434
Download:
Share:
 
Abstract:
TensorRT를 활용하여 DL Inference를 최적화하는 방안에 대해 소개합니다 ...Read More
Abstract:
TensorRT를 활용하여 DL Inference를 최적화하는 방안에 대해 소개합니다. 본 세션에서는 TensorRT를 실제로 적용해가는 과정을 통해서 최적화 과정에서 성능 및 Inference 환경에 대하여 고려해야하는 내용들을 이해하실 수 있습니다. 특히 TensorRT의 개발 언어(C++/Python), FP16/INT8등 Low Precision 지원 문제, RNN에 대한 내용 등 적용과정에서 고려되는 내용에 대한 팁들이 제공될 것입니다.  Back
 
Topics:
HPC and AI, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8118
Streaming:
Download:
Share:
HPC and Supercomputing
Presentation
Media
Abstract:
Understanding the emergence of nuclear physics from the underlying fundamental theory of strong interactions with Quantum chromodynamics (QCD) requires the fastest supercomputers. We will describe the role of QCD in the evolution of our universe ...Read More
Abstract:

Understanding the emergence of nuclear physics from the underlying fundamental theory of strong interactions with Quantum chromodynamics (QCD) requires the fastest supercomputers. We will describe the role of QCD in the evolution of our universe and discuss how we use the latest supercomputers, such as Summit at Oak Ridge National Laboratory and Sierra at Lawrence Livermore National Laboratory, to address basic questions such as why does the universe contain more matter than antimatter? Looking towards the exascale era, we can dream of tackling more complex questions related to the rate of protons fusing to helium in the sun and the state of matter in extreme conditions such as neutron stars. We'll explain why making the most of these new computers will require clever software to take advantage of the heterogeneous architectures. We'll also describe some advances in optimized use of GPUs, as well as management of the complex set of tasks required.

  Back
 
Topics:
HPC and Supercomputing, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91010
Streaming:
Download:
Share:
 
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll ...Read More
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll discuss new CUDA Fortran features, including Tensor Core support and cooperative groups, and we'll cover our current work on half-precision. We'll explain new OpenACC 2.7 features, along with Beta true deep-copy directives and support for OpenACC programs on unified memory systems. The PGI compiler-assisted software testing feature helps determine where differences arise between CPU and GPU versions of a program or when porting to a new system. Learn upcoming projects, which include a high-performance PGI Subset of OpenMP for NVIDIA GPUs, support for GPU programming with standard C++17 parallel STL and Fortran, and incorporating GPU-Accelerated math libraries to support porting and optimization of HPC applications on NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9289
Streaming:
Download:
Share:
 
Abstract:
Learn about the opportunities and pitfalls of running billion-atom science at scale on ORNL's Summit, the world's fastest GPU-Accelerated supercomputer. We'll talk about the latest performance improvements and scaling results for NAM ...Read More
Abstract:

Learn about the opportunities and pitfalls of running billion-atom science at scale on ORNL's Summit, the world's fastest GPU-Accelerated supercomputer. We'll talk about the latest performance improvements and scaling results for NAMD, a highly parallel molecular dynamics code and one of the first codes to run on Summit. NAMD performs petascale biomolecular simulations — these have included 64 million-atom model of the HIV virus capsid — and previously ran on the GPU-Accelerated Cray XK7 Blue Waters and ORNL Titan machines. Summit features IBM POWER9 CPUs, NVIDIA Volta GPUs, and the NVLink CPU-GPU interconnect.

  Back
 
Topics:
HPC and Supercomputing, Computational Biology and Chemistry
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9302
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how FPGAs are changing as a result of new technology such as the Open CL high-level programming language, hard floating-point units, and tight integration with CPU cores. Traditionally energy-efficient FPGAs were considered notoriously ...Read More
Abstract:
We'll discuss how FPGAs are changing as a result of new technology such as the Open CL high-level programming language, hard floating-point units, and tight integration with CPU cores. Traditionally energy-efficient FPGAs were considered notoriously difficult to program and unsuitable for complex HPC applications. We'll compare the latest FPGAs to GPUs, examining the architecture, programming models, programming effort, performance, and energy efficiency by considering some real applications.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9338
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good ...Read More
Abstract:
We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale.  Back
 
Topics:
HPC and Supercomputing, Deep Learning and AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9412
Streaming:
Download:
Share:
 
Abstract:
Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models f ...Read More
Abstract:

Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models for implementing efficient tasking frameworks. Participants will learn about the pitfalls for tasking arising from the architectural differences between latency-driven CPUs and throughput-driven GPUs. To overcome these pitfalls, we consider programming concepts such as persistent threads, warp-aware data structures and CUDA asynchronous task graphs. In addition, we look at the latest GPU features such as forward progress guarantees and grid synchronization that facilitate the implementation of tasking approaches. A task-based fast multipole method for the molecular dynamics package GROMACS serves as use case for our considerations.

  Back
 
Topics:
HPC and Supercomputing, Computational Biology and Chemistry, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9548
Streaming:
Download:
Share:
 
Abstract:
Large-scale scientific endeavors often focus on improving predictive capabilities by challenging theory-driven simulations with experimental data. We'll describe our work at LLNL using advances in deep learning, computational workflows, and computer ...Read More
Abstract:
Large-scale scientific endeavors often focus on improving predictive capabilities by challenging theory-driven simulations with experimental data. We'll describe our work at LLNL using advances in deep learning, computational workflows, and computer architectures to develop an improved predictive model the learned predictive model. We'll discuss necessary advances in machine learning architectures and methods to handle the challenges of ICF science, including rich, multimodal data (images, scalars, time series) and strong nonlinearities. These include advances in the scalability of our deep learning toolkit LBANN, an optimized asynchronous, GPU-Aware communication library, and a state-of-the-art scientific workflows. We'll also how the combination of high-performance NVLINK and the rich GPU architecture of Sierra enables us to train neural networks efficiently and begin to develop learned predictive models based on a massive data set.  Back
 
Topics:
HPC and Supercomputing, Accelerated Data Science, Deep Learning and AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9565
Streaming:
Share:
 
Abstract:
We'll showcase the latest successes with GPU acceleration of challenging molecular simulation analysis tasks on the latest Volta and Turing GPUs paired with both Intel and IBM/OpenPOWER CPUs on petascale computers such as ORNL Summit. This presentat ...Read More
Abstract:
We'll showcase the latest successes with GPU acceleration of challenging molecular simulation analysis tasks on the latest Volta and Turing GPUs paired with both Intel and IBM/OpenPOWER CPUs on petascale computers such as ORNL Summit. This presentation will highlight the performance benefits obtained from die-stacked memory, NVLink interconnects, and the use of advanced features of CUDA such as just-in-time compilation to increase the performance of key analysis algorithms. We will present results obtained with OpenACC parallel programming directives, as well as discuss current challenges and future opportunities. We'll also describe GPU-Accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations. To make our tools easy to deploy for non-tradtional users of HPC, we publish GPU-Accelerated container images in NGC, and Amazon EC2 AMIs for GPU instance types.  Back
 
Topics:
HPC and Supercomputing, In-Situ and Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9594
Streaming:
Download:
Share:
 
Abstract:
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acce ...Read More
Abstract:
We'll share what we learned by running the QUDA open source library for lattice quantum chromodynamics (LQCD) on modern GPU systems, which provide myriad hardware and software techniques to improve strong scaling. We ran QUDA which provides GPU acceleration for LQCD applications like MILC and Chromaon on six generations of GPUs with various network and node configurations, including IBM POWER9- and x86-based systems and NVLink- and NVSwitch-based systems. Based on those experiences, we'll discuss best practices for scaling to hundreds and even thousands of GPUs. We'll also cover peer-to-peer memory access, GPUDirect RDMA, and NVSHMEM, as well as the techniques such as auto-tuning kernel launch configurations.  Back
 
Topics:
HPC and Supercomputing, Performance Optimization, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9708
Streaming:
Download:
Share:
 
Abstract:
Confused about how Unified Memory works on modern GPU architectures? Did you try Unified Memory some time ago and never wanted to return to it? We'll explain how the last few generations of GPU architectures and software improvements have opened up ...Read More
Abstract:
Confused about how Unified Memory works on modern GPU architectures? Did you try Unified Memory some time ago and never wanted to return to it? We'll explain how the last few generations of GPU architectures and software improvements have opened up new ways to manage CPU and GPU memories. We will dive into the advantages and disadvantages of various OS and CUDA memory allocators, explore how memory is managed by the driver, and examine user controls to tune it. Learn about software enhancements for Unified Memory developed over the past year, how HMM is different from ATS, and how to use Unified Memory with multiple processes.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9727
Streaming:
Download:
Share:
 
Abstract:
We'll present an overview of the upcoming NERSC9 system architecture, throughput model, and application readiness efforts.
 
Topics:
HPC and Supercomputing, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9809
Streaming:
Download:
Share:
 
Abstract:
Learn about algorithm design, implementation, and optimization techniques to accelerate large-scale, phase-field molecular dynamics simulations on a GPU platform. Numerical simulations of phase-field equations are conventionally performed by stencil ...Read More
Abstract:
Learn about algorithm design, implementation, and optimization techniques to accelerate large-scale, phase-field molecular dynamics simulations on a GPU platform. Numerical simulations of phase-field equations are conventionally performed by stencil computation with a very small time-step size and low efficiency. We'll describe how we designed an efficient, GPU friendly algorithm that combines a large step-size exponential time integrator with domain decomposition and localization of matrix exponentials. By using this algorithm with optimization techniques on a single GPU and multiple GPU platforms, we achieved a 50X increase in simulation speed over the conventional stencil computing approach. We'll also discuss GPU-Accelerated molecular dynamics simulation, with a focus on parallel strategies of atomic partition and spatial partition. We demonstrated efficiency on dissipative particle dynamics and free-energy calculations on GPU devices for hundreds of millions of particles.  Back
 
Topics:
HPC and Supercomputing, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9851
Streaming:
Download:
Share:
 
Abstract:
在极端高温高密条件下,囚禁在强子袋中的夸克和胶子会被解禁闭,变得自由而形成一种新的物质形态——夸克胶子等离子体。一般认为大爆炸后几个毫秒内早期宇宙就处于夸克胶子等离子相。 格点量子色动力学是唯一的从第一性原理出发的、可以用来研究强子相到夸 克胶子等离子体相的转变的理论方法。格点量子色动力学的研究可为理解美国布鲁克海文国家实验室的相对论重离子对撞机 (RHIC) 和瑞士 CERN 的大型强子对撞机 (LHC) 的上的相对论重离子碰撞实验结果提供重要的理论依据。在这次报告中,我将首先介绍相对论重离 子碰撞物理和格点量子色动力学,然后回顾利用格点量子色动力学在研究热密物质性质方 面取得的进展,并阐述图形处理器在我们研究中所起的作用。 ...Read More
Abstract:
在极端高温高密条件下,囚禁在强子袋中的夸克和胶子会被解禁闭,变得自由而形成一种新的物质形态——夸克胶子等离子体。一般认为大爆炸后几个毫秒内早期宇宙就处于夸克胶子等离子相。 格点量子色动力学是唯一的从第一性原理出发的、可以用来研究强子相到夸 克胶子等离子体相的转变的理论方法。格点量子色动力学的研究可为理解美国布鲁克海文国家实验室的相对论重离子对撞机 (RHIC) 和瑞士 CERN 的大型强子对撞机 (LHC) 的上的相对论重离子碰撞实验结果提供重要的理论依据。在这次报告中,我将首先介绍相对论重离 子碰撞物理和格点量子色动力学,然后回顾利用格点量子色动力学在研究热密物质性质方 面取得的进展,并阐述图形处理器在我们研究中所起的作用。  Back
 
Topics:
HPC and Supercomputing, Computational Physics, Physics Simulation
Type:
Talk
Event:
GTC China
Year:
2018
Session ID:
CH8401
Download:
Share:
 
Abstract:
(1)基于 GPU 实现高性能并行计算,打破油气藏数值模拟需要粗化的传统研究流程,可以直接模拟无需粗化的高精度的精细油气藏模型,大幅提高精细油气藏研究成果质量,对油气田开发生产实现增储上产、提高采收率影响意义深远; ...Read More
Abstract:
(1)基于 GPU 实现高性能并行计算,打破油气藏数值模拟需要粗化的传统研究流程,可以直接模拟无需粗化的高精度的精细油气藏模型,大幅提高精细油气藏研究成果质量,对油气田开发生产实现增储上产、提高采收率影响意义深远;
(2)基于 GPU 实现高性能超级运算数字岩石技术,利用高分辨率电镜和 MicroCT,采集到纳米级(10 – 9 米)地下岩石的矿物类型、含量、结构、能谱等信息,数据量庞大,常规工作站无法满足大图像文件三维处理及可视化研究需求,利用先进的数字岩心处理算法,以及 GPU 超级并行运算能力,可快速提取微观岩心中的各项信息,有力促进了油田开发后期油气藏认识。
(3)基于 GPU 强大的计算和渲染能力,打造广泛应用于 AI、石油、天然气和采矿业等专业领域中的面向对象的应用程序编程接口、可拓展架构以及一整套先进庞大的组件,为软件开发者提供一个完美的高级研发平台。  Back
 
Topics:
HPC and Supercomputing, Earth Systems Modeling
Type:
Talk
Event:
GTC China
Year:
2018
Session ID:
CH8402
Share:
 
Abstract:
We'll discuss parallel implementations to resampling techniques, commonly used in particle filtering, and their performance on NVIDIA GPUs, including the embedded TX2. A novel parallel approach to implementing systematic and stratified schem ...Read More
Abstract:

We'll discuss parallel implementations to resampling techniques, commonly used in particle filtering, and their performance on NVIDIA GPUs, including the embedded TX2. A novel parallel approach to implementing systematic and stratified schemes is the highlight, but we'll also feature an optimized version of the Metropolis resampling technique. There are two main challenges that have been addressed: Traditional systematic and stratified techniques are serial by nature, but our approach breaks the algorithm up in a way to allow implementation on a GPU while producing identical results to the serial method. Secondly, while the Metropolis method is well suited for a GPU, its naive implementation does not utilize coalesced accesses to global memory.

  Back
 
Topics:
HPC and Supercomputing, Accelerated Data Science
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8105
Streaming:
Share:
 
Abstract:
The vision of the Exascale Computing Project, initiated in 2016 as a formal U.S. Department of Energy project executing through 2022, is to accelerate innovation with exascale simulation and data science solutions. After a brief overview of this ...Read More
Abstract:

The vision of the Exascale Computing Project, initiated in 2016 as a formal U.S. Department of Energy project executing through 2022, is to accelerate innovation with exascale simulation and data science solutions. After a brief overview of this, we will give illustrative examples on how the ECP teams are leveraging, exploiting, and advancing accelerated-node software technologies and applications on hardware such as the powerful GPUs provided by NVIDIA. We will summarize best practices and lessons learned from these accelerated-node experiences along with ECP's plans moving into the exascale era, which is on the now near-term horizon.

These solutions will enhance U.S. economic competitiveness, change our quality of life, and strengthen our national security. ECP's mission is to deliver exascale-ready applications and solutions that address currently intractable problems of strategic importance and national interest; create and deploy an expanded and vertically integrated software stack on DOE HPC exascale and pre-exascale systems, defining the enduring US exascale ecosystem; and leverage U.S. HPC vendor research activities and products into DOE HPC exascale systems. The project is a joint effort of two DOE programs: the Office of Science Advanced Scientific Computing Research Program and the National Nuclear Security Administration Advanced Simulation and Computing Program. ECP's RD&D activities, which encompass the development of applications, software technologies, and hardware technologies and architectures, is carried out by over 100 small teams of scientists and engineers from the DOE national laboratories, universities, and industries.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8152
Streaming:
Share:
 
Abstract:
Most AI researchers and industry pioneers agree that the wide availability and low cost of highly-efficient and powerful GPUs and accelerated computing parallel programming tools (originally developed to benefit HPC applications) catalyzed the m ...Read More
Abstract:

Most AI researchers and industry pioneers agree that the wide availability and low cost of highly-efficient and powerful GPUs and accelerated computing parallel programming tools (originally developed to benefit HPC applications) catalyzed the modern revolution in AI/Deep Learning. Now, AI methods and tools are starting to be applied to HPC applications to great effect. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8175
Streaming:
Share:
 
Abstract:
For the first time in human history we're using Artificial Intelligence technology to automate tasks and decision-making, and in most cases we don't understand how the technology works. This lack of understanding creates distrust and can ...Read More
Abstract:

For the first time in human history we're using Artificial Intelligence technology to automate tasks and decision-making, and in most cases we don't understand how the technology works. This lack of understanding creates distrust and can disenfranchise the users the technology is intended to benefit the most. This is compounded in highly-regulated spaces, such as the U.S. government. In this session, we'll cover the shortcomings of how Machine Learning and AI technologies are being applied in the USG today and how you can establish a trusted environment for successful human and machine collaboration.

  Back
 
Topics:
HPC and Supercomputing, Data Center and Cloud Computing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8214
Streaming:
Download:
Share:
 
Abstract:
The Hartree Centre, a department of the UK National Labs, focusses on industry-led challenges in HPC, High Performance Data Analytics, and AI. Its mission is to make UK industry more competitive through the uptake of novel technologies. Historic ...Read More
Abstract:

The Hartree Centre, a department of the UK National Labs, focusses on industry-led challenges in HPC, High Performance Data Analytics, and AI. Its mission is to make UK industry more competitive through the uptake of novel technologies. Historically the focus has been on HPC (simulation and modelling), and more recently on data centric computing. This sessions focuses on on how AI can best be applied to add value for industry partners.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8105
Streaming:
Download:
Share:
 
Abstract:
In pushing the limits of throughput of floating-point operations, GPUs have become a unique technology. During this session, we'll explore the current state of affairs from an application perspective. For this, we'll consider different c ...Read More
Abstract:

In pushing the limits of throughput of floating-point operations, GPUs have become a unique technology. During this session, we'll explore the current state of affairs from an application perspective. For this, we'll consider different computational science areas including fundamental research on matter, materials science, and brain research. Focusing on key application performance characteristics, we review current architectural and technology trends to derive an outlook towards future GPU-accelerated architectures.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8108
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and about their individual advantages. All programmin ...Read More
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and about their individual advantages. All programming models will be introduced using the example of applying a domain decomposition strategy.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8121
Streaming:
Download:
Share:
 
Abstract:
In this talk attendees will learn how key algorithms for Numerical Weather Prediction were ported to the latest GPU technology and the substantial benefits gained from doing so. We will showcase the power of individual Voltas and the impressive perfo ...Read More
Abstract:
In this talk attendees will learn how key algorithms for Numerical Weather Prediction were ported to the latest GPU technology and the substantial benefits gained from doing so. We will showcase the power of individual Voltas and the impressive performance of the cutting edge DGX-2 server with multiple GPUS connected by a high speed interconnect.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8195
Streaming:
Download:
Share:
 
Abstract:
In this session, we will describe how we successfully extended a large legacy fortran code to GPUs using OpenACC. Based on a state of the art code for combustion simulation AVBP (http://www.cerfacs.fr/avbp7x/), our objective is to keep the code as si ...Read More
Abstract:
In this session, we will describe how we successfully extended a large legacy fortran code to GPUs using OpenACC. Based on a state of the art code for combustion simulation AVBP (http://www.cerfacs.fr/avbp7x/), our objective is to keep the code as simple as possible for the AVBP community while taking advantage of high end computing resources as GPU; OpenACC allows the flexibility to conduct the extension with respect to these constraints. This session will present the various strategies we tried during the refactoring of the application, including the limitations of the directive-only approach which can severely impair performance on particular parts of the code. The lessons learned are applicable to a wide range of codes in the research community.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8217
Streaming:
Download:
Share:
 
Abstract:
This session will describe strategies to achieve an efficient implementation of a parallel high fidelity CFD solver that runs on GPUs. The solver is based on a nodal Discontinuous Galerkin Flux Reconstruction spatial discretisation. The strong data l ...Read More
Abstract:
This session will describe strategies to achieve an efficient implementation of a parallel high fidelity CFD solver that runs on GPUs. The solver is based on a nodal Discontinuous Galerkin Flux Reconstruction spatial discretisation. The strong data locality of the resulting scheme makes it very attractive to be implemented on GPUs. Details of the implementation of the most time consuming kernels are provided, putting emphasis on the extensive use of the GPU shared memory to minimize the memory access time penalty. The communications between GPUs also play a big role in the solver parallel performance. The benefits of overlapping communication and computation will also be quantified. The resulting solver is able to perform LES and DNS simulations of low pressure turbines blades within engine design time scales.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8223
Download:
Share:
 
Abstract:
Come and learn how GPUs help identify biological activity on nearby exoplanets. Deployed on on the Japanese Subaru telescope at 4,200m elevation atop Maunakea, Hawaii, the GPU hardware technology constitutes the backbone of the adaptive optics, which ...Read More
Abstract:
Come and learn how GPUs help identify biological activity on nearby exoplanets. Deployed on on the Japanese Subaru telescope at 4,200m elevation atop Maunakea, Hawaii, the GPU hardware technology constitutes the backbone of the adaptive optics, which drives the real-time correction of the optical aberrations introduced by Earth's atmosphere. Using machine learning technique and advanced linear algebra algorithms accelerated by GPUs, a predictive control problem can now be solved at the multi-kHz frame rate required to keep up with turbulence changes. This represents the first successful on-sky result of this approach for exoplanet imaging.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8251
Streaming:
Download:
Share:
 
Abstract:
Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations for simulation-based design. This talk gives insights into the key ingredients of academic and commercial GPU-accelerated CFD solvers and di ...Read More
Abstract:
Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations for simulation-based design. This talk gives insights into the key ingredients of academic and commercial GPU-accelerated CFD solvers and discusses the technical and physical challenges of (near) real-time simulations of complex flows. Then, ultraFluidX is presented, a recently released commercial GPU-based CFD solver. The solver was specifically designed to leverage the massively parallel architecture of GPUs. With its multi-GPU implementation based on CUDA-aware MPI, the tool can achieve turnaround times of just a few hours for simulations of fully detailed production-level passenger and heavy-duty vehicles. Basics of the solver and several selected application examples are presented.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8253
Streaming:
Download:
Share:
 
Abstract:
We present in this talk a portable matrix assembly strategy used in solving PDEs, suited for co-execution on both the CPUs and accelerators. In addition, a dynamic load balancing strategy is considered to balance the workload among the different ...Read More
Abstract:

We present in this talk a portable matrix assembly strategy used in solving PDEs, suited for co-execution on both the CPUs and accelerators. In addition, a dynamic load balancing strategy is considered to balance the workload among the different CPUs and GPUs available on the cluster. Numerical methods for solving partial differential equations (PDEs) involve two main steps: the assembly of an algebraic system of the form Ax=b and the solution of it with direct or iterative solvers. The assembly step consists of a loop over elements, faces and nodes in the case of the finite element, finite volume, and finite difference methods, respectively. It is computationally intensive and does not involve communication. It is therefore well-suited for accelerators.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8292
Streaming:
Download:
Share:
 
Abstract:
This talk provides an overview of the key strategies used to design and implement OpenStaPLE, an application for Lattice QCD (LQCD) Monte Carlo simulations. LQCD are an example of HPC grand challenge applications, where the accuracy of results s ...Read More
Abstract:

This talk provides an overview of the key strategies used to design and implement OpenStaPLE, an application for Lattice QCD (LQCD) Monte Carlo simulations. LQCD are an example of HPC grand challenge applications, where the accuracy of results strongly depends on available computing resources. OpenStaPLE has been developed on top of MPI and OpenACC frameworks. It manages the parallelism across multiple computing nodes and devices, while OpenACC exploits the high level parallelism available on modern processors and accelerators, enabling a good level of portability across different architectures. After an initial overview, we also present performance and portability results on different architectures, highlighting key improvements of hardware and software key that may lead this class of applications to exhibit better performances.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8317
Streaming:
Download:
Share:
 
Abstract:
We present our experiences implementing GPU acceleration in the massively parallel, real space FHI-aims electronic structure code for computational materials science. For fourteen years, FHI-aims has focused on high numerical accuracy for curren ...Read More
Abstract:

We present our experiences implementing GPU acceleration in the massively parallel, real space FHI-aims electronic structure code for computational materials science. For fourteen years, FHI-aims has focused on high numerical accuracy for current methods, such as Kohn-Sham density-functional theory and beyond, and on outstanding scaling on distributed-parallel high-performance computers. We show how to exploit vectorized implementations in FHI-aims to achieve an overall 3x-4x GPU acceleration with minimal code rewrite for complete simulations. Furthermore, FHI-aims' domain decomposition scheme on non-uniform grids enables compute and memory-parallel computing across thousands of GPU-containing nodes for real-space operations.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8321
Streaming:
Download:
Share:
 
Abstract:
This talk will present the roadmap, the strategy and the currently ongoing efforts to port the fundamental building blocks of the QuantumESPRESSO suite of codes to accelerated architectures. QuantumESPRESSO is an integrated suite of codes provid ...Read More
Abstract:

This talk will present the roadmap, the strategy and the currently ongoing efforts to port the fundamental building blocks of the QuantumESPRESSO suite of codes to accelerated architectures. QuantumESPRESSO is an integrated suite of codes providing computational methods to estimate a vast number of physical properties at the nanoscale. It features high modularity and a user-oriented design, and it can efficiently exploit standalone workstations as well as state-of-art HPC systems. The differences characterizing this new work and the original GPU porting done in CUDA C back in 2012 will be used to discuss aspects of code evolution and maintainability. Special attention will also be devoted to the performance-critical kernels shared by most of the components of the suite.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8340
Streaming:
Download:
Share:
 
Abstract:
VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview on the status of porting VASP to ...Read More
Abstract:

VASP is a software package for atomic-scale materials modeling. It's one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. We'll give an overview on the status of porting VASP to GPUs with OpenACC. Parts of VASP were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload, because VASP is otherwise written wholly in Fortran. We'll discuss OpenACC performance relative to CUDA, the impact of OpenACC on VASP code maintenance, and challenges encountered in the port related to management of aggregate data structures. Finally, we'll discuss possible future solutions for data management that would simplify both new development and the maintenance of VASP and similar large production applications on GPUs.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8367
Streaming:
Download:
Share:
 
Abstract:
Today we are investigating different technologies and architectures, and we will present the first hardware and software prototype that will evolve into a system able to overcome an unprecedented challenge. To probe the predictions of t ...Read More
Abstract:

Today we are investigating different technologies and architectures, and we will present the first hardware and software prototype that will evolve into a system able to overcome an unprecedented challenge.

To probe the predictions of the Standard Model of Particle Physics, the Large Hadron Collider at CERN will be upgraded by 2026 to produce 6 billion proton collisions every second at the centre of the Compact Muon Solenoid (CMS) detector. These collisions produce events in which new particles, which did not exist before the collision, are generated.

The CMS experiment will be able to observe and record the most energetic and rare of these events.

Observing the details of all these events requires reading and analyzing almost 100TB of data every second... and CMS is working on a hybrid approach to tackle this challenge: ASICs and FPGAs will be used for the first level of data reduction, while a hybrid cluster of computer servers and GPUs will be used for the full event reconstruction and final online selection.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8382
Streaming:
Download:
Share:
 
Abstract:
This talk would cover the background and distinguishing features of the European Interactive Computing E-Infrastructure (ICEI) project, which will offer a set of federated services to realize the Fenix infrastructure (https://fenix-ri.eu). For decade ...Read More
Abstract:
This talk would cover the background and distinguishing features of the European Interactive Computing E-Infrastructure (ICEI) project, which will offer a set of federated services to realize the Fenix infrastructure (https://fenix-ri.eu). For decades, high "performance" computing, networking, and storage technologies have been among the driving forces behind numerous scientific discoveries and breakthroughs. Recently, the X-as-a-service model offered by several cloud technologies has enabled researchers, particularly in the fields of data science, to access resources and services in an on-demand and elastic manner. Complex workflows in different domains, such as the European Human Brain Project (HBP), however require a converged, consolidated, and flexible set of infrastructure services to support their performance and accessibility requirements.   Back
 
Topics:
HPC and Supercomputing, Graphics Virtualisation
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8106
Streaming:
Download:
Share:
 
Abstract:
Classical molecular dynamics (MD) simulations will be able to reach sampling in the second timescale within five years thanks to GPUs, producing petabytes of simulation data at current force field accuracy. Notwithstanding this, MD will still be in t ...Read More
Abstract:
Classical molecular dynamics (MD) simulations will be able to reach sampling in the second timescale within five years thanks to GPUs, producing petabytes of simulation data at current force field accuracy. Notwithstanding this, MD will still be in the regime of low-throughput, high-latency predictions with average accuracy. We envisage that machine learning (ML) will be able to solve both the accuracy and time-to-prediction problem by learning predictive models using expensive simulation data. The synergies between classical, quantum simulations and ML methods, such as artificial neural networks, have the potential to drastically reshape the way we make predictions in computational structural biology and drug discovery.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8116
Streaming:
Share:
 
Abstract:
Nuclear fusion is the process that powers the sun, and it is one of the best hopes of achieving a virtually unlimited energy source for the future of humanity. However, reproducing sustainable nuclear fusion reactions here on Earth is a tremendous sc ...Read More
Abstract:
Nuclear fusion is the process that powers the sun, and it is one of the best hopes of achieving a virtually unlimited energy source for the future of humanity. However, reproducing sustainable nuclear fusion reactions here on Earth is a tremendous scientific and technical challenge. Special devices - called tokamaks - have been built around the world, with JET (Joint European Torus, in the UK) being the largest tokamak currently in operation. Such devices confine matter and heat it up to extremely high temperatures, creating a plasma where fusion reactions begin to occur. JET has over one hundred diagnostic systems to monitor what happens inside the plasma, and each 30-second experiment generates about 50 GB of data to be analyzed. In this talk, we will show how Convolutional Neural Networks (CNNs) can be used to reconstruct the 2D plasma profile inside the device based on data coming from those diagnostics. We will also discuss how Recurrent Neural Networks (RNNs) can be used to predict plasma disruptions, which are one of the major problems affecting fusion devices today. Training of such networks is done on NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8144
Streaming:
Download:
Share:
 
Abstract:
In this talk we will give an overview of the benefits GPU computing can provide to the Structural Bioinformatics field. We will explain how most of biomolecular simulations methods can be efficiently accelerated using massively computational architec ...Read More
Abstract:
In this talk we will give an overview of the benefits GPU computing can provide to the Structural Bioinformatics field. We will explain how most of biomolecular simulations methods can be efficiently accelerated using massively computational architectures and will show several fundamental research and technology transfer success cases.  Back
 
Topics:
HPC and Supercomputing, Bioinformatics, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8287
Streaming:
Download:
Share:
 
Abstract:
In 2014, GENCI set up a French technology watch group that targets the provisioning of test systems, selected as part of the prospective approach among partners from GENCI. This was done in order to prepare scientific commun ...Read More
Abstract:

In 2014, GENCI set up a French technology watch group that targets the provisioning of test systems, selected as part of the prospective approach among partners from GENCI. This was done in order to prepare scientific communities and users of GENCI's computing resources for the arrival of the next "Exascale" technologies.\nThe talk will present results obtained on the OpenPOWER platform bought by GENCI and open to the scientific community. We will present on the first results obtained for a set of scientific applications using the available environments (CUDA,OpenACC,OpenMP,â¦), along with results obtained for AI applications using IBM's software distribution PowerAI.

  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8288
Streaming:
Download:
Share:
 
Abstract:
Legacy, performance hungry and cutting edge deep learning workloads require best of breed cloud services and hardware. Enterprises require low cost and financial flexibility. Learn how Oracle and NVIDIA have partnered together to solve these cha ...Read More
Abstract:

Legacy, performance hungry and cutting edge deep learning workloads require best of breed cloud services and hardware. Enterprises require low cost and financial flexibility. Learn how Oracle and NVIDIA have partnered together to solve these challenges with a bare-metal NVIDIA Tesla GPU offering to squeeze every ounce of performance at a fraction of the cost. We'll also detail the ability to use NVIDIA GPU CLOUD to streamline the experience for customers to launch and run clusters of GPU Virtual Machines or bare metal instances for AI or HPC workloads. Come see live demos and learn what Oracle Cloud Infrastructure is doing in this space!

  Back
 
Topics:
HPC and Supercomputing, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8528
Streaming:
Download:
Share:
 
Abstract:
CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, gain in ...Read More
Abstract:
CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA, preview upcoming GPU programming technology, gain insight into the philosophy driving the development of CUDA, and see how it will take advantage of current and future GPUs. You'll also learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8128
Streaming:
Download:
Share:
 
Abstract:
Microsoft Azure's N-Series VMs powered by latest NVIDIA GPUs enable a range of new accelerated scenarios. Learn how you can take advantage of GPUs in Azure - from Workstation Graphics and Visualization, to HPC simulation, to training models ...Read More
Abstract:

Microsoft Azure's N-Series VMs powered by latest NVIDIA GPUs enable a range of new accelerated scenarios. Learn how you can take advantage of GPUs in Azure - from Workstation Graphics and Visualization, to HPC simulation, to training models for artificial intelligence. This session will delve deep into today's exciting offerings with live examples and offer a view of what's to come in the future.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8500
Streaming:
Share:
 
Abstract:
20 多年來,Amazon 大量投資於人工智能領域。現在,我們的使命是將我們的經驗和機器學習功能作為完全托管的服務進行分享,並把它們交付給每位開發人員和數據科學家。 AWS 提供了一系列人工智能服務,通過雲端原生的機器學習和深度學習技術來滿足不同的使用場景和需求。這些服務讓每位開發人員均能使用自然語言理解 (NLU) 、自動語音辨識 (ASR) 、視覺搜索和圖像辨識、文本轉語音 (TTS) 及最新的機器學習 (ML) 技術。 無論您是剛剛開始使用 AI 或者是深度學習專家,本次會議都將向您展示如何在 AWS 雲端進行 AI 創新,提升 AI 應用的規模,並提高 AI 應用的效率。 ...Read More
Abstract:
20 多年來,Amazon 大量投資於人工智能領域。現在,我們的使命是將我們的經驗和機器學習功能作為完全托管的服務進行分享,並把它們交付給每位開發人員和數據科學家。 AWS 提供了一系列人工智能服務,通過雲端原生的機器學習和深度學習技術來滿足不同的使用場景和需求。這些服務讓每位開發人員均能使用自然語言理解 (NLU) 、自動語音辨識 (ASR) 、視覺搜索和圖像辨識、文本轉語音 (TTS) 及最新的機器學習 (ML) 技術。 無論您是剛剛開始使用 AI 或者是深度學習專家,本次會議都將向您展示如何在 AWS 雲端進行 AI 創新,提升 AI 應用的規模,並提高 AI 應用的效率。  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8020
Streaming:
Download:
Share:
 
Abstract:
NVIDIA Volta GPU 導入的 Tensor Core 可透過 IEEE 半精度輸入提供高達 125 TeraFLOPS,讓混合精度訓練提供遠高於單精度的速度提升。我們將提供三項混合精度訓練的基本技巧說明:損耗縮放、精通權重及因應指定運算選擇適當精度。這些技巧都能達到與單精度網路相同的一定模型精度,且不會改變超級參數或訓練排程。最後,我們將解釋如何為您的網路啟用 Tensor Core 、如何確保 Tensor Core 使用及我們將透過圖例顯示以上所有項目,藉助簡單卻不失完備的方式說明 PyTorch 的功能範例。 ...Read More
Abstract:
NVIDIA Volta GPU 導入的 Tensor Core 可透過 IEEE 半精度輸入提供高達 125 TeraFLOPS,讓混合精度訓練提供遠高於單精度的速度提升。我們將提供三項混合精度訓練的基本技巧說明:損耗縮放、精通權重及因應指定運算選擇適當精度。這些技巧都能達到與單精度網路相同的一定模型精度,且不會改變超級參數或訓練排程。最後,我們將解釋如何為您的網路啟用 Tensor Core 、如何確保 Tensor Core 使用及我們將透過圖例顯示以上所有項目,藉助簡單卻不失完備的方式說明 PyTorch 的功能範例。  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8021
Streaming:
Download:
Share:
 
Abstract:
NVIDIA Tesla HGX 是一套平台架構,可提供適合人工智慧、深度學習及高效能運算的最高效能端對端解決方案。在本說明中我們將探討 HGX 資料中心產品藍圖、架構如何標準化加速 AI 的資料中心設計、打造新一代效能的最新技術,以及我們如何配合 OEM 與 ODM 合作夥伴在雲端推出 HGX 平台。 ...Read More
Abstract:
NVIDIA Tesla HGX 是一套平台架構,可提供適合人工智慧、深度學習及高效能運算的最高效能端對端解決方案。在本說明中我們將探討 HGX 資料中心產品藍圖、架構如何標準化加速 AI 的資料中心設計、打造新一代效能的最新技術,以及我們如何配合 OEM 與 ODM 合作夥伴在雲端推出 HGX 平台。  Back
 
Topics:
HPC and Supercomputing, Deep Learning and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8006
Streaming:
Download:
Share:
 
Abstract:
慧與科技為了能有效的協助企業和政府單位建置 AI 環境,發掘 AI 在提升產業競爭力的各種可能性,推出 AI CookBook. 讓 AI 工作者能透過 AI CookBook 中的內容快速的佈署 AI 環境,在慧與科技 AI Center 的 AI/Data Scientist 專家提供著專業諮詢下進行各個產業不同的AI應用。 ...Read More
Abstract:
慧與科技為了能有效的協助企業和政府單位建置 AI 環境,發掘 AI 在提升產業競爭力的各種可能性,推出 AI CookBook. 讓 AI 工作者能透過 AI CookBook 中的內容快速的佈署 AI 環境,在慧與科技 AI Center 的 AI/Data Scientist 專家提供著專業諮詢下進行各個產業不同的AI應用。  Back
 
Topics:
HPC and Supercomputing, Deep Learning and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8007
Streaming:
Download:
Share:
 
Abstract:
深度學習揭密第一次在GTC Taiwan 分享給大眾,什麼是深度學習?為什麼這幾天突然深度學習變成顯學?為什麼GPU 在深度學習領域扮演了重要的領域?該如何入門?我的公司適合導入深度學習嗎?透過NVIDIA 來解密 ...Read More
Abstract:
深度學習揭密第一次在GTC Taiwan 分享給大眾,什麼是深度學習?為什麼這幾天突然深度學習變成顯學?為什麼GPU 在深度學習領域扮演了重要的領域?該如何入門?我的公司適合導入深度學習嗎?透過NVIDIA 來解密  Back
 
Topics:
HPC and Supercomputing, Deep Learning and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8008
Streaming:
Download:
Share:
 
Abstract:
針對當今人工智慧/機器學習和 HPC 工作負載設計的眾多 GPU 硬體平台概述,包括針對深度學習推論和深度學習訓練的客製解決方案。也會涵蓋基於 PCIe GPU 的系統以及具有 NVLink 接口的 GPU 系統。 ...Read More
Abstract:
針對當今人工智慧/機器學習和 HPC 工作負載設計的眾多 GPU 硬體平台概述,包括針對深度學習推論和深度學習訓練的客製解決方案。也會涵蓋基於 PCIe GPU 的系統以及具有 NVLink 接口的 GPU 系統。  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8031
Download:
Share:
 
Abstract:
The simulation of the behavior of the human brain is one of the most important challenges in the recent history of computing, with a large number of practical applications. The main constraint is to simulate efficiently a huge number of neurons using ...Read More
Abstract:
The simulation of the behavior of the human brain is one of the most important challenges in the recent history of computing, with a large number of practical applications. The main constraint is to simulate efficiently a huge number of neurons using current computer technology. One of the most efficient ways in which the scientific community attempts to simulate the behavior of the human brain consists of computing the next three major steps: The computing of 1) the voltage on neuron morphology, 2) the synaptic elements in each of the neurons, and 3) the connectivity between neurons. In this work, we focus on the first step, which is one of the most time-consuming steps of the simulation. Also, it is strongly linked with the rest of steps. All these steps must be carried out on each of the neurons (between 50 and 100 thousand million of neurons in the human brain), which are completely different among them in size and shape.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8328
Streaming:
Download:
Share: