SEARCH SESSIONS
SEARCH SESSIONS

Search All
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

5G & Edge
Presentation
Media
Abstract:
Unlike typical network adapters, Mellanox SmartNICs maximize the performance and agility of modern data centers without sacrificing efficiency. Mellanox ConnectX and BlueField SmartNICs offer state-of-the-art intelligent hardware offloads that a ...Read More
Abstract:

Unlike typical network adapters, Mellanox SmartNICs maximize the performance and agility of modern data centers without sacrificing efficiency. Mellanox ConnectX and BlueField SmartNICs offer state-of-the-art intelligent hardware offloads that accelerate a variety of Cloud workloads including AI/ML, HPC, Big Data, 5G core and edge services, and Cloud computing. In this session, learn how Mellanox SmartNICs together with NVIDIA GPUs push the envelope of Cloud Datacenter innovation to achieve ultimate performance, agility and efficiency.

  Back
 
Topics:
5G & Edge
Type:
Talk
Event:
MWC
Year:
2019
Session ID:
mwcla922
Streaming:
Download:
Share:
AI & Deep Learning Research
Presentation
Media
Abstract:
Well demonstrate how to build a scalable, high-performance, data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, low-latency network interconnects for both InfiniBand and Ethernet. Well present state-of ...Read More
Abstract:

Well demonstrate how to build a scalable, high-performance, data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, low-latency network interconnects for both InfiniBand and Ethernet. Well present state-of-the-art techniques for distributed machine learning, and explain what special requirements they impose on the system. There will be an overview of interconnect technologies used to scale and accelerate distributed machine learning. This will include RDMA, NVIDIAS GPUDIRECT technology and in-network computing platform, which is used to accelerate large-scale deployments in HPC and artificial intelligence.

  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91167
Download:
Share:
 
Abstract:
Join a special presentation from our 2018-2019 Graduate Fellowship recipients to learn what's next from the world of research and academia. Sponsored projects involve a variety of technical challenges, including topics such as 3D scene understanding ...Read More
Abstract:
Join a special presentation from our 2018-2019 Graduate Fellowship recipients to learn what's next from the world of research and academia. Sponsored projects involve a variety of technical challenges, including topics such as 3D scene understanding, new programming models for tensor computations, HPC physics simulations for astrophysics, deep learning algorithms for AI natural language learning, and cancer diagnosis. We believe that theses students will lead the future in our industry and we're proud to support the 2018-2019 NVIDIA Graduate Fellows. For more information on the NVIDIA Graduate Fellowship program, visit www.nvidia.com/en-us/research/graduate-fellowships.  Back
 
Topics:
AI & Deep Learning Research, Virtual Reality & Augmented Reality, Graphics and AI, Computational Biology & Chemistry, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9976
Streaming:
Download:
Share:
 
Abstract:
We''ll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We''ll discuss the current state-of-the-art approaches and performance ga ...Read More
Abstract:
We''ll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We''ll discuss the current state-of-the-art approaches and performance gains where applicable. We''ll also investigate current barriers to adoption and consider possible solutions.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8242
Streaming:
Download:
Share:
 
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better ...Read More
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better understand the mechanisms of cancer, use large amounts of diverse medical data for predictive models, and enable precision medicine by providing guidance for treatment to individual patients. Leveraging the compute expertise of DOE in high performance computing (HPC) and new methods for deep learning in artificial intelligence, this HPC+AI approach aims to create a single scalable deep neural network code called CANDLE (CANcer Distributed Learning Environment) that will be used to address all three challenges. This talk aims to give an overview of the project and highlight how GPU accelerated systems in the DOE ecosystem, Summit and Sierra, have contributed to the project.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81033
Streaming:
Share:
AI Application, Deployment & Inference
Presentation
Media
Abstract:
Well cover recent developments on the merger of high-performance computing and machine learning (ML) within computational fluid dynamics at the National Energy Technology laboratory in collaboration with NVIDIA. Well address three main topics: the us ...Read More
Abstract:
Well cover recent developments on the merger of high-performance computing and machine learning (ML) within computational fluid dynamics at the National Energy Technology laboratory in collaboration with NVIDIA. Well address three main topics: the use of machine learning frameworks (TensorFlow) for traditional HPC applications; performance benchmarks comparing traditional HPC methods with the same methods in TensorFlow on a variety of hardware; and integration of machine learning within the framework to accelerate computations. The replacement of CFD solvers with machine learning accelerated solvers in configurations which control tolerances showed that significant accelerations are possible. While this is specific to CFD, the principles learned can easily be adopted by many other branches of engineering and science to achieve similar accelerations.  Back
 
Topics:
AI Application, Deployment & Inference, HPC and AI
Type:
Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91158
Download:
Share:
 
Abstract:
Exploring the Best Server for AI Speaker: Samuel D. Matzek, Sr. Software Engineer Speaker: Maria Ward, IBM Accelerated Server Offering Manager Explore the server at the heart of the Summit and Sierra supercomputers, and the best server for ...Read More
Abstract:

Exploring the Best Server for AI Speaker: Samuel D. Matzek, Sr. Software Engineer Speaker: Maria Ward, IBM Accelerated Server Offering Manager Explore the server at the heart of the Summit and Sierra supercomputers, and the best server for AI. We will discuss the technical details that set this server apart and why it matters for your machine learning and deep learning workloads. IBM Cloud for AI at Scale Speaker: Alex Hudak, IBM Cloud Offering Manager AI is fast changing the modern enterprise with new applications that are resource demanding, but provide new capabilities to drive insight from customer data. IBM Cloud is partnering with NVIDIA to provide a world class and customized cloud environment to meet the needs of these new applications. Learn about the wide range of NVIDIA GPU solutions inside the IBM Cloud virtual and bare metal server portfolio, and how customers are using them across Deep Learning, Analytics, HPC workloads, and more. IBM Spectrum LSF Family Overview & GPU Support Speaker: Larry Adams, Global Architect - Cross Sector, Developer, Consultant, IBM Systems How to Fuel the Data Pipeline Speaker: Kent Koeninger, IBM IBM Storage Reference Architecture for AI with Autonomous Driving Speaker: Kent Koeninger, IBM  

  Back
 
Topics:
AI Application, Deployment & Inference
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91053
Streaming:
Download:
Share:
 
Abstract:
Now that Deep learning has moved out of the lab and into production, how do you provide training environments to all your internal customers working across business units with different requirements and avoid provisioning separate clusters? IBM has a ...Read More
Abstract:
Now that Deep learning has moved out of the lab and into production, how do you provide training environments to all your internal customers working across business units with different requirements and avoid provisioning separate clusters? IBM has applied decades of HPC experience to build a production ready learning stack, including servers accelerated with NVIDIA GPUs, workload and resource management software, ready to use open source frameworks and it's all covered by IBM support. The solution provides a secure multi-tenant environment so multiple data scientists can share a common set of resources, eliminating silos, while running multiple instances of the same or different applications. The deep learning effort is enhanced with end-to-end pipeline support from data ingestion and preparation, through model training and tuning, to inference. In this session, we will explore what an enterprise deep learning environment looks like and provide insights into the unique IBM value for accelerating the use of deep learning across a wide variety of industries.  Back
 
Topics:
AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81049
Streaming:
Download:
Share:
 
Abstract:
How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many de ...Read More
Abstract:
How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more.  Back
 
Topics:
AI Application, Deployment & Inference, Climate, Weather & Ocean Modeling, Computer Vision, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8816
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll ...Read More
Abstract:
We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll present the current challenges and workarounds when using Singularity in a HPC cluster. We'll compare the performance of Singularity to bare-metal systems.  Back
 
Topics:
AI Application, Deployment & Inference, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8368
Streaming:
Download:
Share:
AI in Healthcare
Presentation
Media
Abstract:
For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we' ...Read More
Abstract:

For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we're building on top of NGC to accelerate new algorithm development, and then a deep dive into a case study of the evolution of our cardiovascular ultrasound scanner and the underlying extensible software stack. It will contain 3 main parts as follows: (a) Cardiovascular ultrasound imaging from a user perspective. Which problems we need to solve for our customers. Impact of Cardiovascular disease in a global perspective (b) An introduction to the Vivid E95 and the cSound platform , GPU based real time image reconstruction & visualization. How GPU performance can be translated to customer value and outcomes and how this has evolved the platform during the last 2 ½ years. (c) Role of deep learning in cardiovascular ultrasound imaging, how we are integrating deep learning inference into our imaging system and preliminary results from automatic cardiac view detection.

  Back
 
Topics:
AI in Healthcare, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8849
Streaming:
Download:
Share:
Accelerated Data Science
Presentation
Media
Abstract:
Well discuss how Dell EMC is using AI, a technology thats revolutionizing mobility, healthcare, business, research, public safety, national security, and more. Our HPC and AI Innovation Lab is dedicated to designing solutions and staying on the leadi ...Read More
Abstract:
Well discuss how Dell EMC is using AI, a technology thats revolutionizing mobility, healthcare, business, research, public safety, national security, and more. Our HPC and AI Innovation Lab is dedicated to designing solutions and staying on the leading edge of new and emerging technologies in an evolving landscape. Well provide an overview of current projects that demonstrate best practices for scaling out and training deep learning models. Well present a variety of use cases on NVIDIA GPUs and Dell EMC hardware.  Back
 
Topics:
Accelerated Data Science, AI & Deep Learning Research
Type:
Sponsored Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91426
Download:
Share:
 
Abstract:
NVSwitch on the DGX-2 is a super crossbar switch which greatly increases the application performance in several ways. First, it increases the problem size capacity traditionally limited by a single GPU's memory to the aggregate DGX-2 GPU mem ...Read More
Abstract:

NVSwitch on the DGX-2 is a super crossbar switch which greatly increases the application performance in several ways. First, it increases the problem size capacity traditionally limited by a single GPU's memory to the aggregate DGX-2 GPU memory of 512 GB. Second, NUMA-effects of traditional multi-GPU servers are greatly reduced, growing memory bandwidth with the number of GPUs. Finally, ease-of-use is simplified as apps written for a smaller number of GPUs can now be more easily ported with the large memory space. 

  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1808
Download:
Share:
 
Abstract:
OpenMP has a 20 year history in HPC, and been used by NERSC developers for node-level parallelization on several generations of NERSC flagship systems. Recent versions of the OpenMP specification include features that enable Accelerator programm ...Read More
Abstract:

OpenMP has a 20 year history in HPC, and been used by NERSC developers for node-level parallelization on several generations of NERSC flagship systems. Recent versions of the OpenMP specification include features that enable Accelerator programming generally, and GPU programming in particular. Given the extensive use of OpenMP on previous NERSC systems, and the GPU-based node architecture of NERSC-9, we expect OpenMP to be important in helping users migrate applications to NERSC-9. In this talk we'll give an overview of the current usage of OpenMP at NERSC, describe some of the new features we think will be important to NERSC-9 users, and give a high-level overview of a collaboration between NERSC and NVIDIA to enable OpenMP for GPUs in the PGI Fortran, C and C++ compilers.

  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1817
Download:
Share:
 
Abstract:
The next big step in data science combines the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions ...Read More
Abstract:

The next big step in data science combines the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions for data science while taking advantage of GPU accelerated hardware commonly found in HPC centers. This session discusses RAPIDS, how to get started, and our roadmap for accelerating more of the data science ecosystem.

  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1824
Download:
Share:
 
Abstract:
Exciting advances in technology have propelled AI computing to the forefront of mainstream applications. The desire to drive advanced visualization with photo realistic real-time rendering and efficient exa-scale class high performance computing ...Read More
Abstract:

Exciting advances in technology have propelled AI computing to the forefront of mainstream applications. The desire to drive advanced visualization with photo realistic real-time rendering and efficient exa-scale class high performance computing fed with huge scale data collection have driven development of the key elements needed to build the most advanced AI computational engines. While these engines connected with advanced high speed busses like NVLINK are now providing true scalable AI computation within single systems, the challenge to break out of the box with large scale AI is upon us. In this talk we will discuss insights gained from creating NVIDIA's SATURNV AI Supercomputer enabling efficient use of this new class of dense AI computational engines and keys to optimizing data centers for GPU multi-node computing specifically targeted for today's neural net and HPC computing.

  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1827
Download:
Share:
 
Abstract:
Learn how the breakthrough HPE Superdome Flex platform equips scientists, engineers, and business lines with in-memory computing at unparalleled scale to solve complex, data-intensive problems holistically, accelerate analytics, and coupled with Nvid ...Read More
Abstract:
Learn how the breakthrough HPE Superdome Flex platform equips scientists, engineers, and business lines with in-memory computing at unparalleled scale to solve complex, data-intensive problems holistically, accelerate analytics, and coupled with Nvidia GPU technology, leverage large-scale data visualization to speed time to discovery and innovation.  Back
 
Topics:
Accelerated Data Science, Computational Fluid Dynamics, Computer Aided Engineering, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8973
Streaming:
Download:
Share:
 
Abstract:
It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new gen ...Read More
Abstract:

It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new generation of GPU-powered analytics platforms can enable enterprises from a range of verticals to dramatically accelerate the process of insight generation at scale. In particular, he will focus on how the key technical differentiators of GPUs: their massive computational bandwidth, fast memory, and native rendering pipeline, make them uniquely suited to allow analysts and data scientists to query, visualize and power machine learning over large, often high-velocity, datasets. Using the open source MapD analytics platform as an example, Todd will detail the technical approaches his team took to leverage the full parallelism of GPUs and demo how the platform allows analysts to interactively explore datasets containing tens of billions of records.

  Back
 
Topics:
Accelerated Data Science, AI Startup, GIS
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81008
Streaming:
Download:
Share:
 
Abstract:
A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in thi ...Read More
Abstract:

A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in this field require scalable compute resources or the use of advance data analytics methods (including deep learning) for processing extreme scale data volumes. GPUs are a key enabling technology and we will thus focus on the opportunities for using these for computing, data analytics and visualisation. GPU-accelerated servers based on POWER processors are here of particular interest due to the tight integration of CPU and GPU using NVLink and the enhanced data transport capabilities.

  Back
 
Topics:
Accelerated Data Science, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23189
Download:
Share:
 
Abstract:
In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructured-grid approach to accommodate geometric complexity. Furthermore, turbulent flows encountered in aerospace applications generally require h ...Read More
Abstract:
In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructured-grid approach to accommodate geometric complexity. Furthermore, turbulent flows encountered in aerospace applications generally require highly anisotropic meshes, driving the need for implicit solution methodologies to efficiently solve the discrete equations. To prepare NASA Langley Research Center''s FUN3D CFD solver for the future HPC landscape, we port two representative kernels to NVIDIA Pascal and Volta GPUs and present performance comparisons with a common multi-core CPU benchmark.  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1710
Download:
Share:
 
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features ...Read More
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features to allow for HPC to BD/AI convergence at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning.   Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1720
Download:
Share:
 
Abstract:
Artificial intelligence, specifically deep learning, is rapidly becoming an important workload within the High Performance Computing space. This talk will present a couple successful systems design approaches HPE has provided customers to help them e ...Read More
Abstract:
Artificial intelligence, specifically deep learning, is rapidly becoming an important workload within the High Performance Computing space. This talk will present a couple successful systems design approaches HPE has provided customers to help them enable AI and deep learning within their HPC ecosystem.  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1731
Download:
Share:
 
Abstract:
In this talk we will look at the current state of high performance computing and look to the future toward exascale. In addition, we will examine some issues that can help in reducing the power consumption for linear algebra computations.
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1733
Download:
Share:
 
Abstract:
The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Features like Independent Thread Scheduling and game-changing Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible ...Read More
Abstract:
The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Features like Independent Thread Scheduling and game-changing Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance of any comparable processor. Join two lead hardware and software architects for Volta on a tour of the features that will make Volta the platform for your next innovation in AI and HPC supercomputing.  Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1739
Download:
Share:
Algorithms & Numerical Techniques
Presentation
Media
Abstract:
The present study deals with porting scalable parallel CFD application HiFUN on NVIDIA Graphics Processing Unit (GPU) using an off-load strategy. The present strategy focuses on improving single node performance of the HiFUN solver with the help of G ...Read More
Abstract:
The present study deals with porting scalable parallel CFD application HiFUN on NVIDIA Graphics Processing Unit (GPU) using an off-load strategy. The present strategy focuses on improving single node performance of the HiFUN solver with the help of GPUs. This work clearly brings out the efficacy of the off-load strategy using OpenACC directives on GPUs and may be considered as one of the attractive models for porting legacy CFD codes on GPU based HPC and Supercomputing platform.  Back
 
Topics:
Algorithms & Numerical Techniques, Computational Fluid Dynamics, Computer Aided Engineering
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8799
Streaming:
Download:
Share:
 
Abstract:
The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind ...Read More
Abstract:

The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind this simulation. In particular we highlight and focus on those transformations and optimizations carried out to achieve a good performance on NVIDIA GPUs.

  Back
 
Topics:
Algorithms & Numerical Techniques, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23076
Download:
Share:
 
Abstract:
With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. ...Read More
Abstract:
With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. This will include the connections to established IEEE floating point standards and existing HPC benchmarks. The discussion will focus on performance and numerical stability issues that are important for this kind of benchmarking and how they relate to NVIDIA platforms.  Back
 
Topics:
Algorithms & Numerical Techniques, Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7676
Download:
Share:
Application Design & Porting Techniques
Presentation
Media
Abstract:
NAMD and VMD provide state-of-the-art molecular simulation, analysis, and visualization tools that leverage a panoply of GPU acceleration technologies to achieve performance levels that enable scientists to routinely apply research methods that ...Read More
Abstract:

NAMD and VMD provide state-of-the-art molecular simulation, analysis, and visualization tools that leverage a panoply of GPU acceleration technologies to achieve performance levels that enable scientists to routinely apply research methods that were formerly too computationally demanding to be practical. To make state-of-the-art MD simulation and computational microscopy workflows available to a broader range of molecular scientists including non-traditional users of HPC systems, our center has begun producing pre-configured container images and Amazon EC2 AMIs that streamline deployment, particularly for specialized occasional-use workflows, e.g., for refinement of atomic structures obtained through cryo-electron microscopy. This talk will describe the latest technological advances in NAMD and VMD, using CUDA, OpenACC, and OptiX, including early results on ORNL Summit, state-of-the-art RTX hardware ray tracing on Turing GPUs, and easy deployment using containers and cloud computing infrastructure.

  Back
 
Topics:
Application Design & Porting Techniques
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1803
Download:
Share:
 
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPU with upstreamed improvements to community projects and with tools that are seei ...Read More
Abstract:

NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPU with upstreamed improvements to community projects and with tools that are seeing broad interest and adoption. Furthermore, NVIDIA is acting as a catalyst for the broader community in enumerating key technical challenges for developers, admins and end users, and is helping to identify gaps and drive them to closure. This talk describes NVIDIA new developments and upcoming efforts. It outlines progress in the most important technical areas, including multi-node containers, security, and scheduling frameworks. It highlights the breadth and depth of interactions across the HPC community that are making the latest, highly-quality HPC applications available to platforms that include GPUs. 

  Back
 
Topics:
Application Design & Porting Techniques
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1821
Download:
Share:
 
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any da ...Read More
Abstract:

Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment. The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. However, if the desired application is not available on NGC registry, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting. HPC Container Maker (HPCCM) is an open-source project that addresses the challenges of creating HPC application containers. Scott McMillan will present how HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image and will cover the best practices to minimize container development effort, minimize image size, and take advantage of image layering. 

  Back
 
Topics:
Application Design & Porting Techniques
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1843
Download:
Share:
Artificial Intelligence and Deep Learning
Presentation
Media
Abstract:
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of t ...Read More
Abstract:
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing used to accelerate large scale deployments in HPC and artificial intelligence.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Special Event
Event:
GTC Israel
Year:
2018
Session ID:
SIL8145
Streaming:
Share:
 
Speakers:
, ,
Abstract:
Neural Networks have capitalized on recent advances on HPC, GPUs, GPGPUs, and the rising amounts of publicly available labeled data. In doing so, NN have and will revolutionize virtually every current application domain, as well as enable novel ...Read More
Abstract:

Neural Networks have capitalized on recent advances on HPC, GPUs, GPGPUs, and the rising amounts of publicly available labeled data. In doing so, NN have and will revolutionize virtually every current application domain, as well as enable novel ones such as those on recognition, autonomous, predictive, resilient, self-managed, adaptive, and evolving applications.
Nevertheless, it is to point out that NN training is rather resource intensive in data, time and energy; turning the resulting trained models into valuable assets represents an IP imperatively worth of being protected.
Furthermore, in the wake of Edge computing, NNs are progressively deployed across decentralized landscapes; as a consequence, IP owners are very protective of their NN based software products.
In this session, we propose to leverage Fully Homomorphic Encryption (FHE) to protect simultaneously the IP of trained NN based software and the input and the output data.
Within the context of a smart city scenario, we outline our NN model-agnostic approach, approximating and decomposing the NN operations into linearized transformations while employing a SIMD for vectorization.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8147
Streaming:
Share:
 
Abstract:
For your business projects you want to rely on solid partners to master their development and deployment. How to avoid the nightmare of cost increase or exceeding deadlines? How to benefit from industrialized solutions, avoiding demos that have ...Read More
Abstract:

For your business projects you want to rely on solid partners to master their development and deployment. How to avoid the nightmare of cost increase or exceeding deadlines? How to benefit from industrialized solutions, avoiding demos that have been freshly issued from labs?
In this session, you will learn how Atos, with a proven set of products and services, helps you accelerate your projects in HPC, enterprise and Internet of Things domains, from cloud to on-premises, from central to edge while leveraging the most powerful NVIDIA technologies.
Because AI applications and models rely on secure, reliable and up-to-date data, this session will also introduce how Atos is managing, updating and securing data and will end up with a presentation of operational applications in the domains of image recognition, video intelligence, prescriptive maintenance and cyber security.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8492
Streaming:
Download:
Share:
 
Abstract:
The University of Queensland needed to solve problems at a scale that had never been contemplated before. Enormous challenges in the field of scientific research imaging, modeling and analysis on the path to cure diseases such as Alzheimer&rsquo ...Read More
Abstract:

The University of Queensland needed to solve problems at a scale that had never been contemplated before. Enormous challenges in the field of scientific research imaging, modeling and analysis on the path to cure diseases such as Alzheimer’s and increasingly demanding cases in machine vision for digital skin cancer pathology were all mounting up against traditional HPC infrastructure. UQ took a considered leap towards GPU. This is UQ's architectural journey – how it built one of the most successful supercomputing facilities the state had ever created, the ways in which key components and architectural choices play a pivotal role in artificial intelligence and inference solving performance and a “whole of system” balance approach to getting HPC “right” in the era of GPU. A presentation for C-level, AI practitioners and HPC professional attendees alike, this talk will provide something useful and refreshing for all that have the chance to attend.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS8015
Streaming:
Download:
Share:
 
Abstract:
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation, AI and computational tools to s ...Read More
Abstract:

In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation, AI and computational tools to seek a new cure for cancer or predict hospitalisation prevention. This presentation will demonstrate visual analytics techniques that use various mixed reality approaches that link simulations, AI with collaborative, complex and interactive data exploration, placing the human-in-the-loop. In the recent days, thanks to advances in graphics hardware and compute power (especially GPGPU and modern Big Data / HPC infrastructures), the opportunities are immense, especially in improving our understanding of complex models that represent the real- or hybrid-worlds. Use cases will be drawn from ongoing research at CSIRO Data61, and the Expanded Perception and Interaction Centre (EPICentre) UNSW using world class GPU clusters and high-end visualisation capabilities. Highlight will be on Defence projects, Massive Graph Visualisation and Medicine.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80017
Download:
Share:
 
Abstract:
Since the concept of Turing machine has been first proposed in 1936, the capability of machines to perform intelligent tasks went on growing exponentially. Artificial Intelligence (AI), as an essential accelerator, pursues the target of making machin ...Read More
Abstract:
Since the concept of Turing machine has been first proposed in 1936, the capability of machines to perform intelligent tasks went on growing exponentially. Artificial Intelligence (AI), as an essential accelerator, pursues the target of making machines as intelligent as human beings. It has already reformed how we live, work, learning, discover and communicate. In this talk, I will review our recent progress on AI by introducing some representative advancements from algorithms to applications, and illustrate the stairs for its realization from perceiving to learning, reasoning and behaving. To push AI from the narrow to the general, many challenges lie ahead. I will bring some examples out into the open, and shed lights on our future target. Today, we teach machines how to be intelligent as ourselves. Tomorrow, they will be our partners to step into our daily life. HPC services are rapidly evolving to meet the demands of an AI-intensive research landscape. At the University of Sydney we have embraced the rapid change in technology to built a dynamic and hybrid HPC called Artemis that focusing on smart resourcing and a heterogenous architecture to support our academics and students with their ground breaking research. Partnering with Artemis 3 was the first university supercomputer to deploy the NVIDIA V100 at scale and now represents a flagship capability for the University and its partners.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80031
Streaming:
Download:
Share:
 
Abstract:
Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the stat ...Read More
Abstract:

Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning, and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology and in-network computing use to accelerates large scale deployments in HPC and artificial intelligence.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2017
Session ID:
SIL7120
Download:
Share:
 
Abstract:
We'll introduce PowerAI and the S822LC for HPC. PowerAI is an optimized software stack for AI designed to take advantage of Power processor performance features, and in particular of the new NVLink interface between Power and the NVIDIA Tesla P100 G ...Read More
Abstract:
We'll introduce PowerAI and the S822LC for HPC. PowerAI is an optimized software stack for AI designed to take advantage of Power processor performance features, and in particular of the new NVLink interface between Power and the NVIDIA Tesla P100 GPU accelerator, first introduced with S822LC for HPC. We'll introduce performance enhancements of the PowerAI, including IBM Caffe with its performance optimization centered at enhance communications and other enhancements to frameworks, libraries, and the deep learning ecosystem for Power. With its high-performance NVLink connection, the new generation S822LC for HPC server is the first that offers a sweet spot of scalability, performance, and efficiency for deep learning applications. Together, these hardware and software enhancements enabled the first release of PowerAI to achieve best in industry training for Alexnet and VGGnet.  Back
 
Topics:
Artificial Intelligence and Deep Learning, Tools & Libraries, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7368
Download:
Share:
 
Abstract:
A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for ...Read More
Abstract:
A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for inter-device and inter-node communication provided by these frameworks are often not optimal. Using examples from several frameworks, we demonstrate that linear strong scaling to many nodes and many devices can be achieved augmenting deep learning frameworks with CUDA-aware MPI allreduce and allgather operations, which allow them to be used in an HPC setting where multi-GPU nodes are augmented with high-speed Infiniband interconnects. We'll show that these operations allow us to quickly train very large speech recognition models.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7543
Download:
Share:
 
Abstract:
Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance ...Read More
Abstract:

Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7834
Download:
Share:
 
Abstract:
欢迎来和 Mellanox 分享如何构建先进的以数据为中心的 AI 系统。Mellanox 公司是高性能、易扩展和低延迟的网络设备提供商,包括 InfiniBand 和 Ethernet 网络产品。我们会介绍分布式机器学习的关键技术以及关键需求,以及最新的面向大规模分布式机器学习系统的网络技术 - 网络计算技术,网络计算技术将会是解决超大规模 HPC 和 AI 扩展瓶颈的重要途径。 ...Read More
Abstract:
欢迎来和 Mellanox 分享如何构建先进的以数据为中心的 AI 系统。Mellanox 公司是高性能、易扩展和低延迟的网络设备提供商,包括 InfiniBand 和 Ethernet 网络产品。我们会介绍分布式机器学习的关键技术以及关键需求,以及最新的面向大规模分布式机器学习系统的网络技术 - 网络计算技术,网络计算技术将会是解决超大规模 HPC 和 AI 扩展瓶颈的重要途径。  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC China
Year:
2019
Session ID:
CN9496
Streaming:
Download:
Share:
Astronomy & Astrophysics
Presentation
Media
Abstract:
Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard float ...Read More
Abstract:
Previously, FPGAs were known to be highly energy efficient, but notoriously difficult to program, and unsuitable for complex HPC applications. This is changing due to new technology developments: a high-level programming language (OpenCL), hard floating-point units, and tight integration with CPU cores. We''ll compare FPGAs and GPUs with respect to architecture, programming model, programming effort, performance, and energy efficiency, using some radio-astronomical signal-processing and imaging algorithms as examples. Can they compete with GPUs?  Back
 
Topics:
Astronomy & Astrophysics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8310
Streaming:
Share:
Autonomous Vehicles
Presentation
Media
Abstract:
AI is revolutionizing the $10T transportation industry. Every vehicle will be autonomous â cars, trucks, taxis, buses and shuttles. AI is core to enabling autonomous driving, but AI is also being applied to mobility, logistics, connected v ...Read More
Abstract:

AI is revolutionizing the $10T transportation industry. Every vehicle will be autonomous â cars, trucks, taxis, buses and shuttles. AI is core to enabling autonomous driving, but AI is also being applied to mobility, logistics, connected vehicles, connected factory, customer experience and a myriad of other use cases in Automotive. Come learn from experts at Audi, BMW and VW about how they are applying data ingestion, labeling, discovery and exploration to develop trained AI models with significant reductions in the time it takes due to GPU-accelerated computing infrastructures.

  Back
 
Topics:
Autonomous Vehicles
Type:
Panel
Event:
GTC Europe
Year:
2018
Session ID:
E8468
Streaming:
Download:
Share:
Climate, Weather & Ocean Modeling
Presentation
Media
Abstract:
We'll discuss the revolution in computing, modeling, data handling and software development that's needed to advance U.S. weather-prediction capabilities in the exascale computing era. Creating prediction models to cloud-resolving 1 KM-resolution s ...Read More
Abstract:
We'll discuss the revolution in computing, modeling, data handling and software development that's needed to advance U.S. weather-prediction capabilities in the exascale computing era. Creating prediction models to cloud-resolving 1 KM-resolution scales will require an estimated 1,000-10,000 times more computing power, but existing models can't exploit exascale systems with millions of processors. We'll examine how weather-prediction models must be rewritten to incorporate new scientific algorithms, improved software design, and use new technologies such as deep learning to speed model execution, data processing, and information processing. We'll also offer a critical and visionary assessment of key technologies and developments needed to advance U.S. operational weather prediction in the next decade.  Back
 
Topics:
Climate, Weather & Ocean Modeling, AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9750
Streaming:
Download:
Share:
Cloud Computing
Presentation
Media
Abstract:
Come learn about Google Cloud solutions with NVIDIA GPUs. We will show why Google Cloud is the best choice to run your NVIDIA instances. You will learn how Google's fundamental principles around infrastructure, data intelligence, and opennes ...Read More
Abstract:

Come learn about Google Cloud solutions with NVIDIA GPUs. We will show why Google Cloud is the best choice to run your NVIDIA instances. You will learn how Google's fundamental principles around infrastructure, data intelligence, and openness help provide the best services for your HPC and ML deployments. In addition, we'll announce exciting details on our new NVIDIA GPU offerings. It is a talk that technical leaders, developers, data scientists, or anyone with a Cloud and GPU interest will not want to miss!

  Back
 
Topics:
Cloud Computing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1805
Download:
Share:
 
Abstract:
You'd imagine with the growth of the public cloud that majority of the HPC workloads and applications would have transitioned to the cloud however almost all enterprise HPC workloads are still running in on-premises datacenters; which mean ...Read More
Abstract:
You'd imagine with the growth of the public cloud that majority of the HPC workloads and applications would have transitioned to the cloud however almost all enterprise HPC workloads are still running in on-premises datacenters; which means millions of mission critical use-cases such as engineering cash simulations and cancer research are still constrained by on-premise environments. Learn how Oracle Cloud Infrastructure is solving these problems with cutting edge GPU and HPC infrastructure along with datacenter level features that make it more attractive for Enterprises to migrate; allowing new use-cases such as using data from Oracle databases to run deep learning training instantly adding more value to your data and business. This is a session not to be missed! 
  Back
 
Topics:
Cloud Computing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1829
Download:
Share:
 
Abstract:
Zenotech Ltd are a UK based company developing the latest in Computation Fluid Dynamics solvers and Cloud based HPC systems. Their Computation Fluid Dynamics solver (zCFD) has been engineered to take full advantage of the latest developments in ...Read More
Abstract:

Zenotech Ltd are a UK based company developing the latest in Computation Fluid Dynamics solvers and Cloud based HPC systems. Their Computation Fluid Dynamics solver (zCFD) has been engineered to take full advantage of the latest developments in GPU technology. This talk will present the performance advantages that Zenotech see when using GPUs and the impact this has on their customers. It will showcase industrial problems that have been solved in a fast/cost effective manner using a combination of zCFD and the P100 and V100 GPU's available on AWS. Traditionally these cases are run on in-house parallel computing clusters but the larger number of GPUs per node with AWS have enable the solving of large CFD problems on a single instance. Benchmarking with zCFD demonstrates that a single P3 node is providing the equivalent performance to over 1100 CPU cores. As well as performance benefits, the spot market and on-demand nature of AWS provide some real cost savings for Zenotech's customers and opens up a scale of simulation that was previously not affordable. The session will present real world examples from Zenotech's customers in the aerospace, renewable and automotive sectors. The session will also show how Zenotech's EPIC platform makes the combination of zCFD and AWS GPUs a simple and cost effective solution for engineers.

  Back
 
Topics:
Cloud Computing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1835
Download:
Share:
Computational Biology & Chemistry
Presentation
Media
Abstract:
Learn how we're bringing Gromacs up to speed with the latest cutting-edge multi-GPU technology. Gromacs, a simulation package for biomolecular systems, is one of the most highly used HPC applications globally. It already benefits from GPU accelerati ...Read More
Abstract:
Learn how we're bringing Gromacs up to speed with the latest cutting-edge multi-GPU technology. Gromacs, a simulation package for biomolecular systems, is one of the most highly used HPC applications globally. It already benefits from GPU acceleration to allow fast simulation of large and complex systems. However, as GPUs become more powerful and increasingly sophisticated multi-GPU systems become available, Gromacs must adapt to optimally benefit from the massive extent of performance on offer. We will describe work to port all significant remaining computational kernels to the GPU, and to perform the required Inter-GPU communications using peer-to-peer memory copies, such that the GPU is exploited throughout and repeated PCIe transfers are avoided. We will present performance results to show the impact of our developments, and also describe the Gromacs performance model we've created to guide our work.  Back
 
Topics:
Computational Biology & Chemistry, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9270
Streaming:
Download:
Share:
 
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor ...Read More
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor operations such as tensor contractions (a generalization of matrix-matrix multiplications), point-wise tensor operations such as tensor permutations, and tensor decompositions (a generalization of matrix decompositions). While providing high performance, cuTENSOR also allows users to express their mathematical equations for tensors in a straightforward way that hides the complexity of dealing with these high-dimensional objects behind an easy-to-use API. CUDA 10.1 enables CUDA programmers to utilize Tensor Cores directly with the new mma.sync instruction. In this presentation, we describe the functionality of mma.sync and present strategies for implementing efficient matrix multiply computations in CUDA that maximize performance on NVIDIA Volta GPUs. We then describe how CUTLASS 1.3 provides reusable components embodying these strategies. CUTLASS 1.3 demonstrates a median 44% speedup of CUDA kernels executing layers from real-world Deep Learning workloads.  Back
 
Topics:
Computational Biology & Chemistry, Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9593
Streaming:
Download:
Share:
 
Abstract:
Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll ...Read More
Abstract:
Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll review the last two years of development in single-particle cryo-electron microscopy processing, with a focus on accelerated software, and discuss benchmarks and best practices for common software packages in this domain. Our talk will include videos and images of atomic resolution molecules and viruses that demonstrate our success in high-resolution imaging.  Back
 
Topics:
Computational Biology & Chemistry, In-Situ & Scientific Visualization, Data Center & Cloud Infrastructure, HPC and AI, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9664
Streaming:
Download:
Share:
 
Abstract:
The existing drug discovery process is costly, slow, and in need of innovation. At ATOM, a public-private consortium consisting of LLNL, GSK, UCSF, and FNL, we built an HPC-driven drug discovery pipeline that is supported by GPU-enabled supercomputer ...Read More
Abstract:
The existing drug discovery process is costly, slow, and in need of innovation. At ATOM, a public-private consortium consisting of LLNL, GSK, UCSF, and FNL, we built an HPC-driven drug discovery pipeline that is supported by GPU-enabled supercomputers and containerized infrastructure. We'll describe the pipeline's infrastructure, including our data lake and model zoo, and share lessons learned along the way. We'll discuss the data-driven modeling pipeline we're using to create thousands of optimized models and the critical role of GPUs in this work. We'll also share model performance results and touch on how these models are integral to ATOM's new drug discovery paradigm. By building GPU-Accelerated tools, we aim to transform drug discovery from a time-consuming and sequential process to a highly parallelized and integrated approach.  Back
 
Topics:
Computational Biology & Chemistry, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9950
Streaming:
Download:
Share:
 
Abstract:
We will demonstrate the features and capabilities of OpenACC for porting and optimizing the ParDOCK docking module of the Sanjeevini suite for computer aided drug discovery developed at the HPC and Supercomputing Facility for Bioinformatics and Compu ...Read More
Abstract:
We will demonstrate the features and capabilities of OpenACC for porting and optimizing the ParDOCK docking module of the Sanjeevini suite for computer aided drug discovery developed at the HPC and Supercomputing Facility for Bioinformatics and Computational Biology at the Indian Institute of Technology Delhi. We have used OpenACC to efficiently port the existing C++ programming model of ParDOCK software with minimal code modifications to run on latest NVIDIA P100 GPU card. These code modifications and tuning resulted in a six times average speedup of improvements in turnaround time. By implementing openACC, the code is now able to sample ten times more ligand conformations leading to an increase in accuracy. The ACC ported ParDOCK code is now able to predict a correct pose of a protein-ligand interaction from 96.8 percent times, compared to 94.3 percent earlier (for poses under 1 A) and 89.9 percent times compared to 86.7 percent earlier (for poses under 0.5 A).  Back
 
Topics:
Computational Biology & Chemistry, Performance Optimization, Genomics & Bioinformatics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8188
Download:
Share:
 
Abstract:
We'll describe recent work to map comparative genomics algorithms to GPU-accelerated leadership-class systems. The explosion in availability of genomic data holds promise for enabling determination of the genetic causes of phenotypic characteristics ...Read More
Abstract:
We'll describe recent work to map comparative genomics algorithms to GPU-accelerated leadership-class systems. The explosion in availability of genomic data holds promise for enabling determination of the genetic causes of phenotypic characteristics, with applications to problems such as the discovery of the genetic roots of diseases. The growing sizes of these datasets and the quadratic and cubic scaling properties of the algorithms necessitate use of leadership-scale accelerated computing. We'll discuss the mapping of two-way and three-way algorithms for comparative genomics calculations to large-scale GPU-accelerated systems. Focusing primarily on the Proportional Similarity metric and the Custom Correlation Coefficient, we'll discuss issues of optimal mapping of the algorithms to GPUs, eliminating redundant calculations due to symmetries, and efficient mapping to many-node parallel systems. We'll also present results scaled to thousands of GPUs on the ORNL Titan system.  Back
 
Topics:
Computational Biology & Chemistry, Algorithms & Numerical Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7156
Download:
Share:
 
Abstract:
Imaging datasets are becoming larger and larger as new generation equipment provides higher definition imaging and scanning modalities. Part of analysing these datasets involves choosing the optimal hardware and software. We'll look at the design ch ...Read More
Abstract:
Imaging datasets are becoming larger and larger as new generation equipment provides higher definition imaging and scanning modalities. Part of analysing these datasets involves choosing the optimal hardware and software. We'll look at the design choices and workflow made for processing cryo-electron microscopy data with results from an NVIDIA DGX-1 and cloud-provisioned HPC.  Back
 
Topics:
Computational Biology & Chemistry, Healthcare and Life Sciences, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7232
Download:
Share:
Computational Fluid Dynamics
Presentation
Media
Abstract:
Learn how we explore the feasibility of porting YALES2 on GPU. YALES2 is an HPC application for turbulent combustion modeling from primary atomization to pollutant prediction on massive complex meshes. It runs over thousands of CPU cores solving seve ...Read More
Abstract:
Learn how we explore the feasibility of porting YALES2 on GPU. YALES2 is an HPC application for turbulent combustion modeling from primary atomization to pollutant prediction on massive complex meshes. It runs over thousands of CPU cores solving several billions element meshes through MPI+OpenMP programming. The work presented here is focusing on a preliminary feasibility study of GPU porting. In this session we will describe: a methodology for porting a large code to GPU; the choices that have been made regarding the different constraints; the performance results. We will also present the final benchmarks run across several platforms form classic Intel+Kepler cluster at ROMEO HPC Center (University of Reims, France) to prototypes with IBM Power8+Pascal at IDRIS (CNRS, France).  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23254
Download:
Share:
 
Abstract:
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name ...Read More
Abstract:

Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.

  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23348
Download:
Share:
Computational Physics
Presentation
Media
Abstract:
AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. ...Read More
Abstract:
AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. This session features a real-world use case from the advanced product engineering team at Western Digital, who is using HPC solutions to model new technologies and capabilities prior to production. Western Digital's computational tools incorporate the description of physics occurring during the HDD recording process and ultimately result in input to a recording sub system channel model which produces an Error Rate. The length scales involved in the recording model range from a few nanometers in the description of the recording media to microns in the description of the recording head. The power of the current generation of NVIDIA GPUs allows Western Digital to generate enough simulation data so that the same recording sub system channel model, used in experiments, can be employed in studies that include fabrication processes variances.   Back
 
Topics:
Computational Physics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81041
Streaming:
Download:
Share:
 
Abstract:
We explore using OpenACC to migrate applications required for modeling solar storms from CPU HPC clusters to an "in-house" multi-GPU system. We describe the software pipeline and the utilization of OpenACC in the computationally heavy codes ...Read More
Abstract:
We explore using OpenACC to migrate applications required for modeling solar storms from CPU HPC clusters to an "in-house" multi-GPU system. We describe the software pipeline and the utilization of OpenACC in the computationally heavy codes. A major step forward is the initial implementation of OpenACC in our Magnetohydrodynamics code MAS. Strategies for overcoming some of the difficulties encountered are discussed, including handling Fortran derived types, array reductions, and performance tuning. Production-level "time-to-solution" results will be shown for multi-CPU and multi-GPU systems of various sizes. The timings show that it is possible to achieve acceptable "time-to-solution"s on a single multi-GPU server/workstation for problems that previously required using multiple HPC CPU-nodes.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8847
Streaming:
Download:
Share:
Computer Vision
Presentation
Media
Abstract:
The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expan ...Read More
Abstract:
The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expansion efficiency. We'll share our journey to deliver functional maps of the world that include building extraction, human settlement maps, mobile home parks, and facility mapping using a variety of remote sensing imagery. Our research addresses three frontier challenges; 1) distinct characteristics of remote sensing data for deep learning (including the model distribution shifts encountered with remote sensing images), multisensor sources, and data multi modalities; 2) training very large deep-learning models using multi-GPU and multi-node HPC platforms; 3) large-scale inference using ORNL's Titan and Summit with NVIDIA TensorRT. We'll also talk about developing workflows to minimize I/O inefficiency, doing parallel gradient-descent learning, and managing remote sensing data in HPC environment.  Back
 
Topics:
Computer Vision, GIS, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8420
Streaming:
Download:
Share:
Data Center & Cloud Infrastructure
Presentation
Media
Abstract:
We'll discuss how Microsoft and NVIDIA have partnered to bring a broad portfolio of GPU products to Azure to support the demands of the most bleeding-edge customers. Our talk will cover how Azuer's industry-leading accelerator technology, delivered ...Read More
Abstract:
We'll discuss how Microsoft and NVIDIA have partnered to bring a broad portfolio of GPU products to Azure to support the demands of the most bleeding-edge customers. Our talk will cover how Azuer's industry-leading accelerator technology, delivered in multiple formats, puts demanding applications in an environment in which needed resources available on demand. From high performance networking and storage, to AI-aware cluster management and job orchestration tools, Azure takes the work out of running high-performance workloads.  Back
 
Topics:
Data Center & Cloud Infrastructure, GPU Virtualization
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91017
Streaming:
Share:
 
Abstract:
Migrating and building solutions in the cloud is challenging, expensive and not nearly as performant. Oracle Cloud Infrastructure (OCI) has been working with NVIDIA on giving you the on-premises performance you need with the cloud benefits and f ...Read More
Abstract:

Migrating and building solutions in the cloud is challenging, expensive and not nearly as performant. Oracle Cloud Infrastructure (OCI) has been working with NVIDIA on giving you the on-premises performance you need with the cloud benefits and flexibility you expect. In this session we'll discuss how you can take big data and analytics workloads, database workloads, or traditional enterprise HPC workloads that require multiple components along with a portfolio of accelerated hardware and not only migrate them to the cloud, but run them successfully. We'll discuss solution architectures, showcase demos, benchmarks and take you through the cloud migration journey. We'll detail the latest instances that OCI provides, along with cloud-scale services.

  Back
 
Topics:
Data Center & Cloud Infrastructure, Performance Optimization
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91026
Streaming:
Download:
Share:
 
Abstract:
Learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present state-of-the-art techniques for ...Read More
Abstract:
Learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present state-of-the-art techniques for distributed machine learning and examine what special requirements these techniques impose on the system. We'll also give an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing that accelerates large-scale deployments in HPC and artificial intelligence.  Back
 
Topics:
Data Center & Cloud Infrastructure, Deep Learning & AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9268
Streaming:
Share:
 
Abstract:
Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background?   NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to a ...Read More
Abstract:

Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background?   NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to accelerate AI workflow deployment and time to insight. We'll discuss lessons learned about building, deploying, and managing AI infrastructure at scale from design to deployment to management and monitoring.   We will show how the DGX Pod Management software (DeepOps) along with our storage partner reference-architectures can be used for the deployment and management of multi-node GPU clusters for Deep Learning and HPC environments, in an on-premise, optionally air-gapped datacenter. The modular nature of the software also allows experienced administrators to pick and choose items that may be useful, making the process compatible with their existing software or infrastructure.  

  Back
 
Topics:
Data Center & Cloud Infrastructure, AI Application, Deployment & Inference
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9334
Streaming:
Download:
Share:
 
Abstract:
Whether it's for AI, data science and analytics, or HPC, GPU-Accelerated software can make possible the previously impossible. But it's well known that these cutting edge software tools are often complex to use, hard to manage, and diffi ...Read More
Abstract:

Whether it's for AI, data science and analytics, or HPC, GPU-Accelerated software can make possible the previously impossible. But it's well known that these cutting edge software tools are often complex to use, hard to manage, and difficult to deploy. We'll exlain how NGC solves these problems and gives users a head start on their projects by simplifying the use of GPU-Optimized software. NVIDIA product management and engineering experts will walk through the latest enhancements to NGC and give examples of how software from NGC can improve GPU-accelerated workflows.

  Back
 
Topics:
Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9504
Streaming:
Download:
Share:
 
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPUs with upstreamed improvements to community projects and with tools that are seeing b ...Read More
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPUs with upstreamed improvements to community projects and with tools that are seeing broad interest and adoption. In addition, NVIDIA is a catalyst for the broader community in enumerating key technical challenges for developers, admins and end users, and is helping to identify gaps and drive them to closure. Our talk describes NVIDIA's new developments and upcoming efforts. We'll detail progress in the most important technical areas, including multi-node containers, security, and scheduling frameworks. We'll also offer highlights of the breadth and depth of interactions across the HPC community that are making the latest, highly-quality HPC applications available to platforms that include GPUs.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9525
Streaming:
Download:
Share:
 
Abstract:
Users of HPC Systems have diverse needs and requirements for their applications and ML/DL environments. Containers help streamline and simplify environment creation, but security concerns generally prohibit popular container environments such as Dock ...Read More
Abstract:
Users of HPC Systems have diverse needs and requirements for their applications and ML/DL environments. Containers help streamline and simplify environment creation, but security concerns generally prohibit popular container environments such as Docker from running in shared computing environments. Alternate container systems for HPC address security concerns but have less documentation and resources available for users. We'll describe how our pipeline and resources at MITRE enable users to quickly build custom environments and run their code on the HPC system while minimizing startup time. Our process implements LXD containers, Docker, and Singularity on a combination of development and production HPC systems using a traditional scheduler.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9958
Streaming:
Download:
Share:
 
Abstract:
The impact of the recent Spectre and Meltdown security vulnerabilities has reached every corner of the compute ecosystem. Red Hat's Performance Engineering team has a keen interest in quantifying a wide variety of workloads in order to provide feedb ...Read More
Abstract:
The impact of the recent Spectre and Meltdown security vulnerabilities has reached every corner of the compute ecosystem. Red Hat's Performance Engineering team has a keen interest in quantifying a wide variety of workloads in order to provide feedback to upstream developers working on these problems. This presentation will detail our team's involvement over the last several months, share selected performance impacts from a variety of common enterprise and HPC workloads, how to potentially mitigate overheads, and inform the audience about what's being done to reduce impacts going forward.  Back
 
Topics:
Data Center & Cloud Infrastructure, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81017
Streaming:
Download:
Share:
 
Abstract:
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine ...Read More
Abstract:
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine learning algorithm should be shareable and easily implementable with possible options of frameworks; enable machine learning engineers to do end-to-end training pipelines that distribute and parallelize over many machines; training models should be automated and allow easy access to vast eBay datasets; engineers should be able to search past job submissions, view results, and share with others. We have built Krylov from the ground up, leveraging JVM, Python, and Go as the main technologies to build the Krylov components, while standing in shoulder of giants of technology such as Docker, Kubernetes, and Apache Hadoop. Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov HPC cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8277
Streaming:
Download:
Share:
 
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC job engines in conjunction with both Linux and ...Read More
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC job engines in conjunction with both Linux and Windows machines used for virtual desktop infrastructure. The demonstration will focus on a very minimal VMware cluster deployment using VSAN storage to host both the Linux HPC multi node cluster for CUDA workloads and a VMware Horizon view deployment for Linux and Windows Virtual Desktops performing DirectX, OpenGL, and CUDA based visualization workloads as used by engineering and analysis power users.  Back
 
Topics:
Data Center & Cloud Infrastructure, GPU Virtualization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8209
Streaming:
Share:
 
Abstract:
Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you''ll learn what containers are, how they are orchestrated to ...Read More
Abstract:
Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you''ll learn what containers are, how they are orchestrated to run together in the cloud, and how communication among containers works. You''ll get a snapshot of current support from the ecosystem, and gain insight into why NVIDIA is leading the charge to provide best performance and usability.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8642
Streaming:
Download:
Share:
 
Abstract:
Attendees will learn how NVIDIA''s Jetson TX-series processors can be scaled out to create an adaptive HPC and Supercomputing platform for bespoke deployments and edge computing environments. Advancements in composable infrastructure technology now m ...Read More
Abstract:
Attendees will learn how NVIDIA''s Jetson TX-series processors can be scaled out to create an adaptive HPC and Supercomputing platform for bespoke deployments and edge computing environments. Advancements in composable infrastructure technology now make it possible to pool and orchestrate Jetson processors for deployments with specialized parallel computing requirements. Use cases include Jetson deployments in non-embedded environments for edge computing where traditional HPC architectures are not hospitable. Clusters of NVIDIA Jetson TX- devices can be deployed in edge compute environments connected to arrays of sensors for neural net training, pattern recognition, and deep learning. Applications for autonomous transportation can also benefit from clustering massive numbers of Jetson TX- devices to simulate fleets of vehicles to train machine learning algorithms in parallel. Jetson use cases can be expanded well beyond embedded applications when deployed with PCIe-based fabric composable infrastructure technology, permitting 16x networking performance improvement over the embedded 1Gb Ethernet interface.  Back
 
Topics:
Data Center & Cloud Infrastructure, Graphics and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8539
Streaming:
Download:
Share:
 
Abstract:
Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode an ...Read More
Abstract:
Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode and decode to accelerate the processing of video contents. In this session, we will explore performance and resource utilization of various workloads that leverage different capabilities of GPU like graphics, compute and H.264 HW encode / decode. Nvidia virtualized GPUs and VMware vSphere brings in tremendous combined benefits for both GPU-based workloads and data center management via virtualization. We will present results of our research on running diverse workloads on vSphere platform using Nvidia GRID GPUs. We explore vSphere features of Suspend/Resume and vMotioning of vGPU based virtual machines. We will quantify benefits of vGPU for data center management using VMware vSphere and describe techniques for efficient management of workloads and datacenter resources.  Back
 
Topics:
Data Center & Cloud Infrastructure, GPU Virtualization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8250
Streaming:
Download:
Share:
 
Abstract:
NVIDIA DCGM is a monitoring and management daemon, GPU Diagnostic, and SDK geared towards managing GPUs in a cluster environment. DCGM is widely deployed both internally at NVIDIA and externally at large HPC labs and Cloud Service Providers. We will ...Read More
Abstract:
NVIDIA DCGM is a monitoring and management daemon, GPU Diagnostic, and SDK geared towards managing GPUs in a cluster environment. DCGM is widely deployed both internally at NVIDIA and externally at large HPC labs and Cloud Service Providers. We will go over the core features of DCGM and features that have been added in the last year. We will also demonstrate how DCGM can be used to monitor GPU health and alert on GPU errors using both the dcgmi command-line tools and the DCGM SDK.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8505
Streaming:
Download:
Share:
 
Abstract:
Pre and post process CAE data near your cloud compute to save time, money, and IT headaches. Whether you're building the next supercar or visualizing a medical dataset, you can now eliminate the need for data transfer to and from on-premises by runn ...Read More
Abstract:
Pre and post process CAE data near your cloud compute to save time, money, and IT headaches. Whether you're building the next supercar or visualizing a medical dataset, you can now eliminate the need for data transfer to and from on-premises by running professional design and engineering applications in the cloud. See new Oracle Cloud Infrastructure GPUs in live demonstrations of data transfer, CAD pre-processing, and CAE post processing.  Back
 
Topics:
Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8988
Streaming:
Share:
 
Abstract:
New algorithms leverage the algebraic strengths of GPUs far beyond rendering visuals. They unlock opportunities for data analysis leveraging computer vision and artificial neural networks. Earlier this year we set out to investigate the deployment of ...Read More
Abstract:
New algorithms leverage the algebraic strengths of GPUs far beyond rendering visuals. They unlock opportunities for data analysis leveraging computer vision and artificial neural networks. Earlier this year we set out to investigate the deployment of power-efficient GPUs in commodity hardware. We did not focus on supercomputers, but instead exercised GPUs within a homogeneous set of compute nodes like those used to scale Apache Hadoop or Apache Spark clusters. Our work focused on inference deploying models and GPU acceleration for analysis tasks such as feature extraction, identification, and classification not on training or building models, tasks likely better suited to HPC-class machines. Our experiments investigated applications that aren't feasible at scale on existing CPUs, such as malware detection and object detection in images. We'll cover inference on Tesla P4 GPUs in scale-out architectures, leveraging nvidia-docker, Caffe, Torch, and TensorRT.  Back
 
Topics:
Data Center & Cloud Infrastructure, Artificial Intelligence and Deep Learning, Accelerated Data Science
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7190
Download:
Share:
 
 
Topics:
Data Center & Cloud Infrastructure
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1719
Share:
 
Abstract:
Why do HPC in a cloud? How to do HPC (aaS), with GPU passthrough, in OpenStack? How to create full GPU HPC cluster, from scratch, on demand, under five minutes, all equipped with NVIDIA's DCGM and CUDA environment, and deep learning libraries/framew ...Read More
Abstract:
Why do HPC in a cloud? How to do HPC (aaS), with GPU passthrough, in OpenStack? How to create full GPU HPC cluster, from scratch, on demand, under five minutes, all equipped with NVIDIA's DCGM and CUDA environment, and deep learning libraries/frameworks? Hybrid clouds with GPUs spanning OpenStack and AWS? How to easily and automatically move HPC user data and workloads between the private and public cloud? How to dynamically scale a virtualized HPC cluster, both horizontally (within private cloud) and vertically (to public cloud)? We'll answer these questions during a deep dive into the world of HPC on top of OpenStack and AWS. We'll discuss many ways OpenStack private clouds can be used for bursting HPC workloads, HPC-as-a-service, XaaS (anything-as-a-service), and creating hybrid clouds composed of on-prem private/community cloud OpenStack deployment, which dynamically scale them to public clouds, like AWS. Session includes demo.  Back
 
Topics:
Data Center & Cloud Infrastructure, Federal, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7161
Download:
Share:
 
Abstract:
Learn how you can scale your Deep Learning & traditional HPC-based workloads in Azure using powerful NVIDIA Tesla-based GPUs and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. This is a great session ...Read More
Abstract:

Learn how you can scale your Deep Learning & traditional HPC-based workloads in Azure using powerful NVIDIA Tesla-based GPUs and scale out using Azure's low-latency networking backed by InfiniBand infrastructure. This is a great session to learn about Azure's accelerated offerings and roadmap in the future. This session will cover specific announcements on what's to come in both hardware and software. This is a session you don't want to miss! 

  Back
 
Topics:
Data Center & Cloud Infrastructure, Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7204
Download:
Share:
 
Abstract:
Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ...Read More
Abstract:

Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ww look at how to integrate Heterogenous Computing with Cloud technologies.

  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7296
Download:
Share:
 
Abstract:
Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, ...Read More
Abstract:
Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, GPU-enabled high performance computing solution for machine learning and data science by drawing on the experiences gained while IBM Research built its Cognitive Computing Cluster. We'll start by discussing how to build a secure, shared-resource computing cluster optimized for deep learning. Next, we'll cover how to provide deep learning frameworks supporting speech, vision, language, and text processing and their underlying primitives. Finally, we'll discuss how to build a best practice knowledge base to improve research quality and accelerate discovery.  Back
 
Topics:
Data Center & Cloud Infrastructure, Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7350
Download:
Share:
 
Abstract:
M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by ...Read More
Abstract:
M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by Monash's R@CMon Research Cloud team. Built to support Monash University's high-throughput instrument processing requirements, M3 is half-half GPU-accelerated and CPU-only. We'll discuss the design and tech used to build this innovative platform as well as detailing approaches and challenges to building GPU-enabled and HPC clouds.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7366
Download:
Share:
 
Abstract:
We'll explain the strategy on how to design large-scale deep learning platforms using HPC and Docker technology to realize high-performance training and scoring on GPU clusters. Topics will include how to analyze the deep learning GPU application's ...Read More
Abstract:
We'll explain the strategy on how to design large-scale deep learning platforms using HPC and Docker technology to realize high-performance training and scoring on GPU clusters. Topics will include how to analyze the deep learning GPU application's characteristics, such as GPU memory bandwidth, memory capacity, and GPU utilization when run on a GPU cluster with Teye tool; how to handle big data and improve the data reading performance with Lustre; how to optimize the network communication with IB technology; and how to ease deployment and scheduling different deep learning frameworks on a large GPU cluster with Docker.  Back
 
Topics:
Data Center & Cloud Infrastructure, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7678
Download:
Share:
 
Abstract:
Learn why Scyld Cloud Workstation, a browser-based, high-quality, low-bandwidth, 3D-accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need f ...Read More
Abstract:

Learn why Scyld Cloud Workstation, a browser-based, high-quality, low-bandwidth, 3D-accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated -- allowing for easy integration with industry security policies.

  Back
 
Topics:
Data Center & Cloud Infrastructure, Other, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7818
Download:
Share:
Deep Learning & AI Frameworks
Presentation
Media
Abstract:
Learn how to implement and analyze a simple deep learning input pipeline pattern that prevents slowdowns from input queue exhaustion on accelerated HPC systems with limited impact to model performance. Queue exhaustion occurs because the throughput-d ...Read More
Abstract:
Learn how to implement and analyze a simple deep learning input pipeline pattern that prevents slowdowns from input queue exhaustion on accelerated HPC systems with limited impact to model performance. Queue exhaustion occurs because the throughput-driven dequeue rate is greater than the enqueue rate, which is bound by storage access bandwidth. In this session we will describe a technique that prevents queue exhaustion by artificially slowing the effective dequeue rate, without sacrificing substantial validation set performance. An example using TensorFlow is presented, and the resultant optimization step speedup and model performance are analyzed across several HPC resource configurations.  Back
 
Topics:
Deep Learning & AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8674
Streaming:
Download:
Share:
 
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distribute ...Read More
Abstract:
This talk will introduce two programming models OpenSHMEM and SharP to address the programming challenges of HPC systems with multiple GPUs per node, high-performing network, and huge amount of hierarchical heterogeneous memory. SharP uses distributed data-structure approach to abstract the memory and provide uniform interfaces for data abstractions, locality, sharing and resiliency across these memory systems. OpenSHMEM is a well-established library based PGAS programming model for programming HPC systems. We show how NVSHMEM, an implementation of OpenSHMEM, can enable communication in the CUDA kernels and realize OpenSHMEM model for GPU-based HPC systems. These two complementary programming models provide ability to program emerging architectures for Big-Compute and Big-Data applications. After the introduction, we will present experimental results for a wide-variety of applications and benchmarks.  Back
 
Topics:
Deep Learning & AI Frameworks
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1716
Share:
 
Abstract:
This talk is a summary about the ongoing HPC visualization activities, as well as a description of the technologies behind the developer-zone shown in the booth.
 
Topics:
Deep Learning & AI Frameworks
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1735
Download:
Share:
 
 
Topics:
Deep Learning & AI Frameworks
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1736
Download:
Share:
Federal
Presentation
Media
Abstract:
We'll highlight Sentinel, a system for real-time in-situ intelligent video analytics on mobile computing platforms. Sentinel combines state-of-the-art techniques in HPC with dynamic mode decomposition (DMD), a proven method for data reduction and an ...Read More
Abstract:
We'll highlight Sentinel, a system for real-time in-situ intelligent video analytics on mobile computing platforms. Sentinel combines state-of-the-art techniques in HPC with dynamic mode decomposition (DMD), a proven method for data reduction and analysis. By leveraging CUDA, our early system prototype achieves significantly better-than-real-time performance for DMD-based background/foreground separation on high-definition video streams, thereby establishing the efficacy of DMD as the foundation on which to build higher level real-time computer vision techniques. We'll present an overview of the Sentinel system, including the application of DMD to background/foreground separation in video streams, and outline our ongoing efforts to enhance and extend the prototype system.  Back
 
Topics:
Federal, Intelligent Video Analytics, In-Situ & Scientific Visualization, Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7685
Download:
Share:
Finance
Presentation
Media
Abstract:
The Pascal generation of GPUs is bringing an increased compute density to data centers and NVLink on IBM Power 8 CPUs makes this compute density ever more accessible to HPC applications. However, reduced memory-to-compute ratios present some uni ...Read More
Abstract:

The Pascal generation of GPUs is bringing an increased compute density to data centers and NVLink on IBM Power 8 CPUs makes this compute density ever more accessible to HPC applications. However, reduced memory-to-compute ratios present some unique challenges for the cost of throughput-oriented compute. We'll present a case study of moving up production Monte Carlo GPU codes to IBM's "Minsky" S822L servers with NVIDIA Tesla P100 GPUs.

  Back
 
Topics:
Finance, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7668
Download:
Share:
GPU Virtualization
Presentation
Media
Abstract:
VDI users across multiple industries can now harness the power of the world's most advanced virtual workstation to enable increasingly demanding workflows. This session brings together graphics virtualization thought leaders and experts from ...Read More
Abstract:

VDI users across multiple industries can now harness the power of the world's most advanced virtual workstation to enable increasingly demanding workflows. This session brings together graphics virtualization thought leaders and experts from across the globe who have deep knowledge of NVIDIA virtual GPU architecture and years of experience implementing VDI across multiple hypervisors. Panelists will discuss how they transformed organizations, including how they leveraged multi-GPU support to boost GPU horsepower for photorealistic rendering and data-intensive simulation and how they used GPU-Accelerated deep learning or HPC VDI environments with ease using NGC containers.

  Back
 
Topics:
GPU Virtualization
Type:
Panel
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9870
Streaming:
Download:
Share:
 
Abstract:
Universities have increasing demand for Deep Learning/AI classrooms or labs but are constrained by cost and availability of physical classroom labs. Students require access to a lab 24x7 to work on projects and assignments and find that they have to ...Read More
Abstract:
Universities have increasing demand for Deep Learning/AI classrooms or labs but are constrained by cost and availability of physical classroom labs. Students require access to a lab 24x7 to work on projects and assignments and find that they have to wait for HPC clusters to be free when submitting their jobs for training. In the past, students and researchers are tethered and require expensive data scientist workstations. Virtual GPUs provide a highly secure, flexible, accessible solution to power AI and deep learning coursework and research. Learn how Nanjing University is using virtual vGPUs with NGC for teaching AI and Deep learning courses, empowering researchers with the GPU power they need, and providing students with mobility to do coursework anywhere. Similarly, discover how other universities are maximizing their data center resources by running VDI, HPC and AI workloads on common infrastructure and even how companies like Esri are using virtualized deep learning classes to educate their user base. Discover the benefits of vGPUs for AI and how you can setup your environment to achieve optimum performance, as well as the tools you can use to manage and monitor your environment as you scale.  Back
 
Topics:
GPU Virtualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9888
Streaming:
Download:
Share:
 
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC or Deep Learning job engines in conjunctio ...Read More
Abstract:

Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC or Deep Learning job engines in conjunction with both Linux and Windows machines used for virtual desktop infrastructure. The demonstration will focus on a very minimal VMware vSphere cluster deployment using VSAN storage or RedHat RHVM cluster deployment to host both the Linux HPC multi node cluster for CUDA workloads and a VMware Horizon view or Citrix XenDesktop deployment for Linux and Windows Virtual Desktops performing DirectX, OpenGL, OpenCL, and CUDA based visualization workloads as used by engineering and analysis power users.

  Back
 
Topics:
GPU Virtualization, HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8107
Streaming:
Share:
 
Abstract:
What if you could combine VDI, HPC, Deep Learning and AI all together on one platform with VMware vSphere 6.7 and NVIDIA virtual GPU (vGPU) technology? In this session, we'll guide you through how to set up a uniform, well-performing platfor ...Read More
Abstract:

What if you could combine VDI, HPC, Deep Learning and AI all together on one platform with VMware vSphere 6.7 and NVIDIA virtual GPU (vGPU) technology? In this session, we'll guide you through how to set up a uniform, well-performing platform. We will cover the virtualisation of HPC, the sharing of compute resources with VDI, and the implementation of mixed workloads leveraging NVIDIA vGPU technology, and automation of the platform. If you want to have fun at work while preparing for the the future, don't miss this N3RD session!

  Back
 
Topics:
GPU Virtualization
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8156
Streaming:
Download:
Share:
 
Abstract:
With the latest release of NVIDIA vGPU software the world's most powerful virtual workstation gets even more powerful. Learn more about how our latest enhancements enable your data center to be more agile and scale your data center to meet t ...Read More
Abstract:

With the latest release of NVIDIA vGPU software the world's most powerful virtual workstation gets even more powerful. Learn more about how our latest enhancements enable your data center to be more agile and scale your data center to meet the needs of thousands to ten-thousands and even hundreds of thousands of users. The newest release of NVIDIA virtual GPU software adds support for more powerful VMs, which can be managed from the cloud or from the on premises data center, or private cloud. With support for live migration of GPU-enabled VMs, IT can truly deliver high availability and a quality user experience. IT can further ensure they get the most out of their investments with the ability to re-purpose the same infrastructure that runs VDI during the day to run HPC and other compute workloads at night. In this session, we will unveil the new features of NVIDIA vGPU solutions and demonstrate how GPU virtualization enables you to easily support the most demanding users and scale virtualized, digital workspaces on an agile and flexible infrastructure, from the cloud and as well as the on premises data center.

  Back
 
Topics:
GPU Virtualization
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8513
Streaming:
Download:
Share:
 
Abstract:
Enabling access to NVIDIA compute acceleration is an key component of VMware's approach to enabling HPC and ML workloads on vSphere. In this talk we will discuss the various available options, provide performance results, and share some hard ...Read More
Abstract:

Enabling access to NVIDIA compute acceleration is an key component of VMware's approach to enabling HPC and ML workloads on vSphere. In this talk we will discuss the various available options, provide performance results, and share some hardware and software tips and guidance to help you meet the needs of your organization's data scientists and researchers.

  Back
 
Topics:
GPU Virtualization
Type:
Talk
Event:
VMWorld
Year:
2019
Session ID:
VM9046
Download:
Share:
 
Abstract:
Come by and we can chat about how you can use NVIDIA's GPU Cloud (NGC) container registry to deploy your GPU accelerated application on a kubernetes cluster.
 
Topics:
GPU Virtualization
Type:
Talk
Event:
VMWorld
Year:
2019
Session ID:
VM9047
Share:
Graphics and AI
Presentation
Media
Abstract:
We'll discuss the challenges of the GPU/DRAM bandwidth in high-performance systems. Graphics memory is a key differentiator for addressing these challenges in AI in areas that from the data center to the smart edge. We'll compare discrete GDDR memo ...Read More
Abstract:
We'll discuss the challenges of the GPU/DRAM bandwidth in high-performance systems. Graphics memory is a key differentiator for addressing these challenges in AI in areas that from the data center to the smart edge. We'll compare discrete GDDR memory and high-bandwidth memory to identify solution space for these options. We'll also discuss how applications in graphics, HPC, and AI benefit from more bandwidth during presentations at the Micron booth on the exhibit floor.  Back
 
Topics:
Graphics and AI, HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9968
Streaming:
Download:
Share:
 
Abstract:
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation and computational tools like a micros ...Read More
Abstract:
In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation and computational tools like a microscope, to seek a new cure for cancer or predict hospitalisation prevention. In this presentation, we will demonstrate new visual analytics techniques that use various mixed reality approaches that link simulations with collaborative, complex and interactive data exploration, placing the human-in-the-loop. In the recent days, thanks to advances in graphics hardware and compute power (especially GPGPU and modern Big Data / HPC infrastructures), the opportunities are immense, especially in improving our understanding of complex models that represent the real- or hybrid-worlds. Use cases presented will be drawn from ongoing research at CSIRO, and Expanded Percaption and Interaction Centre (EPICentre) using world class GPU clusters and visualisation capabilities.  Back
 
Topics:
Graphics and AI, In-Situ & Scientific Visualization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8317
Streaming:
Download:
Share:
HPC and AI
Presentation
Media
Abstract:
Well discuss the plan for the human exploration of Mars, which will require the retropropulsive deceleration of payloads. Conventional computational capabilities dont allow for the simulation of interactions between the atmosphere and retropropulsion ...Read More
Abstract:
Well discuss the plan for the human exploration of Mars, which will require the retropropulsive deceleration of payloads. Conventional computational capabilities dont allow for the simulation of interactions between the atmosphere and retropropulsion exhaust plumes at sufficient spatial resolution to resolve governing phenomena with a high level of confidence. Researchers from NASA Langley and Ames Research Centers, NVIDIA Corporation, and Old Dominion University have developed a GPU-accelerated version of Langley''s FUN3D flow solver. An ongoing campaign on the Summit supercomputer at Oak Ridge National Lab is using this capability to apply detached eddy simulation methods to retropulsion in atmospheric environments for nominal operation of a human-scale Mars lander concept. Well give an overview of the Mars lander fluid dynamics project, the history and details of FUN3D GPU development, and the optimization and performance of the code on emerging HPC architectures.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91220
Download:
Share:
 
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resou ...Read More
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resources from Google Cloud to supply their users with the computing power they need, from exploration and modeling to visualization.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91040
Streaming:
Download:
Share:
 
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to ...Read More
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to exceed the limits of a single GPU and how they can reduce computational time for larger problem sizes.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9158
Streaming:
Download:
Share:
 
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the ...Read More
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the CGYRO code, built by researchers at General Atomics to effectively and efficiently simulate plasma evolution over multiple scales that range from electrons to heavy ions. Fusion plasma simulations are compute- and memory-intensive and usually run on leadership-class, GPU-Accelerated HPC systems like Oak Ridge National Laboratory's Titan and Summit. We'll explain how we designed and implemented CGYRO to make good use of the tens of thousands of GPUs on such systems, which provide simulations that bring us closer to fusion as an abundant clean energy source. We'll also share benchmarking results of both CPU- and GPU-Based systems.  Back
 
Topics:
HPC and AI, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9202
Streaming:
Download:
Share:
 
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different ...Read More
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9476
Streaming:
Download:
Share:
 
Abstract:
Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frame ...Read More
Abstract:

Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9501
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, port ...Read More
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, porting, and performance tuning to achieve performance on NVIDIA Tesla V100 GPUs, while maintaining portability to our other supported platforms. We'll explain why this process poses many challenges and how LLNL code teams have worked with the Sierra Center of Excellence to build experience and expertise in porting complex multi-physics simulation tools to NVIDIA GPU-Based HPC systems. We'll also provide an overview of this porting process, the abstraction technologies employed, lessons learned, and current challenges.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9512
Streaming:
Download:
Share:
 
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We ...Read More
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.  Back
 
Topics:
HPC and AI, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share:
 
Abstract:
We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's ...Read More
Abstract:

We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's most challenging science and national security problems. We will discuss how this multi-year effort paid off with exciting possibilities for new science. We'll also outline using GPUs in our traditional HPC platforms and workflows is adding an exciting new dimension to simulation-based science, prompting LLNL to rethink how we perform future simulations as intelligent simulation. We'll give an overview of the application preparation process that led up to Sierra's deployment, as well as look at current and planned research aimed at riding the AI and machine learning waves in pursuit of game-changing science.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9560
Streaming:
Download:
Share:
 
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those r ...Read More
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those running on a new breed of dense, GPU-Accelerated servers such as the Summit and Sierra supercomputers and the NVIDIA DGX line of servers.  Back
 
Topics:
HPC and AI, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9653
Streaming:
Download:
Share:
 
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thous ...Read More
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thousands of GPUs. We'll also discuss the state of integration of NCCL in deep learning frameworks.  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9656
Streaming:
Share:
 
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate com ...Read More
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth bound and codes like Breadth First Search that have a dynamic communication pattern.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9677
Streaming:
Download:
Share:
 
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density func ...Read More
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density functional theory methods. We'll show how our work is applicable to different atom types and architectures and how it avoids relying on the physical model. Instead, it uses a purely mathematical representation, which reduces the need for human intervention.  Back
 
Topics:
HPC and AI, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9843
Streaming:
Download:
Share:
 
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the r ...Read More
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the recent trend and solutions for accelerating materials discovery and discuss future prospects.  Back
 
Topics:
HPC and AI, Industrial Inspection
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9967
Streaming:
Download:
Share:
 
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it wil ...Read More
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it will lead the future memory trend owing to the growth in AI, ML, and HPC applications. We will discuss a technical overview of HBM technology and the future trends of HBM.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9978
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time whe ...Read More
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time when Moore's Law is tapering off, and the slowdown in the speed of increase single-threaded performance, thus requiring a new compute paradigm, accelerated computing, powered by massively parallel GPUs.  Back
 
Topics:
HPC and AI, AI & Deep Learning Research
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9981
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the ...Read More
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the complete requirements of the data life cycle including initial acquisition, processing, inference, long-term storage, and driving data back into the field to sustain ever-growing processes of improvement. As the data landscape evolves with emerging requirements, the relationship between compute and data is undergoing a fundamental transition. We will provide examples of data life cycles in production triggering diverse architectures from turnkey reference systems with DGX and DDN A3I to tailor-made solutions.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9983
Streaming:
Share:
 
Abstract:
Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as ...Read More
Abstract:

Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1807
Download:
Share:
 
Abstract:
For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inacc ...Read More
Abstract:

For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1810
Download:
Share:
 
Abstract:
The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed a ...Read More
Abstract:

The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed and Unified memory support, datatype processing, and support for OpenPOWER and NVLink will be highlighted for HPC applications. We will also present novel designs and enhancements to the MPI library to boost performance and scalability of Deep Learning frameworks on GPU clusters. Container-based solutions for GPU-based cloud environment will also be highlighted.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1812
Download:
Share:
 
Abstract:
AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the ...Read More
Abstract:

AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1814
Download:
Share:
 
Abstract:
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs o ...Read More
Abstract:

The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we will work with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1816
Download:
Share:
 
Abstract:
Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric mod ...Read More
Abstract:

Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric models, and the mushrooming of data volumes. Our team at the National Center for Atmospheric Research is pursuing a hybrid approach to surmounting these barriers that combines machine learning techniques and GPU-acceleration to produce, we hope, a new generation of ultra-fast models of enhanced fidelity with nature and increased value to society.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1818
Download:
Share:
 
Abstract:
The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, suc ...Read More
Abstract:

The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, such as a specific astronomical event or scientific experiment. By combining vast amounts of labeled surrogate data with advanced few-shot learning, we have demonstrated success in leveraging small data in deep learning. In this talk, we will discuss these exciting results and explore the scientific innovations that made this possible.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1822
Download:
Share:
 
Abstract:
Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with tra ...Read More
Abstract:

Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with traditional scientific methods to push the state-of-the-art in many disciplines. We will provide an overview of some of the thirty projects we have stewarded, demonstrating how we have leveraged computing and analytics in fields as diverse as ultrasensitive detection to metabolomics to atmospheric science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1823
Download:
Share:
 
Abstract:
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NIVIDA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism ...Read More
Abstract:
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NIVIDA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. 
  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1833
Download:
Share:
 
Abstract:
We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML frame ...Read More
Abstract:

We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML framework on commodity Azure VMs that scales to tens of terabytes and thousands of cores, while achieving better accuracy than state-of-the-art. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1842
Download:
Share:
 
Abstract:
PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 insti ...Read More
Abstract:

PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 institutions. Bridges emphasizes "nontraditional" uses that span the life, physical, and social sciences, engineering, and business, many of which are based on AI or AI-enabled simulation. We describe the characteristics of Bridges that have made it a success, and we highlight several inspirational results and how they benefited from the system architecture. We then introduce "Bridges AI", a powerful new addition for balanced AI capability and capacity that includes NVIDIA's DGX-2 and HPE NVLink-connected 8-way Volta servers. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1832
Download:
Share:
 
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data ce ...Read More
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment. The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. However, if the desired application is not available on NGC, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting. HPC Container Maker (HPCCM) [1] is an open-source project that addresses the challenges of creating HPC application containers. HPCCM encapsulates into modular building blocks the best practices of deploying core HPC components with container best practices, to reduce container development effort, minimize image size, and take advantage of image layering. HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image from the specification details of how to configure, build, and install a component. This separation also enables the best practices of HPC component deployment to transparently evolve over time.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80022
Download:
Share:
 
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。 ...Read More
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8029
Download:
Share:
 
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the s ...Read More
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the state-of-the-art system designed for HPC and cognitive computing. This system also introduces NVLink 2.0 high-speed connectivity between CPU and GPU, along with coherent device memory. System characteristics such as CPU and GPU compute and memory throughput, NVLink latency, and bandwidth play key roles in application performance. We'll demonstrate how each of these influences application performance through a case study.  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8309
Streaming:
Download:
Share:
 
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain sci ...Read More
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain science application development projects focusing on preparing codes for efficient use on Summit. Over the last three years, these teams have developed and executed a development plan based on detailed information about Summit's architecture and system software stack. This presentation will highlight the progress made by the teams that have used Titan, the 27 PF Cray XK7 with NVIDIA K20X GPUs, SummitDev, an early IBM Power8+ access system with NVIDIA P100 GPUs, and since very recently Summit, OLCF's new IBM Power9 system with NVIDIA V100 GPUs. The program covers a wide range of domain sciences, with applications including ACME, DIRAC, FLASH, GTC, HACC, LSDALTON, NAMD, NUCCOR, NWCHEM, QMCPACK, RAPTOR, SPECFEM, and XGC  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8908
Streaming:
Download:
Share:
 
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use t ...Read More
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use the same Python code on different platforms like X86, RISC and ARM. The python development community is growing fast and many community members are interested on how to start moving to GPU accelerated programming but don't know how to start and what is needed. We'll go through the steps and adoption path to start developing python solutions taking advantage of GPU acceleration, including some details, advantages and challenges for the strongest and more popular python3 modules to be used with GPUs: scikit-cuda, PyCUDA, Numba, cudamat and cupy. Some code samples and programs execution statistics will be shown as a performance analysis exercising as well.  Back
 
Topics:
HPC and AI, Tools & Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8214
Streaming:
Download:
Share:
 
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based ...Read More
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based GPUs as well as GPU systems with the NVLink interface.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8999
Streaming:
Share:
 
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal example ...Read More
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal examples of this complexity. We use the scalable FLASH code to model these astrophysical cataclysms, incorporating hydrodynamics, thermonuclear kinetics, and self-­?gravity across considerable spans in space and time. Using OpenACC and GPU-­?enabled libraries coupled to new NVIDIA GPU hardware capabilities, we have improved the physical fidelity of these simulations by increasing the number of evolved nuclear species by more than an order-­?of-­? magnitude. I will discuss these and other performance improvements to the FLASH code on the Summit supercomputer at Oak Ridge National Laboratory.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8926
Streaming:
Share:
 
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and ...Read More
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and best practices from real applications. Developers interested in writing OpenMP codes for GPUs will learn how best to achieve good performance and portability.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8344
Streaming:
Share:
 
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific an ...Read More
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific and engineering workflows. In this talk, Vic will discuss an application of machine learning to develop a fast-running surrogate model that captures the dynamics of an industrial multiphase fluid flow. He will also discuss an improved population search method that can help the analyst explore a high-dimensional parameter space to optimize production while reducing the model uncertainty.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8828
Streaming:
Download:
Share:
 
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distribu ...Read More
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distributed shared memory systems can be carried over to CPUs and GPUs in a cluster. Unicorn is designed for easy programmability and provides a deterministic execution environment. Device, node and cluster management are completely handled by the runtime and no related API is provided to the application programmer. Load balancing, scheduling and scalability are also fully transparent to the application code. Programs written on one cluster can be run verbatim on a different cluster. Application code is agnostic to data placement within the cluster as well as the changes in network interfaces and data availability pattern. Unicorn''s programming model, being deterministic, by design eliminates several data races and deadlocks. Unicorn''s runtime employs several data optimizations including prefetching and subtask streaming in order to overlap communication and computation. Unicorn employs pipelining at two levels first to hide data transfer costs among cluster nodes and second to hide transfer latency between CPUs and GPUs on all nodes. Among other optimizations, Unicorn''s work-stealing based scheduler employs a two-level victim selection technique to reduce the overhead of steal operations. Further, it employs special proactive and aggressive stealing mechanism to prevent the said pipelines from stalling (during a steal operation). We will showcase the scalability and performance of Unicorn on several scientific workloads. We will also demonstrate the load balancing achieved in some of these experiments and the amount of time the runtime spends in communications. We find that parallelization of coarse-grained applications like matrix multiplication or 2D FFT using our system requires only about 30 lines of C code to set up the runtime. The rest of the application code is regular single CPU/GPU implementation. This indicates the ease of extending sequential code to a parallel environment. We will be showing the efficiency of our abstraction with minimal loss on performance on latest GPU architecture like Pascal and Volta. Also we will be comparing our approach to other similar implementations like StarPU-MPI and G-Charm.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8565
Streaming:
Download:
Share:
 
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. Th ...Read More
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. There is a need for a novel approach towards the design of large scale collective communication algorithms for CUDA aware MPI runtimes. The session will deep dive into our implementation of these collectives and its performance advantages on IBM POWER 9 Systems with NVIDIA V100 GPUs for OSU benchmark and Distributed TensorFlow.  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8306
Streaming:
Download:
Share:
 
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer pr ...Read More
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer processes are intrinsically coupled and must be considered simultaneously in complex domains associated with devices such as gas-turbine and rocket engines. The problem is compounded by the effects of turbulence and high-pressure phenomena, which require treatment of nonideal fluid mixtures at supercritical conditions. The combination of complex multicomponent property evaluations along with the computational grid resolution requirements makes these simulations expensive and cumbersome. Recent advances in high performance computing (HPC) systems, such as the graphics processing unit (GPU) based architectures, provides an opportunity for significant advances in dealing with these complexities while reducing the time to solution.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8910
Streaming:
Share:
 
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale ...Read More
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale data problems. The LTFB approach creates a set of Deep Neural Network (DNN) models and trains each instance of these models independently and in parallel. Periodically, each model selects another model to pair with, exchanges models, and then run a local tournament against held-out tournament datasets. The winning model continues training on the local training datasets. This new approach maximizes computation and minimizes amount of synchronization required in training deep neural network, a major bottleneck in existing synchronous deep learning algorithms. We evaluate our proposed algorithm on two HPC machines at Lawrence Livermore National Laboratory including an early access IBM Power8+ with NVIDIA Tesla P100 GPUs machine. Experimental evaluations of the LTFB framework on two popular image classification benchmark: CIFAR10 and ImageNet, show significant speed up compared to the sequential baseline.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8829
Streaming:
Share:
 
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter ...Read More
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter and environmental challenges, network performance and optimization, data pipeline and storage challenges as well as workload orchestration and optimization. You will learn more about open architectures for HPC, AI and Deep Learning, combining flexible compute architectures, rack scale platforms, and software-defined networking and storage, to provide a scalable software-defined deep learning environment. We will discuss strategies, providing insight into everything from specialty compute for training vs. inference to high performance storage for data workflows to orchestration and workflow management tools. We will also discuss deploying deep learning environments from development to production at scale from private cloud to public cloud.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8972
Streaming:
Download:
Share:
 
Abstract:
Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamen ...Read More
Abstract:

Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We''ll also discuss the expected performance in combination with the new computing platforms that emerged last year.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8474
Download:
Share:
 
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take ad ...Read More
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take advantage of all of Julia's most powerful features: generic programming, n-dimensional kernels, higher order functions and custom numeric types. We'll give overview the compiler's implementation and performance characteristics via the Rodinia benchmark suite. We'll show how these techniques enable highly flexible AI libraries with state-of-the-art performance, and allow a major government user to run highly computational threat modelling on terabytes of data in real time.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8269
Download:
Share:
 
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will ...Read More
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will also cover current technical and business challenges, and the future considerations for next-generation HBM line-up and many more.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8949
Streaming:
Download:
Share:
 
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles t ...Read More
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis.  Back
 
Topics:
HPC and AI, 5G & Edge, In-Situ & Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8561
Streaming:
Download:
Share:
 
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms ...Read More
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms to market faster, and getting the most productivity out of your researcher's. At this session, Greg Schmidt introduces the new HPE Apollo 6500 Gen10 System with NVLink for the enterprise. This innovative system design allows for a high degree of flexibility with a range of configuration and topology options to match your workloads. Learn how the Apollo 6500 unlocks business value from your data for AI.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8969
Streaming:
Share:
 
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file sys ...Read More
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file systems, and MPI backends. We'll discuss examples of how deep learning workflows are being deployed on next-generation systems at the Oak Ridge Leadership Computing Facility. We'll share benchmarks between native compiled versus containers on Power systems, like Summit, as well as best practices for deploying learning and models on HPC resources on scientific workflows.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8551
Streaming:
Download:
Share:
 
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long lat ...Read More
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8595
Streaming:
Download:
Share:
 
Abstract:
This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA develo ...Read More
Abstract:

This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.

  Back
 
Topics:
HPC and AI, Data Center & Cloud Infrastructure, AI & Deep Learning Business Track (High Level), HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8688
Streaming:
Download:
Share:
 
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-&sh ...Read More
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-­?in-­?Cell method, which models the plasma as billions of quasiparticles representing ions and electrons. Ostensibly, the process of advancing each particle in time is embarrassingly parallel. However, the electric and magnetic fields must be known in order to push the particle, which requires an implicit gather operation from XGC's sophisticated unstructured mesh. In this session, we'll show how careful mapping of field and particle data structures to GPU memory allowed us to decouple the performance of the critical electron push routine from size of the simulation mesh and allowed the true particle parallelism to dominate. This improvement enables performant, high resolution, ITER scale simulations on Summit.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8909
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming ...Read More
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
HPC and AI, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23031
Download:
Share:
 
Abstract:
Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the impor ...Read More
Abstract:

Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the importance of the necessary reshaping of the architecture of the software using its GPU accelerated analytics. Minsky featuring far better GPU memory bandwidth and GPU-CPU interconnect rase the bar even further. Murex will show how it has handled this new challenge for its business.

  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23209
Download:
Share:
 
Abstract:
HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation ...Read More
Abstract:

HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation power usually involve heterogeneous computation resources, e.g., GPUs and InfiniBand as installed on HPE Apollo.   NovuMinds NovuForce system leveraging state of art technologies make the deployment and configuration procedure fast and smooth. NovuForce deep learning softwares within the docker image has been optimized for the latest technology like NVIDIA Pascal GPU and infiniband GPUDirect RDMA. This flexibility of the software, combined with the broad GPU servers in HPE portfolio, makes one of the most efficient and scalable solutions.

  Back
 
Topics:
HPC and AI, Performance Optimization, Tools & Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23250
Download:
Share:
 
Abstract:
Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it ...Read More
Abstract:

Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.

  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23277
Download:
Share:
 
Abstract:
We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us sat ...Read More
Abstract:

We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us satisfy strict real-time restrictions dictated by the imaging hardware used to scan the rock samples. We present a detailed description of the workflow from a DRP approach perspectives, our algorithm and optimization techniques and performance results on the latest NVIDIA GPU generations.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing, Video & Image Processing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23303
Download:
Share:
 
Abstract:
In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guida ...Read More
Abstract:

In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guidance and prepare the scientific communities to challenges of exascale architectures. The talk will present the OpenPOWER platform bought by GENCI and provided to the scientific community. Then, it will present the first results obtained on the platform for a set of about 15 applications using all the solutions provided to the users (CUDA,OpenACC,OpenMP,...). Finally, a presentation about one specific application will be made regarding its porting effort and techniques used for GPUs with both OpenACC and OpenMP.

  Back
 
Topics:
HPC and AI, Performance Optimization, Programming Languages
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23183
Download:
Share:
 
Abstract:
Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes han ...Read More
Abstract:

Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes hand in hand with the increasing demand of Wireless-VR Hardware solutions, what hardware is out on the market now, what is coming and how can wireless solutions - whether standalone devices or Addons - create a higher value for your VR application? How large-scale location based VR and hardware manufacturers are expanding the boundaries of the VR industry, both for Entertainment and B2B?

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23388
Download:
Share:
 
Abstract:
With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughpu ...Read More
Abstract:

With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughput of multiple applications, covering all aspects of the workflow, including data analysis and visualisation. We will discuss ongoing efforts to further integrate these extreme-scale compute and data services with infrastructure services of the cloud. As Tier-0 systems of PRACE, Piz Daint is accessible to all scientists in Europe and worldwide. It provides a baseline for future development of exascale computing. We will present a strategy for developing exascale computing technologies in domains such as weather and climate or materials science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23429
Download:
Share:
 
Abstract:
The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Strea ...Read More
Abstract:

The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release.

  Back
 
Topics:
HPC and AI, Programming Languages, Tools & Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23434
Download:
Share:
 
Abstract:
TensorRT를 활용하여 DL Inference를 최적화하는 방안에 대해 소개합니다 ...Read More
Abstract:
TensorRT를 활용하여 DL Inference를 최적화하는 방안에 대해 소개합니다. 본 세션에서는 TensorRT를 실제로 적용해가는 과정을 통해서 최적화 과정에서 성능 및 Inference 환경에 대하여 고려해야하는 내용들을 이해하실 수 있습니다. 특히 TensorRT의 개발 언어(C++/Python), FP16/INT8등 Low Precision 지원 문제, RNN에 대한 내용 등 적용과정에서 고려되는 내용에 대한 팁들이 제공될 것입니다.  Back
 
Topics:
HPC and AI, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8118
Streaming:
Download:
Share:
 
Abstract:
This talk was designed to illustrate how having CUDA knowledge can help DL developers understand and tune their deep learning applications. We explain how to implement Tensorflow custom operations to utilize GPU more efficiently in running DL workloads, esp. BERT Inference for SQuAD. We also deliver the key insights on why the techniques introduced here can achieve better performance by discerning the profiling result. ...Read More
Abstract:
This talk was designed to illustrate how having CUDA knowledge can help DL developers understand and tune their deep learning applications. We explain how to implement Tensorflow custom operations to utilize GPU more efficiently in running DL workloads, esp. BERT Inference for SQuAD. We also deliver the key insights on why the techniques introduced here can achieve better performance by discerning the profiling result.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2019
Session ID:
SKR9108
Download:
Share:
 
Abstract:
DPU(Data Processing Unit)作为下一代数据中心和 5G 边缘的核心支柱, NVIDIA 采用 Arm 架构的 SoC 设计,具有硬件加速、软件可编程和安全可信等特性,并提供统一标准的 DOCA SDK,实现数据中心网络、存储和安全业务的卸载加速。 ...Read More
Abstract:
DPU(Data Processing Unit)作为下一代数据中心和 5G 边缘的核心支柱, NVIDIA 采用 Arm 架构的 SoC 设计,具有硬件加速、软件可编程和安全可信等特性,并提供统一标准的 DOCA SDK,实现数据中心网络、存储和安全业务的卸载加速。  Back
 
Topics:
HPC and AI, Accelerated Data Science
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20020
Download:
Share:
 
Abstract:
Ampere Computing 今年推出 80 核基于 Arm 架构的云原生处理器,具备领先的性能和极高的扩展性,借助NVIDIA GPU 系列产品,为以安卓云游戏为代表的云计算多种应用提供全新的 Arm 原生平台。 ...Read More
Abstract:
Ampere Computing 今年推出 80 核基于 Arm 架构的云原生处理器,具备领先的性能和极高的扩展性,借助NVIDIA GPU 系列产品,为以安卓云游戏为代表的云计算多种应用提供全新的 Arm 原生平台。  Back
 
Topics:
HPC and AI, Accelerated Data Science
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20026
Download:
Share:
 
Abstract:
随着时代的发展,算力在关键领域起了决定性作用,长城致力于打造 Arm + NVIDIA 的完整芯算力生态,通过本次演讲分享下长城 Arm 架构服务器相关产品及重点行业解决方案,同时在 NVIDIA 助力下长城服务器在智慧城市上相关应用的部署情况,最后分享下长城 +NVIDIA 在生态应用方面的情况。 ...Read More
Abstract:
随着时代的发展,算力在关键领域起了决定性作用,长城致力于打造 Arm + NVIDIA 的完整芯算力生态,通过本次演讲分享下长城 Arm 架构服务器相关产品及重点行业解决方案,同时在 NVIDIA 助力下长城服务器在智慧城市上相关应用的部署情况,最后分享下长城 +NVIDIA 在生态应用方面的情况。  Back
 
Topics:
HPC and AI, Accelerated Data Science
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20033
Download:
Share:
 
Abstract:
本演讲中将会介绍 NVIDIA 的 Android Cloud Gaming 相关工作。
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20038
Download:
Share:
 
Abstract:
宝德自主研发产品及生态构建分享,重点介绍基于飞腾的服务器系列产品。
 
Topics:
HPC and AI, Accelerated Data Science
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20048
Download:
Share:
 
Abstract:
随着中国新基建的全面展开,通专结合的多样化算力已成为支撑新基建的“基石”。飞腾生态与 NVIDIA 生态的融合必将带来无限的想象空间。技术交流将为大家呈现一个崭新和变革的飞腾,介绍飞腾 CPU 及生态伙伴整机产品与 NVIDIA 全系列产品包括 Geforce 系列显卡、 Tesla 系列 GPU 、以太网、 IB 卡等产品的适配进展,阐述生态融合下在联合实验室建设、 5G MEC 、数据中心、能源、交通、金融、教育、医疗、工业制造等领域的应用前景和解决方案。 ...Read More
Abstract:
随着中国新基建的全面展开,通专结合的多样化算力已成为支撑新基建的“基石”。飞腾生态与 NVIDIA 生态的融合必将带来无限的想象空间。技术交流将为大家呈现一个崭新和变革的飞腾,介绍飞腾 CPU 及生态伙伴整机产品与 NVIDIA 全系列产品包括 Geforce 系列显卡、 Tesla 系列 GPU 、以太网、 IB 卡等产品的适配进展,阐述生态融合下在联合实验室建设、 5G MEC 、数据中心、能源、交通、金融、教育、医疗、工业制造等领域的应用前景和解决方案。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20057
Download:
Share:
 
Abstract:
人工智能(AI)与高性能计算(HPC)已经呈现出融合的趋势,国内外的顶级超算均增加了 AI 的计算能力。 LINPACK 等传统超算领域的基准测试(benchmark)和目前的 AI 性能基准测试(MLPerf 等)均无法很好的满足对目前和未来智能超算的 AI 算力的评测要求。我们提出了一种基于自动化机器学习(AutoML)的端到端测试基准,并采用类似超算领域运算量的衡量指标(FLOPS)。该基准测试的自动化扩展性和稳定性已经在不同的异构平台上得到验证,并于第二届中国超级算力大会上推出了人工智能算力排行榜 AIPerf500,鹏城实验室的“鹏城云脑 II”获得榜首。此外,鹏城实验室还有一套 Arm 服务器集群(开发者云)并在积极构建“ Arm+GPU”生态,目前已经实现了虚拟移动操作系统云平台(支持云游戏和移动办公等业务)和一些科学计算领域的进展。 ...Read More
Abstract:
人工智能(AI)与高性能计算(HPC)已经呈现出融合的趋势,国内外的顶级超算均增加了 AI 的计算能力。 LINPACK 等传统超算领域的基准测试(benchmark)和目前的 AI 性能基准测试(MLPerf 等)均无法很好的满足对目前和未来智能超算的 AI 算力的评测要求。我们提出了一种基于自动化机器学习(AutoML)的端到端测试基准,并采用类似超算领域运算量的衡量指标(FLOPS)。该基准测试的自动化扩展性和稳定性已经在不同的异构平台上得到验证,并于第二届中国超级算力大会上推出了人工智能算力排行榜 AIPerf500,鹏城实验室的“鹏城云脑 II”获得榜首。此外,鹏城实验室还有一套 Arm 服务器集群(开发者云)并在积极构建“ Arm+GPU”生态,目前已经实现了虚拟移动操作系统云平台(支持云游戏和移动办公等业务)和一些科学计算领域的进展。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20091
Download:
Share:
 
Abstract:
本演讲中将会介绍 NVIDIA 在 Arm 架构上 CUDA 支持的进展和最新工作。
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20096
Download:
Share:
 
Abstract:
在油气行业,面临数据处理的严重挑战,传统的计算方法,已经无法支持快速整张的数据处理需求。帕拉代姆的基于 GPU 的新版地震处理方案,通过对 V100 的优化,Paradigm 基于 GPU 的解决方案,全方位角成像偏移运算,效率提高 3-5 倍,大幅度减少了数据处理的时间,加速找到油气资源。 ...Read More
Abstract:
在油气行业,面临数据处理的严重挑战,传统的计算方法,已经无法支持快速整张的数据处理需求。帕拉代姆的基于 GPU 的新版地震处理方案,通过对 V100 的优化,Paradigm 基于 GPU 的解决方案,全方位角成像偏移运算,效率提高 3-5 倍,大幅度减少了数据处理的时间,加速找到油气资源。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20218
Download:
Share:
 
Abstract:
11 月 17 日,在 2020 全球超算大会上,NVIDIA 发布 NVIDIA A100 80GB GPU 。全新 A100 采用 HBM2e 技术,可将 A100 40GB GPU 的高带宽内存增加一倍至 80GB,提供每秒超过 2TB 的内存带宽;第三代 NVLink 和 NVSwitch,相较于上一代互联技术,可使 GPU 之间的带宽增加至原来的两倍,将数据密集型工作负载的 GPU 数据传输速度提高至每秒 600gigabytes 。使得数据可以快速传输到全球最快的数据中心 GPU A100 上,让研究人员能够更快速的加速其应用,处理最大规模模型和数据集。同时,高带宽内存容量的增加,也对高性能计算应用提供了极大的助力,使得 GPU 在诸如分子动力学,高能物理,电镜等应用上运行更大的规模,并进一步提升性能。本次分享主要围绕三个方面介绍 NVIDIA A100 80GB GPU : 1. 计算趋势及挑战。 2.NVIDIA A100 80GB GPU 介绍。 3.NVIDIA 端到端平台介绍。通过三部分的介绍,希望大家可以对 NVIDIA A100 80GB GPU 在高性能计算以及 AI 计算有进一步的了解。 ...Read More
Abstract:
11 月 17 日,在 2020 全球超算大会上,NVIDIA 发布 NVIDIA A100 80GB GPU 。全新 A100 采用 HBM2e 技术,可将 A100 40GB GPU 的高带宽内存增加一倍至 80GB,提供每秒超过 2TB 的内存带宽;第三代 NVLink 和 NVSwitch,相较于上一代互联技术,可使 GPU 之间的带宽增加至原来的两倍,将数据密集型工作负载的 GPU 数据传输速度提高至每秒 600gigabytes 。使得数据可以快速传输到全球最快的数据中心 GPU A100 上,让研究人员能够更快速的加速其应用,处理最大规模模型和数据集。同时,高带宽内存容量的增加,也对高性能计算应用提供了极大的助力,使得 GPU 在诸如分子动力学,高能物理,电镜等应用上运行更大的规模,并进一步提升性能。本次分享主要围绕三个方面介绍 NVIDIA A100 80GB GPU : 1. 计算趋势及挑战。 2.NVIDIA A100 80GB GPU 介绍。 3.NVIDIA 端到端平台介绍。通过三部分的介绍,希望大家可以对 NVIDIA A100 80GB GPU 在高性能计算以及 AI 计算有进一步的了解。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20256
Download:
Share:
 
Abstract:
本演讲将带您快速入门使用开发环境 BML CodeLab,零门槛开启机器学习开发之旅。您可以: 1)交互开发环境 CodeLab 的背景和特色,并快速入门安装和使用。 Pandas 和 Sklearn 面对大数据量的分析和训练,性能较差、处理数据量较小。 CodeLab 是更好用的 JupyterLab,可灵活部署到开发者本地单机、 IDC 机器、云上托管资源等。在性能上做了高度优化,新增众多企业级特征,并在单机资源受限时无缝扩展到云上集群。 2)高性能数据科学引擎的原理,用于提升分析建模速度。利用 GPU 和 CPU 众核并行加速及混合计算、超大数据处理、高效数据存储等技术,让数据科学开发,既保持单机的简单易用,又媲美分布式系统的处理能力。内置高性能引擎的 CodeLab,性能比开源产品提升近十倍。 3)内置的易用开发插件,用于提升开发效率。基于开源 Jupyterlab 扩展机制,CodeLab 集成众多功能丰富、简单易用的开发工具。如:轻量级机器学习应用开发小程序插件,通过简单 Python 代码,将分析训练成果发布成高性能应用; AI 工作流程插件,管理工作流编排和跟踪实验,提升迭代效率。 ...Read More
Abstract:
本演讲将带您快速入门使用开发环境 BML CodeLab,零门槛开启机器学习开发之旅。您可以: 1)交互开发环境 CodeLab 的背景和特色,并快速入门安装和使用。 Pandas 和 Sklearn 面对大数据量的分析和训练,性能较差、处理数据量较小。 CodeLab 是更好用的 JupyterLab,可灵活部署到开发者本地单机、 IDC 机器、云上托管资源等。在性能上做了高度优化,新增众多企业级特征,并在单机资源受限时无缝扩展到云上集群。 2)高性能数据科学引擎的原理,用于提升分析建模速度。利用 GPU 和 CPU 众核并行加速及混合计算、超大数据处理、高效数据存储等技术,让数据科学开发,既保持单机的简单易用,又媲美分布式系统的处理能力。内置高性能引擎的 CodeLab,性能比开源产品提升近十倍。 3)内置的易用开发插件,用于提升开发效率。基于开源 Jupyterlab 扩展机制,CodeLab 集成众多功能丰富、简单易用的开发工具。如:轻量级机器学习应用开发小程序插件,通过简单 Python 代码,将分析训练成果发布成高性能应用; AI 工作流程插件,管理工作流编排和跟踪实验,提升迭代效率。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20368
Download:
Share:
 
Abstract:
本演讲介绍百度大规模 AI 集群的网络设计。百度 AI 集群使用海量的 NVIDIA GPU 承载了大多数的百度分布式 AI 训练任务。在构建这样的大规模 AI 集群时,网络设计起到了至关重要的作用。本演讲从接入带宽、网络架构、 RDMA 、通信算法、任务调度等各个层面详解了高性能、高可用的大规模 AI 集群设计的主要考量。 ...Read More
Abstract:
本演讲介绍百度大规模 AI 集群的网络设计。百度 AI 集群使用海量的 NVIDIA GPU 承载了大多数的百度分布式 AI 训练任务。在构建这样的大规模 AI 集群时,网络设计起到了至关重要的作用。本演讲从接入带宽、网络架构、 RDMA 、通信算法、任务调度等各个层面详解了高性能、高可用的大规模 AI 集群设计的主要考量。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20395
Download:
Share:
 
Abstract:
向量搜索是深度学习领域中的重要组成部分。在利用深度学习技术开发业务应用时,向量搜索也是最基本的组成部分。然而向量搜索是一种计算密集型工作负载,对计算资源有较高的要求。通过结合NVIDIA GPU,可以大幅提升向量索引的速度,加快向量搜索的速度。在本演讲中,我们将介绍NVIDIA GPU 如何加速 Milvus 向量搜索引擎。 ...Read More
Abstract:
向量搜索是深度学习领域中的重要组成部分。在利用深度学习技术开发业务应用时,向量搜索也是最基本的组成部分。然而向量搜索是一种计算密集型工作负载,对计算资源有较高的要求。通过结合NVIDIA GPU,可以大幅提升向量索引的速度,加快向量搜索的速度。在本演讲中,我们将介绍NVIDIA GPU 如何加速 Milvus 向量搜索引擎。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20437
Download:
Share:
 
Abstract:
中山大学罗海彬教授研究团队发展了 GPU 加速的自由能微扰 — 绝对结合自由能计算新方法(GA-FEP),取得药物设计关键技术的新突破,在抗击新冠肺炎研究中,采用该方法获得 Mpro 抑制活性最优的药物双嘧达莫,多中心临床试验验证该药对新冠肺炎获得较好的临床治疗效果。该 GPU 加速方法实现自由能微扰(FEP)/ 药物设计方法的国产化,该 GA-FEP 方法首次一周内完成对老药数据库的高精度筛选,从而预测出 25 个对新冠肺炎主蛋白酶 Mpro 有较高亲合力的药物,进一步的体外活性验证发现了 15 个 Mpro 抑制剂,表现出较高的活性化合物命中率。特别值得一提的是,抑制活性最优的药物双嘧达莫对新冠肺炎获得较好的临床治疗效果,从而进一步验证 GA-FEP 方法的可靠性。该 GA-FEP 方法既可以明显提高药物 / 靶标亲合力的预测精度,还可以提高预测速度(传统 FEP 方法每个化合物的预测时间为 30-60 天,本方法缩短为 1 天以内,效率提高 30-60 倍),从而提高创新药物筛选的成功率并降低研发时间。该 GA-FEP 方法还可以应用于其他新药设计工作中,如骨架跃迁和全新药物设计等,以提高先导化合物的发现和优化效率。 ...Read More
Abstract:
中山大学罗海彬教授研究团队发展了 GPU 加速的自由能微扰 — 绝对结合自由能计算新方法(GA-FEP),取得药物设计关键技术的新突破,在抗击新冠肺炎研究中,采用该方法获得 Mpro 抑制活性最优的药物双嘧达莫,多中心临床试验验证该药对新冠肺炎获得较好的临床治疗效果。该 GPU 加速方法实现自由能微扰(FEP)/ 药物设计方法的国产化,该 GA-FEP 方法首次一周内完成对老药数据库的高精度筛选,从而预测出 25 个对新冠肺炎主蛋白酶 Mpro 有较高亲合力的药物,进一步的体外活性验证发现了 15 个 Mpro 抑制剂,表现出较高的活性化合物命中率。特别值得一提的是,抑制活性最优的药物双嘧达莫对新冠肺炎获得较好的临床治疗效果,从而进一步验证 GA-FEP 方法的可靠性。该 GA-FEP 方法既可以明显提高药物 / 靶标亲合力的预测精度,还可以提高预测速度(传统 FEP 方法每个化合物的预测时间为 30-60 天,本方法缩短为 1 天以内,效率提高 30-60 倍),从而提高创新药物筛选的成功率并降低研发时间。该 GA-FEP 方法还可以应用于其他新药设计工作中,如骨架跃迁和全新药物设计等,以提高先导化合物的发现和优化效率。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC China
Year:
2020
Session ID:
CNS20563
Download:
Share: