SEARCH SESSIONS

Search All
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Topic(s) Filter: HPC and AI
Presentation
Media
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resou ...Read More
Abstract:
Demand for GPUs in High Performance Computing is only growing, and it is costly and difficult to keep pace in an entirely on-premise environment. We will hear from Schlumberger on why and how they are utilizing cloud-based GPU-enabled computing resources from Google Cloud to supply their users with the computing power they need, from exploration and modeling to visualization.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91040
Streaming:
Download:
Share:
 
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to ...Read More
Abstract:
We will characterize the performance of multi-GPU systems in an effort to determine their viability for running physics-based applications using Fast Fourier Transforms (FFTs). Additionally, we'll discuss how multi-GPU FFTs allow available memory to exceed the limits of a single GPU and how they can reduce computational time for larger problem sizes.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9158
Streaming:
Download:
Share:
 
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the ...Read More
Abstract:
Learn about the science of magnetically confined plasmas to develop the predictive capability needed for a sustainable fusion energy source. Gyrokinetic simulations are one of the most useful tools for understanding fusion science. We'll explain the CGYRO code, built by researchers at General Atomics to effectively and efficiently simulate plasma evolution over multiple scales that range from electrons to heavy ions. Fusion plasma simulations are compute- and memory-intensive and usually run on leadership-class, GPU-Accelerated HPC systems like Oak Ridge National Laboratory's Titan and Summit. We'll explain how we designed and implemented CGYRO to make good use of the tens of thousands of GPUs on such systems, which provide simulations that bring us closer to fusion as an abundant clean energy source. We'll also share benchmarking results of both CPU- and GPU-Based systems.  Back
 
Topics:
HPC and AI, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9202
Streaming:
Download:
Share:
 
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different ...Read More
Abstract:
Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9476
Streaming:
Download:
Share:
 
Abstract:
Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frame ...Read More
Abstract:

Learn about the current wave of advances in AI and HPC technologies to improve performance of DNN training on NVIDIA GPUs. We'll discuss exciting opportunities for HPC and AI researchers and give an overview of interesting trends in DL frameworks from an architectural/performance standpoint. Several modern DL frameworks offer ease of use and flexibility to describe, train, and deploy various types of DNN architectures. These typically use a single GPU to accelerate DNN training and inference. We're exploring approaches to parallelize training. We'll highlight challenges for message passing interface runtimes to efficiently support DNN training and discuss how efficient communication primitives in MVAPICH2 can support scalable DNN training. We'll also talk about how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9501
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, port ...Read More
Abstract:
We'll discuss how code teams at Lawrence Livermore National Laboratory (LLNL) are porting our production applications to Sierra, LLNL's flagship NVIDIA GPU-Based supercomputer. In general, our codes rely on a three-stage process of investment, porting, and performance tuning to achieve performance on NVIDIA Tesla V100 GPUs, while maintaining portability to our other supported platforms. We'll explain why this process poses many challenges and how LLNL code teams have worked with the Sierra Center of Excellence to build experience and expertise in porting complex multi-physics simulation tools to NVIDIA GPU-Based HPC systems. We'll also provide an overview of this porting process, the abstraction technologies employed, lessons learned, and current challenges.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9512
Streaming:
Download:
Share:
 
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We ...Read More
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.  Back
 
Topics:
HPC and AI, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share:
 
Abstract:
We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's ...Read More
Abstract:

We'll describe how Lawrence Livermore National Laboratory (LLNL) prepared a large existing application base for our recently deployed Sierra supercomputer, which is designed to harness over 17,000 NVIDIA V100 GPUs to tackle the nation's most challenging science and national security problems. We will discuss how this multi-year effort paid off with exciting possibilities for new science. We'll also outline using GPUs in our traditional HPC platforms and workflows is adding an exciting new dimension to simulation-based science, prompting LLNL to rethink how we perform future simulations as intelligent simulation. We'll give an overview of the application preparation process that led up to Sierra's deployment, as well as look at current and planned research aimed at riding the AI and machine learning waves in pursuit of game-changing science.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9560
Streaming:
Download:
Share:
 
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those r ...Read More
Abstract:
We'll introduce the fundamental concepts behind NVIDIA GPUDirect and explain how GPUDirect technologies are leveraged to scale out performance. GPUDirect technologies can provide even faster results for compute-intensive workloads, including those running on a new breed of dense, GPU-Accelerated servers such as the Summit and Sierra supercomputers and the NVIDIA DGX line of servers.  Back
 
Topics:
HPC and AI, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9653
Streaming:
Download:
Share:
 
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thous ...Read More
Abstract:
We'll present the latest developments in the NCCL library, which provides optimized inter-GPU communication primitives to make distributed computing easy and universal. Since 2015, NCCL has enabled deep learning and HPC applcations to scale to thousands of GPUs. We'll also discuss the state of integration of NCCL in deep learning frameworks.  Back
 
Topics:
HPC and AI, Deep Learning & AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9656
Streaming:
Share:
 
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate com ...Read More
Abstract:
We'll discuss NVSHMEM, a PGAS library that implements the OpenSHMEM specification for communication across NVIDIA GPUs connected by different types of interconnects that include PCI-E, NVLink and Infiniband. NVSHMEM makes it possible to initiate communication from within a CUDA kernel. As a result, CUDA kernel boundaries are not forced on an application due to its communication requirements. Less synchronization on the CPU helps strong scaling efficiency. Ability to initiate fine-grained communication from inside the CUDA kernel helps achieve better overlap of communication with computation. QUDA is a popular GPU-Enabled QCD library used by several popular packages like Chroma and MILC. NVSHMEM enables better strong scaling in QUDA. NVSHMEM not only benefits latency-bound applications like QUDA, but can also help improve performance and reduce complexity of codes like FFT that are bandwidth bound and codes like Breadth First Search that have a dynamic communication pattern.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9677
Streaming:
Download:
Share:
 
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density func ...Read More
Abstract:
We'll discuss our work using neural networks to fit the interatomic potential function and describe how we tested the network's potential function in atomic simulation software. This method has lower computational cost than traditional density functional theory methods. We'll show how our work is applicable to different atom types and architectures and how it avoids relying on the physical model. Instead, it uses a purely mathematical representation, which reduces the need for human intervention.  Back
 
Topics:
HPC and AI, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9843
Streaming:
Download:
Share:
 
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the r ...Read More
Abstract:
The conventional trial-and-error development approach to materials science is time-consuming and expensive. More efficient in silico techniques based on simulations or machine learning have emerged during the past two decades. We'll talk about the recent trend and solutions for accelerating materials discovery and discuss future prospects.  Back
 
Topics:
HPC and AI, Industrial Inspection
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9967
Streaming:
Download:
Share:
 
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it wil ...Read More
Abstract:
SK hynix began developing HBM(High Bandwidth Memory) technology in 2011 when it became evident that memory density and bandwidth scaling is critical for next generation architectures. HBM is currently widely adopted in various applications and it will lead the future memory trend owing to the growth in AI, ML, and HPC applications. We will discuss a technical overview of HBM technology and the future trends of HBM.  Back
 
Topics:
HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9978
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time whe ...Read More
Abstract:
We'll talk about how the huge computing advances made in AI by the deep learning revolution of the last five years have pushed legacy hardware to its limits with the CPU, which must now run workloads it was not tailored for. This comes at a time when Moore's Law is tapering off, and the slowdown in the speed of increase single-threaded performance, thus requiring a new compute paradigm, accelerated computing, powered by massively parallel GPUs.  Back
 
Topics:
HPC and AI, AI & Deep Learning Research
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9981
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the ...Read More
Abstract:
We'll discuss the challenges uncovered in AI and deep learning workloads, discuss the most efficient approaches to handling data, and examine use cases in autonomous vehicles, retail, health care, finance, and other markets. Our talk will cover the complete requirements of the data life cycle including initial acquisition, processing, inference, long-term storage, and driving data back into the field to sustain ever-growing processes of improvement. As the data landscape evolves with emerging requirements, the relationship between compute and data is undergoing a fundamental transition. We will provide examples of data life cycles in production triggering diverse architectures from turnkey reference systems with DGX and DDN A3I to tailor-made solutions.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9983
Streaming:
Share:
 
Abstract:
Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as ...Read More
Abstract:

Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1807
Download:
Share:
 
Abstract:
For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inacc ...Read More
Abstract:

For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1810
Download:
Share:
 
Abstract:
The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed a ...Read More
Abstract:

The talk will focus on the latest developments in MVAPICH2-GDR MPI library that helps HPC and Deep Learning applications to exploit maximum performance and scalability on GPU clusters. Multiple designs focusing on GPUDirect RDMA (GDR), Managed and Unified memory support, datatype processing, and support for OpenPOWER and NVLink will be highlighted for HPC applications. We will also present novel designs and enhancements to the MPI library to boost performance and scalability of Deep Learning frameworks on GPU clusters. Container-based solutions for GPU-based cloud environment will also be highlighted.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1812
Download:
Share:
 
Abstract:
AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the ...Read More
Abstract:

AI methods and tools are starting to be applied to HPC applications by a growing number of brave researchers in diverse scientific fields. This talk will describe an emergent workflow that uses traditional HPC numeric simulations to generate the labeled data sets required to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore's Law world considered.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1814
Download:
Share:
 
Abstract:
The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs o ...Read More
Abstract:

The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we will work with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1816
Download:
Share:
 
Abstract:
Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric mod ...Read More
Abstract:

Rapid progress in atmospheric science has been fueled in part over the years by faster computers. However, progress has slowed over the last decade due to three factors: the plateauing of core speeds, the increasing complexity of atmospheric models, and the mushrooming of data volumes. Our team at the National Center for Atmospheric Research is pursuing a hybrid approach to surmounting these barriers that combines machine learning techniques and GPU-acceleration to produce, we hope, a new generation of ultra-fast models of enhanced fidelity with nature and increased value to society.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1818
Download:
Share:
 
Abstract:
The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, suc ...Read More
Abstract:

The recent success of deep learning has been driven by the ability to combine significant GPU resources with extremely large labeled datasets. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, such as a specific astronomical event or scientific experiment. By combining vast amounts of labeled surrogate data with advanced few-shot learning, we have demonstrated success in leveraging small data in deep learning. In this talk, we will discuss these exciting results and explore the scientific innovations that made this possible.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1822
Download:
Share:
 
Abstract:
Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with tra ...Read More
Abstract:

Pacific Northwest National Laboratory's scientific mission spans energy, molecular science to national security. Under the Deep Learning for Scientific Discovery Initiative, PNNL has invested in integrating advanced machine learning with traditional scientific methods to push the state-of-the-art in many disciplines. We will provide an overview of some of the thirty projects we have stewarded, demonstrating how we have leveraged computing and analytics in fields as diverse as ultrasensitive detection to metabolomics to atmospheric science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1823
Download:
Share:
 
Abstract:
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NIVIDA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism ...Read More
Abstract:
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NIVIDA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. 
  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1833
Download:
Share:
 
Abstract:
We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML frame ...Read More
Abstract:

We have developed a HPC ML training algorithm that can reduce training time on PBs of data from days and weeks to minutes. Using the same research, we can now conduct inferencing on completely encrypted data. We have built a distributed ML framework on commodity Azure VMs that scales to tens of terabytes and thousands of cores, while achieving better accuracy than state-of-the-art. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1842
Download:
Share:
 
Abstract:
PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 insti ...Read More
Abstract:

PSC's "Bridges" was the first system to successfully converge HPC, AI, and Big Data. Designed for the U.S. national research community and supported by NSF, it now serves approximately 1600 projects and 7500 users at over 350 institutions. Bridges emphasizes "nontraditional" uses that span the life, physical, and social sciences, engineering, and business, many of which are based on AI or AI-enabled simulation. We describe the characteristics of Bridges that have made it a success, and we highlight several inspirational results and how they benefited from the system architecture. We then introduce "Bridges AI", a powerful new addition for balanced AI capability and capacity that includes NVIDIA's DGX-2 and HPE NVLink-connected 8-way Volta servers. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1832
Download:
Share:
 
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data ce ...Read More
Abstract:
Containers simplify application deployments in the data centers by wrapping applications into an isolated virtual environment. By including all application dependencies like binaries and libraries, application containers run seamlessly in any data center environment. The HPC application containers available on NVIDIA GPU Cloud (NGC) dramatically improve ease of application deployment while delivering optimized performance. However, if the desired application is not available on NGC, building HPC containers from scratch trades one set of challenges for another. Parts of the software environment typically provided by the HPC data center must be redeployed inside the container. For those used to just loading the relevant environment modules, installing a compiler, MPI library, CUDA, and other core HPC components from scratch may be daunting. HPC Container Maker (HPCCM) [1] is an open-source project that addresses the challenges of creating HPC application containers. HPCCM encapsulates into modular building blocks the best practices of deploying core HPC components with container best practices, to reduce container development effort, minimize image size, and take advantage of image layering. HPCCM makes it easier to create HPC application containers by separating the choice of what should go into a container image from the specification details of how to configure, build, and install a component. This separation also enables the best practices of HPC component deployment to transparently evolve over time.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80022
Download:
Share:
 
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。 ...Read More
Abstract:
企業在面臨 AI 新時代下,擁有絕佳的機會來實踐創新與領導地位。但在現實環境中人工智慧的實現,往往因為擴展基礎設施的複雜性而停滯不前。在本次的議程中,我們將分享 NVIDIA DGX 系統的新拓展功能,搭載 Pure Storage FlashBlade 快閃儲存,如何在短短數小時內洞察先機,實現企業級規模的人工智慧。  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8029
Download:
Share:
 
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the s ...Read More
Abstract:
We'll highlight IBM POWER9 system and NVIDIA Volta GPU characteristics such as compute, memory, and NVLink capabilities. We'll also take the audience through HPC application performance observations and tuning. IBM POWER9 with NVIDIA Volta is the state-of-the-art system designed for HPC and cognitive computing. This system also introduces NVLink 2.0 high-speed connectivity between CPU and GPU, along with coherent device memory. System characteristics such as CPU and GPU compute and memory throughput, NVLink latency, and bandwidth play key roles in application performance. We'll demonstrate how each of these influences application performance through a case study.  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8309
Streaming:
Download:
Share:
 
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain sci ...Read More
Abstract:
The Center for Accelerated Application Readiness within the Oak Ridge Leadership Computing Facility is a program to prepare scientific applications for next generation supercomputer architectures. Currently the program consists of thirteen domain science application development projects focusing on preparing codes for efficient use on Summit. Over the last three years, these teams have developed and executed a development plan based on detailed information about Summit's architecture and system software stack. This presentation will highlight the progress made by the teams that have used Titan, the 27 PF Cray XK7 with NVIDIA K20X GPUs, SummitDev, an early IBM Power8+ access system with NVIDIA P100 GPUs, and since very recently Summit, OLCF's new IBM Power9 system with NVIDIA V100 GPUs. The program covers a wide range of domain sciences, with applications including ACME, DIRAC, FLASH, GTC, HACC, LSDALTON, NAMD, NUCCOR, NWCHEM, QMCPACK, RAPTOR, SPECFEM, and XGC  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8908
Streaming:
Download:
Share:
 
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use t ...Read More
Abstract:
Python language is a programming language with an increasing adoption by development community due to its fast learning curve, flexibility and ease to use and integrate with other technologies. Due to its level of abstraction, it is possible to use the same Python code on different platforms like X86, RISC and ARM. The python development community is growing fast and many community members are interested on how to start moving to GPU accelerated programming but don't know how to start and what is needed. We'll go through the steps and adoption path to start developing python solutions taking advantage of GPU acceleration, including some details, advantages and challenges for the strongest and more popular python3 modules to be used with GPUs: scikit-cuda, PyCUDA, Numba, cudamat and cupy. Some code samples and programs execution statistics will be shown as a performance analysis exercising as well.  Back
 
Topics:
HPC and AI, Tools & Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8214
Streaming:
Download:
Share:
 
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based ...Read More
Abstract:
Overview of numerous GPU hardware platforms designed for today's taxing AI/machine learning and HPC workloads, including custom solutions targeted for Deep Learning Inferencing and Deep Learning Training. Talk will cover systems based on PCIe based GPUs as well as GPU systems with the NVLink interface.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8999
Streaming:
Share:
 
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal example ...Read More
Abstract:
Multiphysics and multiscale simulations are found in a variety of computational science subfields, but their disparate computational characteristics can make GPU implementations complex and often difficult. Simulations of supernovae are ideal examples of this complexity. We use the scalable FLASH code to model these astrophysical cataclysms, incorporating hydrodynamics, thermonuclear kinetics, and self-­?gravity across considerable spans in space and time. Using OpenACC and GPU-­?enabled libraries coupled to new NVIDIA GPU hardware capabilities, we have improved the physical fidelity of these simulations by increasing the number of evolved nuclear species by more than an order-­?of-­? magnitude. I will discuss these and other performance improvements to the FLASH code on the Summit supercomputer at Oak Ridge National Laboratory.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8926
Streaming:
Share:
 
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and ...Read More
Abstract:
OpenMP has a long history on shared memory, CPU-based machines, but has recently begun to support offloading to GPUs and other parallel accelerators. This talk will discuss the current state of compilers for OpenMP on NVIDIA GPUs, showing results and best practices from real applications. Developers interested in writing OpenMP codes for GPUs will learn how best to achieve good performance and portability.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8344
Streaming:
Share:
 
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific an ...Read More
Abstract:
Computer simulations offer great insight into complex, dynamical systems but can be difficult to navigate through a large set of control/design parameters. Deep learning methods, applied on fast GPUs, can provide an ideal way to improve scientific and engineering workflows. In this talk, Vic will discuss an application of machine learning to develop a fast-running surrogate model that captures the dynamics of an industrial multiphase fluid flow. He will also discuss an improved population search method that can help the analyst explore a high-dimensional parameter space to optimize production while reducing the model uncertainty.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8828
Streaming:
Download:
Share:
 
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distribu ...Read More
Abstract:
We present Unicorn, a novel parallel programming model for GPU clusters. It shows that distributed shared memory systems can be efficient with the help of transactional semantics and deferred data synchronizations, and thus the simplicity of distributed shared memory systems can be carried over to CPUs and GPUs in a cluster. Unicorn is designed for easy programmability and provides a deterministic execution environment. Device, node and cluster management are completely handled by the runtime and no related API is provided to the application programmer. Load balancing, scheduling and scalability are also fully transparent to the application code. Programs written on one cluster can be run verbatim on a different cluster. Application code is agnostic to data placement within the cluster as well as the changes in network interfaces and data availability pattern. Unicorn''s programming model, being deterministic, by design eliminates several data races and deadlocks. Unicorn''s runtime employs several data optimizations including prefetching and subtask streaming in order to overlap communication and computation. Unicorn employs pipelining at two levels first to hide data transfer costs among cluster nodes and second to hide transfer latency between CPUs and GPUs on all nodes. Among other optimizations, Unicorn''s work-stealing based scheduler employs a two-level victim selection technique to reduce the overhead of steal operations. Further, it employs special proactive and aggressive stealing mechanism to prevent the said pipelines from stalling (during a steal operation). We will showcase the scalability and performance of Unicorn on several scientific workloads. We will also demonstrate the load balancing achieved in some of these experiments and the amount of time the runtime spends in communications. We find that parallelization of coarse-grained applications like matrix multiplication or 2D FFT using our system requires only about 30 lines of C code to set up the runtime. The rest of the application code is regular single CPU/GPU implementation. This indicates the ease of extending sequential code to a parallel environment. We will be showing the efficiency of our abstraction with minimal loss on performance on latest GPU architecture like Pascal and Volta. Also we will be comparing our approach to other similar implementations like StarPU-MPI and G-Charm.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8565
Streaming:
Download:
Share:
 
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. Th ...Read More
Abstract:
In this session we present MPI collective algorithms optimized for distributed deep learning frameworks. The performance of large message MPI collectives such as broadcast, allreduce, reduce etc. are critical to the performance of these workloads. There is a need for a novel approach towards the design of large scale collective communication algorithms for CUDA aware MPI runtimes. The session will deep dive into our implementation of these collectives and its performance advantages on IBM POWER 9 Systems with NVIDIA V100 GPUs for OSU benchmark and Distributed TensorFlow.  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8306
Streaming:
Download:
Share:
 
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer pr ...Read More
Abstract:
Simulation and analysis of flow and combustion processes in propulsion and power systems presents many new and interesting challenges. A multitude of strongly coupled fluid dynamic, thermodynamic, transport, chemical, multiphase, and heat transfer processes are intrinsically coupled and must be considered simultaneously in complex domains associated with devices such as gas-turbine and rocket engines. The problem is compounded by the effects of turbulence and high-pressure phenomena, which require treatment of nonideal fluid mixtures at supercritical conditions. The combination of complex multicomponent property evaluations along with the computational grid resolution requirements makes these simulations expensive and cumbersome. Recent advances in high performance computing (HPC) systems, such as the graphics processing unit (GPU) based architectures, provides an opportunity for significant advances in dealing with these complexities while reducing the time to solution.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8910
Streaming:
Share:
 
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale ...Read More
Abstract:
We propose a new framework for parallelizing deep neural network training that maximizes the amount of data that is ingested by the training algorithm. Our proposed framework called Livermore Tournament Fast Batch Learning (LTFB) targets large-scale data problems. The LTFB approach creates a set of Deep Neural Network (DNN) models and trains each instance of these models independently and in parallel. Periodically, each model selects another model to pair with, exchanges models, and then run a local tournament against held-out tournament datasets. The winning model continues training on the local training datasets. This new approach maximizes computation and minimizes amount of synchronization required in training deep neural network, a major bottleneck in existing synchronous deep learning algorithms. We evaluate our proposed algorithm on two HPC machines at Lawrence Livermore National Laboratory including an early access IBM Power8+ with NVIDIA Tesla P100 GPUs machine. Experimental evaluations of the LTFB framework on two popular image classification benchmark: CIFAR10 and ImageNet, show significant speed up compared to the sequential baseline.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8829
Streaming:
Share:
 
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter ...Read More
Abstract:
We will discuss challenges and lessons learned from deploying multiple large scale HPC and AI clusters in different industries. Lessons learned will focus on end-to-end aspects of designing and deploying large scale gpu clusters including datacenter and environmental challenges, network performance and optimization, data pipeline and storage challenges as well as workload orchestration and optimization. You will learn more about open architectures for HPC, AI and Deep Learning, combining flexible compute architectures, rack scale platforms, and software-defined networking and storage, to provide a scalable software-defined deep learning environment. We will discuss strategies, providing insight into everything from specialty compute for training vs. inference to high performance storage for data workflows to orchestration and workflow management tools. We will also discuss deploying deep learning environments from development to production at scale from private cloud to public cloud.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8972
Streaming:
Download:
Share:
 
Abstract:
Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamen ...Read More
Abstract:

Hear about the latest developments concerning the NVIDIA GPUDirect family of technologies, which are aimed at improving both the data and the control path among GPUs, in combination with third-party devices. We''ll introduce the fundamental concepts behind GPUDirect and present the latest developments, such as changes to the pre-existing APIs, the new APIs recently introduced. We''ll also discuss the expected performance in combination with the new computing platforms that emerged last year.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8474
Download:
Share:
 
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take ad ...Read More
Abstract:
Learn how the Julia programming language can be used for GPU programming, both for (1) low-level kernel programming, and (2) high-level array and AI libraries. This full-stack support drastically simplifies code bases, and GPU programmers can take advantage of all of Julia's most powerful features: generic programming, n-dimensional kernels, higher order functions and custom numeric types. We'll give overview the compiler's implementation and performance characteristics via the Rodinia benchmark suite. We'll show how these techniques enable highly flexible AI libraries with state-of-the-art performance, and allow a major government user to run highly computational threat modelling on terabytes of data in real time.  Back
 
Topics:
HPC and AI, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8269
Download:
Share:
 
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will ...Read More
Abstract:
Participants will take part in in-depth discussions revolving around the revolutionary HBM (High Bandwidth Memory) product, the distinguishing technical features and the role it plays in expanding the boundaries of the AI revolution. The session will also cover current technical and business challenges, and the future considerations for next-generation HBM line-up and many more.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8949
Streaming:
Download:
Share:
 
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles t ...Read More
Abstract:
Scientific simulations typically store just a small fraction of their computed timesteps--as few as one in 500--due to I/O and storage limitations. Previous work has demonstrated in situ software-based compression, but at the cost of compute cycles that simulation scientists are loath to sacrifice. We propose the use of the special-purpose video processing unit (VPU), currently unutilized in the HPC context, for inexpensive lossy encoding. Our work demonstrates that video encoding quality is suitable for volumes and recommends encoder settings. We'll show that data can be encoded in parallel with a hybrid CPU/GPU computation with minimal impact on run time. We'll also demonstrate that decoding is fast enough for on-the-fly decompression during analysis.  Back
 
Topics:
HPC and AI, Telecoms and Accelerated Computing, In-Situ & Scientific Visualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8561
Streaming:
Download:
Share:
 
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms ...Read More
Abstract:
There is a huge opportunity for businesses to use advanced AI methods to extract insights from their data faster. Imagine training your models in minutes or hours rather than days or weeks. Think how much more money can you make by getting algorithms to market faster, and getting the most productivity out of your researcher's. At this session, Greg Schmidt introduces the new HPE Apollo 6500 Gen10 System with NVLink for the enterprise. This innovative system design allows for a high degree of flexibility with a range of configuration and topology options to match your workloads. Learn how the Apollo 6500 unlocks business value from your data for AI.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8969
Streaming:
Share:
 
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file sys ...Read More
Abstract:
HPC centers have been traditionally configured for simulation workloads, but deep learning has been increasingly applied alongside simulation on scientific datasets. These frameworks do not always fit well with job schedulers, large parallel file systems, and MPI backends. We'll discuss examples of how deep learning workflows are being deployed on next-generation systems at the Oak Ridge Leadership Computing Facility. We'll share benchmarks between native compiled versus containers on Power systems, like Summit, as well as best practices for deploying learning and models on HPC resources on scientific workflows.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8551
Streaming:
Download:
Share:
 
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long lat ...Read More
Abstract:
Addressing the apparent Amdahl's fraction of synchronizing with the CPU for communication is critical for strong scaling of applications on GPU clusters. GPUs are designed to maximize throughput and have enough state and parallelism to hide long latencies to global memory. It's important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communications between GPUs. NVSHMEM provides a Partitioned Global Address Space (PGAS) that spans memory across GPUs and provides an API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. NVSHMEM also provides CPU-side API for GPU-GPU data movement that provides a progression for applications to move to NVSHMEM. CPU-side communication can be issued in stream order, similar to CUDA operations. It implements the OpenSHMEM programming model that is of great interest to government agencies and national labs. We'll give an overview of capabilities, API, and semantics of NVSHMEM. We'll use examples from a varied set of applications (HPGMG, Multi-GPU Transpose, Graph500, etc.) to demonstrate the use and benefits of NVSHMEM.  Back
 
Topics:
HPC and AI, Tools & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8595
Streaming:
Download:
Share:
 
Abstract:
This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA develo ...Read More
Abstract:

This session presents an overview of the hardware and software architecture of the DGX-2 platform. This talk will discuss the NVSwitch hardware that enables all 16 GPUs on the DGX-2 to achieve 24x the bandwidth of two DGX-1V systems. CUDA developers will learn ways to utilize the full GPU connectivity to quickly build complex applications and utilize the high bandwidth NVLINK connections to scale up performance.

  Back
 
Topics:
HPC and AI, Data Center & Cloud Infrastructure, AI & Deep Learning Business Track (High Level), HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8688
Streaming:
Download:
Share:
 
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-&sh ...Read More
Abstract:
XGC is a kinetic whole-­?volume modeling code with unique capabilities to study tokamak edge plasmas in real geometry and answer important questions about the design of ITER and other future fusion reactors. The main technique is the Particle-­?in-­?Cell method, which models the plasma as billions of quasiparticles representing ions and electrons. Ostensibly, the process of advancing each particle in time is embarrassingly parallel. However, the electric and magnetic fields must be known in order to push the particle, which requires an implicit gather operation from XGC's sophisticated unstructured mesh. In this session, we'll show how careful mapping of field and particle data structures to GPU memory allowed us to decouple the performance of the critical electron push routine from size of the simulation mesh and allowed the true particle parallelism to dominate. This improvement enables performant, high resolution, ITER scale simulations on Summit.  Back
 
Topics:
HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8909
Streaming:
Download:
Share:
 
Abstract:
Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming ...Read More
Abstract:

Do you need to compute larger or faster than a single GPU allows you to? Then come to this session and learn how to scale your application to multiple GPUs. In this session, you will learn how to use the different available multi GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

  Back
 
Topics:
HPC and AI, Programming Languages, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23031
Download:
Share:
 
Abstract:
Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the impor ...Read More
Abstract:

Murex has been an early adopters of GPU for pricing and risk management of complex financial options. GPU adoption has generated performance boost of its software while reducing its usage cost. Each new generation of GPU has also shown the importance of the necessary reshaping of the architecture of the software using its GPU accelerated analytics. Minsky featuring far better GPU memory bandwidth and GPU-CPU interconnect rase the bar even further. Murex will show how it has handled this new challenge for its business.

  Back
 
Topics:
HPC and AI, Performance Optimization
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23209
Download:
Share:
 
Abstract:
HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation ...Read More
Abstract:

HPE Deep Learning solutions empower innovation at any scale, building on our purpose-built HPC systems and technologies solutions, applications and support services. Deep Learning demands massive amounts of computational power. Those computation power usually involve heterogeneous computation resources, e.g., GPUs and InfiniBand as installed on HPE Apollo.   NovuMinds NovuForce system leveraging state of art technologies make the deployment and configuration procedure fast and smooth. NovuForce deep learning softwares within the docker image has been optimized for the latest technology like NVIDIA Pascal GPU and infiniband GPUDirect RDMA. This flexibility of the software, combined with the broad GPU servers in HPE portfolio, makes one of the most efficient and scalable solutions.

  Back
 
Topics:
HPC and AI, Performance Optimization, Tools & Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23250
Download:
Share:
 
Abstract:
Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it ...Read More
Abstract:

Discover how we designed and optimized a highly-scalable dense solver to solve Maxwell equations on our GPU-powered supercomputer. After describing our industrial application and its heavy computation requirements, we detail how we modernized it with programmability concerns in mind. We show how we solved the challenge of tightly combining tasks with MPI, and illustrate how this scaled up to 50000 CPU cores, reaching 1.38 Petaflops. A focus is then given on the integration of GPUs in this model, along with a few implementation tricks to ensure truly asynchronous programming. Finally, after briefly detailing how we added hierarchical compression techniques into our distributed solver over CPUs, we describe how we plan to unlock the challenges that yet prevented porting it on GPUs.

  Back
 
Topics:
HPC and AI, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23277
Download:
Share:
 
Abstract:
We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us sat ...Read More
Abstract:

We leverage NVIDIA GPUs for connected components labeling and image classification applied to Digital Rock Physics (DRP), to help characterize reservoir rocks and study their pore distributions. We show on this talk how NVIDIA GPUs helped us satisfy strict real-time restrictions dictated by the imaging hardware used to scan the rock samples. We present a detailed description of the workflow from a DRP approach perspectives, our algorithm and optimization techniques and performance results on the latest NVIDIA GPU generations.

  Back
 
Topics:
HPC and AI, HPC and Supercomputing, Video & Image Processing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23303
Download:
Share:
 
Abstract:
In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guida ...Read More
Abstract:

In order to prepare the scientific communities, GENCI and its partners have set up a technology watch group and lead collaborations with vendors, relying on HPC experts and early adopted HPC solutions. The two main objectives are providing guidance and prepare the scientific communities to challenges of exascale architectures. The talk will present the OpenPOWER platform bought by GENCI and provided to the scientific community. Then, it will present the first results obtained on the platform for a set of about 15 applications using all the solutions provided to the users (CUDA,OpenACC,OpenMP,...). Finally, a presentation about one specific application will be made regarding its porting effort and techniques used for GPUs with both OpenACC and OpenMP.

  Back
 
Topics:
HPC and AI, Performance Optimization, Programming Languages
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23183
Download:
Share:
 
Abstract:
Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes han ...Read More
Abstract:

Wireless-VR is widely defined as the key solution for maximum immersion. But why? Is it only the obvious reason of the omission of the heavy and inflexible cable? There is more behind it. Learn how the development of tracking technology goes hand in hand with the increasing demand of Wireless-VR Hardware solutions, what hardware is out on the market now, what is coming and how can wireless solutions - whether standalone devices or Addons - create a higher value for your VR application? How large-scale location based VR and hardware manufacturers are expanding the boundaries of the VR industry, both for Entertainment and B2B?

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23388
Download:
Share:
 
Abstract:
With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughpu ...Read More
Abstract:

With over 5000 GPU-accelerated nodes, Piz Daint has been Europes leading supercomputing systems since 2013, and is currently one of the most performant and energy efficient supercomputers on the planet. It has been designed to optimize throughput of multiple applications, covering all aspects of the workflow, including data analysis and visualisation. We will discuss ongoing efforts to further integrate these extreme-scale compute and data services with infrastructure services of the cloud. As Tier-0 systems of PRACE, Piz Daint is accessible to all scientists in Europe and worldwide. It provides a baseline for future development of exascale computing. We will present a strategy for developing exascale computing technologies in domains such as weather and climate or materials science.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23429
Download:
Share:
 
Abstract:
The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Strea ...Read More
Abstract:

The presentation will give an overview about the new NVIDIA Volta GPU architecture and the latest CUDA 9 release. The NVIDIA Volta architecture powers the worlds most advanced data center GPU for AI, HPC, and Graphics. Volta features a new Streaming Multiprocessor (SM) architecture and includes enhanced features like NVLINK2 and the Multi-Process Service (MPS) that delivers major improvements in performance, energy efficiency, and ease of programmability. New features like Independent Thread Scheduling and the Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance. CUDA is NVIDIA''s parallel computing platform and programming model. You''ll learn about new programming model enhancements and performance improvements in the latest CUDA9 release.

  Back
 
Topics:
HPC and AI, Programming Languages, Tools & Libraries
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23434
Download:
Share:
 
Abstract:
TensorRT를 활용하여 DL Inference를 최적화하는 방안에 대해 소개합니다 ...Read More
Abstract:
TensorRT를 활용하여 DL Inference를 최적화하는 방안에 대해 소개합니다. 본 세션에서는 TensorRT를 실제로 적용해가는 과정을 통해서 최적화 과정에서 성능 및 Inference 환경에 대하여 고려해야하는 내용들을 이해하실 수 있습니다. 특히 TensorRT의 개발 언어(C++/Python), FP16/INT8등 Low Precision 지원 문제, RNN에 대한 내용 등 적용과정에서 고려되는 내용에 대한 팁들이 제공될 것입니다.  Back
 
Topics:
HPC and AI, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8118
Streaming:
Download:
Share:
 
Abstract:
This talk was designed to illustrate how having CUDA knowledge can help DL developers understand and tune their deep learning applications. We explain how to implement Tensorflow custom operations to utilize GPU more efficiently in running DL workloads, esp. BERT Inference for SQuAD. We also deliver the key insights on why the techniques introduced here can achieve better performance by discerning the profiling result. ...Read More
Abstract:
This talk was designed to illustrate how having CUDA knowledge can help DL developers understand and tune their deep learning applications. We explain how to implement Tensorflow custom operations to utilize GPU more efficiently in running DL workloads, esp. BERT Inference for SQuAD. We also deliver the key insights on why the techniques introduced here can achieve better performance by discerning the profiling result.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2019
Session ID:
SKR9108
Download:
Share:
 
Abstract:
Learn how to design GPU-Based systems for different application scenarios. We'll explain how to design data centers for different scales, application scenarios, and standards for enterprises and hyperscalers. We'll cover AI training and inference a ...Read More
Abstract:
Learn how to design GPU-Based systems for different application scenarios. We'll explain how to design data centers for different scales, application scenarios, and standards for enterprises and hyperscalers. We'll cover AI training and inference applications and edge computing for OCP and ODCC standard data centers. We'll discuss the challenges involved and share our experience designing a GPU platform for data centers. We'll also explore problems attendees are facing and see how we can work together to solve them.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91013
Streaming:
Share:
 
Abstract:
Data centers today benefit from highly optimized hardware architectures and performance metrics that enable efficient provisioning and tuning of compute resources. But these architectures and metrics, honed over decades, are sternly challenged by the ...Read More
Abstract:
Data centers today benefit from highly optimized hardware architectures and performance metrics that enable efficient provisioning and tuning of compute resources. But these architectures and metrics, honed over decades, are sternly challenged by the rapid increase of AI applications and neural net workloads, where the impact of memory metrics like bandwidth, capacity, and latency on overall performance is not yet well understood. Get the perspectives of AI HW/SW co-design experts from Google, Microsoft, Facebook and Baidu, and technologists from NVIDIA and Samsung, as they evaluate the AI hardware challenges facing data centers and brainstorm current and necessary advances in architectures with particular emphasis on memory's impact on both training and inference.  Back
 
Topics:
Data Center & Cloud Infrastructure, Performance Optimization, Speech & Language Processing, HPC and AI, HPC and Supercomputing
Type:
Panel
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91018
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how we build a highly scalable deep learning training system and training ImageNet in four minutes. For dense GPU clusters we optimize the training system by proposing a mixed-precision training method that significantly improves train ...Read More
Abstract:
We'll discuss how we build a highly scalable deep learning training system and training ImageNet in four minutes. For dense GPU clusters we optimize the training system by proposing a mixed-precision training method that significantly improves training throughput of a single GPU without losing accuracy. We also propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on ImageNet dataset without losing accuracy. And we propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. Our training system can achieve 75.8% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7% top-1 test accuracy within 4 minutes using 1024 Tesla P40 GPUs,which also outperforms all other existing systems.  Back
 
Topics:
Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9146
Streaming:
Download:
Share:
 
Abstract:
Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding erro ...Read More
Abstract:

Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding errors. Although algorithms like binomial correction or Karatsuba can reduce rounding errors considerably, they require additional calculations. We'll detail performance of these algorithms based on the Warp Matrix Multiply Accumulate API.

  Back
 
Topics:
Algorithms & Numerical Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9176
Streaming:
Download:
Share:
 
Abstract:
Learn how to take an application from slow, serial execution to blazing fast GPU execution using OpenACC, a directives-based parallel programming language that works with C, C++, and Fortran. By the end of this session participants will know the basi ...Read More
Abstract:
Learn how to take an application from slow, serial execution to blazing fast GPU execution using OpenACC, a directives-based parallel programming language that works with C, C++, and Fortran. By the end of this session participants will know the basics of using OpenACC to write an accelerated application that runs on multicore CPUs and GPUs with minimal code changes. No prior GPU programming experience is required, but the ability to understand C, C++, or Fortran code is necessary.  Back
 
Topics:
Programming Languages, HPC and AI
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9262
Streaming:
Download:
Share:
 
Abstract:
As GPU computing nodes begin packing in an increasing number of GPUs, programming to maximize performance across GPUs in a system is becoming a challenge. We'll discuss techniques to extend your GPU applications from using one GPU to using many GPUs ...Read More
Abstract:
As GPU computing nodes begin packing in an increasing number of GPUs, programming to maximize performance across GPUs in a system is becoming a challenge. We'll discuss techniques to extend your GPU applications from using one GPU to using many GPUs. By the end of the session, you'll understand the relative trade-offs in each of these approaches and how to choose the best approach for your application. Some prior OpenACC or GPU computing experience is recommended for this talk.  Back
 
Topics:
Programming Languages, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9263
Streaming:
Download:
Share:
 
Abstract:
Learn how to scale distributed training of TensorFlow and PyTorch models with Horovod, a library designed to make distributed training fast and easy to use. Although frameworks like TensorFlow and PyTorch simplify the design and training of deep lear ...Read More
Abstract:
Learn how to scale distributed training of TensorFlow and PyTorch models with Horovod, a library designed to make distributed training fast and easy to use. Although frameworks like TensorFlow and PyTorch simplify the design and training of deep learning models, difficulties usually arise when scaling models to multiple GPUs in a server or multiple servers in a cluster. We'll explain the role of Horovod in taking a model designed on a single GPU and training it on a cluster of GPU servers.  Back
 
Topics:
Deep Learning & AI Frameworks, AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9321
Streaming:
Download:
Share:
 
Abstract:
Training intelligent agents with reinforcement learning is a notoriously unstable process. Although massive parallelization on GPUs and distributed systems can reduce instabilities, the success of training remains strongly influenced by the choice of ...Read More
Abstract:
Training intelligent agents with reinforcement learning is a notoriously unstable process. Although massive parallelization on GPUs and distributed systems can reduce instabilities, the success of training remains strongly influenced by the choice of hyperparameters. We'll describe a novel meta-optimization algorithm for distributed systems that solves a set of optimization problems in parallel while looking for the optimal hyperparameters. We'll also show how it applies to deep reinforcement learning. We'll demonstrate how the algorithm can fine-tune hyperparameters while learning to play different Atari games. Compared with existing approaches, our algorithm releases more computational resources during training by means of a stochastic scheduling procedure. Our algorithm has been implemented on top of MagLev, the NVIDIA AI training and inference infrastructure.  Back
 
Topics:
Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9414
Streaming:
Download:
Share:
 
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPUs with upstreamed improvements to community projects and with tools that are seeing b ...Read More
Abstract:
NVIDIA offers several containerized applications in HPC, visualization, and deep learning. We have also enabled a broad array of contain-related technologies for GPUs with upstreamed improvements to community projects and with tools that are seeing broad interest and adoption. In addition, NVIDIA is a catalyst for the broader community in enumerating key technical challenges for developers, admins and end users, and is helping to identify gaps and drive them to closure. Our talk describes NVIDIA's new developments and upcoming efforts. We'll detail progress in the most important technical areas, including multi-node containers, security, and scheduling frameworks. We'll also offer highlights of the breadth and depth of interactions across the HPC community that are making the latest, highly-quality HPC applications available to platforms that include GPUs.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9525
Streaming:
Download:
Share:
 
Abstract:
Loading and storing file content to remote disk in CUDA applications typically requires using traditional file and network abstractions provided by the operating system. We'll show how GPUDirect changes that, making it possible to expose GPU memory ...Read More
Abstract:
Loading and storing file content to remote disk in CUDA applications typically requires using traditional file and network abstractions provided by the operating system. We'll show how GPUDirect changes that, making it possible to expose GPU memory directly to third-party devices such as NVM Express (NVMe). We'll introduce our proof-of-concept software library for creating GPU-Oriented storage applications with GPUDirect-capable GPUs and commodity NVMe disks residing in multiple remote hosts. Learn how we use the memory-mapping capabilities of PCIe non-transparent bridges to set up efficient I/O data paths between GPUs and disks that are attached to different root complexes (hosts) in a PCIe network. We'll demonstrate how our solution can initiate remote disk I/O from within a CUDA kernel. We will also compare our approach to state-of-the-art NVMe over fabrics and share our results for running a distributed workload on multiple GPUs using a remote disk.  Back
 
Topics:
Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9563
Streaming:
Download:
Share:
 
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor ...Read More
Abstract:
We'll discuss cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor operations such as tensor contractions (a generalization of matrix-matrix multiplications), point-wise tensor operations such as tensor permutations, and tensor decompositions (a generalization of matrix decompositions). While providing high performance, cuTENSOR also allows users to express their mathematical equations for tensors in a straightforward way that hides the complexity of dealing with these high-dimensional objects behind an easy-to-use API. CUDA 10.1 enables CUDA programmers to utilize Tensor Cores directly with the new mma.sync instruction. In this presentation, we describe the functionality of mma.sync and present strategies for implementing efficient matrix multiply computations in CUDA that maximize performance on NVIDIA Volta GPUs. We then describe how CUTLASS 1.3 provides reusable components embodying these strategies. CUTLASS 1.3 demonstrates a median 44% speedup of CUDA kernels executing layers from real-world Deep Learning workloads.  Back
 
Topics:
Computational Biology & Chemistry, Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9593
Streaming:
Download:
Share:
 
Abstract:
Learn why deep learning scales so well and how to apply it to important open problems. Deep learning has enabled rapid progress in diverse problems in vision, speech, and beyond. Driving this progress are breakthroughs in algorithms that can harness ...Read More
Abstract:
Learn why deep learning scales so well and how to apply it to important open problems. Deep learning has enabled rapid progress in diverse problems in vision, speech, and beyond. Driving this progress are breakthroughs in algorithms that can harness massive datasets and powerful compute accelerators like GPUs. We'll combine theoretical and experiment insights to help explain why deep learning scales predictably with bigger datasets and faster computers. We'll also show how some problems are relatively easier than others and how to tell the difference. Learn about examples of open problems that cannot be solved by individual computers, but are within reach of the largest machines in the world. We'll also make the case for optimizing data centers to run AI workloads. Finally, we'll outline a high-level architecture for an AI datacenter, and leave you with powerful tools to reach beyond human accuracy to confront some of the hardest open problems in computing.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9643
Streaming:
Share:
 
Abstract:
Learn about the Kokkos C++ Performance Portability EcoSystem, a production-level solution for writing modern C++ applications in a hardware-agnostic way. The ecosystem is part of the U.S. Department of Energy's Exascale Project, a national effort to ...Read More
Abstract:
Learn about the Kokkos C++ Performance Portability EcoSystem, a production-level solution for writing modern C++ applications in a hardware-agnostic way. The ecosystem is part of the U.S. Department of Energy's Exascale Project, a national effort to prepare the HPC community for the next generation of supercomputing platforms. We'll give an overview of what the Kokkos EcoSystem provides, including its programming model, math kernels library, tools, and training resources. We'll provide success stories for Kokkos adoption in large production applications on the leading supercomputing platforms in the U.S. We'll focus particularly on early results from two of the world's most powerful supercomputers, Summit and Sierra, both powered by NVIDIA Tesla V100 GPUs. We will also describe how the Kokkos EcoSystem anticipates the next generation of architectures and share early experiences of using NVSHMEM incorporated into Kokkos.  Back
 
Topics:
Programming Languages, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9662
Streaming:
Download:
Share:
 
Abstract:
Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll ...Read More
Abstract:
Learn how high-resolution imaging is revolutionizing science and dramatically changing how we process, analyze, and visualize at this new scale. We will show the journey a researcher can take to produce images capable of winning a Nobel prize. We'll review the last two years of development in single-particle cryo-electron microscopy processing, with a focus on accelerated software, and discuss benchmarks and best practices for common software packages in this domain. Our talk will include videos and images of atomic resolution molecules and viruses that demonstrate our success in high-resolution imaging.  Back
 
Topics:
Computational Biology & Chemistry, In-Situ & Scientific Visualization, Data Center & Cloud Infrastructure, HPC and AI, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9664
Streaming:
Download:
Share:
 
Abstract:
We'll talk about the UCX unified communication library, a C library that acts as middleware for programming models like MPI, PGAS, and other task-based runtime models. As HPC and data science applications move to Python for prototyping purposes and ...Read More
Abstract:
We'll talk about the UCX unified communication library, a C library that acts as middleware for programming models like MPI, PGAS, and other task-based runtime models. As HPC and data science applications move to Python for prototyping purposes and for ease of development, we provide Python bindings and an object-oriented type of UCX through Python bindings with the help of Cython. We'll explain how this makes it possible for Python applications to quickly use many of UCX's communication primitives such as send-recv, distributed load-store, and callback facilities. Specifically, UCX provides CUDA-awareness, making it possible for objects concerning CUDA memory to be transferred among Python processes.  Back
 
Topics:
Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9679
Streaming:
Download:
Share:
 
Abstract:
Users of HPC Systems have diverse needs and requirements for their applications and ML/DL environments. Containers help streamline and simplify environment creation, but security concerns generally prohibit popular container environments such as Dock ...Read More
Abstract:
Users of HPC Systems have diverse needs and requirements for their applications and ML/DL environments. Containers help streamline and simplify environment creation, but security concerns generally prohibit popular container environments such as Docker from running in shared computing environments. Alternate container systems for HPC address security concerns but have less documentation and resources available for users. We'll describe how our pipeline and resources at MITRE enable users to quickly build custom environments and run their code on the HPC system while minimizing startup time. Our process implements LXD containers, Docker, and Singularity on a combination of development and production HPC systems using a traditional scheduler.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9958
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the challenges of the GPU/DRAM bandwidth in high-performance systems. Graphics memory is a key differentiator for addressing these challenges in AI in areas that from the data center to the smart edge. We'll compare discrete GDDR memo ...Read More
Abstract:
We'll discuss the challenges of the GPU/DRAM bandwidth in high-performance systems. Graphics memory is a key differentiator for addressing these challenges in AI in areas that from the data center to the smart edge. We'll compare discrete GDDR memory and high-bandwidth memory to identify solution space for these options. We'll also discuss how applications in graphics, HPC, and AI benefit from more bandwidth during presentations at the Micron booth on the exhibit floor.  Back
 
Topics:
Graphics and AI, HPC and AI
Type:
Sponsored Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9968
Streaming:
Download:
Share:
 
Abstract:
AI and related technologies are beginning to revolutionize astronomy and astrophysics. As facilities like the Large Synoptic Survey Telescope and the Wide Field InfraRed Telescope come online, data volumes in astronomy will increase. We will describe ...Read More
Abstract:
AI and related technologies are beginning to revolutionize astronomy and astrophysics. As facilities like the Large Synoptic Survey Telescope and the Wide Field InfraRed Telescope come online, data volumes in astronomy will increase. We will describe a deep learning framework that allows astronomers to identify and categorize astronomical objects in enormous datasets with more fidelity than ever. We'll also review new applications of AI in astrophysics, including data analysis and numerical simulation.  Back
 
Topics:
Astronomy & Astrophysics, Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9508
Streaming:
Download:
Share:
 
Abstract:
Learn how GPUs are pushing the limits of the largest astronomical telescopes on Earth and how they'll be used to image life-bearing planets outside our solar system. Thanks to hardware features such as Tensor Cores and mixed-precision suppor ...Read More
Abstract:

Learn how GPUs are pushing the limits of the largest astronomical telescopes on Earth and how they'll be used to image life-bearing planets outside our solar system. Thanks to hardware features such as Tensor Cores and mixed-precision support, plus optimized AI frameworks, GPU technology is changing how large data streams from optical sensors are digested in real time. We'll discuss how real-time AI made possible by GPUs opens up new means to optimally control the system and calibrate images, which will help scientists get the most out of the largest optical telescopes. GPUs will also benefit future extreme-size facilities like the European Extremely Large Telescope because the complexity of maintaining exquisite image quality increases with the square of its diameter size. We'll present on-sky results obtained on the 8.2-meter Subaru Telescope and explain why these techniques will be essential to future giant telescopes.

  Back
 
Topics:
Astronomy & Astrophysics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9634
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the revolution in computing, modeling, data handling and software development that's needed to advance U.S. weather-prediction capabilities in the exascale computing era. Creating prediction models to cloud-resolving 1 KM-resolution s ...Read More
Abstract:
We'll discuss the revolution in computing, modeling, data handling and software development that's needed to advance U.S. weather-prediction capabilities in the exascale computing era. Creating prediction models to cloud-resolving 1 KM-resolution scales will require an estimated 1,000-10,000 times more computing power, but existing models can't exploit exascale systems with millions of processors. We'll examine how weather-prediction models must be rewritten to incorporate new scientific algorithms, improved software design, and use new technologies such as deep learning to speed model execution, data processing, and information processing. We'll also offer a critical and visionary assessment of key technologies and developments needed to advance U.S. operational weather prediction in the next decade.  Back
 
Topics:
Climate, Weather & Ocean Modeling, AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9750
Streaming:
Download:
Share:
 
Abstract:
Learn how GPU-based computational fluid dynamics (CFD) paves the way for fast, affordable high-fidelity simulations for automotive aerodynamics. Highly resolved transient CFD simulations based on pure CPU systems are computationally expensive and con ...Read More
Abstract:
Learn how GPU-based computational fluid dynamics (CFD) paves the way for fast, affordable high-fidelity simulations for automotive aerodynamics. Highly resolved transient CFD simulations based on pure CPU systems are computationally expensive and constrained by available computational resources. For years, this posed a problem for automotive OEMs working on aerodynamic design. We'll describe our solution, ultraFluidX, a novel CFD solver designed to leverage the massively parallel architecture of GPUs. We will outline how its efficient multi-GPU implementation allows the tool to achieve turnaround times of just a few hours on a single GPU machine. This makes it possible to perform simulations of fully detailed production-level passenger and heavy-duty vehicles overnight, which is a breakthrough for simulation-based design.  Back
 
Topics:
Computer Aided Engineering, Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9719
Streaming:
Download:
Share:
 
Abstract:
We'll do a deep dive into previously undisclosed architectural details of NVIDIA's Turing T4 Cloud GPU, which we unearthed via micro-benchmarks, and compare the architecture's features with previous generations of NVIDIA GPUs. We' ...Read More
Abstract:

We'll do a deep dive into previously undisclosed architectural details of NVIDIA's Turing T4 Cloud GPU, which we unearthed via micro-benchmarks, and compare the architecture's features with previous generations of NVIDIA GPUs. We'll also reveal the geometry and latency of Turing's complex memory hierarchy, the format of its encoded instructions, and the latency of instructions. Learn how developers can use this knowledge to design workloads that adapt exactly to the characteristics of the T4 GPU. We'll also explain how to manually assemble binary code that squeezes every bit of bare-metal performance from the hardware, which maximizes dual issues and avoids bank conflicts.

  Back
 
Topics:
Finance - Deep Learning, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9839
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how the latest advances GPU technologies have have made it possible to reduce MRI scan time and increase reconstruction accuracy. These advances have added previously unseen capabilities to nuclear medical imaging by simulating photon ...Read More
Abstract:
We'll discuss how the latest advances GPU technologies have have made it possible to reduce MRI scan time and increase reconstruction accuracy. These advances have added previously unseen capabilities to nuclear medical imaging by simulating photon trajectories with excellent precision and speed. They've also accelerated work in cancer therapy with real-time simulation of the physics of thermal tumor ablation. Learn how we've unleashed the potential of high performance computing and deep learning on GPUs to drive medical imaging innovation.  Back
 
Topics:
Medical Imaging & Radiology, AI in Healthcare, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9993
Streaming:
Download:
Share:
 
Abstract:
We'll discuss the Clara Platform, which is designed to bring NVIDIA technology and expertise in high performance computing, artificial intelligence, and photorealistic rendering to the medical-imaging industry. Our talk will focus on how dev ...Read More
Abstract:

We'll discuss the Clara Platform, which is designed to bring NVIDIA technology and expertise in high performance computing, artificial intelligence, and photorealistic rendering to the medical-imaging industry. Our talk will focus on how developers from industry and institutions are leveraging the platform to integrate artificial intelligence into hospitals to bend the cost curve and improve patient outcomes.

  Back
 
Topics:
Medical Imaging & Radiology, AI in Healthcare, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9994
Streaming:
Download:
Share:
 
Abstract:
Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models f ...Read More
Abstract:

Learn how to make your irregular algorithm perform on GPUs. We provide insights into our research on a tasking framework for synchronization-critical applications on GPUs. We discuss the requirements of GPU architectures and programming models for implementing efficient tasking frameworks. Participants will learn about the pitfalls for tasking arising from the architectural differences between latency-driven CPUs and throughput-driven GPUs. To overcome these pitfalls, we consider programming concepts such as persistent threads, warp-aware data structures and CUDA asynchronous task graphs. In addition, we look at the latest GPU features such as forward progress guarantees and grid synchronization that facilitate the implementation of tasking approaches. A task-based fast multipole method for the molecular dynamics package GROMACS serves as use case for our considerations.

  Back
 
Topics:
HPC and Supercomputing, Computational Biology & Chemistry, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9548
Streaming:
Download:
Share:
 
Abstract:
We'll present an overview of the upcoming NERSC9 system architecture, throughput model, and application readiness efforts.
 
Topics:
HPC and Supercomputing, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9809
Streaming:
Download:
Share:
 
Abstract:
在这个演讲题目中,我们会呈现现阶段大量科学计算领域利用深度学习、机器学习来加速各种高性能计算问题解决速度的案例。我们会讨论最新的技术进展和相应所带来的性能的提升。我们也会探讨现阶段在科学计算中应用 AI 的障碍和一些可能的解决方案。 ...Read More
Abstract:
在这个演讲题目中,我们会呈现现阶段大量科学计算领域利用深度学习、机器学习来加速各种高性能计算问题解决速度的案例。我们会讨论最新的技术进展和相应所带来的性能的提升。我们也会探讨现阶段在科学计算中应用 AI 的障碍和一些可能的解决方案。  Back
 
Topics:
Genomics & Bioinformatics, HPC and AI
Type:
Talk
Event:
GTC China
Year:
2018
Session ID:
CH8407
Download:
Share:
 
Abstract:
随着计算机硬件、算法模型及大数据等技术的快速发展,我国人工智能技术进入又一轮深入应用。报告将从 AI 发展趋势出发,介绍 AI 面临的新机遇与挑战,简要探讨 AI 计算及数据应用服务平台建设思路与未来发展方向。 ...Read More
Abstract:
随着计算机硬件、算法模型及大数据等技术的快速发展,我国人工智能技术进入又一轮深入应用。报告将从 AI 发展趋势出发,介绍 AI 面临的新机遇与挑战,简要探讨 AI 计算及数据应用服务平台建设思路与未来发展方向。  Back
 
Topics:
Science and Research, HPC and AI
Type:
Talk
Event:
GTC China
Year:
2018
Session ID:
CH8403
Share:
 
Abstract:
  • 認知運算 /人工智慧 PowerAI & Watson
  • 智慧金融、製造、醫療範例,情境和演示
  • 部署在自己的數據中心的 IBM 認知系統整體解決方案
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8028
Download:
Share:
 
Abstract:
Facebook 的人工智慧 (AI) 創新優勢在於,它能夠利用多功能的工具集快速將尖端的研究投入到大規模生產中。 藉由無縫並具互操作性的 AI 發展,了解開放神經網路交換 (ONNX) 與 PyTorch 1.0 如何協助加速研究到生產的路徑過程。此演講將分享 Facebook 針對 PyTorch 1.0 與生態圈中其他拓展計劃,包含針對物件偵測的 Detectron、ELF 推論平台及用於生成最佳化神經網路代碼的 Tensor Comprehensions and Glow。 ...Read More
Abstract:
Facebook 的人工智慧 (AI) 創新優勢在於,它能夠利用多功能的工具集快速將尖端的研究投入到大規模生產中。 藉由無縫並具互操作性的 AI 發展,了解開放神經網路交換 (ONNX) 與 PyTorch 1.0 如何協助加速研究到生產的路徑過程。此演講將分享 Facebook 針對 PyTorch 1.0 與生態圈中其他拓展計劃,包含針對物件偵測的 Detectron、ELF 推論平台及用於生成最佳化神經網路代碼的 Tensor Comprehensions and Glow。  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
GTC Taiwan
Year:
2018
Session ID:
STW8030
Download:
Share:
 
Abstract:
Learn how to use Gunrock, a state-of-the-art CUDA-based graph-processing library specifically designed for the GPU, to develop fast, efficient, and complex graph primitives. Gunrock achieves a balance between performance and expressiveness by couplin ...Read More
Abstract:
Learn how to use Gunrock, a state-of-the-art CUDA-based graph-processing library specifically designed for the GPU, to develop fast, efficient, and complex graph primitives. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. Gunrock is a stable, powerful, and forward-looking substrate for GPU-based, graph-centric research and development. Like many graph frameworks, it leverages a bulk-synchronous programming model and targets iterative convergent graph computations. We believe that Gunrock offers both the best performance on GPU graph analytics as well as the widest range of primitives.  Back
 
Topics:
Accelerated Data Science, Tools & Libraries, HPC and AI
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8586
Streaming:
Share:
 
Abstract:
We'll present a new technique for improving efficiency of inference and training in deep learning in the presence of sparse workloads. We'll start with a brief overview of applications of sparse linear algebra in engineering and data analysis. Then ...Read More
Abstract:
We'll present a new technique for improving efficiency of inference and training in deep learning in the presence of sparse workloads. We'll start with a brief overview of applications of sparse linear algebra in engineering and data analysis. Then, we'll analyze the presence of sparsity in both the training and inference phases of deep learning. To exploit this sparsity, we present our method of improving memory locality of sparse applications. We'll establish lower and upper bounds for sparse matrix operations and crossover with dense matrix operations. We'll demonstrate how to minimize memory traffic by tiling matrix operations, efficient use of L2, L1, and SMEM. We'll conclude with a performance comparison of our method with existing techniques on some real pruned weight matrices from GoogLeNet and OpenNMT's multiway translation network. This is the joint work of Michael Frumkin, Jeff Pool, and Lung Sheng Chien.  Back
 
Topics:
Algorithms & Numerical Techniques, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8458
Streaming:
Share:
 
Abstract:
Inspur has been deploying AI solutions with our customers, such as Microsoft, Alibaba, Baidu, BMW, for many years. We will share AI use cases on how we deploy AI at scale and take a close look at the technologies that enable AI deployments.
 
Topics:
AI Application, Deployment & Inference, AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8996
Streaming:
Share:
 
Abstract:
Time witnessed the rapid growth of AI cloud and AI-as-a-service, along with the AI explosion this year. We'll report our continuous effort and progress on bringing NVIDIA GPU to container cloud, GPU scheduling optimization, and experience sharing of ...Read More
Abstract:
Time witnessed the rapid growth of AI cloud and AI-as-a-service, along with the AI explosion this year. We'll report our continuous effort and progress on bringing NVIDIA GPU to container cloud, GPU scheduling optimization, and experience sharing of holding AI workload on container cloud. Firstly, based on the work we reported at GTC 2017, we will update our latest progress of new GPU features adding to Kubernetes, including two GPU advanced schedulers and GPU resource namespace control. This year, we have brought GPU-enabled Kubernetes to IBM Cloud Private, the IBM commercial on-premise container cloud, and several other import IBM products, including our own IBM AI product PowerAI Vision. Meanwhile, we also keep activity to continuously share our technology to open community. Secondly, we want to share our lessons and learns about how to design, manage, optimize and operate AI cloud, with our experiences from our product and user feedback over two years.  Back
 
Topics:
Data Center & Cloud Infrastructure, Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8287
Streaming:
Share:
 
Abstract:
Learn how the breakthrough HPE Superdome Flex platform equips scientists, engineers, and business lines with in-memory computing at unparalleled scale to solve complex, data-intensive problems holistically, accelerate analytics, and coupled with Nvid ...Read More
Abstract:
Learn how the breakthrough HPE Superdome Flex platform equips scientists, engineers, and business lines with in-memory computing at unparalleled scale to solve complex, data-intensive problems holistically, accelerate analytics, and coupled with Nvidia GPU technology, leverage large-scale data visualization to speed time to discovery and innovation.  Back
 
Topics:
Accelerated Data Science, Computational Fluid Dynamics, Computer Aided Engineering, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8973
Streaming:
Download:
Share:
 
Abstract:
AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. ...Read More
Abstract:
AWS offers the most powerful GPU-accelerated cloud infrastructure that delivers unparalleled computational efficiency for advanced engineering simulations and analysis, enabling High Performance Computing (HPC) workloads to run in the cloud at scale. This session features a real-world use case from the advanced product engineering team at Western Digital, who is using HPC solutions to model new technologies and capabilities prior to production. Western Digital's computational tools incorporate the description of physics occurring during the HDD recording process and ultimately result in input to a recording sub system channel model which produces an Error Rate. The length scales involved in the recording model range from a few nanometers in the description of the recording media to microns in the description of the recording head. The power of the current generation of NVIDIA GPUs allows Western Digital to generate enough simulation data so that the same recording sub system channel model, used in experiments, can be employed in studies that include fabrication processes variances.   Back
 
Topics:
Computational Physics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81041
Streaming:
Download:
Share:
 
Abstract:
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine ...Read More
Abstract:
The Krylov Project is the key component in eBay's AI Platform initiative that provides an easy to use, open, and fast AI orchestration engine that is deployed as managed services in eBay cloud. The main goals of the project are: Every AI and machine learning algorithm should be shareable and easily implementable with possible options of frameworks; enable machine learning engineers to do end-to-end training pipelines that distribute and parallelize over many machines; training models should be automated and allow easy access to vast eBay datasets; engineers should be able to search past job submissions, view results, and share with others. We have built Krylov from the ground up, leveraging JVM, Python, and Go as the main technologies to build the Krylov components, while standing in shoulder of giants of technology such as Docker, Kubernetes, and Apache Hadoop. Using Krylov, AI scientists can access eBay's massive datasets; build and train AI models; spin up powerful compute (high-memory or GPU instances) on the Krylov HPC cluster; and set up machine learning pipelines, such as using declarative constructs that stitch together pipeline lifecycle.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8277
Streaming:
Download:
Share:
 
Abstract:
Horovod makes it easy to train a single GPU TensorFlow model on many GPUs; both on a single server and across multiple servers. We'll cover Uber's explorations of distributed deep learning, how to use Horovod, and what kind of performance you ...Read More
Abstract:
Horovod makes it easy to train a single GPU TensorFlow model on many GPUs; both on a single server and across multiple servers. We'll cover Uber's explorations of distributed deep learning, how to use Horovod, and what kind of performance you can get on standard models, such as Inception V3 and ResNet-101. Learn how to speed up training of your TensorFlow model with Horovod.  Back
 
Topics:
AI & Deep Learning Research, Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8152
Streaming:
Share:
 
Abstract:
As multi-GPU deep learning performance improves, the performance of the storage system hosting a dataset becomes critical in keeping these GPUs fully utilized. We survey the different methods for providing training data to a TensorFlow applicati ...Read More
Abstract:

As multi-GPU deep learning performance improves, the performance of the storage system hosting a dataset becomes critical in keeping these GPUs fully utilized. We survey the different methods for providing training data to a TensorFlow application on a GPU, and benchmark data throughput for a variety of popular neural network architectures. We look at performance and potential bottlenecks for local storage technologies (SCSI SSD and NVMe), high performance network-attached file systems, TensorFlow native connectors (HDFS and S3), and FUSE-connected object storage.

  Back
 
Topics:
Data Center & Cloud Infrastructure, AI Startup, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8544
Streaming:
Share:
 
Abstract:
Recurrent neural networks are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modeling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs ar ...Read More
Abstract:
Recurrent neural networks are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modeling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. To address this issue, we prune blocks of weights in a layer instead of individual weights. Using these techniques, we can create block-sparse RNNs with sparsity ranging from 80% to 90% with a small loss in accuracy. This technique allows us to reduce the model size by 10x. Additionally, we can prune a larger dense network to recover this loss in accuracy while maintaining high block sparsity and reducing the overall parameter count. Our technique works with a variety of block sizes up to 32x32. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8924
Streaming:
Download:
Share:
 
Abstract:
Learn how to effectively leverage CUDA Fortran to port scientific applications written in Fortran to GPUs. We'll present in detail the porting effort of Quantum ESPRESSO's Plane-Wave Self-Consistent Field (PWscf) solver, from profiling and identify ...Read More
Abstract:
Learn how to effectively leverage CUDA Fortran to port scientific applications written in Fortran to GPUs. We'll present in detail the porting effort of Quantum ESPRESSO's Plane-Wave Self-Consistent Field (PWscf) solver, from profiling and identifying time-consuming procedures to performance analysis of the GPU-accelerated solver on several benchmark problems on systems ranging in size from small workstations to large distributed GPU clusters. We'll highlight several tools available in CUDA Fortran to accomplish this, from high-level CUF kernel directives to lower level kernel programming, and provide guidance and best practices in several use cases with detailed examples.  Back
 
Topics:
Computational Biology & Chemistry, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8446
Streaming:
Share:
 
Abstract:
A GPU-accelerated direct sparse matrix solver has been in use at ANSYS since 2016. It achieves high performance on CPUs and GPUs for a wide range of electromagnetic problems, in comparison with state-of-the-art commercial and open-source software. We ...Read More
Abstract:
A GPU-accelerated direct sparse matrix solver has been in use at ANSYS since 2016. It achieves high performance on CPUs and GPUs for a wide range of electromagnetic problems, in comparison with state-of-the-art commercial and open-source software. We'll review the current GPU acceleration technique, and describe our recent improvements to the GPU-enabled matrix solver technique, observing up to 1.5x speedup over the existing GPU algorithm. This new innovation enables GPU acceleration of matrix computations that would not benefit from GPUs before.  Back
 
Topics:
Algorithms & Numerical Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8161
Streaming:
Share:
 
Abstract:
Attendees will learn how deep learning models identify severe weather hazards, how deep learning severe weather diagnosis compares with other machine learning methods, and what weather features deep learning considers most important for determining w ...Read More
Abstract:
Attendees will learn how deep learning models identify severe weather hazards, how deep learning severe weather diagnosis compares with other machine learning methods, and what weather features deep learning considers most important for determining whether a storm will produce severe weather or not. Severe weather hazards, such as tornadoes, hail, high winds, and flash floods, cause billions of dollars in property damage and injure or kill hundreds of people in the U.S. each year. Improved forecasts of the potential for severe weather enables decision makers to take actions to save lives and property. Machine learning and deep learning models extract spatial information from observations and numerical weather prediction model output to predict the probability of severe weather based on whether or not some form of severe weather was reported by the public. Convolutional neural networks and generative adversarial networks are compared against principal component analysis encodings to determine how much skill deep learning adds over traditional methods. The deep learning models are interrogated to identify important variables and spatial features for severe weather prediction.  Back
 
Topics:
Advanced AI Learning Techniques, Climate, Weather & Ocean Modeling, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8455
Streaming:
Share:
 
Abstract:
We'll do a dive deep into best practices and real world examples of leveraging the power and flexibility of local GPU workstations, such has the DGX Station, to rapidly develop and prototype deep learning applications. We'll demonstrate the setup o ...Read More
Abstract:
We'll do a dive deep into best practices and real world examples of leveraging the power and flexibility of local GPU workstations, such has the DGX Station, to rapidly develop and prototype deep learning applications. We'll demonstrate the setup of our small lab, which is capable of supporting a team of several developers/researchers, and our journey as we moved from lab to data center. Specifically, we'll walk through our experience of building the TensorRT Inference Demo, aka Flowers, used by Jensen to demonstrate the value of GPU computing throughout the world-wide GTCs. As an added bonus, get first-hand insights into the latest advancements coming to AI workstations this year. The flexibility for fast prototyping provided by our lab was an invaluable asset as we toyed with different software and hardware components. As the models and applications stabilized and we moved from lab to data center, we were able to run fully load-balanced over 64 V100s serving video inference demonstrating Software-in-the-Loop's (SIL) ReSim capabilities for Autonomous Vehicles at GTC EU. Real live examples will be given.  Back
 
Topics:
Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8263
Streaming:
Share:
 
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC job engines in conjunction with both Linux and ...Read More
Abstract:
Discussion and demonstration of the potential with running HPC, and VDI workloads on common clusters for a modern a datacenter Dr. Jekyll and Mr. Hyde scenario. Explore the coexistence of CUDA based HPC job engines in conjunction with both Linux and Windows machines used for virtual desktop infrastructure. The demonstration will focus on a very minimal VMware cluster deployment using VSAN storage to host both the Linux HPC multi node cluster for CUDA workloads and a VMware Horizon view deployment for Linux and Windows Virtual Desktops performing DirectX, OpenGL, and CUDA based visualization workloads as used by engineering and analysis power users.  Back
 
Topics:
Data Center & Cloud Infrastructure, GPU Virtualization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8209
Streaming:
Share:
 
Abstract:
Learn how to use GPUs to accelerate gradient boosting on decision trees. We'll discuss CUDA implementation of CatBoost an open-source library that successfully handles categorical features and shows better quality compared to other open-source gra ...Read More
Abstract:
Learn how to use GPUs to accelerate gradient boosting on decision trees. We'll discuss CUDA implementation of CatBoost an open-source library that successfully handles categorical features and shows better quality compared to other open-source gradient boosted decision trees libraries. We'll provide a brief overview of problems which could be solved with CatBoost. Then, we'll discuss challenges and key optimizations in the most significant computation blocks. We'll describe how one can efficiently build histograms in shared memory to construct decision trees and how to avoid atomic operation during this step. We'll provide benchmarks that shows that our GPU implementation is five to 40 times faster compared to Intel server CPUs. We'll also provide performance comparison against GPU implementations of gradient boosting in other open-source libraries.  Back
 
Topics:
AI Application, Deployment & Inference, Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8393
Streaming:
Share:
 
Abstract:
Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you''ll learn what containers are, how they are orchestrated to ...Read More
Abstract:
Are you wondering whether the cloud is relevant to HPC and how it works? Increasingly, applications in high-performance computing are using containers to ease deployment. In this talk, you''ll learn what containers are, how they are orchestrated to run together in the cloud, and how communication among containers works. You''ll get a snapshot of current support from the ecosystem, and gain insight into why NVIDIA is leading the charge to provide best performance and usability.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8642
Streaming:
Download:
Share:
 
Abstract:
We''ll present a multi-node distributed deep learning framework called ChainerMN. Even though GPUs are continuously gaining more computation throughput, it is still very time-consuming to train state-of-the-art deep neural network models ...Read More
Abstract:

We''ll present a multi-node distributed deep learning framework called ChainerMN. Even though GPUs are continuously gaining more computation throughput, it is still very time-consuming to train state-of-the-art deep neural network models. For better scalability and productivity, it is paramount to accelerate the training process by using multiple GPUs. To enable high-performance and flexible distributed training, ChainerMN was developed and built on top of Chainer. We''ll first introduce the basic approaches to distributed deep learning and then explain the design choice, basic usage, and implementation details of Chainer and ChainerMN. To demonstrate the scalability and efficiency of ChainerMN, we''ll discuss the remarkable results from training ResNet-50 classification model on ImageNet database using 1024 Tesla P100 GPUs and our in-house cluster, MN-1.  

  Back
 
Topics:
AI & Deep Learning Research, AI Startup, Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8889
Streaming:
Download:
Share:
 
Abstract:
We''ll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We''ll discuss the current state-of-the-art approaches and performance ga ...Read More
Abstract:
We''ll review our study of the use of artificial intelligence to augment various domains of computational science in order to improve time to solution for various HPC problems. We''ll discuss the current state-of-the-art approaches and performance gains where applicable. We''ll also investigate current barriers to adoption and consider possible solutions.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8242
Streaming:
Download:
Share:
 
Abstract:
Many scientific and engineering fields increasingly rely on complex and time consuming computational simulation as part of the modern scientific workflow. In many applications, such as High Energy Particle Physics, Cosmology, Geophysics, and others, ...Read More
Abstract:
Many scientific and engineering fields increasingly rely on complex and time consuming computational simulation as part of the modern scientific workflow. In many applications, such as High Energy Particle Physics, Cosmology, Geophysics, and others, simulations are the computational bottleneck for producing and testing results. We introduce the usage of Generative Adversarial Networks (GAN) as a potential tool for speeding up expensive theoretical models and simulations in scientific and engineering applications, ushering in a new era of deep learning-powered scientific discovery. We will show that using a GAN-based High Energy Physics fast simulator on GPUs can provide speedups of up to 100,000x when compared to traditional simulation software, while retaining high levels of precision. Finally, we will discuss modeling and architectural considerations in this domain with the hope of directly empowering scientists and engineers in other fields to experiment with Generative Adversarial Networks in order to speed up simulation across scientific domains.  Back
 
Topics:
AI & Deep Learning Research, Advanced AI Learning Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81001
Streaming:
Download:
Share:
 
Abstract:
We''ll describe how we accelerate the estimation of multiple subsurface properties with GPU-equipped cloud computers and save cost at the same time. Traditionally, institutions spend millions of dollars to build and maintain computing infrastructures ...Read More
Abstract:
We''ll describe how we accelerate the estimation of multiple subsurface properties with GPU-equipped cloud computers and save cost at the same time. Traditionally, institutions spend millions of dollars to build and maintain computing infrastructures that are rarely occupied at full capacity. Cloud computing offers a solution to this via on-demand provisioning that can flexibly meet an institution''s needs, but it comes with two potential problems: preemption and no guarantee of low-latency inter-node communication. To sidestep these issues, we implement a pipeline processing model that fully utilizes CPU memory and GPU global memory to hide latency without having to decompose the computational domain into multiple nodes.  Back
 
Topics:
Algorithms & Numerical Techniques, Seismic & Geosciences, Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8405
Streaming:
Download:
Share:
 
Abstract:
We''ll present HornetsNest, a framework for developing static and dynamic graph algorithms with relative ease. Through a small subset of graph primitives, which are the API for our framework, it is possible to implement parallel graph algorithms usin ...Read More
Abstract:
We''ll present HornetsNest, a framework for developing static and dynamic graph algorithms with relative ease. Through a small subset of graph primitives, which are the API for our framework, it is possible to implement parallel graph algorithms using a fairly small number of code lines. These graph primitives are optimized in the backend and as such programmers can focus on algorithm design rather than load-balancing, system utilization, and optimizations. Using these primitives, it''s possible to implement BFS in roughly 10 lines of code. Performance-wise, this BFS performs as well is its counterpart in the Gunrock library. More importantly, HornestsNest is the first framework to support a wide range of high-performing dynamic graph analytics, including new algorithms for dynamic triangle counting, dynamic page rank, and dynamic Katz centrality. Finally, we''ll cover the performance of numerous graph algorithms.  Back
 
Topics:
Accelerated Data Science, Algorithms & Numerical Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8297
Streaming:
Download:
Share:
 
Abstract:
How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many de ...Read More
Abstract:
How do meteorologists predict weather or weather events such as hurricanes, typhoons, and heavy rain? Predicting weather events were done based on supercomputer (HPC) simulations using numerical models such as WRF, UM, and MPAS. But recently, many deep learning-based researches have been showing various kinds of outstanding results. We'll introduce several case studies related to meteorological researches. We'll also describe how the meteorological tasks are different from general deep learning tasks, their detailed approaches, and their input data such as weather radar images and satellite images. We'll also cover typhoon detection and tracking, rainfall amount prediction, forecasting future cloud figure, and more.  Back
 
Topics:
AI Application, Deployment & Inference, Climate, Weather & Ocean Modeling, Computer Vision, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8816
Streaming:
Download:
Share:
 
Abstract:
Microsoft Research leverages a wide variety of open-source, free and custom tools to manage a complex infrastructure for doing research. We are in a unique position at Microsoft and in the industry, where we serve academic experts who expect access t ...Read More
Abstract:
Microsoft Research leverages a wide variety of open-source, free and custom tools to manage a complex infrastructure for doing research. We are in a unique position at Microsoft and in the industry, where we serve academic experts who expect access to the latest open source tools, in an environment where Microsoft solutions should also be considered. See examples of how we manage popular/constrained assets and enforce fairness across many systems. Linux/Docker, Windows, On-site, Azure, or a hybrid of all-of-the above we see it all. In this session, you will learn what tools can be easily leveraged to manage your own onsite and cloud GPU infrastructure. We touch on Cluster management fabrics, scheduling, authentication, hot storage, configuration management, software portability/container management and high-performance hardware selection.  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8663
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll ...Read More
Abstract:
We'll talk about how to use Singularity to containerize deep learning applications. We'll provide compelling reasons to choose Singularity over Docker. We'll cover deep learning frameworks, including TensorFlow, NV-Caffe, MXNet, and others. We'll present the current challenges and workarounds when using Singularity in a HPC cluster. We'll compare the performance of Singularity to bare-metal systems.  Back
 
Topics:
AI Application, Deployment & Inference, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8368
Streaming:
Download:
Share:
 
Abstract:
Hardware is getting smarter every day. GPUs, hardware accelerated networks, and non-volatile memories are increasingly replacing capabilities offered today by operating systems and software libraries. They are becoming available on-premise and in clo ...Read More
Abstract:
Hardware is getting smarter every day. GPUs, hardware accelerated networks, and non-volatile memories are increasingly replacing capabilities offered today by operating systems and software libraries. They are becoming available on-premise and in clouds. Leveraging them in your application can yield orders of magnitude improvements in latency and throughput, and much smaller code bases. We present simple abstractions exposing hardware capabilities, and work-in-progress demos: data storage using hardware erasure codes present in recent network adapters, streaming data from storage to GPUs using RDMA, and executing a deep learning distributed compute graph entirely in hardware using GPUDirect Async. Our demos are attempts to replace large code bases with few lines of Python, using interchangeable and unified hardware abstractions, so data and control events can flow directly device-to-device.  Back
 
Topics:
Data Center & Cloud Infrastructure, Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8154
Streaming:
Download:
Share:
 
Abstract:
You'll learn about enabling virtualized GPUs for machine learning workloads on VMware vSphere and combining GPU performance with data center management benefits of VMware vSphere. NVIDIA's Pascal GPU is the first GPU to offer both virtualized Compu ...Read More
Abstract:
You'll learn about enabling virtualized GPUs for machine learning workloads on VMware vSphere and combining GPU performance with data center management benefits of VMware vSphere. NVIDIA's Pascal GPU is the first GPU to offer both virtualized Compute/CUDA and virtualized Graphics. It supports multiple virtual machines (VM) sharing GPU for both compute and graphics capabilities. We will present our research results of machine learning workloads with vSphere platform using NVIDIA's virtualized GPUs. Learn different ways to deploy GPU-based workloads developed with popular machine learning frameworks like TensorFlow and Caffe using VMware DirectPathIO and NVIDIA vGPU solutions. We will discuss use cases to leverage best scheduling methods Equal Share, Fixed Share and Best Effort for virtualized GPUs and illustrate their benefits via our performance study. We address the scalability of machine learning workloads in term of the number of VMs per vSphere server and the number of GPUs per VM. Data center resource utilization of these workloads on vSphere with NVIDIA GPUs is also analyzed and presented.  Back
 
Topics:
GPU Virtualization, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8249
Streaming:
Download:
Share:
 
Abstract:
We'll present Hornet, formerly known as cuSTINGER, a data structure designed for sparse dynamic graphs and matrices. Hornet scales to massive datasets while supporting very fast updates, over 200 million updates per second on a single Tesla P100 GPU ...Read More
Abstract:
We'll present Hornet, formerly known as cuSTINGER, a data structure designed for sparse dynamic graphs and matrices. Hornet scales to massive datasets while supporting very fast updates, over 200 million updates per second on a single Tesla P100 GPU. We'll show that replacing CSR, a popular data structure for sparse data, with Hornet does not change the execution time. We'll also show that the memory utilization of Hornet is within that of CSR and COO, and briefly show performance results of several analytics using Hornet. We'll cover the programming model for Hornet in a separate talk.  Back
 
Topics:
Accelerated Data Science, Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8177
Streaming:
Download:
Share:
 
Abstract:
Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode an ...Read More
Abstract:
Enterprise Digital Workspaces support diverse workloads including virtual desktops, deep learning, big data. Nvidia GPUs bring high performance computing (HPC) for graphics, GPGPU, especially machine learning workloads. They also provide HW encode and decode to accelerate the processing of video contents. In this session, we will explore performance and resource utilization of various workloads that leverage different capabilities of GPU like graphics, compute and H.264 HW encode / decode. Nvidia virtualized GPUs and VMware vSphere brings in tremendous combined benefits for both GPU-based workloads and data center management via virtualization. We will present results of our research on running diverse workloads on vSphere platform using Nvidia GRID GPUs. We explore vSphere features of Suspend/Resume and vMotioning of vGPU based virtual machines. We will quantify benefits of vGPU for data center management using VMware vSphere and describe techniques for efficient management of workloads and datacenter resources.  Back
 
Topics:
Data Center & Cloud Infrastructure, GPU Virtualization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8250
Streaming:
Download:
Share:
 
Abstract:
The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expan ...Read More
Abstract:
The functional mapping of man-made facilities from high-resolution remote sensing images provides timely high-fidelity land-use information and population distribution estimates, which facilitates federal, non-governmental agency and industrial expansion efficiency. We'll share our journey to deliver functional maps of the world that include building extraction, human settlement maps, mobile home parks, and facility mapping using a variety of remote sensing imagery. Our research addresses three frontier challenges; 1) distinct characteristics of remote sensing data for deep learning (including the model distribution shifts encountered with remote sensing images), multisensor sources, and data multi modalities; 2) training very large deep-learning models using multi-GPU and multi-node HPC platforms; 3) large-scale inference using ORNL's Titan and Summit with NVIDIA TensorRT. We'll also talk about developing workflows to minimize I/O inefficiency, doing parallel gradient-descent learning, and managing remote sensing data in HPC environment.  Back
 
Topics:
Computer Vision, GIS, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8420
Streaming:
Download:
Share:
 
Abstract:
We'll present the architectural details of the Volta GPU discovered via our micro-benchmarks and reveal the geometry and latency of Volta's complex memory hierarchy, the format of its encoded instructions, and the latency of commonly used inst ...Read More
Abstract:
We'll present the architectural details of the Volta GPU discovered via our micro-benchmarks and reveal the geometry and latency of Volta's complex memory hierarchy, the format of its encoded instructions, and the latency of commonly used instructions. The knowledge being shared enables developers to craft better optimized code than what is currently possible through publicly available information and tool chains.  Back
 
Topics:
Finance - Quantitative Risk & Derivative Calculations, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8122
Streaming:
Download:
Share:
 
Abstract:
We'll review recent efforts to compress fully connected layers in machine learning via tensor networks, including the Tensor Train format, the Tensor Contraction Layer, the Tensor Regression Layer, and a Tensor Ring decomposition. These decompositio ...Read More
Abstract:
We'll review recent efforts to compress fully connected layers in machine learning via tensor networks, including the Tensor Train format, the Tensor Contraction Layer, the Tensor Regression Layer, and a Tensor Ring decomposition. These decompositions, in supplementing or replacing fully connected layers, are shown to dramatically reduce the number of parameters required by the network without resorting to sparsity and without loss in error. We've shown 55-80 percent compression of the entire network with less than one percent loss in accuracy. These Tensor layers can be used in end-to-end training, fine-tuning, and transfer-learning by initializing the decomposition with a pre-trained fully connected layer. Furthermore, because the forward and backward passes of the network rely on dense Tensor contractions, we show that these methods retain high computational intensity and can be efficiently evaluated on GPUs.  Back
 
Topics:
AI & Deep Learning Research, Algorithms & Numerical Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8807
Streaming:
Download:
Share:
 
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better ...Read More
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better understand the mechanisms of cancer, use large amounts of diverse medical data for predictive models, and enable precision medicine by providing guidance for treatment to individual patients. Leveraging the compute expertise of DOE in high performance computing (HPC) and new methods for deep learning in artificial intelligence, this HPC+AI approach aims to create a single scalable deep neural network code called CANDLE (CANcer Distributed Learning Environment) that will be used to address all three challenges. This talk aims to give an overview of the project and highlight how GPU accelerated systems in the DOE ecosystem, Summit and Sierra, have contributed to the project.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81033
Streaming:
Share:
 
Abstract:
This talk presents a groundbreaking multi-factor genomic, phenotypic & clinical data association platform and its use in building accurate disease risk models and clinical decision support tools.
 
Topics:
Computational Biology & Chemistry, Intelligent Machines, IoT & Robotics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23018
Download:
Share:
 
Abstract:
Across the Mediterranean basins, the Messinian salinity crisis resulted in the deposition of up to 2 km thick multi-layered evaporitic succession consisting of alternating layers of halite and clastics. Such geological objects obscure seismic im ...Read More
Abstract:

Across the Mediterranean basins, the Messinian salinity crisis resulted in the deposition of up to 2 km thick multi-layered evaporitic succession consisting of alternating layers of halite and clastics. Such geological objects obscure seismic imaging and may even be over pressurized posing potential drilling hazards, which are often hard to predict. We demonstrate TPDOT&TWSM approach developed in IPGG SB RAS by example of evaluating the interference wavefields wave fragment into the shadow zone for real geological case from the Levant Basin, offshore Israel. Using of GPUs allowed accelerating TWSM algorithm based on multiple large size matrix-vector operations in hundreds and more times.

  Back
 
Topics:
Other, Other
Type:
Poster
Event:
GTC Europe
Year:
2017
Session ID:
P23046
Download:
Share:
 
Abstract:
Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations of automotive aerodynamics. Highly-resolved, transient CFD simulations based on pure CPU systems are computationally expensive and cons ...Read More
Abstract:

Learn how GPU-based Computational Fluid Dynamics (CFD) paves the way for affordable high-fidelity simulations of automotive aerodynamics. Highly-resolved, transient CFD simulations based on pure CPU systems are computationally expensive and constrained by available computational resources. This was posing a big challenge for automotive OEMs in their aerodynamic design process over many years. To overcome this problem, we present ultraFluidX, a novel CFD solver that was specifically designed to leverage the massively parallel architecture of GPUs. With its multi-GPU implementation based on CUDA-aware MPI, the tool can achieve turnaround times of just a few hours for simulations of fully detailed production-level passenger and heavy-duty vehicles a breakthrough for simulation-based design.

  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23327
Download:
Share:
 
Abstract:
Come and learn about new fast low-rank matrix computations on GPUs! By exploiting the low-rank off-diagonal block structure, we design and implement fast linear algebra operations on massively parallel hardware architectures. The main idea is to ...Read More
Abstract:

Come and learn about new fast low-rank matrix computations on GPUs! By exploiting the low-rank off-diagonal block structure, we design and implement fast linear algebra operations on massively parallel hardware architectures. The main idea is to refactor the numerical algorithms and the corresponding implementations by aggregating similar numerical operations in terms of highly optimized batched kernels. Applications in weather prediction, seismic imaging and material science are employed to assess the trade-off between numerical accuracy and parallel performance of these fast matrix computations compared to more traditional approaches..

  Back
 
Topics:
Algorithms & Numerical Techniques, Tools & Libraries, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23367
Download:
Share:
 
Abstract:
The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind ...Read More
Abstract:

The attendees can learn about how the behavior of Human Brain is simulated by using current computers, and the different challenges which the implementation has to deal with. We cover the main steps of the simulation and the methodologies behind this simulation. In particular we highlight and focus on those transformations and optimizations carried out to achieve a good performance on NVIDIA GPUs.

  Back
 
Topics:
Algorithms & Numerical Techniques, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23076
Download:
Share:
 
Abstract:
As embedded software in intelligent vehicles becomes more and more complex, it becomes critical to automakers and suppliers to use advanced and efficient software solutions. Learn how to dramatically reduces development cycles and how to simplif ...Read More
Abstract:

As embedded software in intelligent vehicles becomes more and more complex, it becomes critical to automakers and suppliers to use advanced and efficient software solutions. Learn how to dramatically reduces development cycles and how to simplify the deployment of critical real-time applications on embedded targets. In this presentation we will show how RTMaps embedded facilitates porting design from early prototyping stages on PCs down to the most recent ECUs designed for production. RTMaps is a component based software which facilitates the design and the execution of ADAS and HAD applications. It offers an easy-to use drag-and-drop approach for GPU-based computer-vision and AI systems, including an integration of the NVIDIA DriveWorks software modules as independent building-block.

  Back
 
Topics:
Autonomous Vehicles, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23106
Download:
Share:
 
Abstract:
The goal of the session is to deep dive into key technical building blocks of interactive Computer Aided Engineering (CAE) and to understand along specific prototypes how GPU computing will impact it. Considering the example of interactive desig ...Read More
Abstract:

The goal of the session is to deep dive into key technical building blocks of interactive Computer Aided Engineering (CAE) and to understand along specific prototypes how GPU computing will impact it. Considering the example of interactive design assistants, we will explain the ingredients of future GPU-based simulation codes: (i) multi-level voxel geometry representation from integration to finite elements, (ii) Indirect (weak) realization of boundary conditions, (iii) (non-linear) geometric multi-grid methods. By streamlining all algorithms with respects to GPU, state-of-the-art industrial solutions are outperformed by orders of magnitude in computational efficiency yet conserving accuracy. This is shown along a few prototypes towards the vision of a virtual maker space.

  Back
 
Topics:
Algorithms & Numerical Techniques, Computer Aided Engineering, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23113
Download:
Share:
 
Abstract:
Polymatica is an OLAP and Data Mining server with hybrid CPU+GPU architecture which turns any analytical work on billions-records data volumes into a proactive process with no waitings. Polymatica architecture uses NVIDIA Multi-GPU (i.e. in DGX- ...Read More
Abstract:

Polymatica is an OLAP and Data Mining server with hybrid CPU+GPU architecture which turns any analytical work on billions-records data volumes into a proactive process with no waitings. Polymatica architecture uses NVIDIA Multi-GPU (i.e. in DGX-1) in critical operations with billions of raw business data records. This allows to eliminate pauses and accelerate the speed of analytical operations for up to hundred times. You'll see the performance difference on the example of the real analytical process in retail on different hardware: 1) CPU-only calculations on 2*Intel Xeon, no GPU; 2) 2*Intel Xeon + single Tesla P100; 3) DGX-1: 2*Intel Xeon + 8*Tesla P100. Polymatica on DGX-1 become the fastest OLAP and Data Mining engine allowing advanced analytics on datasets of billions of records.

  Back
 
Topics:
Accelerated Data Science, Algorithms & Numerical Techniques, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23164
Download:
Share:
 
Abstract:
Come and learn how the grand challenge of controlling adaptive optics systems on future Extremely Large Telescopes is being solved using GPUs. As part of Green Flash, an international EU funded joint industrial and academic project, our team is ...Read More
Abstract:

Come and learn how the grand challenge of controlling adaptive optics systems on future Extremely Large Telescopes is being solved using GPUs. As part of Green Flash, an international EU funded joint industrial and academic project, our team is developing solutions based on GPUs for the real-time control of large optical systems operating under tough operating environments. This includes the hard real-time data pipeline, the soft real-time supervisor module as well as a real-time capable numerical simulation to test and verify the proposed solutions. We will discuss how the unprecedented memory bandwidth provided by HBM2 on the new Pascal architecture is changing the game in dimensioning these complex real-time computers crunching up to 200 Gb/s of noisy data.

  Back
 
Topics:
Astronomy & Astrophysics, Computer Vision, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23171
Download:
Share:
 
Abstract:
New high order, high resolution hybrid MPI-CUDA codes for the simulation of turbulent flows on many distributed GPUs will be presented.
 
Topics:
Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23179
Download:
Share:
 
Abstract:
Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU ...Read More
Abstract:

Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequential code, we guide the audience through the different steps that allowed us to increase performances on a GPU around 149 times the original runtime of the code (performances evaluated on a K20Xm). The same optimization strategy is applied to the CPU code and increases performances around 35 times the original run time (performances evaluated on a E5-1650v3 processor). Based on this methodology, we finally end up with an optimized unified version of the code which can run simultaneously on both GPU and CPU architectures.

  Back
 
Topics:
Algorithms & Numerical Techniques, Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23191
Download:
Share:
 
Abstract:
This talk will give a guideline of what it takes to create the next level of immersive entertainment. VR location based entertainment lives at the forefront of this exciting new medium. Technologies from laser-scanning, photogrammetry, volumetric cap ...Read More
Abstract:
This talk will give a guideline of what it takes to create the next level of immersive entertainment. VR location based entertainment lives at the forefront of this exciting new medium. Technologies from laser-scanning, photogrammetry, volumetric capture and others will be showcased to give an understanding of how we can achieve photo-real results. Leif has a 20 year working history in visual effects and has been a long time virtual reality enthusiast since the 90s. His talk will show first hand examples of these technologies and how they can be applied. Leif will outline how modern GPU´s help to create the next level of immersion. Recently finished VR projects for Audi as well as the location based multiplayer platform HOLOGATE will serve as examples of how this technology can be applied.  Back
 
Topics:
Gaming and AI, HPC and AI, Real-Time Graphics
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23229
Download:
Share:
 
Abstract:
Learn how we explore the feasibility of porting YALES2 on GPU. YALES2 is an HPC application for turbulent combustion modeling from primary atomization to pollutant prediction on massive complex meshes. It runs over thousands of CPU cores solving seve ...Read More
Abstract:
Learn how we explore the feasibility of porting YALES2 on GPU. YALES2 is an HPC application for turbulent combustion modeling from primary atomization to pollutant prediction on massive complex meshes. It runs over thousands of CPU cores solving several billions element meshes through MPI+OpenMP programming. The work presented here is focusing on a preliminary feasibility study of GPU porting. In this session we will describe: a methodology for porting a large code to GPU; the choices that have been made regarding the different constraints; the performance results. We will also present the final benchmarks run across several platforms form classic Intel+Kepler cluster at ROMEO HPC Center (University of Reims, France) to prototypes with IBM Power8+Pascal at IDRIS (CNRS, France).  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23254
Download:
Share:
 
Abstract:
Learn how large requests on big datasets, like production or finance data, can benefit from hybrid engine approaches for calculating on in-memory databases. While hybrid architectures are state-of-the-art in specialized calculation scenarios (e. ...Read More
Abstract:

Learn how large requests on big datasets, like production or finance data, can benefit from hybrid engine approaches for calculating on in-memory databases. While hybrid architectures are state-of-the-art in specialized calculation scenarios (e.g., linear algebra), multi-GPU or even multicore usage in database servers is still far from everyday use. In general, the approach to handle requests on large datasets would be scaling the database resources by adding new hardware nodes to the compute cluster. We use intelligent request planning and load balancing to distribute the calculations to multi-GPU and multicore engines in one node. These calculation engines are specifically designed for handling hundreds of millions of cells in parallel with minimal merging overhead.

  Back
 
Topics:
Accelerated Data Science, Performance Optimization, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23294
Download:
Share:
 
Abstract:
This session will show how we combine high performance GPU processing with Deep Learning (DL). We use automated tomographic imaging microscopes for various studies in physics and biology. These systems have raw data flow of up to 2GB/s, making r ...Read More
Abstract:

This session will show how we combine high performance GPU processing with Deep Learning (DL). We use automated tomographic imaging microscopes for various studies in physics and biology. These systems have raw data flow of up to 2GB/s, making real-time (RT) data processing mandatory. To make the system more intelligent, an advanced processing pipeline must be incorporated. So far DL inference speed doesnt allow us to apply it to all the data. To address the problem, we are designing a hybrid system, that allows DL usage for high throughput microscopy in RT. Concepts and approaches that we use to design the system will be illustrated with examples from high energy physics and biology.

  Back
 
Topics:
Computer Vision, HPC and AI, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23307
Download:
Share:
 
Abstract:
Online shopping is nothing if not efficient. Walmart together with new Jersey-startup Jet take things a step further, using AI and Deep Learning to optimize their entire E-Commerce business. The first AI application we discuss is Jet’s uni ...Read More
Abstract:

Online shopping is nothing if not efficient. Walmart together with new Jersey-startup Jet take things a step further, using AI and Deep Learning to optimize their entire E-Commerce business. The first AI application we discuss is Jet’s unique smart merchant selection: the platform finds the best merchant and warehouse combination in real time so that the total order cost is as low as possible. Then we show how to efficiently pack fresh and frozen orders with Deep Reinforcement Learning. The value of this approach is not just to find the best boxes and the tightest packing, but also the least amount of coolant and its placement so that the temperature of all items stays within the required limits during shipment.  

  Back
 
Topics:
Algorithms & Numerical Techniques, Other, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23145
Download:
Share:
 
Abstract:
Come join us, and learn how to build a data-centric GPU clusters for artificial intelligence.  We will briefly present the state-of-the-art techniques for distributed Machine Learning, and the special requirements they impose on the GPU clu ...Read More
Abstract:

Come join us, and learn how to build a data-centric GPU clusters for artificial intelligence.  We will briefly present the state-of-the-art techniques for distributed Machine Learning, and the special requirements they impose on the GPU cluster. Additionally, we will present an overview of interconnect technologies used to scale and accelerate distributed Machine Learning.   During the session we will cover RDMA, NVIDIA's GPUDirect RDMA and GPUDirect Asynch as well as in-network-computing and how the use of those technologies enables new level of scalability and performance in large scale deployments in artificial intelligence and high performance computing.    

  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23200
Download:
Share:
 
Abstract:
Calculation of surface normals can be crucial to the process of extracting useful information from point clouds. Surface normals give an estimate of the objects in te scene which might be of importance for more complex algorithms like feature ex ...Read More
Abstract:

Calculation of surface normals can be crucial to the process of extracting useful information from point clouds. Surface normals give an estimate of the objects in te scene which might be of importance for more complex algorithms like feature extraction using machine learning techniques. In this poster, we present our implementation of normal estimation on a GPU and CPU and show results for both platforms. Through our implementation we show that GPU impementations can be up to an order of magnitude faster or more on a rather modest desktop Xeon workstation when compared to a GPU implementation on a Quadro M4000 graphics card. To substantiate our finding we also share profilign information and plots on the distribution of errors in our approach.

  Back
 
Topics:
Algorithms & Numerical Techniques, HD Mapping, HPC and AI
Type:
Poster
Event:
GTC Europe
Year:
2017
Session ID:
P23018
Download:
Share:
 
Abstract:
Deep learning optimization in real world applications is often limited by the lack of valuable data, either due to missing labels or the sparseness of relevant events (e.g. failures, anomalies) in the dataset. We face this problem when we optimi ...Read More
Abstract:

Deep learning optimization in real world applications is often limited by the lack of valuable data, either due to missing labels or the sparseness of relevant events (e.g. failures, anomalies) in the dataset. We face this problem when we optimize dispatching and rerouting decisions in the Swiss railway network, where the recorded data is variable over time and only contains a few valuable events. To overcome this deficiency we use the high computational power of modern GPUs to simulate millions of physically plausible scenarios. We use this artificial data to train our deep reinforcement learning algorithms to find and evaluate novel and optimal dispatching and rerouting strategies.

  Back
 
Topics:
Accelerated Data Science, Other, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23163
Download:
Share:
 
Abstract:
A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in thi ...Read More
Abstract:

A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in this field require scalable compute resources or the use of advance data analytics methods (including deep learning) for processing extreme scale data volumes. GPUs are a key enabling technology and we will thus focus on the opportunities for using these for computing, data analytics and visualisation. GPU-accelerated servers based on POWER processors are here of particular interest due to the tight integration of CPU and GPU using NVLink and the enhanced data transport capabilities.

  Back
 
Topics:
Accelerated Data Science, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23189
Download:
Share:
 
Abstract:
We present our experience of running computationally intensive camera-based perception algorithms on NVIDIA GPUs. Geometric (depth) and semantic (classification) information is fused in the form of semantic stixels, which provide a rich and comp ...Read More
Abstract:

We present our experience of running computationally intensive camera-based perception algorithms on NVIDIA GPUs. Geometric (depth) and semantic (classification) information is fused in the form of semantic stixels, which provide a rich and compact representation of the traffic scene. We present some strategies to reduce the computational complexity of the algorithms. Using synthetic data generated by the SYNTHIA tool, including slanted roads from a simulation of San Francisco city, we evaluate performance latencies and frame rates on a DrivePX2-based platform.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23196
Download:
Share:
 
Abstract:
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name ...Read More
Abstract:

Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.

  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23348
Download:
Share:
 
Abstract:
NVIDIA DGX Systems powered by Volta deliver breakthrough performance for today''s most popular deep learning frameworks. Attend this session to hear from DGX product experts and gain insights that will help researchers, developers, and d ...Read More
Abstract:

NVIDIA DGX Systems powered by Volta deliver breakthrough performance for today''s most popular deep learning frameworks. Attend this session to hear from DGX product experts and gain insights that will help researchers, developers, and data science practitioners accelerate training and iterate faster than ever. Learn (1) best practices for deploying an end-to-end deep learning practice, (2) how the newest DGX systems including DGX Station address the bottlenecks impacting your data science, and (3) how DGX software including optimized deep learning frameworks give your environment a performance advantage over GPU hardware alone.

  Back
 
Topics:
Accelerated Data Science, Computer Vision, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23370
Download:
Share:
 
Abstract:
The WCHG and BDI at the University of Oxford have an established research computing platform for genomics, statistical genetics and structural biology research and I will outline how we are developing this platform to include a significant GPU i ...Read More
Abstract:

The WCHG and BDI at the University of Oxford have an established research computing platform for genomics, statistical genetics and structural biology research and I will outline how we are developing this platform to include a significant GPU infrastructure to support our researchers great wave of enthusiasm for exploring the potential of deep learning and AI for life sciences research. We are deploying a mixture of GPU architectures and deep learning AI frameworks and I will report on our current plans the the initial areas of research in the life sciences that show promise for AI.

  Back
 
Topics:
Computational Biology & Chemistry, HPC and AI
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23451
Download:
Share:
 
Abstract:
컴퓨팅 기술의 미래를 조망하며, NVIDIA가 이에 어떻게 대비를 하고 있는지를 소개합니다 ...Read More
Abstract:
컴퓨팅 기술의 미래를 조망하며, NVIDIA가 이에 어떻게 대비를 하고 있는지를 소개합니다. 또한 데이터 사이언스 및 머신러닝 용으로 설계된 RAPIDS GPU 가속 플랫픔을 통해 GPU가 가속화하지 못한 마지막 영역까지 기술을 확장한 사례에 대해 소개합니다.  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8101
Streaming:
Download:
Share:
 
Abstract:
DGX SATURNV는 엔비디아에서 내부에서 사용하는 딥러닝 학습용 인프라로, 그 구조는DGX POD 백서를 통해서 아키텍쳐가 공개되었고, 오픈소스 NVIDIA Deepops를 통해서 구축 노하우가 공개되었습니다 ...Read More
Abstract:
DGX SATURNV는 엔비디아에서 내부에서 사용하는 딥러닝 학습용 인프라로, 그 구조는DGX POD 백서를 통해서 아키텍쳐가 공개되었고, 오픈소스 NVIDIA Deepops를 통해서 구축 노하우가 공개되었습니다. 본 발표에서는 위의 방법으로 구축된 딥러닝 학습 인프라에서 사용되는 NVIDIA의 최신 기술을 살펴보고, NVIDIA의 최신 기술이 실무에 어떻게 이용될 수 있는지 살펴보도록 하겠습니다.  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8103
Streaming:
Download:
Share:
 
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8110
Streaming:
Download:
Share:
 
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8111
Streaming:
Download:
Share:
 
Abstract:
NVIDIA announced Turing Architecture which harnesses significant advances in deep learning inferencing, professional graphics applications, and PC gaming while releasing CUDA 10 highly optimized for this next-generation architecture ...Read More
Abstract:
NVIDIA announced Turing Architecture which harnesses significant advances in deep learning inferencing, professional graphics applications, and PC gaming while releasing CUDA 10 highly optimized for this next-generation architecture. In this talk, we will introduce their key features including new Tensor Cores; RT Cores; CUDA Graphs; Library Improvement; and new Developer Tools, at the same time, emphasizing their importance for the future of parallel software development.  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8116
Streaming:
Download:
Share:
 
Abstract:
본 세션에서는 효과적인 GPU 기반의 inference platform을 구축할 수 있는 방안을 설명합니다 ...Read More
Abstract:
본 세션에서는 효과적인 GPU 기반의 inference platform을 구축할 수 있는 방안을 설명합니다. Inference에 최적화된Tesla T4의 HW 특징 및 성능 지표를 소개하며, 이를 기반으로 하는 TensorRT5.0의 새롭게 업데이트된 내용을 전달합니다. 마지막으로 TensorRT Inference server를 사용하여 deep 기반의 production deploy cluster를 효과적으로 구축하는 방안을 설명하며, 이를 위해 client SDK sample code 및 Kubernetes 연동에 대한 내용도 설명합니다.  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8120
Streaming:
Download:
Share:
 
Abstract:
NVIDIA가 업계 최초의 Virtual GPU 솔루션을 시장에 소개한 지 5년이 지났습니다 ...Read More
Abstract:
NVIDIA가 업계 최초의 Virtual GPU 솔루션을 시장에 소개한 지 5년이 지났습니다. 꾸준한 개발 및 연구를 통해 GPU를 가상화하고 공유하여 사용하는 혁신적인 개념은 이제 GPU 가속이 필요한 모든 분야에 적용할 수 있습니다. Virtual GPU를 통해 AI/DL 업무를 진정한 의미의 가상화 환경으로 진화시킬 수 있는 방안을 소개하고자 합니다.  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and AI
Type:
Talk
Event:
AI Conference Korea
Year:
2018
Session ID:
SKR8123
Share: