GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Well show how to plan the deployment of AI infrastructure at scale with DGX software using DeepOps, from design and deployment to management and monitoring. DeepOps, an NVIDIA open source project, is used for the deployment and management of DGX POD clusters. Its also used in the deployment of Kubernetes and Slurm, in an on-premise, optionally air-gapped data center. The modularity of the Ansible scripts in DeepOps gives experienced DevOps administrators the flexibility to customize deployment experience based on their specific IT infrastructure requirements, whether that means implementing a high-performance benchmarking cluster, or providing a data science team with Jupyter Notebooks that tap into GPUs. Well also describe how to leverage AI infrastructure at scale to support interactive training, machine learning pipelines, and inference use cases.
Well show how to plan the deployment of AI infrastructure at scale with DGX software using DeepOps, from design and deployment to management and monitoring. DeepOps, an NVIDIA open source project, is used for the deployment and management of DGX POD clusters. Its also used in the deployment of Kubernetes and Slurm, in an on-premise, optionally air-gapped data center. The modularity of the Ansible scripts in DeepOps gives experienced DevOps administrators the flexibility to customize deployment experience based on their specific IT infrastructure requirements, whether that means implementing a high-performance benchmarking cluster, or providing a data science team with Jupyter Notebooks that tap into GPUs. Well also describe how to leverage AI infrastructure at scale to support interactive training, machine learning pipelines, and inference use cases.  Back
 
Topics:
AI & Deep Learning Research, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91207
Download:
Share:
 
Abstract:

Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background?   NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to accelerate AI workflow deployment and time to insight. We'll discuss lessons learned about building, deploying, and managing AI infrastructure at scale from design to deployment to management and monitoring.   We will show how the DGX Pod Management software (DeepOps) along with our storage partner reference-architectures can be used for the deployment and management of multi-node GPU clusters for Deep Learning and HPC environments, in an on-premise, optionally air-gapped datacenter. The modular nature of the software also allows experienced administrators to pick and choose items that may be useful, making the process compatible with their existing software or infrastructure.  

Do you have a GPU cluster or air-gapped environment that you are responsible for but don't have an HPC background?   NVIDIA DGX POD is a new way of thinking about AI infrastructure, combining DGX servers with networking and storage to accelerate AI workflow deployment and time to insight. We'll discuss lessons learned about building, deploying, and managing AI infrastructure at scale from design to deployment to management and monitoring.   We will show how the DGX Pod Management software (DeepOps) along with our storage partner reference-architectures can be used for the deployment and management of multi-node GPU clusters for Deep Learning and HPC environments, in an on-premise, optionally air-gapped datacenter. The modular nature of the software also allows experienced administrators to pick and choose items that may be useful, making the process compatible with their existing software or infrastructure.  

  Back
 
Topics:
Data Center & Cloud Infrastructure, AI Application, Deployment & Inference
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9334
Streaming:
Download:
Share:
 
Abstract:
This tutorial will cover the issues encountered when deploying NVIDIA DGX-1/DGXStation into secure environment. For security reasons, some installations require that systems be isolated from the internet or outside networks. Since most DGX-1 software updates are accomplished through an over-the-network process with NVIDIA servers, this session will walk the participants through how updates can be made by maintaining an intermediary server. This session will be a combination of lecture, live demos and along with detailed instructions.
This tutorial will cover the issues encountered when deploying NVIDIA DGX-1/DGXStation into secure environment. For security reasons, some installations require that systems be isolated from the internet or outside networks. Since most DGX-1 software updates are accomplished through an over-the-network process with NVIDIA servers, this session will walk the participants through how updates can be made by maintaining an intermediary server. This session will be a combination of lecture, live demos and along with detailed instructions.  Back
 
Topics:
AI & Deep Learning Research, Data Center & Cloud Infrastructure
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8568
Streaming:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next