SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Abstract:
There are numerous problems which have been exposed by the creation of AI models due to the total capability of the current generation of GPUs to create and run a large volume of models, and we are going to show people how to fix them. The exponential compute growth which has occurred in this area has opened the doors to creating and testing hundreds or thousands more models than the, one-by-one approach which was performed in the past. These models use and generate data from both batch and real-time sources. As data becomes enriched, and parameters tuned and explored, there is a need for versioning everything, including the data. Issues found here are similar to other software engineering problems, but new approaches must be taken to create solutions given the complexity of the problems with the inclusion of vast amounts of data. We will discuss the very specific problems and approaches to fix them.
There are numerous problems which have been exposed by the creation of AI models due to the total capability of the current generation of GPUs to create and run a large volume of models, and we are going to show people how to fix them. The exponential compute growth which has occurred in this area has opened the doors to creating and testing hundreds or thousands more models than the, one-by-one approach which was performed in the past. These models use and generate data from both batch and real-time sources. As data becomes enriched, and parameters tuned and explored, there is a need for versioning everything, including the data. Issues found here are similar to other software engineering problems, but new approaches must be taken to create solutions given the complexity of the problems with the inclusion of vast amounts of data. We will discuss the very specific problems and approaches to fix them.  Back
 
Topics:
AI Application Deployment and Inference, Data Center and Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9683
Streaming:
Download:
Share:
 
Abstract:

The audience will learn examples and common practices for using Kubernetes to leverage NVIDIA GPU computing power when building DL models. We use a converged data platform to serve as data infrastructure, providing distributed file system and key-value storage and streams. Kubernetes is an orchestration layer that manages containers to scale out the training and deployment of DL models using heterogeneous GPU clusters. We also leverage the ability to publish and subscribe to streams on the platform to build next-gen applications with DL models, and monitor the model performance and shift of feature distributions.

The audience will learn examples and common practices for using Kubernetes to leverage NVIDIA GPU computing power when building DL models. We use a converged data platform to serve as data infrastructure, providing distributed file system and key-value storage and streams. Kubernetes is an orchestration layer that manages containers to scale out the training and deployment of DL models using heterogeneous GPU clusters. We also leverage the ability to publish and subscribe to streams on the platform to build next-gen applications with DL models, and monitor the model performance and shift of feature distributions.

  Back
 
Topics:
Deep Learning and AI, Data Center and Cloud Computing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8138
Streaming:
Download:
Share: