SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

AI Application Deployment and Inference
Presentation
Media
Maximizing Utilization for Data Center Inference with TensorRT Inference Server
Abstract:
As the use of AI has increased, so has the need for a production-quality AI inference solution. We'll discuss the latest additions to NVIDIA's TensorRT Inference Server and describe deployment examples to help plan your data center production inference architecture. NVIDIA TensorRT Inference Server makes it possible to efficiently leverage inference in applications and to do so without reinventing the wheel. We'll talk about how TensorRT supports the top AI frameworks and custom backends, and maximizes utilization by hosting multiple models per GPU and across GPUs with dynamic request batching. Our talk will also cover how the inference server seamlessly supports Kubernetes with health and latency metrics and integrates with Kubeflow for simplified deployment. 
 
Topics:
AI Application Deployment and Inference, Data Center and Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9438
Streaming:
Download:
Share: