SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll present a fast, highly accurate, and customizable object-detection network optimized for training and inference on GPUs. After describing the network architecture, we'll dive into how different stages of training workflow are accelerated. Our techniques include data ingestion and augmentation, mixed precision, and multi-GPU training. We'll demonstrate how we optimized our network for deployment without loss of accuracy using ONNX and NVIDIA TensorRT. We'll also show how to create TensorRT plugins for post-processing to perform inference entirely on the GPU. This session will be a combination of lecture and demos.
We'll present a fast, highly accurate, and customizable object-detection network optimized for training and inference on GPUs. After describing the network architecture, we'll dive into how different stages of training workflow are accelerated. Our techniques include data ingestion and augmentation, mixed precision, and multi-GPU training. We'll demonstrate how we optimized our network for deployment without loss of accuracy using ONNX and NVIDIA TensorRT. We'll also show how to create TensorRT plugins for post-processing to perform inference entirely on the GPU. This session will be a combination of lecture and demos.  Back
 
Topics:
AI Application, Deployment & Inference, Deep Learning & AI Frameworks, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9243
Streaming:
Download:
Share:
 
Abstract:
Learn how you can utilize TensorRT and NVIDIA Docker to quickly configure and deploy a GPU-accelerated inference server and start gaining insights from your trained deep neural network (DNN) models. TensorRT is a high-performance tool for low-latency, high-throughput DNN inference. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, which allows TensorRT to support and optimize DNN models trained in multiple deep learning frameworks. We'll leverage the TensorRT Python API to create a lightweight Python Flask application capable of serving multiple DNN models trained using TensorFlow, PyTorch, and Caffe, and also discuss how to containerize this inference service using NVIDIA Docker for ease of deployment at scale. This session will consist of a lecture, live demos, and detailed instructions.
Learn how you can utilize TensorRT and NVIDIA Docker to quickly configure and deploy a GPU-accelerated inference server and start gaining insights from your trained deep neural network (DNN) models. TensorRT is a high-performance tool for low-latency, high-throughput DNN inference. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, which allows TensorRT to support and optimize DNN models trained in multiple deep learning frameworks. We'll leverage the TensorRT Python API to create a lightweight Python Flask application capable of serving multiple DNN models trained using TensorFlow, PyTorch, and Caffe, and also discuss how to containerize this inference service using NVIDIA Docker for ease of deployment at scale. This session will consist of a lecture, live demos, and detailed instructions.  Back
 
Topics:
AI Application, Deployment & Inference, Tools & Libraries, Data Center & Cloud Infrastructure
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8495
Streaming:
Download:
Share:
 
Abstract:
Come learn how you can optimize the deployment of your trained neural networks using the GPU-accelerated inferencing library called TensorRT. TensorRT is a high-performance tool for low-latency, high-throughput deep neural network (DNN) inference that runs on NVIDIA GPUs. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, allowing TensorRT to support and optimize DNN models trained in multiple deep learning frameworks like Caffe and TensorFlow. It also provides the capability to run inference at reduced precision, giving developers the ability to take advantage of new GPU hardware features like the Volta Tensor Core architecture. This session will be a combination of lecture and live demos.
Come learn how you can optimize the deployment of your trained neural networks using the GPU-accelerated inferencing library called TensorRT. TensorRT is a high-performance tool for low-latency, high-throughput deep neural network (DNN) inference that runs on NVIDIA GPUs. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, allowing TensorRT to support and optimize DNN models trained in multiple deep learning frameworks like Caffe and TensorFlow. It also provides the capability to run inference at reduced precision, giving developers the ability to take advantage of new GPU hardware features like the Volta Tensor Core architecture. This session will be a combination of lecture and live demos.  Back
 
Topics:
AI Application, Deployment & Inference, Tools & Libraries, Performance Optimization, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8496
Streaming:
Share: