We'll do a dive deep into best practices and real world examples of leveraging the power and flexibility of local GPU workstations, such as the DGX Station, to rapidly develop and prototype deep learning applications. This journey will take you from experimenting and iterating fast and often, to obtaining a trained model, to eventually deploying scale-out GPU inference servers in a datacenter. Tools available, that will be explained, are NGC (NVIDIA GPU Cloud), TensorRT, TensorRT Inference Server, and YAIS. NGC is a cloud registry of docker images, TensorRT is an inference optimizing compiler, TensorRT Inference Server is a containerized microservice that maximizes GPU utilization and runs multiple models from different frameworks concurrently on a node, and YAIS is a C++ library for developing compute intensive asynchronous microservices using gRPC. This session will consist of a lecture, live demos, and detailed instructions about different inference compute options, pre- and post-processing considerations, inference serving options, monitoring considerations, and scalable workload orchestration using Kubernetes.