We'll do a dive deep into best practices and real world examples of leveraging the power and flexibility of local GPU workstations, such as the DGX Station, to rapidly develop and prototype deep learning applications. This journey will take you from experimenting and iterating fast and often, to obtaining a trained model, to eventually deploying scale-out GPU inference servers in a datacenter. Tools available, that will be explained, are NGC (NVIDIA GPU Cloud), TensorRT, TensorRT Inference Server, and YAIS. NGC is a cloud registry of docker images, TensorRT is an inference optimizing compiler, TensorRT Inference Server is a containerized microservice that maximizes GPU utilization and runs multiple models from different frameworks concurrently on a node, and YAIS is a C++ library for developing compute intensive asynchronous microservices using gRPC. This session will consist of a lecture, live demos, and detailed instructions about different inference compute options, pre- and post-processing considerations, inference serving options, monitoring considerations, and scalable workload orchestration using Kubernetes.
Building upon the foundational understanding of how deep learning is applied to image classification, this lab explores different approaches to the more challenging problem of detecting if an object of interest is present within an image and recognizing its precise location within the image. Numerous approaches have been proposed for training deep neural networks for this task, each having pros and cons in relation to model training time, model accuracy and speed of detection during deployment. On completion of this lab, you will understand each approach and their relative merits. You'll receive hands-on training applying cutting edge object detection networks trained using NVIDIA DIGITS on a challenging real-world dataset.
In this lab you will test three different approaches to deploying a trained DNN for inference. The first approach is to directly use inference functionality within a deep learning framework, in this case DIGITS and Caffe. The second approach is to integrate inference within a custom application by using a deep learning framework API, again using Caffe but this time through it's Python API. The final approach is to use the NVIDIA GPU Inference Engine (GIE) which will automatically create an optimized inference run-time from a trained Caffe model and network description file. You will learn about the role of batch size in inference performance as well as various optimizations that can be made in the inference process. You'll also explore inference for a variety of different DNN architecture
Cray Cluster Systems have long been used to support Supercomputing and Scientific Applications. In this talk we'll demonstrate how these same systems can be easily configured to support Docker and subsequently various Machine Learning Software Packages ? including NVIDIA's Digits Software. Additionally, these systems can be configured in such a way that their Docker containers can be configured to pull data from Cray's Sonexion Scale-out Lustre Storage System. With this configuration our systems can have maximum application flexibility through docker as well as simultaneously being able to support the high performance storage requirements of many types of machine learning workloads through a connection with our Lustre ecosystem.
The distributed shared-memory implementation of the coupled-cluster singles and doubles with perturbative triples algorithm, CCSD(T), in the GAMESS chemistry package was ported to the GPU using the directive-based OpenACC standard. The focus of this port was to achieve maximum strong-scaling performance for small molecular systems (