Learn how GPU Coder produces high-performance CUDA code automatically from a high-level algorithm description in MATLAB. Write your deep learning application with the expressive power of MATLAB, which allows you to describe not just the use of your trained deep learning model in inference mode but also perform data-augmentation and post-processing of the results to create a complete deployment-ready application. GPU Coder can then generate optimized inference code for the whole application. The deep learning inference model is compiled down to TensorRT, while the rest of the application logic is parallelized through creation of CUDA kernels and integration with other CUDA optimized libraries like cuBLAS, cuFFT, etc. The generated code can be cross-compiled to any NVIDIA GPU device that supports TensorRT. This allows engineers and scientists to unlock the expressive ease-of-use of the MATLAB programming language while unleashing deep learning performance by leveraging TensorRT.
Learn how to design, develop, and deploy computer vision and deep learning automotive applications on to GPUs, whether on your desktop, a cluster, or on embedded Tegra platforms, including Jetson TK1/TX1/TX2 and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB. Next, those networks are trained using MATLAB's GPU and parallel computing support either on the desktop, a local compute cluster, or in the cloud. Finally, a new compiler (released in September 2017) auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. We present benchmarks that show the superior performance of the auto-generated CUDA code (~7x faster than TensorFlow).
Learn how to adopt a MATLAB-centric workflow to design, develop, and deploy computer vision and deep learning applications on to GPUs whether on your desktop, a cluster, or on embedded Tegra platforms. The workflow starts with algorithm design in MATLAB. The deep learning network is defined in MATLAB and is trained using MATLAB's GPU and parallel computing support. Then, the trained network is augmented with traditional computer vision techniques and the application can be verified in MATLAB. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which can be cross-compiled to Tegra. Performance benchmark for Alexnet inference shows that the auto-generated CUDA code is ~2.5x faster than mxNet, ~5x faster than Caffe2 and is ~7x faster than TensorFlow.
Large datasets of imaging and genomic data have become available for research into the correlation between genome and brain structure for Alzheimer's disease. We'll present a GPU-enabled tool that permits interactive correlation between the attributes of the MRI voxels and single nucleotide polymorphisms in DNA sequences of Alzheimer's patients. The system runs on a desktop PC and is several orders of magnitude faster than the Matlab version.