GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
Learn how to design a deep learning algorithm in MATLAB and deploy to an embedded Tegra platform, including Jetson TK1, TX1, TX2, and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. Algorithms used include deep learning augmented with traditional computer vision. Then, networks are trained using NVIDIA GPUs and parallel computing support in MATLAB either on the desktop, a local compute cluster, or in the cloud. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. Generated code is highly optimized and we present benchmarks that show that performance of generated code is about two-and-a-half times faster than mxNet, about five times faster than Caffe2; about seven times faster than TensorFlow; and is on par with an optimized TensorRT implementation.
Learn how to design a deep learning algorithm in MATLAB and deploy to an embedded Tegra platform, including Jetson TK1, TX1, TX2, and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. Algorithms used include deep learning augmented with traditional computer vision. Then, networks are trained using NVIDIA GPUs and parallel computing support in MATLAB either on the desktop, a local compute cluster, or in the cloud. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. Generated code is highly optimized and we present benchmarks that show that performance of generated code is about two-and-a-half times faster than mxNet, about five times faster than Caffe2; about seven times faster than TensorFlow; and is on par with an optimized TensorRT implementation.  Back
 
Topics:
Computer Vision, Intelligent Machines, IoT & Robotics, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7151
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next