In this talk, we will introduce NVIDIA VisionWorks toolkit, a software development package for computer vision (CV) and image processing. VisionWorks(TM) implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling computer vision applications on a scalable and flexible platform. VisionWorks implements a thread-safe API and framework for seamlessly adding user defined primitives. The talk will give an overview of the VisionWorks toolkit, OpenVX API and framework, VisionWorks-plus modules including VisionWorks Structure From Motion and Object Tracker modules, and computer vision pipeline samples showing integration of the library API into a computer vision pipeline on Tegra platforms.Back
The new NVIDIA® CUDA® Toolkit 8 presents major improvements to the memory model, profiling tools, and new libraries. This enables you to improve performance, simplify memory usage, profile and debug your application more efficiently.Back
Learn how updates to the CUDA toolkit improve the performance of GPU-accelerated applications. Through benchmark results, we will review the impact of new libraries, updates to memory management and mixed precision programming. The session will cover performance of CUDA toolkit components including libraries and the compiler.Back
This talk will provide an overview of new debugging and profiling features added in the CUDA 8.0 Toolkit.Back
The new CUDA Toolkit 8 includes support for Pascal GPUs, up to 2TB of Unified Memory and new automated critical path analysis for effortless performance optimization. This is the most powerful and easy version of the CUDA Toolkit to date.Back
New to CUDA? Join this free foundational webinar on Wednesday, June 8 to gain essential programming knowledge.
Even those with some CUDA experience can benefit by refreshing the key concepts required for future optimization tutorials.
The course begins with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model, fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy.Back
Join this session to learn how to use GPUs and CUDA programming to achieve order-of-magnitude speedup even for large codes that are more complex than tutorial examples. We'll cover our multi-year effort on heterogeneous CPU-GPU accelerating for the GROMACS package for molecular dynamics simulations on a wide range of architectures. We'll introduce new results where CUDA has made it possible to accelerate the costly 3D image reconstruction used in single-particle cryo-electron microscopy (cryo-EM) by 20-200X. You'll learn how you can use these tools in your application work, and what strategies to pursue to accelerate difficult codes where neither libraries nor directives use useful, and even moving computational kernels to CUDA seems to fail.Back
Learn how to deploy GPU clustering at scale by integrating Chelsio's 40GbE iWARP (RDMA/TCP) into your GPU applications. GPUs have demonstrated paradigm-shifting performance in a wide variety of applications. But there remain network infrastructure challenges to the adoption of GPUs operating at scale, especially in large-scale cloud environments. We present 40GbE iWARP, which leverages existing Ethernet infrastructure and requires no new protocols, interoperability, or long maturity period as the no-risk path for Ethernet-based, large-scale GPU clustering. The session provides a technical overview of 40GbE iWARP, including best practices and tuning for GPU applications.Back
This session will provide an step-by-step walk through of new features added in NVIDIA Visual Profiler and nvprof. It will show how these profiling tools can be used to identify optimization opportunities at the application, kernel, and source-line levels.Back