SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Aerospace and Defense
Presentation
Media
XMP: An NVIDIA CUDA?-Accelerated Big Integer Library
Justin Luitjens (NVIDIA)
We'll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hell ...Read More
We'll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hellman key exchange. We'll focus on what the capabilities of the library are along with how to efficiently use the library.  Back
 
Keywords:
Aerospace and Defense, Tools and Libraries, GTC 2016 - ID S6151
Streaming:
Download:
Algorithms
Presentation
Media
Testing Chordal Graphs with CUDA?
Agnieszka Lupinska (Jagiellonian University)
We'll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two ...Read More
We'll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two non-adjacent vertices on the cycle. In total, the algorithm takes O(N) time on N-threads grid and it performs O(N+M) work for graphs of N vertices and M edges. We'll compare the performance tests results achieved by the CUDA implementation on NVIDIA GeForce GTX TITAN X and the sequential implementation on CPU with four cores (eight threads). We'll present the tests results for cliques, sparse graphs, dense graphs, and random chordal graphs.  Back
 
Keywords:
Algorithms, Big Data Analytics, GTC 2016 - ID S6489
Streaming:
Download:
Astronomy and Astrophysics
Presentation
Media
A CUDA?-Based 3D Kinetic Model for Space Plasma Physics
Shahab Fatemi (University of California, Berkeley), Andrew R.Poppe (University of California, Berkeley)
We've developed the first three-dimensional, self-consistent kinetic plasma model that runs on NVIDIA GPUs using CUDA. The model self-consistently solves the charged-particles motion and their associated electromagnetic fields. We use this model t ...Read More
We've developed the first three-dimensional, self-consistent kinetic plasma model that runs on NVIDIA GPUs using CUDA. The model self-consistently solves the charged-particles motion and their associated electromagnetic fields. We use this model to explore the microphysics of plasma interactions with solar system objects, to understand fundamental kinetic processes of plasma, and to meet NASA's requirements for planetary and space exploration.  Back
 
Keywords:
Astronomy and Astrophysics, Algorithms, Computational Physics, GTC 2016 - ID S6265
Streaming:
Download:
Big Data Analytics
Presentation
Media
Unblock Performance Limit of DNN by CUDA? in R
Patric Zhao (NVIDIA)
You'll learn technical solutions to accelerate R by CUDA. R's DNN has become a very popular approach in statistical analysis areas. Even though there are several DNN packages in R, it is rarely used in big data and deep neural networking becaus ...Read More
You'll learn technical solutions to accelerate R by CUDA. R's DNN has become a very popular approach in statistical analysis areas. Even though there are several DNN packages in R, it is rarely used in big data and deep neural networking because the single core performance of R is limited and the current design of DNN packages in R is not GPU-friendly. Firstly, we'll introduce how we apply specific patterns, such as general matrix multiplication (GEMM), to DNN in R, which is a GPU-friendly pattern and can be easily accelerated by cuBLAS. Secondly, we'll show the tradeoff between performance and memory usage in R for DNN. Finally, we'll package all of these CUDA approaches into a R package and publish to CRAN so than anyone can install it in R quickly, and get significant performance improvement from NVIDIA GPUs.  Back
 
Keywords:
Big Data Analytics, Deep Learning and AI, Performance Optimization, GTC 2016 - ID S6156
Streaming:
Download:
 
Data Analytics and Machine Learning at Your Finger Tips - No CUDA Required
Bryan Thompson (Blazegraph), James Lewis (Blazegraph)
Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) a ...Read More
Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) and graph pattern matching (SPARQL) that provide speedups of up to 1,000x over Spark native and up to 300x over leading graph databases when executed on the BlazeGraph platform. These high-level languages are translated into task graphs that expose the available parallelism. The mapgraph runtime evaluates the task graphs and provides a scalable architecture on GPUs and GPU clusters. This presentation discusses the concepts for graph algorithms and queries, the mapgraph architecture, and how algorithms are evaluated on a GPU cluster.  Back
 
Keywords:
Big Data Analytics, Deep Learning and AI, Aerospace and Defense, GTC 2016 - ID S6267
Streaming:
Download:
Computational Biology
Presentation
Media
Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs
Bertil Schmidt (JGU Mainz), Christian Hundt (University Mainz)
Learn how to efficiently parallelize gene set enrichment analysis (GSEA) using CUDA. GSEA is an important bioinformatics method that determines whether given sets of genes are statistically overrepresented between two phenotypes. The GSEA software fr ...Read More
Learn how to efficiently parallelize gene set enrichment analysis (GSEA) using CUDA. GSEA is an important bioinformatics method that determines whether given sets of genes are statistically overrepresented between two phenotypes. The GSEA software from the Broad Institute is the most popular tool to perform such studies with several thousand users. NGS technologies are gradually replacing microarrays for high-throughput gene expression studies. Size and availability of input data sets are increasing, leading to high runtimes of the desktop GSEA application. We present an efficient CUDA parallelization of the core GSEA algorithm. By using a combination of parallelization techniques, we achieve speed-ups of around two orders of magnitude on a single GPU.  Back
 
Keywords:
Computational Biology, GTC 2016 - ID S6164
Streaming:
Download:
Computer Vision and Machine Vision
Presentation
Media
NVIDIA CUDA? for Mobile
Yogesh Kini (NVIDIA)
This session is about a few important use-cases in mobile that can be accelerated using CUDA. Use-cases include image processing, camera output post-processing, using CUDA. Attendees will learn about : [1] Tegra unified memory architecture, which can ...Read More
This session is about a few important use-cases in mobile that can be accelerated using CUDA. Use-cases include image processing, camera output post-processing, using CUDA. Attendees will learn about : [1] Tegra unified memory architecture, which can be utilized by applications to reduce total memory usage and power consumption. [2] CUDA interoperability with EGLImage [3] Use of EGLStreams to setup image processing pipeline using CUDA. [4] Tegra specific enhancements to CUDA-OpenGL(ES) interop  Back
 
Keywords:
Computer Vision and Machine Vision, Tools and Libraries, Video and Image Processing, GTC 2016 - ID S6384
Streaming:
Download:
 
VisionWorks: A CUDA Accelerated Computer Vision Library
Elif Albuz (NVIDIA)
In this talk, we will introduce NVIDIA VisionWorks toolkit, a software development package for computer vision (CV) and image processing. VisionWorks(TM) implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GP ...Read More

In this talk, we will introduce NVIDIA VisionWorks toolkit, a software development package for computer vision (CV) and image processing. VisionWorks(TM) implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling computer vision applications on a scalable and flexible platform. VisionWorks implements a thread-safe API and framework for seamlessly adding user defined primitives. The talk will give an overview of the VisionWorks toolkit, OpenVX API and framework, VisionWorks-plus modules including VisionWorks Structure From Motion and Object Tracker modules, and computer vision pipeline samples showing integration of the library API into a computer vision pipeline on Tegra platforms.

  Back
 
Keywords:
Computer Vision and Machine Vision, Embedded, Self-Driving Cars, Automotive, GTC 2016 - ID S6783
Streaming:
Download:
Data Center and Cloud Computing
Presentation
Media
High-Performance CUDA? Clustering at Cloud Scale: GPUDirect RDMA over 40GBE IWARP
Tom Reu (Chelsio Communications, Inc.)
Learn how to deploy GPU clustering at scale by integrating Chelsio's 40GbE iWARP (RDMA/TCP) into your GPU applications. GPUs have demonstrated paradigm-shifting performance in a wide variety of applications. But there remain network infrastructure ...Read More
Learn how to deploy GPU clustering at scale by integrating Chelsio's 40GbE iWARP (RDMA/TCP) into your GPU applications. GPUs have demonstrated paradigm-shifting performance in a wide variety of applications. But there remain network infrastructure challenges to the adoption of GPUs operating at scale, especially in large-scale cloud environments. We present 40GbE iWARP, which leverages existing Ethernet infrastructure and requires no new protocols, interoperability, or long maturity period as the no-risk path for Ethernet-based, large-scale GPU clustering. The first part of the session is a technical overview of 40GbE iWARP, including best practices and tuning for GPU applications. The second part summarizes benchmark results showing benefits of GPUDirect RDMA using 40GbE iWARP.  Back
 
Keywords:
Data Center and Cloud Computing, Performance Optimization, Tools and Libraries, GTC 2016 - ID S6255
Streaming:
 
Benefits of Remote GPU Virtualization: The rCUDA? Perspective
Federico Silla (Technical University of Valencia)
Many applications use GPUs to accelerate their execution. However, using GPUs presents several side effects, such as increased acquisition and maintenance costs and space requirements. Moreover, these increased costs may not be easily amortized becau ...Read More
Many applications use GPUs to accelerate their execution. However, using GPUs presents several side effects, such as increased acquisition and maintenance costs and space requirements. Moreover, these increased costs may not be easily amortized because GPUs usually present very low utilization rates. In a similar way to virtual machines, the use of virtual GPUs may overcome the concerns associated with the use of real GPU devices. The remote GPU virtualization technique allows an application being executed in a computer not having a GPU to transparently make use of a GPU installed in other node of the cluster. Although the use of remote GPUs may seem to be a senseless idea, it provides several benefits as described in this talk by using the rCUDA (remote CUDA) middleware as a case study.  Back
 
Keywords:
Data Center and Cloud Computing, Tools and Libraries, HPC and Supercomputing, GTC 2016 - ID S6681
Streaming:
Download:
Deep Learning and AI
Presentation
Media
Optimizing Deep Recurrent Neural Networks With Persistent CUDA Kernels
Gregory Diamos (Baidu)
Learn a new technique for mapping Deep Recurrent Neural Networks (RNN) efficiently onto GPUs. We show how it is possible to achieve substantially higher computational throughput at low mini-batch sizes than direct implementations of RNNs based on mat ...Read More
Learn a new technique for mapping Deep Recurrent Neural Networks (RNN) efficiently onto GPUs. We show how it is possible to achieve substantially higher computational throughput at low mini-batch sizes than direct implementations of RNNs based on matrix multiplications. The key to our approach is the use of persistent computational kernels that exploit the GPU's inverted memory hierarchy to reuse network weights over time, and communication between thread blocks using a deadlock-free global barrier. Our initial implementation sustains 3 TFLOP/s at a mini-batch size of 4 on an NVIDIA TitanX GPU. This provides a 16x reduction in activation memory footprint, allows us to train models with 12x more parameters on the same hardware, and allows us to strongly scale RNN training to 32 GPUs.  Back
 
Keywords:
Deep Learning and AI, Performance Optimization, HPC and Supercomputing, GTC 2016 - ID S6673
Streaming:
Download:
Developer - Tools & Libraries
Presentation
Media
What's New in CUDA Toolkit 8
Siddharth Sharma (NVIDIA)
The new NVIDIA® CUDA® Toolkit 8 presents major improvements to the memory model, profiling tools, and new libraries. This enables you to improve performance, simplify memory usage, profile and debug your application more efficiently. ...Read More

The new NVIDIA® CUDA® Toolkit 8 presents major improvements to the memory model, profiling tools, and new libraries. This enables you to improve performance, simplify memory usage, profile and debug your application more efficiently.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC Webinars 2016 - ID GTCE119
Streaming:
Download:
 
CUDA Toolkit 8 Performance Overview
Pramod Ramarao (NVIDIA)
Learn how updates to the CUDA toolkit improve the performance of GPU-accelerated applications. Through benchmark results, we will review the impact of new libraries, updates to memory management and mixed precision programming. The session will ...Read More

Learn how updates to the CUDA toolkit improve the performance of GPU-accelerated applications. Through benchmark results, we will review the impact of new libraries, updates to memory management and mixed precision programming. The session will cover performance of CUDA toolkit components including libraries and the compiler.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC Webinars 2016 - ID GTCE120
Streaming:
Download:
 
New Developer Tools Features in CUDA 8.0
Sanjiv Satoor (NVIDIA)
This talk will provide an overview of new debugging and profiling features added in the CUDA 8.0 Toolkit. ...Read More

This talk will provide an overview of new debugging and profiling features added in the CUDA 8.0 Toolkit.

  Back
 
Keywords:
Developer - Tools & Libraries, HPC and Supercomputing, Supercomputing 2016 - ID SC6115
Streaming:
Download:
 
CUDA 8 Features Overview Webinar
Milind Kukanur (NVIDIA)
The new CUDA Toolkit 8 includes support for Pascal GPUs, up to 2TB of Unified Memory and new automated critical path analysis for effortless performance optimization. This is the most powerful and easy version of the CUDA Toolkit to date. ...Read More

The new CUDA Toolkit 8 includes support for Pascal GPUs, up to 2TB of Unified Memory and new automated critical path analysis for effortless performance optimization. This is the most powerful and easy version of the CUDA Toolkit to date.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC Webinars 2016 - ID GTCE125
Streaming:
Download:
 
Catch Up on CUDA
Chris Mason (Acceleware)
Join this free, foundational webinar on Wednesday, June 8 at 9am PST to gain essential CUDA programming knowledge. The course begins with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model, fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. ...Read More

New to CUDA? Join this free foundational webinar on Wednesday, June 8 to gain essential programming knowledge.

Even those with some CUDA experience can benefit by refreshing the key concepts required for future optimization tutorials.

The course begins with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model, fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. 

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2016 - ID GTCE127
Streaming:
Download:
Finance
Presentation
Media
Quants Coding CUDA? in .NET: Pitfalls and Solutions
Benjamin Eimer (Chatham Financial)
We'll cover some of the lessons we have learned in developing a hybrid GPU/CPU linear algebra library in .NET to accelerate the financial risk and derivative pricing models developed by our quant team. The purpose of this library is to allow our t ...Read More
We'll cover some of the lessons we have learned in developing a hybrid GPU/CPU linear algebra library in .NET to accelerate the financial risk and derivative pricing models developed by our quant team. The purpose of this library is to allow our team to transition to GPU computing incrementally within our extensive .NET codebase. We'll present some of the difficulties encountered when .NET automated garbage collection interacts with low-level memory management and how we addressed them. Solving these problems is essential for running CUDA code as part of a highly available, web service architecture.  Back
 
Keywords:
Finance, Tools and Libraries, Programming Languages, GTC 2016 - ID S6400
Streaming:
Download:
HPC and Supercomputing
Presentation
Media
HPC Application Porting to CUDA? at BSC
Pau Farre (Barcelona Supercomputing Center), Marc Jorda (Barcelona Supercomputing Center)
In this session you will learn the main challenges that we have overcome at the BSC to successfully accelerate two large applications by using CUDA and NVIDIA GPUs: WARIS (a Volcanic Ash Transportation Model) and PELE (a Drug Molecule Interaction Sim ...Read More
In this session you will learn the main challenges that we have overcome at the BSC to successfully accelerate two large applications by using CUDA and NVIDIA GPUs: WARIS (a Volcanic Ash Transportation Model) and PELE (a Drug Molecule Interaction Simulator). We show that leveraging asynchronous execution is key to achieve a high utilization of the GPU resources (even for very small problem sizes) and to overlap CPU and GPU execution. We also explain some techniques to introduce Unified Virtual Memory in your data structures for seamless CPU/GPU data sharing. Our results show an execution time improvement in WARIS of 8.6x for a 4-GPU node compared to a 16-core CPU node (using by-hand AVX vectorization and MPI). Preliminary experiments in PELE already show a 2x speedup.  Back
 
Keywords:
HPC and Supercomputing, Computational Chemistry, Earth Systems Modeling, GTC 2016 - ID S6408
Streaming:
Download:
Healthcare and Life Sciences
Presentation
Media
Enabling 10x Acceleration of Research in Molecular Life Sciences Using CUDA
Erik Lindahl (Professor, Stockholm University)
Join this session to learn how to use GPUs and CUDA programming to achieve order-of-magnitude speedup even for large codes that are more complex than tutorial examples. We'll cover our multi-year effort on heterogeneous CPU-GPU accelerating ...Read More

Join this session to learn how to use GPUs and CUDA programming to achieve order-of-magnitude speedup even for large codes that are more complex than tutorial examples. We'll cover our multi-year effort on heterogeneous CPU-GPU accelerating for the GROMACS package for molecular dynamics simulations on a wide range of architectures. We'll introduce new results where CUDA has made it possible to accelerate the costly 3D image reconstruction used in single-particle cryo-electron microscopy (cryo-EM) by 20-200X. You'll learn how you can use these tools in your application work, and what strategies to pursue to accelerate difficult codes where neither libraries nor directives use useful, and even moving computational kernels to CUDA seems to fail.

  Back
 
Keywords:
Healthcare and Life Sciences, High Performance Computing, GTC Washington D.C. 2016 - ID DCS16141
Streaming:
Performance Optimization
Presentation
Media
NVIDIA CUDA? Optimization with NVIDIA Nsight? Eclipse Edition: A Case Study
Christoph Angerer (NVIDIA), Mathias Wagner (NVIDIA)
We'll present a real CUDA application and use NVIDIA Nsight Eclipse Edition on Linux to optimize the performance of the code. Attendees will learn a method to analyze their codes and how to use the tools to apply those ideas. ...Read More
We'll present a real CUDA application and use NVIDIA Nsight Eclipse Edition on Linux to optimize the performance of the code. Attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Keywords:
Performance Optimization, Tools and Libraries, GTC 2016 - ID S6111
Streaming:
Download:
 
NVIDIA CUDA? Optimization with NVIDIA Nsight? Visual Studio Edition: A Case Study
Christoph Angerer (NVIDIA), Jakob Progosch (NVIDIA)
We'll present a real CUDA application and use NVIDIA Nsight Visual Studio Edition on Windows to optimize the performance of the code. Attendees will learn a method to analyze their codes and how to use the tools to apply those ideas. ...Read More
We'll present a real CUDA application and use NVIDIA Nsight Visual Studio Edition on Windows to optimize the performance of the code. Attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Keywords:
Performance Optimization, Tools and Libraries, GTC 2016 - ID S6112
Download:
 
PerfMon Redux: Analyzing a CUDA? Application With the Windows Performance Monitor
Richard Wilton (Johns Hopkins University)
Learn how to use the Performance Monitor tool ("PerfMon") in Microsoft Windows to do non-invasive real-time visualization of the performance of a CUDA application. This approach lets you aggregate performance data from the host oper ...Read More
Learn how to use the Performance Monitor tool ("PerfMon") in Microsoft Windows to do non-invasive real-time visualization of the performance of a CUDA application. This approach lets you aggregate performance data from the host operating system and hardware along with GPU performance metrics, and makes it possible to examine the interactions between GPU components (CUDA compute and memory activity) and non-GPU components (CPU activity, disk I/O, and host memory) throughout the execution lifetime of a complex CUDA application. Examples will be provided from the performance analysis of a pipelined CUDA application that runs kernels on multiple GPUs and that makes intensive concurrent use of CPU threads and host memory.  Back
 
Keywords:
Performance Optimization, GTC 2016 - ID S6287
Streaming:
Download:
 
Gradually Porting an In-Use Sparse Matrix Library to Use CUDA
Mark Hoemmen (Sandia National Laboratories)
Learn how to port an existing parallel library to use CUDA, even while the library is under constant production use by applications. We did this for the Tpetra parallel sparse linear algebra library. Tpetra provides data structures, computational ker ...Read More
Learn how to port an existing parallel library to use CUDA, even while the library is under constant production use by applications. We did this for the Tpetra parallel sparse linear algebra library. Tpetra provides data structures, computational kernels, and MPI data redistribution for Trilinos' sparse linear solvers. We used Kokkos, an abstraction over different shared-memory parallel programming models, to rewrite Tpetra for CUDA. This, along with careful attention to backwards compatibility, unit testing, and frequent application feedback, let us undertake this rewrite gradually. It also gave both applications and Trilinos' sparse linear solver packages that depend on Tpetra a gradual path to embrace MPI + thread parallelism.  Back
 
Keywords:
Performance Optimization, HPC and Supercomputing, GTC 2016 - ID S6292
Streaming:
 
High-Performance CUDA? Clustering at Cloud Scale: NVIDIA GPUDirect? RDMA Over 40GBE iWARP (Presented by Chelsio Communications)
Tom Reu (Chelsio Communications, Inc)
Learn how to deploy GPU clustering at scale by integrating Chelsio's 40GbE iWARP (RDMA/TCP) into your GPU applications. GPUs have demonstrated paradigm-shifting performance in a wide variety of applications. But there remain network infrastr ...Read More

Learn how to deploy GPU clustering at scale by integrating Chelsio's 40GbE iWARP (RDMA/TCP) into your GPU applications. GPUs have demonstrated paradigm-shifting performance in a wide variety of applications. But there remain network infrastructure challenges to the adoption of GPUs operating at scale, especially in large-scale cloud environments. We present 40GbE iWARP, which leverages existing Ethernet infrastructure and requires no new protocols, interoperability, or long maturity period as the no-risk path for Ethernet-based, large-scale GPU clustering. The session provides a technical overview of 40GbE iWARP, including best practices and tuning for GPU applications.

  Back
 
Keywords:
Performance Optimization, Data Center and Cloud Computing, Tools and Libraries, GTC 2016 - ID S6854
Streaming:
Download:
Programming Languages
Presentation
Media
Featured Presentation: CUDA 8 and Beyond
Mark Harris (NVIDIA)
CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn about new features and performance improvements in CUDA 8. In this talk you'll hear about features and get insight into the philosophy driving the ...Read More
CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn about new features and performance improvements in CUDA 8. In this talk you'll hear about features and get insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.  Back
 
Keywords:
Programming Languages, Tools and Libraries, HPC and Supercomputing, GTC 2016 - ID S6224
Streaming:
Download:
Tools and Libraries
Presentation
Media
Transparent Checkpoint and Restart Technology for CUDA? Applications
Akira Nukada (Tokyo Institute of Technology)
Checkpoint and restart technology is useful for fault tolerance as well as for aggressive job management on large systems. System level checkpoint minimizes application developers' effort required to use it, however CUDA applications are incompati ...Read More
Checkpoint and restart technology is useful for fault tolerance as well as for aggressive job management on large systems. System level checkpoint minimizes application developers' effort required to use it, however CUDA applications are incompatible with major checkpoint implementations such as BLCR. In this talk, we'll present our CUDA-capable checkpoint library called CRCUDA. We are working to reduce its performance overhead.  Back
 
Keywords:
Tools and Libraries, GTC 2016 - ID S6429
Streaming:
Download:
 
Real-Time Visualization of CUDA? Data Using ArrayFire Forge
Brian Kloppenborg (ArrayFire)
We will debut ArrayFire Forge, our new general-purpose data visualization library for GPUs. ArrayFire Forge is a data visualization library that is written specifically for use with GPU-accelerated applications. By using interoperability with OpenGL, ...Read More
We will debut ArrayFire Forge, our new general-purpose data visualization library for GPUs. ArrayFire Forge is a data visualization library that is written specifically for use with GPU-accelerated applications. By using interoperability with OpenGL, Forge enables developers to create real-time, responsive, and stunning visualizations in 2D and 3D. Forge is an open-source project and distributed on GitHub.  Back
 
Keywords:
Tools and Libraries, Graphics Virtualization, Real-Time Graphics, GTC 2016 - ID S6478
Streaming:
Download:
 
CUDA? Debugging Tools in CUDA 8
Vyas Venkataraman (NVIDIA), Kudbudeen Jalaludeen (NVIDIA)
This talk will describe new features in debugging tools in the CUDA 8.0 toolkit. ...Read More
This talk will describe new features in debugging tools in the CUDA 8.0 toolkit.  Back
 
Keywords:
Tools and Libraries, GTC 2016 - ID S6531
Streaming:
Download:
 
Optimizing Application Performance with CUDA? Profiling Tools
Swapna Matwankar (NVIDIA)
This session will provide an step-by-step walk through of new features added in NVIDIA Visual Profiler and nvprof. It will show how these profiling tools can be used to identify optimization opportunities at the application, kernel, and source-l ...Read More

This session will provide an step-by-step walk through of new features added in NVIDIA Visual Profiler and nvprof. It will show how these profiling tools can be used to identify optimization opportunities at the application, kernel, and source-line levels.

  Back
 
Keywords:
Tools and Libraries, Performance Optimization, GTC 2016 - ID S6810
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2017 NVIDIA Corporation Legal Info | Privacy Policy