Please join Jonathan Cohen supported by other members of the NVIDIA engineering team responsible for the new high performance libraries which are part of the CUDA 6 Toolkit. In this webinar, the team will present the latest performance improvements and give attendees a chance to ask questions and even make suggestions for future enhancements - a must attend webinar for any serious GPU Computing Developer. CUDA 6.0 Production is now available download : www.nvidia.com/getcuda
NVIDIA has been developing a library of high-performance parallel sparse iterative linear solvers, with an emphasis on multilevel and multigrid methods. In this presentation, I will provide an overview of the library''s design and outline many of the challenges we have faced in balancing numerical behavior against parallel scalability. Our library has been integrated into ANSYS Fluent 14.5, and will be released as a fully supported feature in the upcoming Fluent 15. I will describe the collaboration between ANSYS and NVIDIA, and present benchmarking results across a variety of test problems from CFD and other fields. Finally, I will talk about our future plans and discuss some of the open research problems in the area of algebraic multigrid on massively parallel processors.
Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map highly divergent and control flow-heavy code to a SIMD architecture. By transforming complex serial flow of control into a sequence of communicating sequential processors running in parallel, we are able to achieve high throughput on very branchy code, while maintaining memory coherence and avoiding execution divergence. I will present initial results from NVIDIA''s internal "nvbio" project to develop efficient computational building blocks for analysis of Next-Generation Sequencing data, with a focus on implementations of BWA and Bowtie2-type aligners.
The goal of this session is to compare the performance of graph matching and graph coloring algorithms on massively parallel devices such as GPUs. We present novel algorithms, which produce superior results for certain graphs and also discuss the techniques used to efficiently implement these algorithms on the GPU.
I will describe tricks for building APIs using C++ metaprogramming that generate custom kernels for complex manipulation of device-side arrays in CUDA. Using a variation of Expression Templates, multiple operations can be fused into a single kernel that executes with reasonable efficiency.