SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Application Design & Porting Techniques
Presentation
Media
An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm
Martin Burtscher (Texas State University)
This session presents a complete CUDA implementation of the irregular Barnes-Hut n-body algorithm. This algorithm repeatedly builds and traverses unbalanced trees, making it difficult to map to GPUs. We explain in detail how our code exploits th ...Read More

This session presents a complete CUDA implementation of the irregular Barnes-Hut n-body algorithm. This algorithm repeatedly builds and traverses unbalanced trees, making it difficult to map to GPUs. We explain in detail how our code exploits the architectural features of GPUs, including lockstep operation and thread divergence, both of which are commonly viewed as hurdles to achieving high performance, especially for irregular codes. On a five million body simulation running on a Tesla C2050, our CUDA implementation is 30 times faster than a parallel pthreads version running on a high-end 6-core Xeon.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2111
Streaming:
Download:
 
GPU Task-Parallelism: Primitives and Applications
Stanley Tzeng (University of California), Anjul Patney (Davis)
We explore how a task-parallel model can be implemented on the GPU and address concerns and programming techniques for doing so. We discuss the primitives for building a task-parallel system on the GPU. This includes novel ideas for mapping task ...Read More

We explore how a task-parallel model can be implemented on the GPU and address concerns and programming techniques for doing so. We discuss the primitives for building a task-parallel system on the GPU. This includes novel ideas for mapping tasking systems onto the GPU including task granularity, load balancing, memory management, and dependency resolution. We also present several applications which demonstrate how a task-parallel model is more suitable than the regular data parallel model. These applications include a Reyes renderer, tiled deferred lighting renderer, and a video encoding demo.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2138
Streaming:
Download:
 
Large-Scale Reservoir Simulation on GPU
Song Yu (Chemical & Petroleum Department, University of Calgary)
Develop highly parallel GPU-based GMRES solver and several precondtioners, and couple them with the in-house reservoir simulator to speedup large-scale reservoir simulation with over one million grid blocks. For those preconditioners, we develop ...Read More

Develop highly parallel GPU-based GMRES solver and several precondtioners, and couple them with the in-house reservoir simulator to speedup large-scale reservoir simulation with over one million grid blocks. For those preconditioners, we develop the highly parallelized ILU(k), ILUT, and block ILU(k), block ILUT, with matrix partition by METIS on GPU. The excellent speedup and accurate results can demonstrate the great promising future of the GPU parallel device in parallel reservoir simulation.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2190
Streaming:
Download:
 
Levenberg-Marquardt Using Block Sparse Matrices on CUDA
Tetsuo Tawara (Koozyt, Inc.)
This session describes the experiences of constructing GPU based matrix-vector functions for block sparse matrices having multiple block sizes and a domain-specific numerical Jacobian generation function. The bundle adjustment algorithm is an op ...Read More

This session describes the experiences of constructing GPU based matrix-vector functions for block sparse matrices having multiple block sizes and a domain-specific numerical Jacobian generation function. The bundle adjustment algorithm is an optimization procedure which attempts to refine the relative camera pose, and 3D structure location variables, estimated from multiple sets of images. The Conjugate Gradient algorithm is used to solve the normal equations which appear in the inner loop to the non-linear least squares problem.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2231
Streaming:
Download:
 
LAtoolbox: A Multi-platform Sparse Linear Algebra Toolbox
Dimitar Lukarski (Karlsruhe Institute of Technology (KIT)), Jan-Philipp Weiss (Karlsruhe Institute of Technology)
Find out about an easy way for building sparse linear solvers for GPUs and multi-/many-core platforms. Based on data abstraction and virtualization of the hardware, the LAtoolbox supports several platforms such as GPUs, multi-core CPUs, and acce ...Read More

Find out about an easy way for building sparse linear solvers for GPUs and multi-/many-core platforms. Based on data abstraction and virtualization of the hardware, the LAtoolbox supports several platforms such as GPUs, multi-core CPUs, and accelerators. The various backends (CUDA, OpenCL, OpenMP, ...) utilize optimized and platform-specific routines and allow seamless integration of GPUs into scientific applications. By means of unified interfaces across all platforms the library enables you to build generic linear solvers and preconditioners on a single code base without specific information of your hardware. We demonstrate portability and flexibility of our open-source approach on heterogeneous platforms.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2291
Streaming:
Download:
 
Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation
Thomas Benson (Georgia Tech Research Institute)
This presentation describes our development of a GPU-accelerated backpropagation implementation for Synthetic Aperture Sonar systems that supports multiple nodes via MPI and multi-GPU nodes. This implementation can form a complex-valued gigapixe ...Read More

This presentation describes our development of a GPU-accelerated backpropagation implementation for Synthetic Aperture Sonar systems that supports multiple nodes via MPI and multi-GPU nodes. This implementation can form a complex-valued gigapixel image in one hour on a single C2050. We further scale this implementation to the Keeneland system where we can form the same gigapixel image in 21 seconds on 48 nodes with 144 C2070 Tesla GPUs. Our talk will discuss the details of our implementation, including our optimizations and scaling results for various node and GPU configurations, as well as the applicability to other domains, including Synthetic Aperture Radar.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2316
Streaming:
Download:
Astronomy and Astrophysics
Presentation
Media
Scalable Frameworks and Algorithms for Terascale Radio Astronomy Images
Christopher Fluke (Swinburne University of Technology - Centre for Astrophysics and Supercomputing)
Learn how the oldest science is using the newest processors to solve a critical problem: how to accomplish traditional image analysis and visualization tasks when the images are terabytes in size? Simple, standard operations such as displaying 2 ...Read More

Learn how the oldest science is using the newest processors to solve a critical problem: how to accomplish traditional image analysis and visualization tasks when the images are terabytes in size? Simple, standard operations such as displaying 2-d slices, evaluating image statistics, and applying histogram equalization become manifestly challenging when images dramatically exceed single-node memory capacity. We will explain how our hybrid CPU-GPU cluster framework - which can volume render a 200GB image at >50fps! - will support traditional radio astronomy tasks for the colossal images that the Square Kilometre Array and its precursor, the Australian SKA Pathfinder, will generate.

  Back
 
Keywords:
Astronomy and Astrophysics, GTC 2012 - ID S2022
Streaming:
Download:
 
GPU Acceleration of Dense Stellar Clusters Simulation
Bharath Pattabiraman (Northwestern University), Stefan Umbreit (Northwestern University)
Computing the interactions between stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. This paper presents the parallelization of a Monte Carlo algorithm for simulating stellar cluster evolutio ...Read More

Computing the interactions between stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. This paper presents the parallelization of a Monte Carlo algorithm for simulating stellar cluster evolution using programmable Graphics Processing Units. The kernels of this algorithm exhibit high levels of data dependent decision making and unavoidable non-contiguous memory accesses. However, we adopt various parallelization strategies and utilize the high computing power of the GPU to obtain substantial near-linear speedups which cannot be easily achieved on a CPU-based system. This acceleration allows to explore physical regimes which were out of reach of current simulations.

  Back
 
Keywords:
Astronomy and Astrophysics, GTC 2012 - ID S2087
Streaming:
Download:
 
Signal Processing on GPUs for Radio Telescopes
John Romein (ASTRON)
In this talk, we will present GPU implementations of four highly compute-intensive algorithms used by radio telescopes. ...Read More

In this talk, we will present GPU implementations of four highly compute-intensive algorithms used by radio telescopes.

  Back
 
Keywords:
Astronomy and Astrophysics, GTC 2012 - ID S2124
Streaming:
Download:
 
GPUs for Radio Imaging
Vamsi Krishna Veligatla (University Of Groningen)
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to pr ...Read More

With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process the large data-sets in a reasonable time. In this session we describe how we have used the computing power of GPU's to improve the performance of the standard radio imaging techniques as well as how this computational power is useful for creating a new generation of Radio Imaging Algorithms.

  Back
 
Keywords:
Astronomy and Astrophysics, GTC 2012 - ID S2187
Streaming:
Download:
 
Accelerating Radio Astronomy Cross-Correlation Beyond 1 Tflops Using Fermi
Michael Clark (NVIDIA)
Radio astronomy is a signal processing application that requires extreme supercomputing. While today's radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase to 1 Exaflops. The most compute ...Read More

Radio astronomy is a signal processing application that requires extreme supercomputing. While today's radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase to 1 Exaflops. The most compute intensive part of this problem is the so-called cross-correlation algorithm, which is a linear-algebra problem. In this session we demonstrate that the Fermi architecture is ideally suited to this problem, and through exploiting the Fermi memory hierarchy it is possible to achieve close to 80% of peak performance in a real application.

  Back
 
Keywords:
Astronomy and Astrophysics, GTC 2012 - ID S2347
Streaming:
Download:
Audio, Image and Video Processing
Presentation
Media
Using the GPU Direct for Video API
Thomas True (NVIDIA), Alina Alt (NVIDIA)
This tutorial will demonstrate how video I/O devices can take advantage of the GPU Direct for Video API to optimize the data transfer performance for digital video, film and broadcast applications and computer vision applications. The GPU Direct ...Read More

This tutorial will demonstrate how video I/O devices can take advantage of the GPU Direct for Video API to optimize the data transfer performance for digital video, film and broadcast applications and computer vision applications. The GPU Direct for Video API is a technology that permits the DMA transfer of data buffers between video I/O devices and the GPU through the use of a shared system memory buffer for immediate processing by OpenGL, DirectX, CUDA and OpenCL. This direct transfer can improve synchronization and eliminate latency between video capture, GPU processing and video output.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2049
Streaming:
Download:
 
Fast High Quality Image and Video Background Removal with CUDA
Timo Stich (NVIDIA)
A tool to efficiently and easily cut out objects from a taken picture has great practical value. In this session we present aspects on how to efficiently implement such a tool with CUDA and the NPP library based on the GrabCut approach by Rother ...Read More

A tool to efficiently and easily cut out objects from a taken picture has great practical value. In this session we present aspects on how to efficiently implement such a tool with CUDA and the NPP library based on the GrabCut approach by Rother et al. Through GPU acceleration both runtime and accuracy is improved compared to CPU based implementations such as the one in MS Word 2011. Further we show how to extend our GPU implementation to enable live background removal in a webcam video stream.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2052
Streaming:
Download:
 
Cost-effective GPU Acceleration of a Video Restoration and Archiving Workflow
Klaus Gaedke (Technicolor)
The goal of this session is to present a complex GPU-accelerated video restoration and archiving workflow. The workflow consists of many different processing steps and a final review application. Fast and cost-effective processing and real-time ...Read More

The goal of this session is to present a complex GPU-accelerated video restoration and archiving workflow. The workflow consists of many different processing steps and a final review application. Fast and cost-effective processing and real-time display of the processed video material is a key requirement. It will be shown in detail how a GPU based acceleration can be achieved for many different processing steps and the review application based on the use of OpenCV, OpenCL, and OpenGL. Furthermore, an object oriented software architecture supporting the acceleration of several different processing tasks on the same graphics adapter will be presented.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2073
Streaming:
Download:
 
Multi-GPU Real-Time Ptychographic X-ray Image Reconstruction
Filipe Maia (Lawrence Berkeley National Laboratory)
Learn how a new imaging technique, combined with the computational power of GPUs and the brightness of modern X-ray synchrotrons can quickly and easily produce images with nanometer level resolution. Ptychography is a recent X-ray imaging techni ...Read More

Learn how a new imaging technique, combined with the computational power of GPUs and the brightness of modern X-ray synchrotrons can quickly and easily produce images with nanometer level resolution. Ptychography is a recent X-ray imaging technique in which overlapping regions of a sample are exposed in quick succession and the resulting scattering is used to reconstruct a high resolution image of the sample. Discover why GPUs can substitute for the lack of X-ray lenses and how they enabled a dramatic reduction in the feedback time for users of the technique from days to seconds.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2131
Streaming:
Download:
 
Rapid Training of Acoustic Models Using GPUs
Jike Chong (Carnegie Mellon University), Ian Lane (Carnegie Mellon University Co)
Learn how to realize robust and accurate speech recognition systems by training acoustic models on GPUs. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large ...Read More

Learn how to realize robust and accurate speech recognition systems by training acoustic models on GPUs. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large cluster of machines. To overcome this development bottleneck, we propose a new framework for rapid training of acoustic models using highly parallel GPUs. With a single NVIDIA GTX580 GPU, our proposed approach is shown to be 51x faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000-hour speech data in just over 9 hours.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2222
Streaming:
Download:
 
Building Real-Time Professional Visualization Solutions with OpenCL
Kristof Denolf (Barco), Samuel Maroy (Barco)
Professional visualization solutions, like high-quality high-resolution medical displays or very large screens for surveillance or entertainment, benefit from GPUs image and graphics compute capabilities to achieve real-time performance, but add ...Read More

Professional visualization solutions, like high-quality high-resolution medical displays or very large screens for surveillance or entertainment, benefit from GPUs image and graphics compute capabilities to achieve real-time performance, but add specific constraints, like low-latency, multiple HD streams and strict synchronization. This talk first motivates the industrial relevance of development in OpenCL on heterogeneous devices. It then explains the techniques currently explored to meet the specific design constraints, with a main focus on parallel data transfer and compute. The lessons learned are illustrated with a real-life example.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2252
Streaming:
Download:
 
Sensor Processing with Rugged Kepler GPUs (Presented by GE Intelligent Platforms)
Dustin Franklin (GE Intelligent Platforms)
Swimming in sensors and drowning in data? Turn the tide on high-bandwidth sensors with rugged next-generation GPUs from NVIDIA. See how we deploy NVIDIA GPUs into the most extreme of environments, providing GPGPU capabilities onboard platforms w ...Read More

Swimming in sensors and drowning in data? Turn the tide on high-bandwidth sensors with rugged next-generation GPUs from NVIDIA. See how we deploy NVIDIA GPUs into the most extreme of environments, providing GPGPU capabilities onboard platforms where SWaP and GFLOPS/watt is key. Dig into four realtime CUDA sensor processing applications - Hyperspectral Imaging, Wide-Area Surveillance, 360° Situational Awareness, and GSM cellular SIGINT. Discuss the CUDA algorithms, interconnects, and rugged platforms behind each. Learn how we utilize GPUDirect and realtime Linux for improved latency and determinism.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2253
Streaming:
Download:
 
Fast JPEG Coding on the GPU
Fyodor Serzhenko (Fastvideo), Victor Podlozhnyuk (NVIDIA)
The goal of this session is to demonstrate how high speed JPEG compression and decompression can be efficiently implemented on the GPU using CUDA. In this session we will present: detailed analysis of Baseline JPEG compression and decompression ...Read More

The goal of this session is to demonstrate how high speed JPEG compression and decompression can be efficiently implemented on the GPU using CUDA. In this session we will present: detailed analysis of Baseline JPEG compression and decompression processes and its constituent parts (such as Huffman Coding, RLE, Differential Coding, Quantization, Discrete Cosine Transform) and their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to high-speed imaging.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2273
Streaming:
Download:
 
Best Practices in GPU-Based Video Processing
Thomas True (NVIDIA)
The combination of the GPU's massively parallel compute engine with extremely high memory bandwidth and new programming paradigms such as CUDA and OpenCL have made the GPU well suited for image and video processing applications. This session ...Read More

The combination of the GPU's massively parallel compute engine with extremely high memory bandwidth and new programming paradigms such as CUDA and OpenCL have made the GPU well suited for image and video processing applications. This session will explore best practices and techniques for the development of efficient GPU-based video and image processing applications. Topics to be discussed include image segmentation and threading models for efficient parallelism, optimal memory usage strategies to reduce expensive data movement as well as multi-GPU considerations. Case studies and examples specific to video and image processing will be presented.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2328
Streaming:
Download:
Bioinformatics & Genomics
Presentation
Media
Algorithms and Tools for Bioinformatics on GPUs
Bertil Schmidt (Nanyang Technological University)
Learn how to use GPUs to accelerate compute- and data-intensive applications and algorithms Bioinformatics. High-throughput techniques for DNA sequencing and gene expression analysis with microarrays have led to a rapid growth in the amount of d ...Read More

Learn how to use GPUs to accelerate compute- and data-intensive applications and algorithms Bioinformatics. High-throughput techniques for DNA sequencing and gene expression analysis with microarrays have led to a rapid growth in the amount of digital biological data, e.g. the NCBI Sequence Read Archive (SRA) houses raw sequence data generated by next-generation sequencing (NGS) technologies which succeeds 25 trillion base-pairs. Therefore, modern bioinformatics tools need to be scalable; i.e. they need to deal with an ever growing amount of data. GPUs and CUDA provide the opportunity to significantly reduce the runtime of many biological algorithms on inexpensive hardware.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2008
Streaming:
Download:
 
SeqNFind: Application Of CUDA GPU Technologies To Sequence Alignment Techniques
D. Andrew Carr (Accelerated Technology Laboratories Inc.)
Explosive growth in the amount of genomic data has created a need for faster systems that align and compare nucleotide sequences. With the development of tools for leveraging the massively parallel architecture of NVIDIA GPUs it is a logical nex ...Read More

Explosive growth in the amount of genomic data has created a need for faster systems that align and compare nucleotide sequences. With the development of tools for leveraging the massively parallel architecture of NVIDIA GPUs it is a logical next step to construct algorithms for genomic analysis on GPU clouds/clusters. Although a seemingly simple task, there are a number of challenges to deploying the current algorithms. Every algorithm from Smith-Waterman to BLAST has its own unique set of barriers. Presented here some of the lessons learned and how ongoing genomic research projects have benefitted from the increased speed and accuracy.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2037
Streaming:
Download:
 
Swift: A GPU-based Smith-Waterman Sequence Alignment Program
Pankaj Gupta (St Jude Children's Research Hospital)
This session describes Swift, a GPU-based Smith-Waterman implementation for aligning short DNA sequences to large genomes. Swift has been designed to reduce computation time and lower hardware cost. Also, unlike other leading GPU-based Smith-Wat ...Read More

This session describes Swift, a GPU-based Smith-Waterman implementation for aligning short DNA sequences to large genomes. Swift has been designed to reduce computation time and lower hardware cost. Also, unlike other leading GPU-based Smith-Waterman sequence alignment programs like CUDASW++ and SWCUDA which focus on protein sequence alignment, Swift has been developed for DNA sequence alignment. Swift performs 200x faster than CUDASW++ using a test data set containing 1000 reads (100 bases each) and 1000 references (1000 bases each), and it performs 11x faster than the CPU-based implementation of Smith-Waterman using 24 million reads (100 bases each) and human chromosome 1.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2083
Streaming:
Download:
 
CUMACH - A Fast GPU-based Genotype Imputation Tool
Agatha Hu (NVIDIA)
The goal of this session is to introduce a GPU-implemented tool in bioinformatics. Genotype imputation is method which extrapolates genetic correlations from a densely characterized reference panel to a sparsely typed study sample. There have al ...Read More

The goal of this session is to introduce a GPU-implemented tool in bioinformatics. Genotype imputation is method which extrapolates genetic correlations from a densely characterized reference panel to a sparsely typed study sample. There have already been lots of CPU-based tools, but they all cost lots of time for large data-set. In this session, we try to implement a GPU-based imputation tool which can get relatively good result and fast speed. There will be three main parts for the session: 1) Introduce the background and its HMM based algorithm, 2) GPU implementation and optimization, 3) Results.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2084
Streaming:
Download:
 
SOAP3: GPU-based Compressed Indexing and Ultra-fast Parallel Alignment of Short Reads
BingQiang Wang (BGI)
We give the fi_x000C_rst implementation of a compressed index (Burrows-Wheeler Transform) on the GPU, supporting very efficient parallel alignment of short patterns (reads) onto the human genome. The new alignment software SOAP3 is tens of times ...Read More

We give the fi_x000C_rst implementation of a compressed index (Burrows-Wheeler Transform) on the GPU, supporting very efficient parallel alignment of short patterns (reads) onto the human genome. The new alignment software SOAP3 is tens of times faster than existing ones and can catch up the throughput (Giga to Tera bp) of next generation DNA sequencer. It takes 2.4 seconds to perform exact matching for one million length-100 reads (tens of seconds for small-error approximate matching). Technically, we show how to minimize memory accesses to the index from individual threads and to control the branching and divergence of the threads.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2109
Streaming:
Download:
 
Accurate Sequence Alignment Using Distributed Filtering on GPU Clusters
Reza Farivar (University of Illinois at Urbana-Champaign), Shivaram Venkataraman (UC Berkeley)
Learn how GPUs enable new ways to rethink a complex bioinformatics problem: Accurate sequence alignment. What was once prohibitive to compute can become the basic block of novel GPU-based algorithms. Modern DNA sequencing machines generate enorm ...Read More

Learn how GPUs enable new ways to rethink a complex bioinformatics problem: Accurate sequence alignment. What was once prohibitive to compute can become the basic block of novel GPU-based algorithms. Modern DNA sequencing machines generate enormous amounts of short sequences within minutes, and they should be aligned to a reference genome in real time. Most solutions only find a few locations that match a short sequence. We introduce a new technique to find all matching locations inside a reference sequence for a given number of mismatches. Our technique is based on a distributed filtering scheme and GPU based processing.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2152
Streaming:
Download:
 
Towards Computing the Cure for Cancer
Wu Feng (Virginia Tech), Heshan Lin (Virginia Tech)
Learn about how to create "designer" genomic analysis pipelines as part of the "Compute the Cure" for cancer initiative from NVIDIA Foundation. Get an overview of an open-source framework that enables the creation of customiz ...Read More

Learn about how to create "designer" genomic analysis pipelines as part of the "Compute the Cure" for cancer initiative from NVIDIA Foundation. Get an overview of an open-source framework that enables the creation of customized genomic analysis pipelines. Discover how different plug-ins from the "mapping/realignment/discovery" repositories, respectively, can be composed to form a genomic analysis pipeline. Learn to use next-generation sequencing data to characterize previously undetectable genetic changes between normal and malignant cells. Find out how you can contribute to the "Compute the Cure" cause.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2156
Streaming:
Download:
 
High-Throughput Epistasis Screening Using GPUs
Mark Seligman (Insilicos LLC)
Epistasis is the interaction of two or more genes in coding for a biological property. Epistasis is believed to be an important factor in an individual's susceptibility to disease, and the search for epistasis is a major component in the dev ...Read More

Epistasis is the interaction of two or more genes in coding for a biological property. Epistasis is believed to be an important factor in an individual's susceptibility to disease, and the search for epistasis is a major component in the development of personalized approaches to genomic medicine. Statistical tests for epistasis are typically confounded by the multiple-testing problem, that is, the aggregated loss of precision incurred through repeated hypothesis testing. One way to circumvent this problem is to simulate a false-discovery rate via resampling. We report success in using GPUs to accelerate these highly compute-intensive resampling techniques.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2337
Streaming:
Download:
 
GPGPU Accelerated Protein Similarity Measures Identifying Biological Relevant Structure
Edward Lowe (Vanderbilt University), Nils Woetzel (Vanderbilt University)
Atomic structure similarity measures for proteins help in de novo protein structure prediction. For a large set of computationally generated protein structures (~20k) all pairwise similarities have to be calculated to cluster structures. Common ...Read More

Atomic structure similarity measures for proteins help in de novo protein structure prediction. For a large set of computationally generated protein structures (~20k) all pairwise similarities have to be calculated to cluster structures. Common similarity measures are root mean square deviation (RMSD) and global distance test total score (GDT_TS). Although GDT_TS has advantages over RMSD, it is not used due to its time consuming calculation. Afore mentioned and other similarity measures are ported for parallel execution on GPGPUs to make them amenable for clustering de novo generated structural models to find the largest cluster representing the biological relevant protein conformations.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2346
Streaming:
Download:
 
Dynamic Programming on CUDA: Finding the Most Similar DNA Sequence
Grzegorz Kokosinski (IBM Poland), Krzysztof Zarzycki (IBM Poland)
Learn a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU. Our particular problem regarded DNA sequences: given a reference sequence, how to find the one most similar to it among a large database? The seque ...Read More

Learn a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU. Our particular problem regarded DNA sequences: given a reference sequence, how to find the one most similar to it among a large database? The sequences are millions characters long, and their similarity is calculated with a (quadratic) DP algorithm, which makes the problem very tough even for the GPUs. We speed up both the theoretical and practical side: we present programming techniques that enable Dynamic Programming to be performed at the hardware speed, and improvements to the algorithm itself that drastically lower the execution time.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2376
Streaming:
Download:
 
The Advantage of GPU Computation for Analyzing Complex Traits
Jun Zhu (Zhejiang University)
Most import agriculture traits and human diseases are complex traits which are controlled by gene network with gene by gene interaction (epistasis) and gene by environment interaction (GE). New statistic methods and software are developed for an ...Read More

Most import agriculture traits and human diseases are complex traits which are controlled by gene network with gene by gene interaction (epistasis) and gene by environment interaction (GE). New statistic methods and software are developed for analyzing genetic architecture for complex traits based on genome-wide association study (GWAS). When deal with large mapping population and huge amount of molecular information, GPU computation has an advantage over CPU computation. We will demonstrate the newly developed GPU based software QTLNetwork V3.0 and GWAS-GMDR for mapping genes with epistasis and GE interaction for complex traits of human, crops, and mouse.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2516
Streaming:
Download:
 
GPU Accelerated Bioinformatics Research at BGI
BingQiang Wang (BGI)
After digitizing DNA double helix by sequencing, computation is the key connecting raw sequences with life science discoveries. As massive data is generated, how to process and analysis as well as storage them in an efficiently manner turns out ...Read More

After digitizing DNA double helix by sequencing, computation is the key connecting raw sequences with life science discoveries. As massive data is generated, how to process and analysis as well as storage them in an efficiently manner turns out to be a major challenge. By developing GPU accelerated bioinformatics tools and integrate them into pipelines, BGI researchers now run analysis pipelines in several hours instead of several days. These tools include SOAP3 aligner, SNP calling and tool for population genomics. The speed up is generally around 10-50x comparing with traditional counterparts.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2519
Streaming:
Download:
Climate, Weather, Ocean Modeling
Presentation
Media
Real Time GPU-Based Marine Scenes Simulation
Jerome Graindorge (ALYOTECH), Julien Houssay (ALYOTECH)
Marine survey, carried out by sea or by air, is of major concern for current defense and security applications. Essential surveillance/ observation/ identification systems involve electro-optics (visible and infra-red) and radar. Optimizing thei ...Read More

Marine survey, carried out by sea or by air, is of major concern for current defense and security applications. Essential surveillance/ observation/ identification systems involve electro-optics (visible and infra-red) and radar. Optimizing their performance requires amounts of expensive observational data spanning the wide variability of the marine environment. Computer simulation provides a valuable flexible and inexpensive alternative. Since 2007, ALYOTECH, in partnership with the IFREMER (French Research Institute for Exploration of the Sea), has been developing a GPU-based real-time ocean scene simulator for visible, infrared and radar sensors, in order to meet the challenging requirements arising from marine survey issues.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID S2053
Streaming:
Download:
 
A Stencil Library for the New Dynamic Core of COSMO
Tobias Gysi (SCS), Peter Messmer (NVIDIA)
We will present a stencil library used in the heart of the COSMO numeric weather prediction model. During the talk we'll show how we implemented an abstraction that allows easy development of new stencils and solvers on top of a framework al ...Read More

We will present a stencil library used in the heart of the COSMO numeric weather prediction model. During the talk we'll show how we implemented an abstraction that allows easy development of new stencils and solvers on top of a framework allowing execution on both CPU and GPU. The library makes efficient use of GPU resources and we will show how to structure memory accesses and computation optimally. Developers involved in porting or writing fully-featured C++ libraries for CUDA will also be interested in attending.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID S2256
Streaming:
Download:
Cloud Visualization
Presentation
Media
Graphics in the Cloud - How NVIDIA is Enabling Cloud Visualization
Will Wade (NVIDIA)
Engineers, artists, scientists, and gamers are the most demanding visual thinkers on the planet, and as such have not been willing to move their computing environments to the infamous "cloud". These remotely accessed systems are seen a ...Read More

Engineers, artists, scientists, and gamers are the most demanding visual thinkers on the planet, and as such have not been willing to move their computing environments to the infamous "cloud". These remotely accessed systems are seen as slow and not up to the visual experience that users expect when dealing with these types of applications. NVIDIA aims to change that perception with the NVIDIA Virtual Graphics Platform. In this session you will hear about the technologies behind accelerating graphics in the cloud, and some of the industry partnerships that are enabling it.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2254
Streaming:
Download:
 
Scalable GPU Computing Service Architecture
Henrik Hoj Madsen (LEGO), Michael Scholer (LEGO)
In this session we describe our GPU accelerated computing service which supports several internal business processes in a large scale company setup. The service supports diverse computational needs such as on-demand rendering, mesh optimization, ...Read More

In this session we describe our GPU accelerated computing service which supports several internal business processes in a large scale company setup. The service supports diverse computational needs such as on-demand rendering, mesh optimization, a Massive Multiplayer Online Game (MMO), product visualizations and other demanding computational tasks. We present the architectural considerations for a service-oriented computational framework and the practical learning's and opportunities encountered during development a enterprise system using NVIDIA technologies such as CUDA, OptiX, OpenGL and OpenCL. Our aim is to share knowledge and present LEGO's vision for a GPU accelerated computational platform as a business-driven technology.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2261
Streaming:
Download:
 
Delivering 3D Professional Graphics from the Cloud with Citrix XenDesktop
Derek Thorslund (Citrix Systems, Inc.)
Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual pro ...Read More

Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual property safe in the data center since only fully-rendered screen images are sent over the network. Users in remote locations no longer have to wait for large file transfers. And they can access 3D models from a wide variety of devices, including iPads and Android tablets. Learn how Citrix XenDesktop, XenServer and Receiver technologies have made all of this a reality for many organizations today.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2413
Streaming:
Download:
 
Accelerating Cloud Graphics
Franck Diard (NVIDIA)
A new NVIDIA SDK provides access to a class of key components which allow optimal capture, compression, streaming and low latency display of high performance games from the cloud. We demonstrate how all these components fit together to deliver a ...Read More

A new NVIDIA SDK provides access to a class of key components which allow optimal capture, compression, streaming and low latency display of high performance games from the cloud. We demonstrate how all these components fit together to deliver an ultimate cloud gaming experience for the customer, but also how they help optimize the relevant metrics for cloud gaming companies.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2627
Streaming:
Download:
 
Interactive Preclinical Analytics via GPU Cloud Platform (Presented by Penguin Computing)
Matt Jacobs (Penguin Computing), David Weinstein (Numira Biosciences)
David Weinstein, CTO of Numira Biosciences and Matt Jacobs, SVP of Corporate Development for Penguin Computing will discuss how Penguin's On-Demand GPU compute environment (POD) and Numira's specialized medical imaging services have been ...Read More

David Weinstein, CTO of Numira Biosciences and Matt Jacobs, SVP of Corporate Development for Penguin Computing will discuss how Penguin's On-Demand GPU compute environment (POD) and Numira's specialized medical imaging services have been forged into a single, service-based offering for the pharmaceutical and bioinformatics markets. Attendees will learn more about the nature of GPU-based cloud resources and the benefits and challenges associated with bringing a commercial medical imaging service to market on such a platform.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2639
Streaming:
Download:
Clusters & GPU Management
Presentation
Media
Best Practices for Architecting and Managing High-Performance GPU Clusters
Dale Southard (NVIDIA)
An overview of designing, deploying, and managing GPU clusters for HPC. Learn to build and operate top500-class GPU computing resources that provide users with the latest CUDA features. ...Read More

An overview of designing, deploying, and managing GPU clusters for HPC. Learn to build and operate top500-class GPU computing resources that provide users with the latest CUDA features.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2012 - ID S2119
Streaming:
Download:
 
Tesla Cluster Monitoring & Management APIs
Robert Alexander (NVIDIA)
Learn more about cluster management and monitoring of Tesla and Quadro products. This includes a detailed description of the NVIDIA Management Library (NVML) and user facing third party software. Additionally, a brief summary of our out-of-band ...Read More

Learn more about cluster management and monitoring of Tesla and Quadro products. This includes a detailed description of the NVIDIA Management Library (NVML) and user facing third party software. Additionally, a brief summary of our out-of-band capabilities will be provided.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2012 - ID S2238
Streaming:
Download:
 
Dynamically Allocating GPGPU to Host Nodes (Servers)
Saeed Iqbal (Dell)
Learn how to remotely change the mapping of GPUs to hosts based on application needs. Audience will then be presented with example scripts and a demo illustrating how this can be implemented to improve system resource utilization. ...Read More

Learn how to remotely change the mapping of GPUs to hosts based on application needs. Audience will then be presented with example scripts and a demo illustrating how this can be implemented to improve system resource utilization.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2012 - ID S2309
Streaming:
Download:
Computational Fluid Dynamics
Presentation
Media
Unstructured Grid Numbering Schemes for GPU Coalescing Requirements
Andrew Corrigan (Naval Research Laboratory), Johann Dahm (University of Michigan)
Learn how to achieve high performance for computational fluid dynamics (CFD) solvers over unstructured grids using numbering schemes tailored for GPU coalescing requirements. Using these techniques, unstructured grid CFD solvers can make more ef ...Read More

Learn how to achieve high performance for computational fluid dynamics (CFD) solvers over unstructured grids using numbering schemes tailored for GPU coalescing requirements. Using these techniques, unstructured grid CFD solvers can make more effective use of memory bandwidth, which is an otherwise significant performance bottleneck that has so far led to relatively limited performance gains on GPUs in comparison to structured grid CFD solvers. Performance benchmarks will be shown using the Jet Engine Noise Reduction (JENRE) code.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2031
Streaming:
Download:
 
A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters
Peter Zaspel (University of Bonn)
Join our presentation of a multi-GPU fluid solver for high performance GPU compute clusters. We use high-order scientific techniques to simulate the interaction of two fluids like air and water. Scientists, engineers and even the computer animat ...Read More

Join our presentation of a multi-GPU fluid solver for high performance GPU compute clusters. We use high-order scientific techniques to simulate the interaction of two fluids like air and water. Scientists, engineers and even the computer animation industry will profit from the enormous compute power of tens or hundreds of GPUs. A major focus in this talk will be on the applied GPU implementation techniques and the performance results including performance per Watt and performance per dollar results. We also highlight the lessons we learned from porting the complex CPU CFD code NaSt3DGPF to the GPU.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2044
Streaming:
Download:
 
Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations
Rohit Gupta (Delft University of Technology)
Get a closer look on how parallel conjugate gradient(CG) method can get an edge over it's optimized CPU implementation. We have developed preconditioning techniques for CG which are suited to the GPU and match Block-IC in terms of numerical ...Read More

Get a closer look on how parallel conjugate gradient(CG) method can get an edge over it's optimized CPU implementation. We have developed preconditioning techniques for CG which are suited to the GPU and match Block-IC in terms of numerical performance. We present our results for two level preconditioned CG on the GPU and also compare it with multi-CPU implementations. Our results show that for large problem sizes(1 million unknowns and above) it is possible to achieve an order of magnitude and higher speedups for the two level preconditioned CG method.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2063
Streaming:
Download:
 
Particleworks: Particle-based CAE Software Fully Ported on Multi-GPU
Yoshiaki Hanada (Prometech Software Inc.), Issei Masaie (Prometech Software Inc.)
Get the latest information on Particle-based fluid simulation + multi-GPU computing as a commercial CAE software named "Particleworks" in Japan. In this session, we provide the information such as (1)Particle simulation trends in CAE, ...Read More

Get the latest information on Particle-based fluid simulation + multi-GPU computing as a commercial CAE software named "Particleworks" in Japan. In this session, we provide the information such as (1)Particle simulation trends in CAE, (2)Particle simulation development in Japanese industry, (3)Implementation and performance of full GPU porting and (4)Multi-GPUs scaling with the several clients' cases.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2066
Streaming:
Download:
 
A Monte Carlo Thermal Radiation Solver in GPU/CPU Hybrid Architecture
Gaofeng Wang (Laboratoire E.M2.C, Ecole Centrale Paris), Oliver Gicquel (Laboratoire E.M2.C, Ecole Centrale Paris)
A Monte Carlo ray-tracing code is developed to predict radiative heat transfer behaviours in CFD simulation of combustion phenomena. Using emission-reciprocal method, each random ray casting of each node could be independently conducted for para ...Read More

A Monte Carlo ray-tracing code is developed to predict radiative heat transfer behaviours in CFD simulation of combustion phenomena. Using emission-reciprocal method, each random ray casting of each node could be independently conducted for parallel computations. The code is efficiently implemented in hybrid GPU/CPU HPC resources using a dedicated dynamic load balancing strategy. A linear speedup scaling of hybrid HPC resources has been shown in demonstrating calculation of radiative heat transfer of a helicopter engine's combustion chamber, while adding one GPU in HPC resources pool is in sense of nine CPU cores supplements.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2129
Streaming:
Download:
 
Efficient Implementation of CFD Algorithms on GPU Accelerated Supercomputers
Ali Khajeh Saeed (University of Massachusetts, Amherst)
The goal of this session is to introduce the concepts necessary to perform large computational fluid dynamic (CFD) problems on collections of many GPUs. Communication and computation overlapping schemes become even more critical when using fast ...Read More

The goal of this session is to introduce the concepts necessary to perform large computational fluid dynamic (CFD) problems on collections of many GPUs. Communication and computation overlapping schemes become even more critical when using fast compute engines such as GPUs that are connected via a relatively slow interconnect (such as MPI on InfiniBand). The algorithms presented are validated on unsteady CFD simulations of turbulence using 192 graphics processors to update half-a-billion unknowns per computational timestep. The performance results from three different GPU accelerated supercomputers (Lincoln, Forge, and Keeneland) are compared with a large CPU based supercomputer (Ranger).

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2217
Streaming:
Download:
 
RANS CFD Solver on Fermi
Peng Wang (NVIDIA), James Lin (Shanghai Jiao Tong University)
SJTU-NS3D is an in-house CFD code co-developed by SJTU and COMAC for large civil airplane, solving 3D Reynolds Average Navier-Stokes (RANS) equations on structured grids by finite volume method, which could be used in designing wing model. In th ...Read More

SJTU-NS3D is an in-house CFD code co-developed by SJTU and COMAC for large civil airplane, solving 3D Reynolds Average Navier-Stokes (RANS) equations on structured grids by finite volume method, which could be used in designing wing model. In this talk, we will present the design and further optimization of CUDA version of SJTU-NS3D, and it achieves 20-fold speedup for standard M6 wing model and 37-fold speedup for wing model candidate from COMAC on single Fermi C2050.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2251
Streaming:
Download:
 
Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python
Michal Januszewski (University of Silesia in Katowice; Google Switzerland)
Learn how Run-Time Code Generation (RTCG) techniques allowed for fast development of a lattice Boltzmann (LB) fluid dynamics solver called Sailfish. Sailfish is completely open source, supports a wide variety of LB models (single and multiple re ...Read More

Learn how Run-Time Code Generation (RTCG) techniques allowed for fast development of a lattice Boltzmann (LB) fluid dynamics solver called Sailfish. Sailfish is completely open source, supports a wide variety of LB models (single and multiple relaxation times, the entropic model; single and binary fluids) and can take advantage of multiple GPUs. Even though the project is written predominantly in Python, no performance compromises are made. This talk will introduce the basic design principles of Sailfish and illustrate how RTCG allows to exploit the power of GPUs with minimal programmer effort.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2258
Streaming:
Download:
 
CU++: An Object-Oriented Framework for Computational Fluid Dynamics (CFD) Applications
Dominic Chandar (University of Wyoming)
In this session, I will elucidate the power of blending C++ expression templates and CUDA which has resulted in a smart framework - CU++ for solving Computational Fluid Dynamics problems on structured and unstructured meshes. Briefly, CU++ allow ...Read More

In this session, I will elucidate the power of blending C++ expression templates and CUDA which has resulted in a smart framework - CU++ for solving Computational Fluid Dynamics problems on structured and unstructured meshes. Briefly, CU++ allows a code developer with just C/C++ knowledge to write computer programs that will execute on the GPU with minimal knowledge of specific programming techniques in CUDA. It allows the user to reuse existing C/C++ CFD codes with minimal changes. Codes written in CU++ can also be compiled in serial mode to be executed on a CPU using the tool ugc.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2264
Streaming:
Download:
 
Virtual Process Engineering - Realtime Simulation of Multiphase Systems
Wei Ge (Chinese Academy of Sciences)
Realtime simulation and virtual reality with quantitatively correct physics for industrial processes with multi-scale and multiphase system is once a remote dream for process engineering, but is becoming true now with CPU-GPU hybrid supercomputi ...Read More

Realtime simulation and virtual reality with quantitatively correct physics for industrial processes with multi-scale and multiphase system is once a remote dream for process engineering, but is becoming true now with CPU-GPU hybrid supercomputing. Numerical and visualization methods for such simulations on thousands of GPUs will be reported with applications in chemical and energy industries.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2268
Streaming:
Download:
 
Culises - A Library for Accelerated CFD on Hybrid GPU-CPU Systems
Bjoern Landmann (FluiDyna GmbH)
The vast majority of CFD simulations relies on the solution of large-scale systems of linear equations (SLE), where the solution of a system can consume most of the total CPU time. We have developed a library (Culises) for state-of-the-art solut ...Read More

The vast majority of CFD simulations relies on the solution of large-scale systems of linear equations (SLE), where the solution of a system can consume most of the total CPU time. We have developed a library (Culises) for state-of-the-art solution of SLE that is targeted on hybrid GPU-CPU platforms. Culises can be connected to MPI-parallelized CFD codes (e.g. OpenFOAM) via an application-specific interface. In this talk, we focus on efficient implementation of preconditioned Krylov subspace methods. Using the computing power of GPUs, Culises can significantly accelerate pure CPU computations for a multitude of industrial CFD applications.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2293
Streaming:
Download:
 
A GPU-Enabled SPH Method for Micro and Nanofluidic Simulations
Daniel Gaudlitz (FluiDyna GmbH)
With SPH methods multi-phase flows within complex geometries can be efficiently investigated. Also physical effects present in micro- and nanofluidic applications are described with little effort using the SPH methodology. In order to investigat ...Read More

With SPH methods multi-phase flows within complex geometries can be efficiently investigated. Also physical effects present in micro- and nanofluidic applications are described with little effort using the SPH methodology. In order to investigate microfluidic applications relevant to industry, large domains and high spatial resolutions are required. Therefore, a SPH method for accelerated computations on GPUs is currently developed. The code features dynamic casting of computational data into blocks of appropriate size to fit the GPU memory layout. Also tree-like data structures for efficient manipulation of particle distributions help to obtain significant performance gains on GPU hardware.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2296
Streaming:
Download:
 
Large Scale Computational Fluid Dynamics Simulations on Hybrid Supercomputers
John Humphrey (EM Photonics), Eric Kelmelis (EM Photonics)
Learn how to approach the all-too-common program of trying to retrofit a major application for speed in the modern era of the hybrid supercomputer. In this talk, we will focus on computational fluid dynamics (CFD) codes that are run on Top500 Su ...Read More

Learn how to approach the all-too-common program of trying to retrofit a major application for speed in the modern era of the hybrid supercomputer. In this talk, we will focus on computational fluid dynamics (CFD) codes that are run on Top500 Supercomputers. Many of these applications have existed for 20 or more years, so the process of adding the GPU and getting wall-clock improvements in performance can be very challenging! Our talk will discuss how to properly target your effort, the impact of directives-based coding, and how to maintain efficiency across a hybrid cluster.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2304
Streaming:
Download:
 
Classical Algebraic Multigrid for CFD with CUDA
Simon Layton (Boston University)
Classical algebraic multigrid (AMG) is one of the most popular algorithms used in engineering, and the engine in many successful commercial packages. Among sparse linear solvers, it is known for being fast, parallel and scalable, yet it maps to ...Read More

Classical algebraic multigrid (AMG) is one of the most popular algorithms used in engineering, and the engine in many successful commercial packages. Among sparse linear solvers, it is known for being fast, parallel and scalable, yet it maps to GPU architecture with some considerable difficulty. We have tackled these difficulties and currently have a full CUDA implementation of classical AMG, which has been validated against the gold-standard, Hypre. Significant effort was dedicated to reducing thread divergence and optimizing memory access, and we continue to work on performance improvements. We are aiming for a competitive AMG code for fluid dynamics applications.

  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2012 - ID S2305
Streaming:
Download:
Computational Photography
Presentation
Media
Accelerate a Fully Functional Photo Editing Software with GPU
Kaiyong Zhao (Hong Kong Baptist University)
Introduce how to design a fully functional GPU-based photo editing software, which provides features like layering and selecting, and integrates various adjusting tools and image filters. This design contains a fast layer rendering engine, an im ...Read More

Introduce how to design a fully functional GPU-based photo editing software, which provides features like layering and selecting, and integrates various adjusting tools and image filters. This design contains a fast layer rendering engine, an image filter framework which manages different filters supporting visual feedback for filter parameter adjustment. We will also introduce how to design undoing system for GPU-based image processing software. Specifically a CUDA-accelerated HDR tool will be presented in detailed.

  Back
 
Keywords:
Computational Photography, GTC 2012 - ID S2281
Streaming:
Download:
 
Live 3D-Video with a Lightfield Camera
Christian Perwass (Raytrix GmbH)
In this session you will learn what a lightfield camera is, how it works and what you can do with it. Next to the theoretical presentation we give a live demo of the camera system developed by our company Raytrix that gives you 3D live video fro ...Read More

In this session you will learn what a lightfield camera is, how it works and what you can do with it. Next to the theoretical presentation we give a live demo of the camera system developed by our company Raytrix that gives you 3D live video from a single camera through a single lens currently at up to 10fps with a maximum effective resolution of 3 megapixels synthesized from an 11 megapixel sensor using CUDA algorithms on a GTX580. Post-production features include pixel-wise focusing, depth zoom, variable stereo base-line and base-line rotation.

  Back
 
Keywords:
Computational Photography, GTC 2012 - ID S2335
Streaming:
Download:
 
Tools for Mobile Computational Photography
Alejandro Troccoli (NVIDIA)
This session will talk about advances in Mobile Computational Photography and the tools that NVIDIA is putting together to enable these on Tegra powered devices. It will demonstrate the use of FCam, an Application Programming Interface (API) tha ...Read More

This session will talk about advances in Mobile Computational Photography and the tools that NVIDIA is putting together to enable these on Tegra powered devices. It will demonstrate the use of FCam, an Application Programming Interface (API) that allows for easy and precise control of the camera system. In addition, the FCam API can enable the application developer to replace basic camera routines such as metering, which are typically hidden inside black boxes in traditional camera programming models.

  Back
 
Keywords:
Computational Photography, GTC 2012 - ID S2526
Streaming:
Download:
Computational Physics
Presentation
Media
Multiparticle Collision Dynamics on GPUs
Elmar Westphal (Forschungszentrum Juelich)
See how we employ GPUs to simulate the interaction of millions of solvent and solute particles of a fluid system. Often the domain of large cluster system, the most time consuming part of our simulations can now be done on desktop PCs in reasona ...Read More

See how we employ GPUs to simulate the interaction of millions of solvent and solute particles of a fluid system. Often the domain of large cluster system, the most time consuming part of our simulations can now be done on desktop PCs in reasonable time. This contribution shows how GPUs can effectively be used to accelerate existing programs and how techniques like streaming and increased data locality significantly enhance calculation throughput. It also shows how a GPU-optimized program structure yields usually expensive additional functionality "almost free". Furthermore, a well-scaling single-node/multi-GPU implementation of the program is presented.

  Back
 
Keywords:
Computational Physics, GTC 2012 - ID S2036
Streaming:
Download:
 
Application of the GPU to a Two-Part Computational Electromagnetic Algorithm
Eric Dunn (SAIC)
The shooting and bouncing ray (SBR) method is one way to simulate electromagnetic field radiation. Like all methods, there are certain problems where it does not yield accurate results. In this presentation, we will explain one such case that co ...Read More

The shooting and bouncing ray (SBR) method is one way to simulate electromagnetic field radiation. Like all methods, there are certain problems where it does not yield accurate results. In this presentation, we will explain one such case that consists of an antenna resonating between two metal plates. We will discuss how we used the graphics processing unit (GPU) to separate the problem into two parts. Each part is simulated individually with SBR producing an improved result. Such a GPU-accelerated, two-part approach can be applied to other more general hybrid simulations.

  Back
 
Keywords:
Computational Physics, GTC 2012 - ID S2046
Streaming:
Download:
 
PIConGPU - Bringing large-scale Laser Plasma Simulations to GPU Supercomputing
Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf), Guido Juckeland (Center for Information Services and High Performance Computing & Technical University Dresden)
With powerful lasers breaking the Petawatt barrier, applications for laser-accelerated particle beams are gaining more interest than ever. Ion beams accelerated by intense laser pulses foster new ways of treating cancer and make them available t ...Read More

With powerful lasers breaking the Petawatt barrier, applications for laser-accelerated particle beams are gaining more interest than ever. Ion beams accelerated by intense laser pulses foster new ways of treating cancer and make them available to more people than ever before. Laser-generated electron beams can drive new compact x-ray sources to create snapshots of ultrafast processes in materials. With PIConGPU laser-driven particle acceleration can be computed in hours compared to weeks on standard CPU clusters. We present the techniques behind PIConGPU, detailed performance analysis and the benefits of PIConGPU for real-world physics cases.

  Back
 
Keywords:
Computational Physics, GTC 2012 - ID S2067
Streaming:
Download:
 
Porting Legacy Plasma Codes to GPU
Peng Wang (NVIDIA)
Learn how to port legacy Fortran plasma codes to GPU. Many legacy plasma codes are written in Fortran and have many lines of codes. We will discuss techniques in porting such legacy codes easily and efficiently to CUDA C/C++. Performance analysi ...Read More

Learn how to port legacy Fortran plasma codes to GPU. Many legacy plasma codes are written in Fortran and have many lines of codes. We will discuss techniques in porting such legacy codes easily and efficiently to CUDA C/C++. Performance analysis of major algorithmic patterns in plasma codes will be discussed. The discussion will use the GTC and GeFi plasma code as realistic examples.

  Back
 
Keywords:
Computational Physics, GTC 2012 - ID S2245
Streaming:
Download:
 
The Fast Multipole Method on CPU and GPU Processors
Eric Darve (Stanford)
The fast multipole method (FMM) is a widely used numerical algorithm in computational engineering. Accelerating the FMM on CUDA-enabled GPUs is challenging because the FMM has a complicated data access pattern, mostly during the so-called multip ...Read More

The fast multipole method (FMM) is a widely used numerical algorithm in computational engineering. Accelerating the FMM on CUDA-enabled GPUs is challenging because the FMM has a complicated data access pattern, mostly during the so-called multipole-to-local (M2L) operation. We have created several schemes to optimize the M2L and have attained a performance of over 350 (resp. 160) Gflop/s for single (double) precision arithmetic. The optimal algorithm was incorporated into a complete FMM code, which can accept any smooth kernel as specified by the user, making it very flexible. We have also developed a highly efficient CPU version.

  Back
 
Keywords:
Computational Physics, GTC 2012 - ID S2334
Streaming:
Download:
 
Unraveling the Mysteries of Quarks with Hundreds of GPUs
Ronald Babich (NVIDIA)
Dive into the world of quarks and gluons, and hear how GPU computing is revolutionizing the way many calculations in lattice quantum chromodynamics (lattice QCD) are performed. The main computational challenge in such calculations is to repeated ...Read More

Dive into the world of quarks and gluons, and hear how GPU computing is revolutionizing the way many calculations in lattice quantum chromodynamics (lattice QCD) are performed. The main computational challenge in such calculations is to repeatedly solve large systems of linear equations arising from a four-dimensional finite-difference problem. In this session, we'll discuss strategies for parallelizing such a solver across hundreds of GPUs. These include techniques and algorithms for reducing memory traffic and inter-GPU communication. The net result is an implementation that achieves better than 20 Tflops on 256 GPUs, realized in the open-source "QUDA" library.

  Back
 
Keywords:
Computational Physics, GTC 2012 - ID S2368
Streaming:
Download:
Computational Structural Mechanics
Presentation
Media
Particle Dynamics with MBD and FEA Using CUDA
Graham Sanborn (FunctionBay)
Many sphere particles are solved with DEM (Discrete Element Method) and simulated with GPU technology. Fast algorithm is applied to calculate hertzian contact forces between many sphere particles (from 100,000 to 1,000,000) and NVIDIA's CUDA ...Read More

Many sphere particles are solved with DEM (Discrete Element Method) and simulated with GPU technology. Fast algorithm is applied to calculate hertzian contact forces between many sphere particles (from 100,000 to 1,000,000) and NVIDIA's CUDA is used to accelerate the calculation. Many sphere particles and MBD and FEA entities are simulated within commercial software RecurDyn. Many models are built and simulated; fork lifter with sand model, oil in oil tank model, oil filled engine system and water filled washing machine model. All models are simulated with NVIDIA's GPU and the result is shown.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2055
Streaming:
Download:
 
MSC Nastran Sparse Direct Solvers for Tesla GPUs
Cheng Liao (MSCsoftware)
The current implementation of MSC Nastran's MSCLDL and MSCLU sparse direct solvers for multiple Tesla GPUs is presented. The matrix is first statically decomposed into a prescribed number of domains. The Schur compliments are then calculated ...Read More

The current implementation of MSC Nastran's MSCLDL and MSCLU sparse direct solvers for multiple Tesla GPUs is presented. The matrix is first statically decomposed into a prescribed number of domains. The Schur compliments are then calculated with CPUs and GPUs, and the residual structure is solved afterward. Back-substitution is used to find the solution at every grid point. Merits of this method are discussed and performance comparisons are made.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2064
Streaming:
Download:
 
Large-Scale Matrix-Free Topology Optimization on the GPU
Krishnan Suresh (University of Wisconsin)
Popular topology optimization methods today are based on the SIMP concept. Unfortunately, SIMP leads to ill-conditioned stiffness matrices that are difficult to solve on GPU architectures. In this talk, I will present a new topology optimization ...Read More

Popular topology optimization methods today are based on the SIMP concept. Unfortunately, SIMP leads to ill-conditioned stiffness matrices that are difficult to solve on GPU architectures. In this talk, I will present a new topology optimization method called PareTO that relies on the concepts of topological sensitivity and pareto-tracing. The resulting stiffness matrices are well conditioned, and one can now fully exploit GPU architectures for fast matrix-free implementation of the finite element method. Numerical experiments demonstrate that the efficacy of PareTO.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2070
Streaming:
Download:
 
Fluid-Structure-Interaction Using SPH and GPGPU Technology
Wayne Mindle (IMPETUS Afea)
There are two goals when developing engineering analysis software, one is accuracy and the other is speed. In the area of Fluid-Structure Interaction (FSI) computational time has always been the major impediment to solving large realistic engine ...Read More

There are two goals when developing engineering analysis software, one is accuracy and the other is speed. In the area of Fluid-Structure Interaction (FSI) computational time has always been the major impediment to solving large realistic engineering problems. In our implementation the fluid/structural dynamics solver uses a combination of GPU/CPU processing. The added benefit of using a powerful GPU workstation is that it is roughly 10 times less expensive than a regular CPU cluster. In this paper, we present the use of GPU Technology as implemented in the explicit dynamic finite element software IMPETUS Afea Solver.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2143
Streaming:
Download:
 
GPU Based Stacking Sequence Optimization For Composite Skins Using GA
P.V. Chandrasekhar (Infosys Ltd.), Amit Padalkar (Infosys Ltd.)
The goal of this session is to showcase how GPUs can be used to achieve high performance in a Genetic algorithm based optimization. The particular domain applied is stacking sequence optimization of Aircraft wing skins. The concepts illustrated ...Read More

The goal of this session is to showcase how GPUs can be used to achieve high performance in a Genetic algorithm based optimization. The particular domain applied is stacking sequence optimization of Aircraft wing skins. The concepts illustrated use CUDA but are generic to any other GPU language. It is assumed that the registrants have exposure to optimization in engineering domain.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2214
Streaming:
Download:
 
Speedup Altair RADIOSS Solvers Using NVIDIA GPU
Eric Lequiniou (Altair), Hongwei Zhou (Altair)
Solvers are the heart of Altair's HyperWorks computer aided engineering simulation software. In this session, you will learn how GPU can improve their performance. Direct solver is widely used in structural analysis and sensitivity calculati ...Read More

Solvers are the heart of Altair's HyperWorks computer aided engineering simulation software. In this session, you will learn how GPU can improve their performance. Direct solver is widely used in structural analysis and sensitivity calculations. By offloading the intensive matrix computation on the GPU and using heterogeneous computing, you will discover how its speed can be increased compared to multi-core approach. Iterative solver is particularly suited to solve large problems with millions of degrees of freedom. An innovative hybrid parallelization using multi GPUs and MPI allowing dramatic solution time reduction will be presented.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2225
Streaming:
Download:
 
Evolving Use of GPU for Dassault Systems Simulation Products
Luis Crivelli (Dassault Systemes, SIMULIA)
SIMULIA, the Dassault Systems brand for simuliation, has been working with NVIDIA GPGPU cards to accelerate the computation required in doing large-scale structural finite-element simulations with the widely used Abaqus product line. SIMULIA' ...Read More

SIMULIA, the Dassault Systems brand for simuliation, has been working with NVIDIA GPGPU cards to accelerate the computation required in doing large-scale structural finite-element simulations with the widely used Abaqus product line. SIMULIA's initial efforts with GPGPU's have been focused on accelerating particularly costly parts of the code when running both on workstations and clusters. We will look at success in these areas with existing products. Further, SIMULIA is now looking at how evolving programming models like OpenACC open the door to using GPU's as a compute platform more than acceleration for limited parts of an application.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2431
Streaming:
Download:
 
GPU Computing: From Sand to Tank Dynamics
Dan Negrut (University of Wisconsin-Madison)
This talk explores the use of heterogeneous CPU/GPU computing, as enabled by an in-house developed Heterogeneous Computing Template (HCT), for physics-based simulations of mechanical systems. HCT draws on five components: advanced modeling techn ...Read More

This talk explores the use of heterogeneous CPU/GPU computing, as enabled by an in-house developed Heterogeneous Computing Template (HCT), for physics-based simulations of mechanical systems. HCT draws on five components: advanced modeling techniques (formulating the governing equations); algorithmic support (solving these equations); proximity computation; domain decomposition/data exchange (for multi-node distributed CPU/GPU computing); and post-processing/visualization. These five components provide the foundation of a computational framework used to analyze mechanical systems with millions of interacting elements. Example applications will include granular terrain simulation, tracked and wheeled vehicle mobility studies (tanks, rovers), fluid-solid interaction and nonlinear finite element analysis.

  Back
 
Keywords:
Computational Structural Mechanics, GTC 2012 - ID S2518
Streaming:
Download:
Computer Graphics
Presentation
Media
GPU-Accelerated Path Rendering
Mark Kilgard (NVIDIA)
Standards such as Scalable Vector Graphics (SVG), PostScript, TrueType outline fonts, and immersive web content such as Flash depend on a resolution-independent 2D rendering paradigm that GPUs have not traditionally accelerated. This tutorial ex ...Read More

Standards such as Scalable Vector Graphics (SVG), PostScript, TrueType outline fonts, and immersive web content such as Flash depend on a resolution-independent 2D rendering paradigm that GPUs have not traditionally accelerated. This tutorial explains a new opportunity to greatly accelerate vector graphics, path rendering, and immersive web standards using the GPU. By attending, you will learn how to write OpenGL applications that accelerate the full range of path rendering functionality. Not only will you learn how to render sophisticated 2D graphics with OpenGL, you will learn to mix such resolution-independent 2D rendering with 3D rendering and do so at dynamic, real-time rates.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2024
Streaming:
Download:
 
Flame On: Real-Time Fire Simulation for Video Games
Simon Green (NVIDIA), Christopher Horvath (Pixar)
Fire and explosions are common elements in video games and other virtual environments. We present a real-time fire simulator inspired by the paper "Directable, High-Resolution Simulation of Fire on the GPU" [Horvath and Geiger 2009], b ...Read More

Fire and explosions are common elements in video games and other virtual environments. We present a real-time fire simulator inspired by the paper "Directable, High-Resolution Simulation of Fire on the GPU" [Horvath and Geiger 2009], but this time implemented entirely in CUDA and targeted at adding interactive fire to video games. This talk will describe both the tricks necessary to implement an efficient fluid simulator in CUDA, and techniques for rendering the results to achieve realistic looking fire.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2102
Streaming:
Download:
 
Optimized Texture Transfers
Shalini Venkataraman (NVIDIA), Gerhard Lang (VizRT)
Many real world graphics applications need to transfer textures efficiently in and out of GPU memory in the form of 2D images, 2.5D terrains or 3D volumes as well as their time-varying counterparts. The first part of this talk covers technical p ...Read More

Many real world graphics applications need to transfer textures efficiently in and out of GPU memory in the form of 2D images, 2.5D terrains or 3D volumes as well as their time-varying counterparts. The first part of this talk covers technical pointers on how to optimize your OpenGL application to overlap transfers with rendering using the NVIDIA Copy Engines. The second part demonstrates the integration and performance of this feature within the a real world latency-sensitive broadcast graphics application from VizRT.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2356
Streaming:
Download:
 
NURBS Tessellation with CUDA
Brent Oster (NVIDIA)
NURBS, or Non Uniform Rational B Splines, are a curved surface representation commonly used in computer aided design and digital content creation. This recursive representation gives a great deal of flexibility, allowing arbitrary surface order ...Read More

NURBS, or Non Uniform Rational B Splines, are a curved surface representation commonly used in computer aided design and digital content creation. This recursive representation gives a great deal of flexibility, allowing arbitrary surface order and knot vectors, enabling a single NURBS surface to contain many contiguous patches. However, this recursive representation is also expensive to compute, so a NURBS surface is often converted into multiple Bezier patches before being tessellated. In this implementation, we present an efficient method for directly tessellating NURBS surfaces using the NVIDIA CUDA computing API.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2403
Streaming:
Download:
 
Stochastic Rasterization
Eric Enderton (NVIDIA), Morgan McGuire (NVIDIA and Williams College)
Learn how to render transparency, motion blur, and depth of field effects in real time using random sampling. These effects combine multiple objects in each pixel, making them expensive to compute directly. But recent research shows that, with s ...Read More

Learn how to render transparency, motion blur, and depth of field effects in real time using random sampling. These effects combine multiple objects in each pixel, making them expensive to compute directly. But recent research shows that, with stratified sampling and clever reconstruction, good image quality can be achieved with surprisingly small numbers of samples per pixel. We will explain how to do this on the GPU, and explore trade-offs of performance, quality, accuracy, and noise.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2409
Streaming:
Download:
 
Computational Graphics: An Overview of Graphics Research at NVIDIA
David Luebke (NVIDIA Research)
The future of computer graphics presents many challenges. The worlds we render will be vastly more complex in geometry and artistic texture. Real-time rendering will use global illumination to achieve a far richer appearance, robustly. And conte ...Read More

The future of computer graphics presents many challenges. The worlds we render will be vastly more complex in geometry and artistic texture. Real-time rendering will use global illumination to achieve a far richer appearance, robustly. And content creation, which has grown to be the dominant cost of producing both games and film, must get simpler and less expensive. The NVIDIA Graphics Research group addresses these challenges with a focus on Computational Graphics: using general-purpose computation to enhance and extend the traditional pipelines and capabilities of real-time rendering. In this talk David Luebke, who leads graphics research, will give an overview of recent and ongoing work in computational graphics at NVIDIA Research.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2609
Streaming:
Download:
 
Octree-Based Sparse Voxelization for Real-Time Global Illumination
Cyril Crassin (NVIDIA)
Discrete voxel representations are generating growing interest in a wide range of applications in computational sciences and particularly in computer graphics. A new real-time usage of dynamic voxelization inside a sparse voxel octree is to comp ...Read More

Discrete voxel representations are generating growing interest in a wide range of applications in computational sciences and particularly in computer graphics. A new real-time usage of dynamic voxelization inside a sparse voxel octree is to compute voxel-based global illumination. When used in real-time contexts, it becomes critical to achieve fast 3D scan conversion (also called voxelization) of traditional triangle-based surface representations. This talk describes an new surface voxelization algorithm that produces a sparse voxel representation of a triangle mesh scene in the form of an octree structure using the GPU hardware rasterizer. In order to scale to very large scenes, our approach avoids relying on an intermediate full regular grid to build the structure and constructs the octree directly.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2610
Streaming:
Download:
 
Edge-Aware Shaders for Real-Time Computer Graphics
Peter-Pike Sloan (NVIDIA)
The most common approach in rendering is to define behavior at a point in terms of material properties and incident illumination. That approach works well when the geometry and material properties are well-known, and the light physics are simula ...Read More

The most common approach in rendering is to define behavior at a point in terms of material properties and incident illumination. That approach works well when the geometry and material properties are well-known, and the light physics are simulated accurately. We present a technique to help situations where the model and/or physics is incomplete. This technique augments shaders with information about nearby edges, such as corners and boundaries between materials, and makes it natural to add richness procedurally near these visually critical regions.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2611
Streaming:
Download:
 
Lenovo ThinkStation Accelerates Medical Research with Beckman Coulter (Presented by Lenovo)
Scott Ruppert (Lenovo), Tanmay Dharmadhikari (Beckman-Coulter)
Lenovo ThinkStations utilize Nvidia Maximus technology to accelerate mission critical applications across multiple industries, including manufacturing, media & entertainment, and Life Sciences. Discover how GPUs are used to accelerate medica ...Read More

Lenovo ThinkStations utilize Nvidia Maximus technology to accelerate mission critical applications across multiple industries, including manufacturing, media & entertainment, and Life Sciences. Discover how GPUs are used to accelerate medical research from product experts with Lenovo and Beckman Coulter. Beckman Coulter has utilized Nvidia GPUs to reduce software development and test cycles by 50% with their Kaluza software. Kaluza is a revolutionary flow cytometry analysis software solution that provides visualization tools, speed and an innovative simplicity to the flow community. See how Kaluza allows users to analyze 10 million cells in real time. Session attendees will receive a drawing entry to win a brand new ThinkPad Tablet.

  Back
 
Keywords:
Computer Graphics, GTC 2012 - ID S2638
Streaming:
Download:
Computer Vision and Machine Vision
Presentation
Media
Inverse 3D Vision: Detection and Tracking of NVIDIA Glasses
Anton Obukhov (Ubiquiti Networks)
Computer Vision is becoming increasingly popular and important nowadays. With the advent of powerful mobile devices and increasing power of desktop PCs, it is important to improve user experience by tackling the hardest problems of real-time int ...Read More

Computer Vision is becoming increasingly popular and important nowadays. With the advent of powerful mobile devices and increasing power of desktop PCs, it is important to improve user experience by tackling the hardest problems of real-time interaction with the user. These include body parts tracking, face, and gesture recognition. This talk discusses techniques behind an interaction pattern between a user and a 3D visualization system, in which the system tracks the position of NVIDIA 3D Vision Glasses, and accounts this information during rendering. The mentioned techniques include Histograms of Oriented Gradients and Template Matching. The system implementation is discussed too.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2062
Streaming:
Download:
 
Oculus Real-Time Modular Cognitive Vision System
Jeremie Papon (University of Gottingen), Alexey Abramov (University of Gottingen)
This session will explore ways to integrate GPU processing into a real-time computer vision architecture. While there has been a rapid push to move vision algorithms onto GPUs, integration into an efficient vision system architecture remains elu ...Read More

This session will explore ways to integrate GPU processing into a real-time computer vision architecture. While there has been a rapid push to move vision algorithms onto GPUs, integration into an efficient vision system architecture remains elusive. We will discuss our development of a modular vision system architecture that enables rapid prototyping of complex pipelines using multiple GPUs. The system incorporates modules for segmentation, disparity mapping, optical flow and particle filter tracking on the GPU. Our talk will explore the various difficulties associated with developing such a system and will give a hands-on demonstration of Oculus, our vision platform.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2075
Streaming:
Download:
 
Parallel Computing In Mobile Robotics for RISE
Janusz Bedkowski (Institute of Mathematical Machines, Warsaw, Poland)
RISE - Risky Intervention and Surveillance Environment is very de- manding task. In presentation three areas of research are shown such as 3D data registration, robot navigation and 3D cloud of points processing. The approach based on robust KNN ...Read More

RISE - Risky Intervention and Surveillance Environment is very de- manding task. In presentation three areas of research are shown such as 3D data registration, robot navigation and 3D cloud of points processing. The approach based on robust KNN nearest neighborhood search applied for improvement of ICP algorithm is shown. The path planning parallel approach based on wave propagation method is shown. On line segmentation of 3D cloud of points based on normal vector computation is given. The set of proposed algorithms where tested on GPGPU NVIDIA CUDA GF 580, the results are satisfying.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2081
Streaming:
Download:
 
Point Cloud Library (PCL) on CUDA
Radu Rusu (Willow Garage Inc), Michael Dixon (Willow Garage Inc)
The Point Cloud Library (PCL - http://pointclouds.org) is a large scale, open project for 3D point cloud processing. The PCL framework contains numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, ...Read More

The Point Cloud Library (PCL - http://pointclouds.org) is a large scale, open project for 3D point cloud processing. The PCL framework contains numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation. Due to the massively parallel nature of many of the above algorithms, GPGPU accelerations holds great potential for achieving real-time performance in numerous applications. In this work we demonstrate some of the recent advances in GPGPU programming for 3D point cloud processing, and outline plans for future development.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2088
Streaming:
Download:
 
GPU Implementation of Deep Learning for Intelligent Computer Vision
Ben Goertzel (Novamente LLC)
Learn how to use GPU supercomputing for intelligent computer vision, via deep learning algorithms. We will focus on a case study of visual object and event recognition in a humanoid robotics context, involving a port to CUDA of the DeSTIN " ...Read More

Learn how to use GPU supercomputing for intelligent computer vision, via deep learning algorithms. We will focus on a case study of visual object and event recognition in a humanoid robotics context, involving a port to CUDA of the DeSTIN "compositional spatiotemporal deep learning network" vision processing algorithm (originally implemented at the University of Tennessee in Knoxville for conventional serial computers). The audience will learn how to use the open-source DeSTIN CUDA code, and also how to port other deep learning algorithms to CUDA.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2104
Streaming:
Download:
 
VScreen: a Real-Time augmented video method
Francisco J. Hernandez--Lopez (CIMAT A.C.), Mariano Rivera (CIMAT A.C.)
We are presenting a tool for image editing that allows us to modify a region of any image or video by another image or video. This application is useful for advertisements, commercials, music videos, movies, etc. We named "Virtual Screen&qu ...Read More

We are presenting a tool for image editing that allows us to modify a region of any image or video by another image or video. This application is useful for advertisements, commercials, music videos, movies, etc. We named "Virtual Screen", or just VScreen, to our development. The main difference between editing (augmenting) videos and fixed images is that the occlusions need be managed. Moving objects in the foreground may occlude the augmented region in background. So that we use a procedure for foreground-background video segmentation, that is implemented in NVIDIA video cards to fulfill the real-time requirement.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2128
Streaming:
Download:
 
Computer Vision Libraries with GPUs
Eric Young (NVIDIA)
Learn how Computer Vision libraries can take advantage of GPUs. Computer Vision algorithms are extremely well suited for GPU architectures because they demand large computational power that GPUs offer over CPUs. This talk provides an overview of ...Read More

Learn how Computer Vision libraries can take advantage of GPUs. Computer Vision algorithms are extremely well suited for GPU architectures because they demand large computational power that GPUs offer over CPUs. This talk provides an overview of the different GPU libraries such as (OpenCV, GPUCV, PCL, and NPP Libraries) and online resources (GPU4Vision and OpeNVIDIA) available for developers today. Examples and demonstrations of practical applications making use of these libraries will also be shown throughout the talk.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2404
Streaming:
Download:
 
High Performance 3D Perception
Chris Slaughter (University of Texas Perception, Lynx Labs)
The path to general purpose graphics programming was driven by computer graphics: the process of rendering 3d models into 2d viewpoints. With the advent of flexible programming of GPGPU processing, this process can be reversed. 3D perception is ...Read More

The path to general purpose graphics programming was driven by computer graphics: the process of rendering 3d models into 2d viewpoints. With the advent of flexible programming of GPGPU processing, this process can be reversed. 3D perception is the problem of inferring structure and motion of the physical world from 2d and 3d measurements. In this talk, we will demonstrate the role GPGPU plays in a diverse set of applications in high speed 3d perception and discuss optimization of these techniques for the GPGPU. We also demonstrate several capabilities of future systems which are enabled by GPGPU technologies.

  Back
 
Keywords:
Computer Vision and Machine Vision, GTC 2012 - ID S2607
Streaming:
Download:
Databases, Data Mining, Business Intelligence
Presentation
Media
30x Faster Regular Expressions on a GPU
David Lehavi (HP Labs)
We present a regular expression (regex) engine on a GPU. We utilize the highly parallel architecture of GPUs to accelerate such searches. We believe that previous attempts to utilize the GPU for this task did not fully tap its potential. Regex p ...Read More

We present a regular expression (regex) engine on a GPU. We utilize the highly parallel architecture of GPUs to accelerate such searches. We believe that previous attempts to utilize the GPU for this task did not fully tap its potential. Regex present imbalanced compute workloads which are very different from common GPU applications (CFD, CG and image processing). Hence, they can teach us general lessons on how to utilize GPUs for more general workloads.Our initial results show 30x improvement in running time relative to single threaded commercial regex engines.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2012 - ID S2043
Streaming:
Download:
 
Efficient Top-Down Planning in Business Intelligence
Tobias Lauer (Jedox AG), Alexander Haberstroh (Jedox AG)
In business intelligence, tasks like corporate planning or what-if analysis complement traditional reporting and analysis. One main difference is that while the latter only read data, the former require the change of possibly large numbers of ex ...Read More

In business intelligence, tasks like corporate planning or what-if analysis complement traditional reporting and analysis. One main difference is that while the latter only read data, the former require the change of possibly large numbers of existing and creation of new data records in the business model, preferably in real time. In this session, we describe the extension of an existing BI tool, Jedox OLAP, by GPU-based parallel algorithms for interactive planning scenarios. Compared to sequential in-memory algorithms, our CUDA approach yields tremendous speedups and can also cope with large amounts of data by using multiple GPUs.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2012 - ID S2219
Streaming:
Download:
 
Intra-Day Risk-Management with Parallelized Algorithms on GPUs
Partha Sen (Fuzzy Logix)
The challenge with intra-day risk management is that a very large number of calculations are required to be performed in a very short amount of time. Typically, we may be interested in calculating VaR for 100 to 1000 securities per second based ...Read More

The challenge with intra-day risk management is that a very large number of calculations are required to be performed in a very short amount of time. Typically, we may be interested in calculating VaR for 100 to 1000 securities per second based on 100 million potential scenarios. The magnitude of these calculations is not Utopian but it reflects the reality of modern financial institutions and exchanges. In this presentation, we outline how the complex problem of intra-day risk management can be solved using parallelized algorithms on GPUs. The methodology has been proven in a POC at 2 financial institutions.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2012 - ID S2427
Streaming:
Download:
Developer - Algorithms
Presentation
Media
Leveraging Matrix Block Structure In Sparse Matrix-Vector Multiplication
Steve Rennich (NVIDIA)
The commonly occurring block structure of sparse matrices can be effectively leveraged to improve the performance of Sparse Matrix-Vector multiplication (SpMV) on GPUs. This session will present one such algorithm and discuss both its design and ...Read More

The commonly occurring block structure of sparse matrices can be effectively leveraged to improve the performance of Sparse Matrix-Vector multiplication (SpMV) on GPUs. This session will present one such algorithm and discuss both its design and its performance relative to other SpMV algorithms. In particular, aspects of GPU floating point performance, GPU memory use, and datastructure translation effort will be detailed.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2029
Streaming:
Download:
 
GPU Parallelization of Gibbs Sampling: Abstractions, Results, and Lessons Learned
Alireza Mahani (Sentrana)
Monte-Carlo-Markov-Chain (MCMC) estimation of Hierarchical Bayesian (HB) models is not only time-consuming, but also difficult to parallelize due to its sequential (Markovian) nature. We present an abstraction of a widely-used MCMC algorithm, ca ...Read More

Monte-Carlo-Markov-Chain (MCMC) estimation of Hierarchical Bayesian (HB) models is not only time-consuming, but also difficult to parallelize due to its sequential (Markovian) nature. We present an abstraction of a widely-used MCMC algorithm, called Gibbs sampling. We define a taxonomy of variable blocks, and for each type of variable block we offer suitable parallelization strategies, along with their corresponding CUDA implementations. For large problems where model estimation may take several hours or days using a single-threaded software, we see speedups in the 30x-100x range, thereby reducing estimation time to a few hours. In addition to lower computation cost relative to MPI-based parallelization, the reduction in estimation time allows for a more interactive modeling experience. We offer an extensive discussion of lessons learned for the broader scientific computing field, including an analysis of tradeoffs between computation costs and development costs, implications of our tradeoff analysis for optimal software development and parallelization, and some practical tips and gotcha's for rookie GPU programmers.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2035
Streaming:
Download:
 
Solving Challenging Numerical Linear Algebra Algorithms Using Multiple GPU Accelerators
Hatem Ltaief (KAUST Supercomputing Laboratory), Stanimire Tomov (University of Tennessee)
See the newest features integrated in MAGMA (Matrix Algebra on GPU and Multicore Architectures) to tackle the multiple GPU-based systems for numerical linear algebra. In this talk, we describe how we leveraged MAGMA to solve existing and new cha ...Read More

See the newest features integrated in MAGMA (Matrix Algebra on GPU and Multicore Architectures) to tackle the multiple GPU-based systems for numerical linear algebra. In this talk, we describe how we leveraged MAGMA to solve existing and new challenging numerical problems on multiple hardware accelerators. Using a hybridization methodology, the new multiGPU-enabled MAGMA is characterized by a representation of linear algebra algorithms as directed acyclic graphs, where nodes correspond to tasks and edges to data dependencies among them, and a dynamic runtime system environment StarPU used to schedule various computational kernels over hybrid architectures of GPUs and homogeneous multicores.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2042
Streaming:
Download:
 
Warped Parallel Nearest Neighbor Searches Using KD-Trees
Andrei Tchouprakov (D4D Technologies), Roman Sokolov (D4D Technologies)
We propose a nearest neighbor search algorithm for a set of closely located query points that utilizes GPU parallelism and is optimized for a single CUDA warp. Instead of each query point traversing its own distinct path, a combined non-divergen ...Read More

We propose a nearest neighbor search algorithm for a set of closely located query points that utilizes GPU parallelism and is optimized for a single CUDA warp. Instead of each query point traversing its own distinct path, a combined non-divergent path suitable for the entire query set can constructed. Therefore, for a single warp a single stack can be maintained for the entire set of query points, allowing for efficient utilization of the shared memory and a number of simultaneous queries equal to the number of threads in a warp.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2079
Streaming:
Download:
 
Floating Point and IEEE 754 Compliance for NVIDIA GPUs: Precision & Performance
Alex Fit-Florea (NVIDIA)
As a result of continuing improvements, NVIDIA offers GPU-accelerated floating-point performance in compliance with IEEE 754. It is our experience that a number of issues related to floating point accuracy and compliance are a frequent source of ...Read More

As a result of continuing improvements, NVIDIA offers GPU-accelerated floating-point performance in compliance with IEEE 754. It is our experience that a number of issues related to floating point accuracy and compliance are a frequent source of confusion both on CPUs and GPUs. The purpose of this talk is to discuss the most common ones related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2085
Streaming:
Download:
 
Summed Area Ripmaps
Gernot Ziegler (NVIDIA)
In this presentation, we show how ripmaps can replace Summed Area Tables (SATs) for the purpose of computing a large number of spatially varying box filter kernels throughout the input data, providing both higher accuracy and higher speed for ty ...Read More

In this presentation, we show how ripmaps can replace Summed Area Tables (SATs) for the purpose of computing a large number of spatially varying box filter kernels throughout the input data, providing both higher accuracy and higher speed for typical use cases. For this purpose, we demonstrate an implementation of ripmap generation in CUDA C (accelerated by shared memory usage), and a texture-cache based box filter for spatially varying kernel sizes, which can be implemented in both CUDA C and graphics-based APIs (e.g. OpenGL and DirectX).

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2096
Streaming:
Download:
 
GPU Based Numerical Methods in Mathematica
Ulises Cervantes-Pimentel (Wolfram Research), Abdul Dakkak (Wolfram Research)
A fast way of developing, prototyping and deploying numerical algorithms that can take advantage of CUDA capable systems is available in Mathematica 8. Over the past year, educators, scientists, and business users have taken advantage of the ben ...Read More

A fast way of developing, prototyping and deploying numerical algorithms that can take advantage of CUDA capable systems is available in Mathematica 8. Over the past year, educators, scientists, and business users have taken advantage of the benefits that the support of GPU programming in Mathematica. By integrating and implementing CUDA/OpenCL in their programs, users make use of a hybrid approach, combining the speed-up that GPUs offer and a powerful numerical development system. In this presentation several examples describing numerical applications ranging from deconvolution of MRI imaging, linear solvers for FEM, systems of ODEs, line integral convolution visualization are presented.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2106
Streaming:
Download:
 
Specialized Sparse Matrix Formats and SpMV Kernel Tuning for GPUs
Alexander Monakov (ISP RAS), Arutyun Avetisyan (ISP RAS)
This session is focused on optimizing sparse matrix-vector product for NVIDIA GPUs. This is a frequently studied kernel that appears in applications employing iterative methods for solving systems of linear equations. In the majority of cases th ...Read More

This session is focused on optimizing sparse matrix-vector product for NVIDIA GPUs. This is a frequently studied kernel that appears in applications employing iterative methods for solving systems of linear equations. In the majority of cases the computation is memory bandwidth bound. Our study focuses on developing specialized sparse matrix storage formats and corresponding CUDA SpMV implementation that achieves high performance at the cost of additional start-up time required for conversion and tuning. The proposed storage formats allow to reduce required memory bandwidth by providing compact coding for locations of some frequently observed patterns of non-zero elements.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2115
Streaming:
Download:
 
On the Parallel Solution of Sparse Triangular Linear Systems
Maxim Naumov (NVIDIA)
A parallel algorithm for solving a sparse triangular linear system on the GPU is proposed. It implements the solution of the triangular system in two phases. The analysis phase builds a dependency graph based on the matrix sparsity pattern and g ...Read More

A parallel algorithm for solving a sparse triangular linear system on the GPU is proposed. It implements the solution of the triangular system in two phases. The analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. The solve phase obtains the full solution by iterating sequentially across the constructed levels. The solution elements corresponding to each level are obtained in parallel. The numerical experiments are presented and it is shown that the incomplete-LU and Cholesky preconditioned iterative methods can achieve a 2x speedup on the GPU over their CPU implementation.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2149
Streaming:
Download:
 
Performance of 3-D FFT Using Multiple GPUs with CUDA 4
Akira Nukada (Tokyo Institute of Technology)
Get the latest information on performance of 3-D fast Fourier transform using multiple GPU devices. CUDA 4.0 enables efficient data transfer between GPUs. It is really important in FFT computation since it requires a large amount of all-to-all d ...Read More

Get the latest information on performance of 3-D fast Fourier transform using multiple GPU devices. CUDA 4.0 enables efficient data transfer between GPUs. It is really important in FFT computation since it requires a large amount of all-to-all data exchange between GPUs. The peer-to-peer communication feature of GPUDirect V2 improves the communication between the devices on same node. GPUDirect also accelerates the communication between GPUs on different nodes. We will present the latest performance results on a four-GPU system and up to 128 compute nodes of TSUBAME 2.0.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2209
Streaming:
Download:
 
1024 Bit Parallel Rational Arithmetic Operators for the GPU
Robert Zigon (Beckman Coulter)
Learn how to create a set of rational arithmetic operators that manipulate 1024 bit operands on a Tesla C2050. These operators are used to create a numerically stable implementation for Bessel functions. Naive implementations of the Bessel funct ...Read More

Learn how to create a set of rational arithmetic operators that manipulate 1024 bit operands on a Tesla C2050. These operators are used to create a numerically stable implementation for Bessel functions. Naive implementations of the Bessel functions produce unreliable results when they are used to solve Maxwell's equations by way of Mie theory. Maxwell's equations are used to model the scattering of light by small particles. Light scatter is used in Particle Characterization to measure the quality of materials like cocoa, cement and pharmaceuticals.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2221
Streaming:
Download:
 
Large Graphs on Multi-GPUs
Enrico Mastrostefano (Sapienza University di Roma)
The goal of this session is to propose new paradigms to explore large graphs on GPUs. Graphs with billions of edges don't fit within the memory of a single GPU. A possible solution is to resort to multiple GPUs. Most of common graph algorith ...Read More

The goal of this session is to propose new paradigms to explore large graphs on GPUs. Graphs with billions of edges don't fit within the memory of a single GPU. A possible solution is to resort to multiple GPUs. Most of common graph algorithms show low arithmetic intensity and irregular access patterns. These features lead to a poor load balance among threads and un-coalesced access to memory. We show how to balance the load to exploit as much as possible all threads and then how to use fast algorithms, as radix-sort and scan, to rearrange data before process them.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2241
Streaming:
Download:
 
3D ADI Method for Fluid Simulation on Multiple GPUs
Nikolai Sakharnykh (NVIDIA), Nikolay Markovskiy (NVIDIA)
Find out about a multiple GPU implementation of the Alternating Direction Implicit method for large 3D domains. The ADI technique is applied towards direct numerical fluid simulation. Modeling complex flows demands extremely large grids and a di ...Read More

Find out about a multiple GPU implementation of the Alternating Direction Implicit method for large 3D domains. The ADI technique is applied towards direct numerical fluid simulation. Modeling complex flows demands extremely large grids and a distributed computation is required for sharing the memory among multiple GPUs. In this session a novel distributed tridiagonal solver as well as parallelization and load balancing strategies will be covered in detail. Finally, a comprehensive performance analysis and scaling studies for different input geometries and possible future improvements will be discussed.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2247
Streaming:
Download:
 
Excitements, Challenges, and Rewards in Optimizing GPGPU Kernels
Rajib Nath (University of California San Diego), Stanimire Tomov (University of Tennessee Knoxville)
Learn about the excitements and challenges in optimizing CUDA kernels for the last two generations of NVIDIA GPGPUs. Autotuning, although crucially important, is merely a silver bullet to port code from one generation of GPU to another. The proc ...Read More

Learn about the excitements and challenges in optimizing CUDA kernels for the last two generations of NVIDIA GPGPUs. Autotuning, although crucially important, is merely a silver bullet to port code from one generation of GPU to another. The process required many steps: (a) architecture specific algorithms, (b) tuning algorithms, (c) finding innovative tricks to handle generic cases, (d) tweaking GPU's internal scheduling to handle partition camping, and (e) above all, the dedication of many enthusiastic programmers. We will share our experiences and discoveries through the development of MAGMABLAS - a subset of CUDA BLAS, highly optimized for NVIDIA GPGPUs.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2248
Streaming:
Download:
 
Telecom Systems Simulations Acceleration via CPU/GPU Co-Processing: Turbo Codes Case Study
Paolo Spallaccini (Ericsson)
Learn how the struggle for acceleration of simulations of a Serially Concatenated turbo code (SCCC) led to the knowledge of new techniques applicable to a broad range of non-natively parallel physical layer telecommunication systems simulations. ...Read More

Learn how the struggle for acceleration of simulations of a Serially Concatenated turbo code (SCCC) led to the knowledge of new techniques applicable to a broad range of non-natively parallel physical layer telecommunication systems simulations. The overall architectural features of CUDA became inspiring for newer parallelization techniques involving algorithm engineering; the simulation acceleration attained for iterative SCCC Decoder represents an example of efficiency of leveraging on heterogeneous GPU-CPU coprocessing concepts. The registrants will deep dive into data sets and tasks organization strategies as well as into results and insights, all widely presented and discussed.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2255
Streaming:
Download:
 
Fast Adaptive Sampling Technique for Multi-Dimensional Integral Estimation Using GPUs
Pradeep Rao (Infosys Technologies Ltd)
Evaluating multi-dimensional integrals is a commonly encountered problem in many areas of science including Physics and Volume estimation of convex bodies. One of the widely used techniques for integral evaluation in large dimensions is the Mont ...Read More

Evaluating multi-dimensional integrals is a commonly encountered problem in many areas of science including Physics and Volume estimation of convex bodies. One of the widely used techniques for integral evaluation in large dimensions is the Monte Carlo method. Vanilla Monte Carlo methods of Integral Estimation use uniform sampling techniques. Variance of such uniform sampling reduces as 1/â??Sample-size, which is too slow for most real life applications. In this study, we discuss about an adaptive sampling technique called VEGAS which reduces the variance at a much faster rate than uniform sampling. We present a new parallel implementation for VEGAS based on CUDA that can significantly reduce the computation time of multi-dimensional integrals. We show that our GPU based implementation of VEGAS achieves up to a 45x speed up over an equivalent CPU based implementation.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2271
Streaming:
Download:
 
Optimization of a Sparse Matrix-Matrix Multiplication on the GPU
Julien Demouth (NVIDIA)
The goal of this session is to present advanced techniques to optimize CUDA code on the GPU. In particular, we will demonstrate the use of advanced CUDA instructions (inline PTX, warp instructions, "extended" syncthreads) and load-bala ...Read More

The goal of this session is to present advanced techniques to optimize CUDA code on the GPU. In particular, we will demonstrate the use of advanced CUDA instructions (inline PTX, warp instructions, "extended" syncthreads) and load-balancing strategies to improve the performance of a sparse matrix-matrix multiplication on the GPU.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2285
Streaming:
Download:
 
Fine-Grained Parallel Preconditioners for Fast GPU-based Solvers
Dimitar Lukarski (Karlsruhe Institute of Technology (KIT)), Jan-Philipp Weiss (Karlsruhe Institute of Technology)
Leverage the power of GPUs for efficient parallel solution of large sparse linear systems of equations by means of fine-grained and scalable parallel preconditioners. In this session we describe parallel preconditioners for GPUs based on multico ...Read More

Leverage the power of GPUs for efficient parallel solution of large sparse linear systems of equations by means of fine-grained and scalable parallel preconditioners. In this session we describe parallel preconditioners for GPUs based on multicolor re-ordering for Gauss-Seidel-type and ILU-type preconditioners as well as approximate inverse (FSAI) preconditioners. With the power(q)-pattern method we detail a novel method for controlling the fill-in pattern of ILU(p) factorizations that introduces a high degree of parallelism in the preconditioning phase. We demonstrate significant improvements with respect to solver time for various problem scenarios and different Krylov-type solvers.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2289
Streaming:
Download:
 
Algorithm Acceleration for Geospatial Analysis
James Goodman (HySpeed Computing LLC), Matthew Sellitto (Northeastern University)
Learn how the power of GPU computing is being leveraged to accelerate algorithms in the field of geospatial image analysis. The data volume and computation requirements associated with geospatial imagery are rapidly expanding as a result of the ...Read More

Learn how the power of GPU computing is being leveraged to accelerate algorithms in the field of geospatial image analysis. The data volume and computation requirements associated with geospatial imagery are rapidly expanding as a result of the increasing number of satellite and airborne sensors, greater data accessibility, and expanded utilization of data intensive technologies. This equates to a growing need for high-performance computing in this field. We demonstrate the capacity for GPU computing to meet this need by accelerating a complex non-linear optimization algorithm used for the mapping and assessment of coral reef ecosystems.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2290
Streaming:
Download:
 
New Advances in GPU Linear Algebra
John Humphrey (EM Photonics), Kyle Spagnoli (EM Photonics)
Hear product experts explain how we have created two of the most widely used libraries in the GPU computing ecosystem. The CULA library for dense linear algebra has been expanding to multi-GPU and out-of-core applications, meaning that users are ...Read More

Hear product experts explain how we have created two of the most widely used libraries in the GPU computing ecosystem. The CULA library for dense linear algebra has been expanding to multi-GPU and out-of-core applications, meaning that users are no longer limited by the onboard GPU memory for their work. In this field, effectively using multiple GPUs is significantly more challenging than a single GPU! The brand new CULA Sparse library tackles the tough world of sparse linear algebra and achieves 10x speedups. Learn more about what makes these two libraries work in this session.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2307
Streaming:
Download:
 
Recent Trends in Hierarchical N-body Methods on GPUs
Rio Yokota (King Abdullah University of Science and Technology)
See the newest developments in the area of hierarchical N-body methods for GPU computing. Hierarchical N-body methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on nex ...Read More

See the newest developments in the area of hierarchical N-body methods for GPU computing. Hierarchical N-body methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. In this session we will cover topics such as hybridization of treecodes and fast multipole methods, auto-tuning kernels for heterogenous systems, fast tree construction based on prefix sums, fast load balancing of global trees, and more. Examples will be given using ExaFMM --an open source hierarchical N-body library for heterogenous systems developed by the speaker. (released at SC11)

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2308
Streaming:
Download:
 
Efficient Graph Matching and Coloring on the GPU
Patrice Castonguay (NVIDIA), Jonathan Cohen (NVIDIA)
The goal of this session is to compare the performance of graph matching and graph coloring algorithms on massively parallel devices such as GPUs. We present novel algorithms, which produce superior results for certain graphs and also discuss th ...Read More

The goal of this session is to compare the performance of graph matching and graph coloring algorithms on massively parallel devices such as GPUs. We present novel algorithms, which produce superior results for certain graphs and also discuss the techniques used to efficiently implement these algorithms on the GPU.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2332
Streaming:
Download:
 
Tree Accumulation on the GPU
Scott Rostrup (Synopsys Inc)
Learn how to map irregular tree structured computations to the GPU efficiently. See how extremely irregular data-dependent computations can be implemented by composing them out of regular data-parallel primitives. In particular we focus on the p ...Read More

Learn how to map irregular tree structured computations to the GPU efficiently. See how extremely irregular data-dependent computations can be implemented by composing them out of regular data-parallel primitives. In particular we focus on the problem of tree accumulation, a generalization of the scan primitive to arbitrary tree data structures. We first show how tree orderings and properties can be computed using the Euler tour technique and standard scan primitives. Using these orderings we then develop our new approach to computing tree accumulations in parallel.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2349
Streaming:
Download:
 
Lossless Data Compression on GPUs
Ritesh Patel (University of California Davis), Jason Mak (University of California Davis)
In this talk, we will discuss common data compression algorithms used in the bzip2 implementation. We will also discuss our efforts towards parallelizing the Burrows-Wheeler Transform, Move-to-Front Transform, and Huffman encoding. The Burrows-W ...Read More

In this talk, we will discuss common data compression algorithms used in the bzip2 implementation. We will also discuss our efforts towards parallelizing the Burrows-Wheeler Transform, Move-to-Front Transform, and Huffman encoding. The Burrows-Wheeler Transform is an algorithm used in both lossless data compression and bioinformatics. We'll explain how it was computed using a parallel string-sorting algorithm. We will also show performance comparisons to serial implementations of each algorithm.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2361
Streaming:
Download:
 
Computing Hausdorff Distances between Freeforms on the GPU
Sara McMains (UC Berkeley), Adarsh Krishnamurthy (UC San Diego)
We present new GPU algorithms for computing the directed Hausdorff distance between freeform surfaces, with applications in shape matching, mesh simplification, and geometric approximation and optimization. Our algorithms run in real-time with v ...Read More

We present new GPU algorithms for computing the directed Hausdorff distance between freeform surfaces, with applications in shape matching, mesh simplification, and geometric approximation and optimization. Our algorithms run in real-time with very small error bounds for parametric models defined by complex NURBS surfaces and can be used to interactively compute the Hausdorff distance for models made of dynamic deformable surfaces. We discuss implementation decisions and tradeoffs between OpenGL, Cuda, and Thrust, and the advantages and disadvantages of parallel hierarchical culling methods for this application.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2410
Streaming:
Download:
 
Artifact-Free Cloud-Based CAD Rendering
Sara McMains (UC Berkeley), Sushrut Pavanaskar (UC Berkeley)
Cloud computing for mechanical CAD provides centrally stored and synchronized models for concurrent engineering. For compactness, trimmed parametric NURBS surface representations are optimal for data transfer to client devices, which must evalua ...Read More

Cloud computing for mechanical CAD provides centrally stored and synchronized models for concurrent engineering. For compactness, trimmed parametric NURBS surface representations are optimal for data transfer to client devices, which must evaluate and render models locally. Direct GPU rendering without pre-tessellation is an attractive solution in this context, both for speed and to preserve fidelity to the original geometry. However, existing data-parallel direct rendering approaches for NURBS suffer from rendering artifacts at trim boundaries. This talk proposes a solution to address these rendering artifacts that are still preventing wide-scale adoption of all such direct rendering algorithms for trimmed parametric models.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2411
Streaming:
Download:
 
An Accelerated Weeks Method for Numerical Laplace Transform Inversion
Patrick Kano (Acunum Algorithms and Simulations, LLC)
Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions ...Read More

Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions in physically meaningful variables. A robust numerical inversion approach is thus desirable. In this talk, I present one of the approaches to compute an approximate inverse, the Weeks method. I will also discuss the difficulties in performing numerical inversion. Finally, I will show how we have been able to utilize Jacket from AccelerEyes in MATLAB to more efficiently and robustly implement the Weeks method.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2415
Streaming:
Download:
 
New Ideas for Massively Parallel Preconditioners
John Appleyard (Polyhedron Software Ltd), Jeremy Appleyard (Polyhedron Software Ltd)
Linear Solvers on serial machines tend to be highly recursive, but that's not an option on GPUs. In this paper we describe a new preconditoner for GMRES and similar Krylov subspace linear solvers that is highly parallel, but also provides ef ...Read More

Linear Solvers on serial machines tend to be highly recursive, but that's not an option on GPUs. In this paper we describe a new preconditoner for GMRES and similar Krylov subspace linear solvers that is highly parallel, but also provides effective mechanisms to reconcile remote driving forces in a spatially discretized system. We will present results, taken from some real-world studies using a commercial oil reservoir simulator, showing how it compares with a state of the art serial solver, and showing how performance scales in a domain decomposition formulation run on a multiple CPU+GPU cluster.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2432
Streaming:
Download:
 
Faster Finite Elements for Wave Propagation Codes
Max Rietmann (Institute for Computational Science / USI Lugano, Switzerland)
Learn how to develop faster and better finite-element codes for wave propagation using GPUs and MPI combined with overlapping techniques to hide the cost of communications and of host/device memory copies. Different options based on mesh colorin ...Read More

Learn how to develop faster and better finite-element codes for wave propagation using GPUs and MPI combined with overlapping techniques to hide the cost of communications and of host/device memory copies. Different options based on mesh coloring or on atomic operations will be presented. The difficulty to define speedup will also be discussed (speedup versus what? using what definition of "cost"?). Examples will be given using SPECFEM3D, a highly optimized spectral finite-element code that has won the Gordon Bell SuperComputing award and the BULL Joseph Fourier award, and that can run on CPU or GPU clusters.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2508
Streaming:
Download:
 
Scalable GPU Graph Traversal
Duane Merrill (NVIDIA)
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both ir ...Read More

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter. We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2600
Streaming:
Download:
Developer - Programming Languages
Presentation
Media
Accelerator Directives, OpenACC and OpenMP4ACC
Rather than require the programmer to rewrite code for accelerators several directive sets have been created and proposed to support non-cache coherent and cache coherent accelerators. This talk will present the OpenACC specification and its imp ...Read More

Rather than require the programmer to rewrite code for accelerators several directive sets have been created and proposed to support non-cache coherent and cache coherent accelerators. This talk will present the OpenACC specification and its implementation for Cray developers, as well as touch on a similar proposal being evaluated by the OpenMP language committee. The presentation will start by discussing the Memory and Execution model needed to allow a programmer to write codes that will run effectively on both distinct memory systems and unified memory systems. Once a proper background has been set the directives will be examined via usage examples.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2089
Streaming:
Download:
 
Compiling CUDA and Other Languages for GPUs
This talk gives an overview of the technology behind NVIDIA's CUDA C and OpenCL C compilers, as well as the GPU architecture as seen from a compiler's perspective. Similarities and differences with compiling to a CPU are also discussed. ...Read More

This talk gives an overview of the technology behind NVIDIA's CUDA C and OpenCL C compilers, as well as the GPU architecture as seen from a compiler's perspective. Similarities and differences with compiling to a CPU are also discussed. We provide insights into compiler optimizations affect performance and how other languages could be targeted to GPUs.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2235
Streaming:
Download:
 
Harnessing GPU Compute with C++ AMP (Part 1 of 2)
Daniel Moth (Microsoft)
C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. After a quick overview of the technology understanding its goals and it ...Read More

C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2242
Streaming:
Download:
 
Harnessing GPU Compute with C++ AMP (Part 2 of 2)
Daniel Moth (Microsoft)
C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. After a quick overview of the technology understanding its goals and it ...Read More

C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2244
Streaming:
Download:
 
Exploiting Fault Tolerant Heterogeneous Parallelism with SPM.Python
Minesh B. Amin (MBA Sciences)
In this session, we shall review how SPM.Python enables the exploitation of parallelism across servers, cores and GPUs in a fault tolerant manner. We will start off by describing the how/what/why SPM.Python augments the traditional (serial) Pyth ...Read More

In this session, we shall review how SPM.Python enables the exploitation of parallelism across servers, cores and GPUs in a fault tolerant manner. We will start off by describing the how/what/why SPM.Python augments the traditional (serial) Python with parallel concepts like parallel task managers and communication primitives. Specifically, the context for and solutions to three formally open technical problems will be described. We will conclude by reviewing examples of how SPM.Python can be used to exploit both coarse and fine grain parallelism using GPUs within and across servers in a fault tolerant manner.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2299
Streaming:
Download:
 
Jet: A Domain-Specific Approach to Parallelism for Film Fluid Simulation
Dan Bailey (Double Negative)
Discover how a domain-specific language can not only provide fast parallel performance but a simpler user experience in an environment that highly values flexibility. This talk will present the Jet language and heterogeneous compiler built on th ...Read More

Discover how a domain-specific language can not only provide fast parallel performance but a simpler user experience in an environment that highly values flexibility. This talk will present the Jet language and heterogeneous compiler built on the LLVM compiler framework that enables efficient generation of X86 machine code or NVIDIA PTX for stencil computation on structured grids. We show that moving target-specific optimizations upstream into the compiler can greatly improve the ability to manipulate the logic of the solver and thus lower the barrier-to-entry for artists and developers without compromising on performance.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2300
Streaming:
Download:
 
Accelerating miniFE: A Finite Element Mini-application
Justin Luitjens (NVIDIA)
The Mantevo performance project is a collection of self-contained proxy applications that illustrate the main performance characteristics of important algorithms. miniFE is intended to be and approximation to an unstructured implicit finite elem ...Read More

The Mantevo performance project is a collection of self-contained proxy applications that illustrate the main performance characteristics of important algorithms. miniFE is intended to be and approximation to an unstructured implicit finite element or finite volume application. Our work investigated algorithms for assembling a matrix on the GPU. Parallelization algorithms using both 1 thread and 8 threads per element were investigated. Using these approaches a significant speedup (over 60x for double precision) compared to the serial algorithm.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2302
Streaming:
Download:
 
New Features In the CUDA Programming Model
Stephen Jones (NVIDIA)
The continuing evolution of the GPU brings with it new hardware capabilities and new functionality. Simultaneously, ongoing development of CUDA and its tools, libraries and ecosystem brings new features to the software stack as well. Come and le ...Read More

The continuing evolution of the GPU brings with it new hardware capabilities and new functionality. Simultaneously, ongoing development of CUDA and its tools, libraries and ecosystem brings new features to the software stack as well. Come and learn from on of CUDA's programming model architects about what's new in the GPU, what's coming in the next release of CUDA, how it works, and how it all fits together.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2338
Streaming:
Download:
 
Delite: A Framework for Implementing Heterogeneous Parallel DSLs
Domain-specific languages can be a solution for heterogeneous parallel computing since they provide higher productivity and performance. To lower the barrier for DSL development, we implemented the Delite compiler framework and runtime. DSL deve ...Read More

Domain-specific languages can be a solution for heterogeneous parallel computing since they provide higher productivity and performance. To lower the barrier for DSL development, we implemented the Delite compiler framework and runtime. DSL developers can easily extend the framework to build a new DSL. The framework provides various optimization facilities and automatically generates code for heterogeneous hardware including GPU. The runtime executes the generated code in parallel by scheduling the kernels on target devices and managing the memory allocations and data transfers. This talk will cover the details of Delite with examples from OptiML, a machine learning DSL implemented with the framework.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2365
Streaming:
Download:
 
Physis: An Implicitly Parallel Framework for Stencil Computations
Naoya Maruyama (RIKEN Advanced Institute for Computational Science)
This session presents how to implement finite difference methods in a concise, readable, and portable way, yet achieving good scalability over hundreds of GPUs, using the Physis high-level application framework. Physis extends the standard C lan ...Read More

This session presents how to implement finite difference methods in a concise, readable, and portable way, yet achieving good scalability over hundreds of GPUs, using the Physis high-level application framework. Physis extends the standard C language with a small set of custom declarative constructs for expressing stencil computations with multidimensional structured grids, which are automatically translated to CUDA for GPU acceleration and MPI for node-level parallelization with automatic domain-specific optimizations such as overlapped boundary exchanges. We demonstrate the programmability improvement and performance of Physis using hundreds of GPUs on TSUBAME2.0.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2367
Streaming:
Download:
 
GPU Performance Analysis and Optimization
Paulius Micikevicius (NVIDIA)
This session will present the fundamental performance-optimization concepts and illustrate their practical application in the context of programming for Fermi and Kepler GPUs. The goal is twofold: make the optimization process a methodical seque ...Read More

This session will present the fundamental performance-optimization concepts and illustrate their practical application in the context of programming for Fermi and Kepler GPUs. The goal is twofold: make the optimization process a methodical sequence of steps, facilitate making performance-aware algorithmic decisions before coding even starts. In order to maximize GPU performance, a code should have sufficient parallelism, access memory in a coalesced pattern, and be amenable to vector execution within warps (groups of 32 threads). We will show how to quantify these requirements for a specific GPU in order to determine performance limiters and their importance for a given code. To address the limiters, we will review hardware operation specifics and related optimization techniques. Optimization process will be illustrated using NVIDIA profiling tools and kernel case studies.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2514
Streaming:
Download:
 
Multi-GPU Programming
Paulius Micikevicius (NVIDIA)
CUDA releases starting with 4.0 include a number of features that facilitate multi-GPU programming and computing. In this session we will review the features useful for programming for multiple GPUs, both within a single node and across network. ...Read More

CUDA releases starting with 4.0 include a number of features that facilitate multi-GPU programming and computing. In this session we will review the features useful for programming for multiple GPUs, both within a single node and across network. We will cover peer-to-peer GPU communication, communication patterns for various GPU topologies, as well as streams in the context of multiple GPUs. Concepts will be illustrated with a case study of 3D forward wave modeling, common in seismic computing.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2515
Streaming:
Download:
 
Copperhead: Data Parallel Python
Bryan Catanzaro (NVIDIA)
Copperhead is a data parallel language suitable for GPU programming, embedded in Python, which aims to provide both a productive programming environment as well as excellent computational efficiency. Copperhead programs are written in a small, r ...Read More

Copperhead is a data parallel language suitable for GPU programming, embedded in Python, which aims to provide both a productive programming environment as well as excellent computational efficiency. Copperhead programs are written in a small, restricted subset of the Python language, using standard constructs like map and reduce, along with traditional data parallel primitives like scan and sort. Copperhead programs interoperate with existing Python numerical and visualization libraries such as NumPy, SciPy, and Matplotlib. In this talk, we will discuss the Copperhead language, the open-source Copperhead runtime, and selected example programs.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2525
Streaming:
Download:
 
An Introduction to the Thrust Parallel Algorithms Library
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances developer productivity while enabling performance portability between GPUs and multicore CPUs. In ...Read More

Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances developer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB and OpenMP) facilitates integration with existing software. In this talk we'll walk though the library's main features and explain how developers can build high-performance applications rapidly with Thrust.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2602
Streaming:
Download:
 
NVIDIA OpenACC
Duncan Poole (NVIDIA)
OpenACC is a directives-based programming standard for parallel computing on accelerators (including GPUs). It is designed to harness the transformative power of heterogeneous computing systems easily and quickly. Adding simple compiler hints to ...Read More

OpenACC is a directives-based programming standard for parallel computing on accelerators (including GPUs). It is designed to harness the transformative power of heterogeneous computing systems easily and quickly. Adding simple compiler hints to your code to express parallelism, allows the compiler to map computation onto anaccelerator. OpenACC directives allow developers to make simple and portable code changes, enabling an easier migration to accelerated computing. This talk discusses the merits of this model, and provides an overview and guidance of the tools available to the developer from the OpenACC members.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2621
Streaming:
Download:
 
The PGI Fortran and C99 OpenACC Compilers
Brent Leback (Portland Group)
Experienced GPU programmers will learn about the latest PGI OpenACC Fortran and C compilers. This session discusses how and where to apply the Parallel and Kernels constructs and the differences between the two. It includes a review of the lates ...Read More

Experienced GPU programmers will learn about the latest PGI OpenACC Fortran and C compilers. This session discusses how and where to apply the Parallel and Kernels constructs and the differences between the two. It includes a review of the latest PGI release and a comparison of the OpenACC standard to the PGI Accelerator Model. Live component demonstrates how to interpret compiler feedback and how to use it to enable better performance and how to inter-operate with lower-level explicit GPU languages like CUDA and OpenCL. The presentation wraps up with a look at planned future enhancements.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2622
Streaming:
Download:
 
CUDA 5 and Beyond
Mark Harris (NVIDIA)
CUDA, NVIDIA's platform for parallel computing, has grown rapidly in the past 5 years. The performance and efficiency of software built on CUDA, combined with a thriving ecosystem of programming languages, libraries, tools, training, and ser ...Read More

CUDA, NVIDIA's platform for parallel computing, has grown rapidly in the past 5 years. The performance and efficiency of software built on CUDA, combined with a thriving ecosystem of programming languages, libraries, tools, training, and service providers, have helped make GPU computing a leading HPC technology. CUDA 5 and the Kepler GPU architecture don't just increase application performance; they enable a more powerful parallel programming model that expands the possibilities of GPU computing, and language features that improve programmer productivity. In this talk you'll hear about these revolutionary features and get insight into the philosophy driving the development of new CUDA hardware and software. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2641
Streaming:
Download:
 
Inside Kepler
In this talk, individuals from the GPU architecture and CUDA software groups will dive into the features of the compute architecture for ??Kepler, NVIDIA??s new transistor GPU. From the reorganized processing cores with new instructions ...Read More

In this talk, individuals from the GPU architecture and CUDA software groups will dive into the features of the compute architecture for ??Kepler, NVIDIA??s new transistor GPU. From the reorganized processing cores with new instructions and processing capabilities, to an improved memory system with faster atomic processing and low-overhead ECC, we will explore how the Kepler GPU achieves world leading performance and efficiency, and how it enables wholly new types of parallel problems to be solved.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2642
Streaming:
Download:
 
Massively Parallel Code Development on Stelletto CDA (Presented by Creative Consultants)
Come participate in the global launch of Stelletto - a multi-Node, office based, GPU accelerated conSTELLAtion compute platform. Join Rob Farber (author/scientist), Denis Gerrer (CAPS Enterprise), and Greg Scantlen (Creative Consultants) to lear ...Read More

Come participate in the global launch of Stelletto - a multi-Node, office based, GPU accelerated conSTELLAtion compute platform. Join Rob Farber (author/scientist), Denis Gerrer (CAPS Enterprise), and Greg Scantlen (Creative Consultants) to learn how to create and leverage massively parallel applications. Whether you are porting legacy code or developing new code from scratch, the Stelletto Code Development Appliance offers a cost-effective methodology for producing scalable apps. In 50 minutes you will learn the essentials of assembling a complete hardware and software solution for scalable Many-Core and GPU accelerated code development from plug-in Stelletto to massively parallel executable code.

  Back
 
Keywords:
Developer - Programming Languages, GTC 2012 - ID S2646
Download:
Developer - Tools & Libraries
Presentation
Media
Debugging Experience with CUDA-GDB and CUDA-MEMCHECK
CUDA Debugger tools CUDA-GDB and CUDA-MEMCHECK provide a whole new feature set to help improve your CUDA application development cycle. This session is a detailed walk-through of the key new features and advanced techniques on using CUDA-GDB and ...Read More

CUDA Debugger tools CUDA-GDB and CUDA-MEMCHECK provide a whole new feature set to help improve your CUDA application development cycle. This session is a detailed walk-through of the key new features and advanced techniques on using CUDA-GDB and CUDA-MEMCHECK together to improve overall code productivity. This tutorial will also include live demos.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2027A
Streaming:
Download:
 
Teraflop GPU Acceleration Of Large Matrix Algebra
Ronald Young (Multipath Corporation)
Learn how Multipaths Fast Matrix Solver (FMS) is setting performance records using multiple GPUs solving large matrices in production applications. By (1) leveraging NVIDIAs CUBLAS library, (2) operating multiple GPUs in parallel and (3) overlap ...Read More

Learn how Multipaths Fast Matrix Solver (FMS) is setting performance records using multiple GPUs solving large matrices in production applications. By (1) leveraging NVIDIAs CUBLAS library, (2) operating multiple GPUs in parallel and (3) overlapping data transfers with computation, FMS averages over 2 teraflops of performance, even on jobs lasting for days. The presentation also includes a description of what problems FMS solves and how it is incorporated into applications programs.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2032
Streaming:
Download:
 
PFAC Library: GPU-Based String Matching Algorithm
Cheng-Hung Lin (National Taiwan Normal University)
In this section, we first propose an exact string matching algorithm, called Parallel-Failureless Aho-Corasick (PFAC) algorithm which is used to match input texts against a set of string patterns on GPUs. The string patterns are compiled into a ...Read More

In this section, we first propose an exact string matching algorithm, called Parallel-Failureless Aho-Corasick (PFAC) algorithm which is used to match input texts against a set of string patterns on GPUs. The string patterns are compiled into a finite state machine similar to the well-known Aho-Corasick algorithm. Furthermore, to accommodate large number of patterns, we present two kinds of hash functions which are adopted to compress the state transition table. The experimental results show that the PFAC library achieves significant performance on NVIDIA GPUs. Finally, the PFAC library has been released on Google code (http://code.google.com/p/pfac/).

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2054
Streaming:
Download:
 
The High-Level Linear Algebra Library ViennaCL And Its Applications
Karl Rupp (TU Wien)
Get to know ViennaCL, an OpenCL high-level linear algebra software, which allows to get the speed of GPU computing at the convenience level of the C++ Boost libraries. Decrease the development and execution time of applications by utilizing our ...Read More

Get to know ViennaCL, an OpenCL high-level linear algebra software, which allows to get the speed of GPU computing at the convenience level of the C++ Boost libraries. Decrease the development and execution time of applications by utilizing our well-tested and widely used library, instead of spending days on learning details of GPU architectures and debugging. We provide examples that demonstrate not only how quickly existing applications are ported efficiently from single-threaded execution to fully utilizing multi-threaded environments, but also how to utilize the rich set of functionalities ranging from common BLAS routines to iterative solvers.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2071
Streaming:
Download:
 
Techniques for Designing GPGPU Games
Learn how to develop faster and better games with the use of GPGPU thought the use of Game GPU tricks. Normally, games process most of its tasks in the CPU, using the GPU only for graphics processing. This session shows some techniques on how to ...Read More

Learn how to develop faster and better games with the use of GPGPU thought the use of Game GPU tricks. Normally, games process most of its tasks in the CPU, using the GPU only for graphics processing. This session shows some techniques on how to better use the GPGPU power to process all the game logic, achieving speedups when compared to CPU, and traditional GPU models. This session also shows some examples of this technique in practice.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2074
Streaming:
Download:
 
Panoptes: A Binary Instrumentation Framework for CUDA
Christopher Kennelly (D. E. Shaw Research)
Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks, while the instrumentation and analysis tools available to date for GPU environments have been more limited. Here we present Panoptes, a binary ins ...Read More

Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks, while the instrumentation and analysis tools available to date for GPU environments have been more limited. Here we present Panoptes, a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, Panoptes allows computationally intensive programs to be run at the native parallelism of the device during analysis. To demonstrate the instrumentation capabilities of Panoptes, we will present our work on a memory addressability and validity checker that targets CUDA programs.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2078
Streaming:
Download:
 
Debugging GPU Applications For Correctness and Performance
David Lecomber (Allinea Software)
This session reveals how debugging CUDA applications is made straightforward with the powerful Allinea DDT debugger. New features enabling greater understanding of performance optimizations will be explored, showing how they can be used to produ ...Read More

This session reveals how debugging CUDA applications is made straightforward with the powerful Allinea DDT debugger. New features enabling greater understanding of performance optimizations will be explored, showing how they can be used to produce better, faster CUDA code. Coupled with newly released support for multiple languages and compilers we will also show how Allinea DDT is enabling developers on desktops and the largest supercomputers to achieve both correct and efficient GPU applications.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2099
Streaming:
Download:
 
Software Architecture to Facilitate CUDA Development
We describe a workflow architecture and its use in developing Schrodinger's core-hopping application. The application supplies the stages as callbacks. A stage may have multiple implementations; for example, CUDA and CPU. An implementation c ...Read More

We describe a workflow architecture and its use in developing Schrodinger's core-hopping application. The application supplies the stages as callbacks. A stage may have multiple implementations; for example, CUDA and CPU. An implementation can be assigned a maximum number of simultaneous threads. When any stage completes, a scheduling algorithm determines which implementation of which stage will be launched next. The application may detect "special" environments, such as CUDA, and set up its stages accordingly, or it may allow specification of which implementation of each stage to run. This makes it easy to develop and debug CUDA stages flexibly and incrementally.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2121
Streaming:
Download:
 
ASI Parallel Fortran: A General-Purpose Fortran to GPU Translator
Rainald Lohner (George Mason University)
Over the last 3 years we have developed a general-purpose Fortran to GPU translator: ASI Parallel Fortran does. The talk will detail its purpose, design layout and capabilities, and show how it is used and implemented. The use of ASI Parallel Fo ...Read More

Over the last 3 years we have developed a general-purpose Fortran to GPU translator: ASI Parallel Fortran does. The talk will detail its purpose, design layout and capabilities, and show how it is used and implemented. The use of ASI Parallel Fortran will be shown for large-scale CFD/CEM codes as well as other general purpose Fortran codes.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2218
Streaming:
Download:
 
Trace Based Performance Analysis For GPU Accelerated Multi-Hybrid Applications
Guido Juckeland (TU Dresden - ZIH)
Get in contact with performance tuning experts for multi-hybrid applications and see first hand how VampirTrace/Vampir can significantly speed up application porting and development. ...Read More

Get in contact with performance tuning experts for multi-hybrid applications and see first hand how VampirTrace/Vampir can significantly speed up application porting and development.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2257
Streaming:
Download:
 
Jacket for Multidimensional Scaling in Genomics
Chris McClanahan (AccelerEyes)
In this tutorial, we will present AccelerEyes?? Jacket software which enables GPU computing in MATLAB through a user case study entitled ??Multidimensional Scaling for Genomics?. We show how Jacket enables developers to write and run code ...Read More

In this tutorial, we will present AccelerEyes?? Jacket software which enables GPU computing in MATLAB through a user case study entitled ??Multidimensional Scaling for Genomics?. We show how Jacket enables developers to write and run code on the GPU in the native M-Language used in MATLAB. By simply casting data to Jacketâ??s GPU data structure, MATLAB functions are transformed into GPU functions. Additionally, we will also include demos of running MATLAB code on the GPU for image and signal processing, life science, finance, and other applications. A Q/A session will enable audience members to ask specific questions about Jacket.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2287
Streaming:
Download:
 
Performance Tools for GPU-Powered Scalable Heterogeneous Systems
Allen Malony (University of Oregon)
Discover the latest parallel performance tool technology for understanding and optimizing parallel computations on scalable heterogeneous platforms. The session will present the TAU performance system and its support of measurement and analysis ...Read More

Discover the latest parallel performance tool technology for understanding and optimizing parallel computations on scalable heterogeneous platforms. The session will present the TAU performance system and its support of measurement and analysis of heterogeneous platforms composed of clusters of shared-memory nodes with GPUs. In particular, TAU's integration of the CUPTI 4.1+ technology will be described and demonstrated through CUDA SDK examples and the SHOC benchmarks. Attendees will be provided LiveDVDs containing the TAU toolsuite and many pre-installed parallel tool packages. It will also include the last CUDA driver, runtime library, and CUPTI.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2298
Streaming:
Download:
 
PTask: OS Support for GPU Dataflow Programming
This session considers the PTask API, OS-level abstractions that support GPUs as first-class computing resources, and supports a dataflow programming model. With PTask, the programmer specifies where data goes, rather than how and when it should ...Read More

This session considers the PTask API, OS-level abstractions that support GPUs as first-class computing resources, and supports a dataflow programming model. With PTask, the programmer specifies where data goes, rather than how and when it should get there, allowing the system to provide fairness and isolation guarantees, streamline data movement in ways that currently require direct programmer involvement, and enable code portabality across diverse GPU-based platforms. Our experience building the PTask APIs shows that PTask can provide important system-wide guarantees and can enable significant performance benefits, for example improving the throughput of hand-tuned CUDA programs by up to 2x.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2320
Streaming:
Download:
 
ArrayFire Graphics: A Tutorial
Chris McClanahan (AccelerEyes)
Learn how to use the graphics primitives for GPU computing available in ArrayFire, a new C and C++ library for GPU computing in both CUDA and OpenCL. In this session, we will cover the capabilities of ArrayFire's graphics primitives and show ...Read More

Learn how to use the graphics primitives for GPU computing available in ArrayFire, a new C and C++ library for GPU computing in both CUDA and OpenCL. In this session, we will cover the capabilities of ArrayFire's graphics primitives and show how to build fast, visual computing applications. The tutorial centers around the construction of an application for the computation of optical flow on the GPU and will illustrate how to couple graphics with compute using ArrayFire's graphics primitives. We will also show how the graphics primitives can be composed to result in scalable, fast graphics that complement GPU applications.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2325
Streaming:
Download:
 
GMAC-2: Easy and Efficient Programming for CUDA-Based Systems
In this talk we introduce GMAC-2, a framework that eases the development of CUDA applications and tools while achieving similar or better performance than hand-tuned code. The new features implemented in GMAC-2 allow programmers to further fine- ...Read More

In this talk we introduce GMAC-2, a framework that eases the development of CUDA applications and tools while achieving similar or better performance than hand-tuned code. The new features implemented in GMAC-2 allow programmers to further fine-tune their code and remove some limitations found in the original GMAC library. For example, memory objects can be now arbitrarily mapped on several devices without restrictions and a host thread can launch kernels on any GPU in the system. Moreover, GMAC-2 transparently takes advantage of the new features offered by the hardware like the GPUDirect 2 peer-to-peer communication.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2333
Streaming:
Download:
 
Debug Multi-GPU Applications on CUDA-Accelerated Clusters with TotalView
Chris Gottbrath (Rogue Wave Software)
Learn how TotalView can help you develop CUDA applications on single servers, multi-GPU servers, and HPC-style clusters. For more than 20 years the TotalView debugger has set the standard for parallel and multi-core debugging on Linux, HPC clust ...Read More

Learn how TotalView can help you develop CUDA applications on single servers, multi-GPU servers, and HPC-style clusters. For more than 20 years the TotalView debugger has set the standard for parallel and multi-core debugging on Linux, HPC clusters and custom supercomputers such as the Cray XT/XE/XK series. CUDA developers deal with the same types of complexity and can realize the same productivity benefits. This talk will introduce TotalView for CUDA and show how you can program more easily with CUDA 3.2, 4.0 and 4.1.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2340
Streaming:
Download:
 
A High Level Programming Environment for Accelerated Computing
Luiz DeRose (Cray Inc.)
One of the critical hurdles for the widespread adoption of accelerated computing in HPC is programming difficulty. Users need a simple programming model that is portable and is not significantly different from the approaches used on current mult ...Read More

One of the critical hurdles for the widespread adoption of accelerated computing in HPC is programming difficulty. Users need a simple programming model that is portable and is not significantly different from the approaches used on current multi-core x86 processors. In this talk I will present Cray's strategy to accelerator programming, which is based on a high level programming environment with tightly coupled compilers, libraries, and tools. Ease of use is possible with compiler making it feasible for users to write applications in Fortran, C, C++, tools to help users port and optimize for accelerators, and auto-tuned scientific libraries.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2407
Streaming:
Download:
 
Optimizing Application Performance with CUDA Profiling Tools
David Goodwin (NVIDIA)
NVIDIA provides two powerful profiling tools that you can use to maximize your application??s performance. The NVIDIA Visual Profiler helps you understand your application??s behavior with a detailed timeline and data from GPU performance co ...Read More

NVIDIA provides two powerful profiling tools that you can use to maximize your application??s performance. The NVIDIA Visual Profiler helps you understand your application??s behavior with a detailed timeline and data from GPU performance counters. The Visual Profiler also provides an automatic, data-driven analysis engine that provides suggestions on potential optimization strategies for your application. Nvprof is a command-line profiler that provides gprof-like functionality for the GPU. Nvprof provides summary information about where your application is spending the most time, so that you can focus your optimization efforts. This session will provide a step-by-step walk through of both of these profiling tools, showing how you can use these tools to identify optimization opportunities at the application, kernel, and source-line levels.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2419A
Streaming:
Download:
 
Optimizing Application Performance with CUDA Profiling Tools
David Goodwin (NVIDIA)
NVIDIA provides two powerful profiling tools that you can use to maximize your applicationâ??s performance. The NVIDIA Visual Profiler helps you understand your applicationâ??s behavior with a detailed timeline and data from GPU ...Read More

NVIDIA provides two powerful profiling tools that you can use to maximize your applicationâ??s performance. The NVIDIA Visual Profiler helps you understand your applicationâ??s behavior with a detailed timeline and data from GPU performance counters. The Visual Profiler also provides an automatic, data-driven analysis engine that provides suggestions on potential optimization strategies for your application. Nvprof is a command-line profiler that provides gprof-like functionality for the GPU. Nvprof provides summary information about where your application is spending the most time, so that you can focus your optimization efforts. This session will provide a step-by-step walk through of both of these profiling tools, showing how you can use these tools to identify optimization opportunities at the application, kernel, and source-line levels.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2419A
Streaming:
Download:
 
Nsight IDE for Linux and Mac
Nsight IDE for Linux and Mac is an all-in-one development environment that lets you develop, debug and optimize CUDA code in an integrated UI environment. If you were waiting for an IDE on Linux and Mac then this session is for you. This session ...Read More

Nsight IDE for Linux and Mac is an all-in-one development environment that lets you develop, debug and optimize CUDA code in an integrated UI environment. If you were waiting for an IDE on Linux and Mac then this session is for you. This session provides a detail usage walk-through of a fully CUDA aware source editor, build integration of the CUDA toolchain, graphical debugger for both CPU and GPU, and graphical profiler to enable performance optimization.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2420
Streaming:
Download:
 
Panini: A GPU Aware Array Class
We present a new templated C++ class library, PANINI, for use in the development of large-scale scientific simulations in an hetrogeneous computing environment. The key feature of this new library is a generic parallel array class built on advan ...Read More

We present a new templated C++ class library, PANINI, for use in the development of large-scale scientific simulations in an hetrogeneous computing environment. The key feature of this new library is a generic parallel array class built on advanced generic programming methodologies where details of parallelization is hidden inside the array class itself. This library will be used for Poison Solver, Advection Diffusion and other equation.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2428
Streaming:
Download:
 
Developing Next-Generation CUDA Acceleration in Wolfram's Mathematica with Parallel Nsight
Since version 8, Mathematica offers advanced support for GPU acceleration with optimized CUDA functions and a built-in framework for developing scientific CUDA kernel code. In this session, the Wolfram development team will share their experienc ...Read More

Since version 8, Mathematica offers advanced support for GPU acceleration with optimized CUDA functions and a built-in framework for developing scientific CUDA kernel code. In this session, the Wolfram development team will share their experience developing their next-generation CUDA support in Mathematica. From the unique ability of Parallel Nsight to attach its CUDA debugger to a running process, the new parallel Warp Watch for warp-wide variable views and expression evaluation, to the latest runtime CUDA profiling experiments; they will demonstrate how they were able to take advantage of Parallel Nsight to get the most out of CUDA and the GPU.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2430
Streaming:
Download:
 
cudaDMA: Emulating DMA engines on GPUs for Performance and Programmability
Brucek Khailany (NVIDIA)
The CudaDMA library is a collection of DMA objects that support efficient movement of data between off-chip global memory and on-chip shared memory in CUDA kernels. CudaDMA objects support many different data transfer patterns including sequenti ...Read More

The CudaDMA library is a collection of DMA objects that support efficient movement of data between off-chip global memory and on-chip shared memory in CUDA kernels. CudaDMA objects support many different data transfer patterns including sequential, strided, gather, scatter, and halo patterns. The library encapsulates efficient synchronization and data transfer implementations to achieve high memory bandwidth utilization. Programmer productivity is achieved by avoiding the need for thread array shapes to match data layout. Using CudaDMA, speedups of up to 1.37x on synthetic micro-benchmarks and 1.15x-3.2x on kernels from scientific applications have been demonstrated.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2605
Streaming:
Download:
 
CUDA Debugger Training on Windows
NVIDIA Developer Tools Team (NVIDIA)
Nsight offers a variety of powerful CUDA debugging feature set that enables developers to quickly spot bugs. From the memory checker to advanced breakpoints and variable warp watch panel, a developer can quickly isolate access memory errors, fil ...Read More

Nsight offers a variety of powerful CUDA debugging feature set that enables developers to quickly spot bugs. From the memory checker to advanced breakpoints and variable warp watch panel, a developer can quickly isolate access memory errors, filter out the thousands of threads to a specific thread and quickly spot abnormal variable value ranges. Through a set of comprehensive exercises, the attendee will be able to utilize these features to become fully proficient at developing CUDA code. Please note this is a hands-on lab and seating is very limited. If this session is full and you still wish to participate, please come by and sign up to be on the waitlist.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2811
Streaming:
Download:
 
CUDA Profiler Training on Windows
NVIDIA Developer Tools Team (NVIDIA)
Nsight offers a comprehensive set of performance analysis tools. From the ability to trace complete system multi-core CPU and multi GPU activities, to profile CUDA kernel with precise profiling experiments, developers can identify system level o ...Read More

Nsight offers a comprehensive set of performance analysis tools. From the ability to trace complete system multi-core CPU and multi GPU activities, to profile CUDA kernel with precise profiling experiments, developers can identify system level optimization opportunities as well as expensive and inefficient CUDA kernels requiring in-depth analysis with the CUDA profiler. Through a set of comprehensive exercises, the attendee will be able to utilize these features to become fully proficient at optimizing complex CUDA applications. Please note this is a hands-on lab and seating is very limited. If this session is full and you still wish to participate, please come by and sign up to be on the waitlist.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2813
Download:
 
CUDA Debugger Training on Windows
NVIDIA Developer Tools Team (NVIDIA)
Nsight offers a variety of powerful CUDA debugging feature set that enables developers to quickly spot bugs. From the memory checker to advanced breakpoints and variable warp watch panel, a developer can quickly isolate access memory errors, fil ...Read More

Nsight offers a variety of powerful CUDA debugging feature set that enables developers to quickly spot bugs. From the memory checker to advanced breakpoints and variable warp watch panel, a developer can quickly isolate access memory errors, filter out the thousands of threads to a specific thread and quickly spot abnormal variable value ranges. Through a set of comprehensive exercises, the attendee will be able to utilize these features to become fully proficient at developing CUDA code. Please note this is a hands-on lab and seating is very limited. If this session is full and you still wish to participate, please come by and sign up to be on the waitlist.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2811
Streaming:
Download:
Digital Content Creation & Film
Presentation
Media
Interacting with Huge Particle Simulations in Maya with the GPU
Wil Braithwaite (NVIDIA)
We present a plug-in for Maya which enables an artist to simulate huge particle counts in real-time by leveraging the NVIDIA GPU. Being able to interact with the simulation opens up new possibilities for modifying the workflow. We will demonstra ...Read More

We present a plug-in for Maya which enables an artist to simulate huge particle counts in real-time by leveraging the NVIDIA GPU. Being able to interact with the simulation opens up new possibilities for modifying the workflow. We will demonstrate the plug-in, and provide insight into the algorithms used.

  Back
 
Keywords:
Digital Content Creation & Film, GTC 2012 - ID S2364
Streaming:
Download:
 
GPU Enablement in Adobe Photoshop
Jerry Harris (Adobe Systems), Jeff Chien (Adobe Systems)
Photoshop is one of the most popular products in history. It attempts to delight the customers with an immersive experience. Since CS4, Adobe has been tapping into the horsepower of the GPU to create a compelling playground for the imaginations ...Read More

Photoshop is one of the most popular products in history. It attempts to delight the customers with an immersive experience. Since CS4, Adobe has been tapping into the horsepower of the GPU to create a compelling playground for the imaginations of creative pros. Please join us to review the latest developments on how GPUs have been an enabling force.

  Back
 
Keywords:
Digital Content Creation & Film, GTC 2012 - ID S2395
Streaming:
Download:
 
Hate to Wait? Flash Memory for Full-Throttle GPU Acceleration (Presented by Fusion-io)
Vincent Brisebois (Fusion-io), Robert Wipfel (Fusion-io)
Are you guilty of ever not trying out an idea because of the time it would take to process the effect? With flash memory throttling your system like jet fuel for your GPU, you can finally make sluggish application performance a bad memory. This ...Read More

Are you guilty of ever not trying out an idea because of the time it would take to process the effect? With flash memory throttling your system like jet fuel for your GPU, you can finally make sluggish application performance a bad memory. This session will couple a technical overview of the latest in PCIe-attached flash memory technology for accelerating graphics processing with developer best practices and tuning for GPU applications using flash memory for image compositing, editing, video playback, 3D content creation, video capture and many other data-intensive tasks.

  Back
 
Keywords:
Digital Content Creation & Film, GTC 2012 - ID S2619
Streaming:
Download:
 
Learn How Adobe After Effects CS6 Takes Advantage of NVIDIA Optix Technology for 3D Ray Tracing (Presented by Adobe)
Colin Smith (Adobe)
Adobe After Effects CS6 unveils an amazing new 3D ray-traced rendering engine based on NVIDIA Optix technology with GPU acceleration of up to 50x faster than a CPU alone. This enables simple and quick designs of realistic geometric text and shap ...Read More

Adobe After Effects CS6 unveils an amazing new 3D ray-traced rendering engine based on NVIDIA Optix technology with GPU acceleration of up to 50x faster than a CPU alone. This enables simple and quick designs of realistic geometric text and shapes in 3D space. Motion graphics artists can now create more physically accurate scenes with beautiful results such as reflections, transparency, soft shadows, and depth-of-field blur directly in After Effects. GPU-accelerated ray tracing drastically improves the workflow by enabling motion graphics artists to develop these 3D effects entirely within After Effects.

  Back
 
Keywords:
Digital Content Creation & Film, GTC 2012 - ID S2632
Streaming:
Download:
Electronic Design Automation
Presentation
Media
GPU Computing Advances in 3D Electromagnetic Simulation
Fabrizio Zanella (CST of America), Andreas Buhr (CST AG)
Learn about the latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. The latest version of CST Studio Suite supports the full range of Tesla products on both Windows and Linux operating systems. Using GPU, multi-G ...Read More

Learn about the latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. The latest version of CST Studio Suite supports the full range of Tesla products on both Windows and Linux operating systems. Using GPU, multi-GPU and MPI-GPU Computing drastically reduces the simulation times for CST customers. We will provide a status of current and future GPU developments at CST and share detailed simulation results.

  Back
 
Keywords:
Electronic Design Automation, GTC 2012 - ID S2069
Streaming:
Download:
 
Compiling a Parallel Domain Specific Language to GPUs
Ramesh Narayanaswamy (Synopsys Inc.)
Discuss techniques for compiling Parallel DSLs to GPUs. Verilog is a Domain Specific Language for Hardware Description. Verilog users express parallelism with guarded processes similar to Occam's guarded commands. Review Verilog semantics, a ...Read More

Discuss techniques for compiling Parallel DSLs to GPUs. Verilog is a Domain Specific Language for Hardware Description. Verilog users express parallelism with guarded processes similar to Occam's guarded commands. Review Verilog semantics, and different approaches to compiling Verilog to parallel architectures and to GPUs. Discuss challenges with (a) Verilog description's runtime behavior (b) managing process dependency. Discuss approaches and challenges in compiling a parallel DSL to CUDA C.

  Back
 
Keywords:
Electronic Design Automation, GTC 2012 - ID S2317
Streaming:
Download:
 
Using GPUs to Speedup Computational Lithography
Constantin Chuyeshov (Cadence Design Systems)
In this paper we show how GPUs can be used to significantly speedup computational lithography, which is heavily used in the Electronic Design Automation (EDA) industry. In particular, we demonstrate a noticeable performance increase in several b ...Read More

In this paper we show how GPUs can be used to significantly speedup computational lithography, which is heavily used in the Electronic Design Automation (EDA) industry. In particular, we demonstrate a noticeable performance increase in several basic optical lithography algorithms as well as the speedup of the full-chip verification software, crucial parts of which were ported to NVIDIA's GPUs. We summarize the advantages, disadvantages and challenges of using GPUs and compare it to more traditional multithreading and distributed computing alternatives for the same applications.

  Back
 
Keywords:
Electronic Design Automation, GTC 2012 - ID S2329
Streaming:
Download:
 
Using GPUs to Speedup Chip Verification
Tomer Ben-David (Rocketick)
As VLSI designs become more complex, the process of verifying them becomes increasingly expensive and time consuming. Verification of such designs has become quite taxing as they take simulators to the edge in terms of both runtime demands and h ...Read More

As VLSI designs become more complex, the process of verifying them becomes increasingly expensive and time consuming. Verification of such designs has become quite taxing as they take simulators to the edge in terms of both runtime demands and host memory requirements. In order to reduce verification time, different verification methodologies have been adopted including the use of emulators. However, emulators' price point is high and so is the engineering time to set them up. Rocketick develops a Verilog co-simulator that uses GPUs as an acceleration platform. Rocketick's product, RocketSim® is now part of NVIDIA's design flow and it is being used to accelerate simulations by 10X-30X compared to the standard simulator and to reduce the memory footprint by 5X. In this session RocketSim® will be presented using some real-world examples of verification flows.

  Back
 
Keywords:
Electronic Design Automation, GTC 2012 - ID S2520
Streaming:
Download:
Emerging Companies Summit
Presentation
Media
Emerging Companies Summit Fireside Chat with Jensen Huang of NVIDIA and Tim Bajarin of Creative Strategies
NVIDIA CEO and co-founder Jensen Huang will take part in a fireside chat with Tim Bajarin, one of IT worldâ??s pre-eminent analysts and president of Creative Strategies. They will discuss trends in mobile, visual and parallel computing, ...Read More

NVIDIA CEO and co-founder Jensen Huang will take part in a fireside chat with Tim Bajarin, one of IT worldâ??s pre-eminent analysts and president of Creative Strategies. They will discuss trends in mobile, visual and parallel computing, and the transformational changes ahead for the industry.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2003
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Raytrix
Chritian Perwass (Raytrix)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2034
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Playcast Media Systems
Meir Friedlander (Playcast Media Systems)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2035
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Universal Robotics
David Peters (Universal Robotics)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2036
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Unity Technologies
David Helgason (Unity Technologies)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2022
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Mirriad
Mark Popkiewicz (Mirriad)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2023
Download:
 
Emerging Companies Summit: CEO on Stage Featuring BioDigital
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2024
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Rocketick
Tomer Ben-David (Rocketick)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2020
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Cortexica
Iain McCready (Cortexica)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2021
Download:
 
Emerging Companies Summit: CEO on Stage Featuring eyeSight
Gideon Shmuel (eyeSight)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2025
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Numira Biosciences
David Weinstein (Numira Biosciences)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2026
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Ubitus
Wesley Kuo (Ubitus)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2027
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Gaikai
David Perry (Gaikai)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2028
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Immersive Media
Myles McGovern (Immersive Media)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2029
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Numecent
Osman Kent (Numecent)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2030
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Realview Imaging
Shaul Gelman (Realview Imaging)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2031
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Elemental Technologies
Sam Blackman (Elemental Technologies)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2032
Download:
 
Emerging Companies Summit: CEO on Stage Featuring Mersive
Rob Balgley (Mersive)
See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to intro ...Read More

See the hottest new technologies from startups that are transforming computing. In a lively and fast-paced exchange, the Emerging Companies Summit CEO on Stage sessions will feature CEOs from three startups who will each have 15 minutes to introduce their companies and interact with a panel of leading venture capitalists, technology executives, and industry analysts.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2012 - ID S2033
Download:
Energy Exploration
Presentation
Media
Memory Efficient Reverse Time Migration in 3D
Chris Leader (Stanford Exploration Project)
Learn how we can image the interior of the Earth in three dimensions using Reverse Time Migration. We discuss how GPUs accelerate this method using parallel wave propagation kernels, texture memories and minimal device to host transfers. Further ...Read More

Learn how we can image the interior of the Earth in three dimensions using Reverse Time Migration. We discuss how GPUs accelerate this method using parallel wave propagation kernels, texture memories and minimal device to host transfers. Further we discuss how the progression to 3D presents a multitude of new problems, particularly memory based - causing the system to be IO limited. By manipulating boundary positions and values to a pseudo-random form we show how many of these memory restrictions can be diminished and how detailed subsurface images can be fully constructed using GPUs.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2125
Streaming:
Download:
 
Accelerating Reservoir Simulation and Algebraic Multigrid with GPUs
Kenneth Esler (Stone Ridge Technology), Vincent Natoli (Stone Ridge Technology)
Given a model of a reservoir's rock and well properties, a reservoir simulator solves the PDEs for the multiphase flow through porous rock to predict well production. Over the past several decades, simulation has progressed from coarse 2D mo ...Read More

Given a model of a reservoir's rock and well properties, a reservoir simulator solves the PDEs for the multiphase flow through porous rock to predict well production. Over the past several decades, simulation has progressed from coarse 2D models to detailed 3D models, providing strong fidelity to empirical production rates. By reformulating the Marathon Oil Corporation's Multiscale Flow Simulator to use GPUs, we improve the overall execution speed by a factor of over 100, allowing fast turnaround on a GPU workstation. We also introduce GAMPACK, a fully-accelerated GPU algebraic multigrid solver, and demonstrate its performance relative to CPU solvers.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2140
Streaming:
Download:
 
GPU-Based Monte Carlo Ray Tracing Simulation for Solar Power Plants
Claus Nilsson (Tietronix Software Inc.), Michel Izygon (Tietronix Software Inc.)
Learn about real time simulations of Concentrating Thermal Solar Power using GPU technology to enable performance optimization of these utility scale plants. By leveraging the power of GPUs and the parallel aspect of the field of thousands sun-t ...Read More

Learn about real time simulations of Concentrating Thermal Solar Power using GPU technology to enable performance optimization of these utility scale plants. By leveraging the power of GPUs and the parallel aspect of the field of thousands sun-tracking mirrors, we have been successful in cutting the computation time by orders of magnitude versus the previously required minutes and hours runtime. We will present an overview of the problem domain and describe how we used the GPU to derive a Monte Carlo physics ray tracing method to simulate the flux reflected by the mirrors onto the solar receiver.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2321
Streaming:
Download:
 
GPU Acceleration for Seismic Interpretation Algorithms
Jonathan Marbach (TerraSpark Geosciences)
The oil and gas industry is already leveraging GPUs for seismic data processing, but what about 3D seismic interpretation? This session will cover how the GPU is being used by TerraSpark Geosciences to dramatically decrease the runtime of algori ...Read More

The oil and gas industry is already leveraging GPUs for seismic data processing, but what about 3D seismic interpretation? This session will cover how the GPU is being used by TerraSpark Geosciences to dramatically decrease the runtime of algorithms for enhancing faults, computing horizon orientation, and calculating volumetric curvature. We will share our experiences in porting these techniques to the GPU, the challenges encountered, the solutions found, and, of course, the benefits to execution time.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2336
Streaming:
Download:
 
GPU-Accelerated Parallel Computing for Simulation of Seismic Wave Propagation
Taro Okamoto (Department of Earth and Planetary Sciences, Tokyo Institute of Technology)
We adopted GPU to accelerate large-scale, parallel finite-difference (FDTD) simulation of seismic wave propagation. Effective parallel implementation is needed because the size of the memory of a single GPU is too small for real applications. Th ...Read More

We adopted GPU to accelerate large-scale, parallel finite-difference (FDTD) simulation of seismic wave propagation. Effective parallel implementation is needed because the size of the memory of a single GPU is too small for real applications. Thus we describe the memory optimization, the three-dimensional domain decomposition, and overlapping the communication and computation adopted in our program. We achieved so far a high performance (single-precision) of about 61 TFlops by using 1200 GPUs of TSUBAME-2.0, the GPU supercomputer in Tokyo Institute of Technology, Japan. As an important application, we show the results of the simulation of the 2011 Tohoku-Oki mega-quake.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2352
Streaming:
Download:
 
Accelerated FDTD Technique for Marine Controlled Source Electromagnetic Imaging
Geoff Clark (Acceleware Ltd.), Michal Okoniewski (Acceleware Ltd.)
Find out about the newest method for Marine Hydrocarbon Exploration. In this session we will profile the use of Finite Difference Time Domain (FDTD) technique in combination with Mittet's method and GPUs to produce faster, cheaper, more accu ...Read More

Find out about the newest method for Marine Hydrocarbon Exploration. In this session we will profile the use of Finite Difference Time Domain (FDTD) technique in combination with Mittet's method and GPUs to produce faster, cheaper, more accurate forward modeling for electromagnetic imaging (Controlled Source Electromagnetic or CSEM). Unlike many frequency domain CSEM techniques this accelerated method does not require simplifying assumptions to reduce the memory and computational burden and has excellent scaling properties (essentially linear) across clusters of GPU accelerated nodes. CSEM is used in the industry to enhance confidence in hydrocarbon reservoir discoveries.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2433
Streaming:
Download:
 
Schlumberger LiveQuest: Application Delivery and Collaboration Solution
Mario Dean (Schlumberger)
The LiveQuest application delivery and collaboration solution allows petro-technical professionals to securely access and share exploration and production (E&P) applications and data, including 3D visualization applications, anytime, anywher ...Read More

The LiveQuest application delivery and collaboration solution allows petro-technical professionals to securely access and share exploration and production (E&P) applications and data, including 3D visualization applications, anytime, anywhere. By utilizing web and thin-client technologies, LiveQuest provides platform-independent and application-agnostic real-time collaboration. In this session, Mario Dean will provide an introduction to the needs of the O&G exploration from an application and large data 3D visualization perspective. He will discuss the LiveQuest solution stack, with specific focus on the 3D remote visualization technology, and share customer deployment examples and overall ROI considerations.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2434
Streaming:
Download:
 
Explore New Techniques in Volume Rendering/Segmentation with Open Inventor
Mike Heck (VSG)
The goal of this session is to show the improvements in quality, performance and flexibility of the volume rendering implementation of Open Inventor. The latest GPU techniques, such as virtual textures and ray casting, have been combined into a ...Read More

The goal of this session is to show the improvements in quality, performance and flexibility of the volume rendering implementation of Open Inventor. The latest GPU techniques, such as virtual textures and ray casting, have been combined into a flexible shader API and applied on out of core data. The techniques of volume rendering, sugarcube rendering, basic and complex clipping, sculpting, editing and segmentation will be demonstrated using examples from a geobody extraction workflow. The great ease and flexibility of the shader pipeline API will be illustrated, and we will discuss the broad future perspectives of that technology.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2444
Streaming:
Download:
 
3D Helmholtz Solver with a Shifted Laplace Multigrid on Multi-GPUs
Kees Lemmens (Delft University of Technology)
Learn about an iterative solver of the 3D Helmholtz equation on multi-GPU using CUDA. The Helmholtz equation discretized by a second order finite differences is solved with Bi-CGSTAB preconditioned by a shifted Laplace multigrid method. Two mult ...Read More

Learn about an iterative solver of the 3D Helmholtz equation on multi-GPU using CUDA. The Helmholtz equation discretized by a second order finite differences is solved with Bi-CGSTAB preconditioned by a shifted Laplace multigrid method. Two multi-GPU approaches are considered: data parallelism and algorithm-split. Their implementations on multi-GPU architecture are compared to a multi-threaded CPU and single GPU implementation. The results show that the data parallel implementation is suffering from communication between GPUs and CPU, but is still several times faster compared to many-cores. The algorithm-split across GPUs limits communication and delivers speedups comparable to a single GPU implementation.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2511
Streaming:
Download:
 
GPUs in Energy & Exploration: Software Development and Production
Paulius Micikevicius (NVIDIA), Paulo Souza (Petrobras), Alexander Loddoch (Chevron), Dave Nichols (Schlumberger), Mauricio Araya (Repsol)
This session will feature expert panelists that will share their experience adopting GPUs in their respective environments. Since 2009, these production systems have been boosting throughput, and shorten cycle times while delivering enhanced ima ...Read More

This session will feature expert panelists that will share their experience adopting GPUs in their respective environments. Since 2009, these production systems have been boosting throughput, and shorten cycle times while delivering enhanced images using NVIDIA technologies. Featured panelists will include: Hess, Schlumberger, Petrobras, Chevron and more.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2628
Download:
 
Hybrid Architectures for Advanced Seismic Imaging: Recent Experiences at Bull (Presented by Bull)
Guy Gueritz (Bull), Mathieu Dubois (Bull)
The two-part presentation describes Bull's system architecture for accelerated seismic applications using GPUs, together with the parallel programming aspects involved and some examples of recent work. The first part covers hybrid system arc ...Read More

The two-part presentation describes Bull's system architecture for accelerated seismic applications using GPUs, together with the parallel programming aspects involved and some examples of recent work. The first part covers hybrid system architectures, basic principles of Reverse Time Migration and the numerical methods used to implement it in various forms, together with the architectural features needed, depending on the specific algorithms used. The second part examines CUDA programming aspects and the use of compiler-based directives and libraries to convert existing codes for maximum performance and scalability on GPU architectures.

  Back
 
Keywords:
Energy Exploration, GTC 2012 - ID S2643
Streaming:
Download:
Finance
Presentation
Media
Real-Time Risk Simulation: The GPU Revolution In Profit Margin Analysis
Gilles Civario (ICHEC), Renato Miceli (ICHEC)
Discover how ICHEC helped a world leading company in its sector, to dramatically speed-up and improve the quality of its real-time risk management tool chain. In this session, we present the method used for porting the core-part of the simulatio ...Read More

Discover how ICHEC helped a world leading company in its sector, to dramatically speed-up and improve the quality of its real-time risk management tool chain. In this session, we present the method used for porting the core-part of the simulation engines to GPUs using CUDA. This porting was realized on two very different simulation algorithms and resulted in speed-ups of 2 to 3 orders of magnitude, allowing much greater accuracy of the results in a real-time environment.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2034
Streaming:
Download:
 
Mathematica as a Practical Platform for GPU-Accelerated Finance
Dylan Roeh (Wolfram Research Inc), Abdul Dakkak (Wolfram Research Inc)
With the introduction of GPU support in version 8, Mathematica has become an excellent environment for integrating CUDA with high level code for interpretation or visualization. In this presentation, we will show the usefulness of Mathematica in ...Read More

With the introduction of GPU support in version 8, Mathematica has become an excellent environment for integrating CUDA with high level code for interpretation or visualization. In this presentation, we will show the usefulness of Mathematica in the venue of computational finance. In addition to demonstrating the GPU-accelerated financial computations which can be readily performed within Mathematica, we will show that these calculations can easily be integrated with third-party data sources including Microsoft Excel and databases. Furthermore, we will cover the UnRisk Mathematica package written by MathConsult, which seamlessly adds GPU-accelerated complex model calibration algorithms to Mathematica's repertoire.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2100
Streaming:
Download:
 
Monte-Carlo Pricing Under a Hybrid Local Volatility Model
Sebastien Gurrieri (Mizuho International)
This session shows how to calculate the prices of several financial products, vanilla and exotic, under Dupire's Local Volatility model. We start with vanilla options on the foreign exchange rate and explain how to rescale the Local Volatili ...Read More

This session shows how to calculate the prices of several financial products, vanilla and exotic, under Dupire's Local Volatility model. We start with vanilla options on the foreign exchange rate and explain how to rescale the Local Volatility matrix in order to take advantage of the fast texture memory interpolation. We then extend this framework to two factors by including stochastic interest rates following Hull-White model, and show how to price Power-Reverse Dual Coupon swaps with an exotic TARN feature. We provide details of the algorithms and compare accuracy and speed with typical performances of single-core production implementations.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2206
Download:
 
From GPU Computing Toward Full HPC In Finance with GPUs
Pierre Spatz (Murex SAS)
During the previous GTC Murex has shown how the company had adapted their generic Monte-Carlo & PDE codes compatible with a payoff language. With one more year of experience with GPUs and OpenCL Murex will show how the company has broadened ...Read More

During the previous GTC Murex has shown how the company had adapted their generic Monte-Carlo & PDE codes compatible with a payoff language. With one more year of experience with GPUs and OpenCL Murex will show how the company has broadened the usage of GPUs for other subjects like vanilla screening or model calibration and focus on their new challenge ââ?¬Ë?use as many GPUs as possibleââ?¬â?¢ for one single computation.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2250
Streaming:
Download:
 
C++ Data Marshalling Best Practices
Cliff Woolley (NVIDIA)
When integrating CUDA C++ kernels into existing C++ applications, it is at times desirable to migrate a C++ object instance from the host to the device or vice versa. Given variations among host compilers regarding structure layout, accomplishin ...Read More

When integrating CUDA C++ kernels into existing C++ applications, it is at times desirable to migrate a C++ object instance from the host to the device or vice versa. Given variations among host compilers regarding structure layout, accomplishing this data marshalling in a manner that is reliable, simple, and efficient is a complex issue. cudaMemcpy is our primary means to transfer data to the GPU, but memcpy-style operations are more readily amenable to C-style structures and arrays than to C++ objects or collections of objects. In this session, we will cover the caveats and best practices for marshalling C++ data.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2377
Streaming:
Download:
 
Speedup Derivatives and Structured Products Pricing, Reduce TCO Using GPUs
Ghali Boukfaoui (Numerix)
Numerix will share its experience using GPU to significantly reduce its customers' Total Cost of Ownership (TCO) and accelerate forward Monte Carlo pricing methods and hybrid models of complex financial structured products and variable annui ...Read More

Numerix will share its experience using GPU to significantly reduce its customers' Total Cost of Ownership (TCO) and accelerate forward Monte Carlo pricing methods and hybrid models of complex financial structured products and variable annuities. Numerix will describe how it combines complex financial and actuarial modeling with user scripting to drive GPU execution from a script interpreted at run time. This architecture is well suited to financial services firms with portfolios of many different types of structured products where deals are represented independently from the models used to price them.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2383
Download:
 
New Generation GPU Accelerated Financial Quant Libraries
Daniel Egloff (QuantAlea GmbH)
Learn from industry experts how new generation GPU accelerated solutions for derivative pricing, hedging, and risk management can be built more efficiently with modern technology and functional programming languages like F# on .NET or Scala on t ...Read More

Learn from industry experts how new generation GPU accelerated solutions for derivative pricing, hedging, and risk management can be built more efficiently with modern technology and functional programming languages like F# on .NET or Scala on the Java VM. As a concrete example we report from a large derivative pricing project developed in F# on .NET. We will introduce the key design concepts and parallelization strategies, which lead to an efficient and transparent GPU acceleration. Several examples will illustrate the benefit of the functional as compared to the classical object oriented approach.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2405
Streaming:
Download:
 
High Productivity Computational Finance on GPUs
Peter Phillips (Aon Benfield Securities), Aamir Mohammad (Aon Benfield Securities)
Learn how Aon Benfield helps clients use GPUs to develop and accelerate Monte Carlo derivatives pricing models. We will present our PathWise software tools used by actuaries and quants in order to rapidly develop and deploy production quality, G ...Read More

Learn how Aon Benfield helps clients use GPUs to develop and accelerate Monte Carlo derivatives pricing models. We will present our PathWise software tools used by actuaries and quants in order to rapidly develop and deploy production quality, GPU grid enabled, Monte Carlo models, using only high-level languages and tools without requiring any knowledge of CUDA or C/C++. We will describe our approaching of using Code Generation, Visual Programming, Domain Specific Languages and scripting languages to create a High Productivity Computing software stack for financial services applications.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2418
Download:
 
Leveraging GPGPU Technology for Valuation of Complex Insurance Products
Chris Stiefeling (Oliver Wyman Financial Services)
We share our experiences moving a mature, large scale insurance application from a CPU to GPU environment. This session explores the nuances of porting a C++ application when 'blank sheet' re-architecture is not an option. This session w ...Read More

We share our experiences moving a mature, large scale insurance application from a CPU to GPU environment. This session explores the nuances of porting a C++ application when 'blank sheet' re-architecture is not an option. This session will cover: Insurance differences from other financial products (and the implications for the GPU), Considerations when moving an existing, fully featured C++ system to a GPGPU platform, Supporting CPU and GPU implementations from a single code base, Supporting user defined code extensions on the GPU, CUDA 4.0 C++ extensions: experiences, challenges and limitations and Performance case study.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2435
Streaming:
Download:
 
kdb+ and GPUs for Market Data Analytics and Trading
Philip A. Beasley-Harling (Bank of America Merrill Lynch)
Market data volumes increase year-on-year with the occasional extraordinary capacity-breaking peak. We must capture, store and process these data to gain insights for quantitative and algorithmic trading using a variety of market data analytics ...Read More

Market data volumes increase year-on-year with the occasional extraordinary capacity-breaking peak. We must capture, store and process these data to gain insights for quantitative and algorithmic trading using a variety of market data analytics and techniques. kdb+ from KX Systems is a memory-based column database, written in the vector-functional language q, often used in finance for these analyses. In this session we demonstrate a method for the enhanced performance of general programs written in q and kdb+ by executing them on the GPU.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2656
Streaming:
Download:
General Interest
Presentation
Media
GPUs for Fast Triggering in NA62 Experiment
Marco Sozzi (Physics Department of Pisa)
We discuss an approach for using commercial graphic processors (GPUs) at the earliest trigger stages in high-energy physics experiments, and study its implementation on a real trigger system in preparation. In particular we focus on the possibil ...Read More

We discuss an approach for using commercial graphic processors (GPUs) at the earliest trigger stages in high-energy physics experiments, and study its implementation on a real trigger system in preparation. In particular we focus on the possibility to reconstruct rings in a Cherenkov detector as building block of a selective trigger condition for rare decay search. Latency and processing rate measurements on several state-of-the-art devices are presented, and the potential issues related to processing time jitter and data transfer throughput are discussed.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2013
Streaming:
Download:
 
Academic Research Programs & Sponsored Research
David Luebke (NVIDIA Research)
We invite you to a special presentation from our 2011-2012 Graduate Fellowship recipients to learn "what's next" in the world of research and academia. The NVIDIA Graduate Fellowship recipients were selected from 200 applications i ...Read More

We invite you to a special presentation from our 2011-2012 Graduate Fellowship recipients to learn "what's next" in the world of research and academia. The NVIDIA Graduate Fellowship recipients were selected from 200 applications in 27 countries. Sponsored projects involve a variety of technical challenges, including computer architecture, computer vision, programmability and optimization for heterogeneous systems, automotive computing and much more. We believe that these minds lead the future in our industry and we are proud to support the 2011-2012 NVIDIA Graduate Fellows. For more information on the 2011-2012 NVIDIA Graduate Fellows, please visit www.NVIDIA.com/fellowship.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2016
Streaming:
Download:
 
Introducing CUDA in KBE Applications for Digital Vehicle Development Programs
Avijit Santra (Tata Motors Limited)
Get the latest development in Next Generation Knowledge Based Engineering (KBE) software which provides real results over the traditional design approach. Today there exist numerous KBE applications in the field of vehicle ergonomics, suspension ...Read More

Get the latest development in Next Generation Knowledge Based Engineering (KBE) software which provides real results over the traditional design approach. Today there exist numerous KBE applications in the field of vehicle ergonomics, suspension, NVH, safety, regulations etc which deal with huge number of iterations and mathematical algorithm. With GPU computing and CUDA the KBE kernel is restructured to incorporate parallel programming model which helps the applications run faster and achieving time reduction from hours to seconds. KBE geometry kernel also gets benefited by enabling CUDA in topology based operations which take lot of time when performed on CPU.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2040
Download:
 
High Performance Logic Simulation with GPUs
Yangdong Deng (Tsinghua University)
Verification has become the bottleneck of IC design process due to its fast increasing complexity. The fundamental means of verifying digital circuits is logic simulation, which can be performed at both register-transfer level (RTL) and gate lev ...Read More

Verification has become the bottleneck of IC design process due to its fast increasing complexity. The fundamental means of verifying digital circuits is logic simulation, which can be performed at both register-transfer level (RTL) and gate level. In this work, we developed GPU based logic simulation solutions. We implemented a Chandy-Misra-Bryant parallel simulation protocol on GPUs for sufficient parallelism. A dynamic GPU memory allocator was introduced to efficiently manage GPU memory resources. RTL simulation is performed in a compiled-code scheme by translating Verilog code into equivalent CUDA code. Experimental results proved that the GPU simulators significantly outperform their CPU counterparts.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2050
Streaming:
Download:
 
Satellite HUB Communication System GPU Based
Gaetano Mendola (MBI srl), Francesco Basile (MBI srl)
In the last few years the increasing GPU computational power has opened new perspectives in telecommunication fields trough SDR (software defined radio) approach. Some tasks, such as the one we had to deal with, do not offer negotiation margins ...Read More

In the last few years the increasing GPU computational power has opened new perspectives in telecommunication fields trough SDR (software defined radio) approach. Some tasks, such as the one we had to deal with, do not offer negotiation margins with the execution speed due to the real-time analysis of a radio signal. We coped with the implementation of the lowest layer in the protocol stack for a land mobile satellite communication system, and we were able to deliver a product with a reduced time to market with respect to traditional FPGA approach.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2065
Streaming:
Download:
 
On the Integration of OpenCL into a Software Defined Radio
Michael Dickens (University of Notre Dame)
Learn about software-defined radio (SDR) techniques for heterogeneous real-time signal processing including a GPU via OpenCL. A brief background on SDR will be provided, as well as an overview of our Surfer SDR project and how it allows for hete ...Read More

Learn about software-defined radio (SDR) techniques for heterogeneous real-time signal processing including a GPU via OpenCL. A brief background on SDR will be provided, as well as an overview of our Surfer SDR project and how it allows for heterogeneous processing. Finally, an example of SDR will be demonstrated via an OFDM waveform. This live demo highlights a number of interesting properties inherent in Surfer, including the ability to switch computations between the host computer's processors -- a laptop's CPU and GPU in this case -- during runtime and seamlessly from the user's perspective.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2134
Streaming:
Download:
 
A High Performance Platform for Real-Time X-Ray Imaging
Suren Chilingaryan (Karlsruhe Institute of Technology)
We will share our experience on development of the GPU-based platform for synchrotron-based X-ray imaging aimed to analysis of dynamic processes. The complete data flow from the camera to the data storage will be discussed with a special focus o ...Read More

We will share our experience on development of the GPU-based platform for synchrotron-based X-ray imaging aimed to analysis of dynamic processes. The complete data flow from the camera to the data storage will be discussed with a special focus on I/O issues, hardware platform, and ways to utilize the available system resources. An efficient GPU-implementation of filtered back projection will be presented highlighting differences of implementations for GT200, Fermi, and AMD Cypress architectures. We will introduce our software platform used to abstract current configuration of the imaging station and to simplify the development of parallel image processing algorithms.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2259
Streaming:
Download:
 
Teaching Applied Parallel Computing with GPUs
Chris Lupo (California Polytechnic State University)
Learn how the next generation of HPC developers are learning hands-on skills with GPUs, and how GPU computing is being incorporated into Computer Science courses. We will discuss how GPUs are being used to enhance student learning of parallel co ...Read More

Learn how the next generation of HPC developers are learning hands-on skills with GPUs, and how GPU computing is being incorporated into Computer Science courses. We will discuss how GPUs are being used to enhance student learning of parallel computing concepts through a cross-teaching approach, where students with different domain expertise are grouped into teams and tasked with parallelizing an application such as ray tracing. We'll show that student projects that emphasize optimization of architectural resources and performance tuning allow students with no prior experience to parallelize a large-scale application with significant performance improvement in as little as six weeks.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2311
Streaming:
Download:
 
Bcl::ChemInfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs
Edward Lowe (Vanderbilt University), Nils Woetzel (Vanderbilt University)
High-throughput screening data allows the training of machine learning quantitative structure activity relationship models which can be used for in silico drug discovery screening. Here, we present a GPU- accelerated suite for descriptor generat ...Read More

High-throughput screening data allows the training of machine learning quantitative structure activity relationship models which can be used for in silico drug discovery screening. Here, we present a GPU- accelerated suite for descriptor generation, model training, feature selection, and data set similarity analysis, bcl::ChemInfo. The suite provides functionality for the analysis of constructed models as well as for screening external libraries of compounds. We examine case studies illustrating how this workflow can now be completed in a single day on a Tesla equipped workstation with speedups reaching 300x providing a complete GPU-accelerated cheminformatics framework for drug discovery.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2354
Streaming:
Download:
 
Set GPUs Free: Integrating a File System with CUDA Programs
Mark Silberstein (UT Austin), Emmet Witchel (UT Austin)
This session seeks the answer to the question: "Can we simplify and speed up CUDA programs by allowing them to access files residing on a host?" To prove our affirmative answer, we demonstrate how the concept of a file system enables p ...Read More

This session seeks the answer to the question: "Can we simplify and speed up CUDA programs by allowing them to access files residing on a host?" To prove our affirmative answer, we demonstrate how the concept of a file system enables programs with non-trivial CPU-GPU and GPU-GPU interactions to be efficiently and easily implemented on top of a new GPU file-system layer. We also show that such a file system enables implementation of fully stand-alone GPU programs without any CPU wrapper code. Finally we outline the details of the file system design which contributed to scalability, data consistency and performance.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2360
Streaming:
Download:
 
GPU-Based High-Performance Simulations for Spintronics
Jan Jacob (University of Hamburg - Institute of Applied Physics and Microstructure Research Center)
The joint utilization of the electron's charge and spin in "spintronics" represents a promising technology for data processing and storage in nanostructures. The complex quantum effects like the spin-Hall effect in these devices re ...Read More

The joint utilization of the electron's charge and spin in "spintronics" represents a promising technology for data processing and storage in nanostructures. The complex quantum effects like the spin-Hall effect in these devices require demanding numerical simulations providing a convenient link between idealized analytical models to often very complex results from measurements. The simulations involving multiplications and inversions of large matrices provide an ideal showcase for performance gain by employing GPGPUs in the execution of the algebraic routines on these matrices in computing environments with shared execution of algorithms on multiple nodes with multiple GPGPUs and CPU cores.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2379
Streaming:
Download:
 
Desktop Supercomputing in the Soft-Matter Physics Laboratory
Peter Lu (Harvard University)
While many GPGPU applications reside on large clusters, in many laboratories the time to move data to an external cluster would exceed the time to analyze it upon arrival. By bringing high-throughput computational power to the data in the labora ...Read More

While many GPGPU applications reside on large clusters, in many laboratories the time to move data to an external cluster would exceed the time to analyze it upon arrival. By bringing high-throughput computational power to the data in the laboratory, GPUs offer new capabilities in doing science. This session offers a number of ways in which GPUs are making a significant impact on our research in experimental physics, biology and chemistry, from designing and building apparatus (Quadro and Tesla), to collecting data on portable devices (Tegra), to high-throughput analysis of large data sets (Tesla). It also presents results from studies investigating the motion of diffusing and aggregating colloidal particles and swimming bacteria, observing liquid-gas phase separation onboard the International Space Station, applying high dynamic-range techniques to optical tomography, and using low-cost devices to detect chemical and microbial contamination in the third world.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2521
Streaming:
Download:
 
GPUs and the Next-Generation Aerial Surveillance
Nikola Bozinovic (MotionDSP)
Graphics processors are already used for computationally intensive video tasks in many ISR (Intelligence, Surveillance, Reconnaissance) applications; GPU-based system for video enhancement and analytics outperforms a similarly priced CPU-based s ...Read More

Graphics processors are already used for computationally intensive video tasks in many ISR (Intelligence, Surveillance, Reconnaissance) applications; GPU-based system for video enhancement and analytics outperforms a similarly priced CPU-based system 5-to-1 at HD resolutions. Our initial tests on 64 megapixel Wide Area Aerial Surveillance (WAAS) data show at least 10x speedup with tasks such as super-resolution or moving target indication. In this talk, we'll discuss unique design and implementation challenges of real-time processing of very large video data sets. We will demonstrate our existing GPU-based software, IKENA ISR, and discuss its video-processing pipeline and innovative processing solutions that are promising to dramatically expand capabilities of emerging aerial surveillance platforms.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2527
Streaming:
Download:
 
Supermicro: Worldwide leader in GP/GPU Servers and Workstation Platforms (Presented by Supermicro)
Don Clegg (Supermicro)
Discover the measurable advantages that make Supermicro the time-to-market leader in GPU platform enablement. See how Supermicro'??s innovative Application-Optimized designs enable partners to both scale-up and scale-out for maximum return ...Read More

Discover the measurable advantages that make Supermicro the time-to-market leader in GPU platform enablement. See how Supermicro'??s innovative Application-Optimized designs enable partners to both scale-up and scale-out for maximum return on investment. Review actual case studies that highlight Supermicro's leadership in Compute Density, Peak Performance, Scalability, Power Efficiency, Manageability, Reliability and Cost Effectiveness.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2636
Streaming:
Download:
 
From DataCenters to Supercomputers - A Deep Dive Into ASUS Solutions (Presented by ASUS)
Chris Liang (ASUS Computer International)
Join us for an intro into ASUS DataCenter and Supercomputer solutions. Explore real life case studies that illustrate how ASUS has revolutionized Server and Workstation platforms, marking ASUS as a leader in GPU technology. Other topics include ...Read More

Join us for an intro into ASUS DataCenter and Supercomputer solutions. Explore real life case studies that illustrate how ASUS has revolutionized Server and Workstation platforms, marking ASUS as a leader in GPU technology. Other topics include mass storage for enterprises, 2U hybrid computing, I/O integration, expandability and power efficiency.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2648
Streaming:
Download:
 
Languages, APIs and Development Tools for GPU Computing
Will Ramey (NVIDIA)
Get a head start on the conference with this first-day introduction to key technologies for GPU Computing. This 90-minute tutorial session will cover the key features and differences between the major programming languages, APIs and development ...Read More

Get a head start on the conference with this first-day introduction to key technologies for GPU Computing. This 90-minute tutorial session will cover the key features and differences between the major programming languages, APIs and development tools available today. Attendees will also learn several high level design patterns for consumer, professional and HPC applications, with practical programming considerations for each.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2005
Streaming:
Download:
 
CCOE Achievement Presentations and Awards
David Luebke (NVIDIA), Stan Tomov (University of Tennessee Knoxville), Lincoln Greenhill (Harvard University), Jesus Labarta (Barcelona Supercomputing Center), Satoshi Matsuoka (Tokyo Tech)
To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements.  Each of our 18 CCOEs was asked to submit an abstract describing what they considered t ...Read More

To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements.  Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months.  An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research.  Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:

 

  • Barcelona Supercomputing Center, OmpSs: Leveraging CUDA for Productive Programming in Clusters of Multi-GPU Systems
  • Harvard University, Massive Cross-correlation in radio Astronomy with Graphics Processing Units
  • Tokyo Tech, TSUBAME 2.0
  • University of Tennessee, MAGMA: A breakthrough in Solvers for Eigenvalue Problems

 

Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment.  After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S4000
Download:
HPC and Supercomputing
Presentation
Media
Leveraging NVIDIA GPUDirect on APEnet+ 3D Torus Cluster Interconnect
Davide Rossetti (Italian National Institue for Nuclear Physics)
APEnet+ is a novel cluster interconnect, based on a custom PCI card which features a PCI Express Gen2 X8 link and a re-configurable HW component (FPGA). It supports a 3D Torus topology and has special acceleration features specifically developed ...Read More

APEnet+ is a novel cluster interconnect, based on a custom PCI card which features a PCI Express Gen2 X8 link and a re-configurable HW component (FPGA). It supports a 3D Torus topology and has special acceleration features specifically developed for NVIDIA Fermi GPUs. An introduction to the basic features and the programming model of APEnet+ will be followed by a description of its performance on some numerical simulations, e.g. High Energy Physics simulations.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2282
Streaming:
Download:
 
Scaling Applications to a Thousand GPUs and Beyond
Alan Gray (The University of Edinburgh), Roberto Ansaloni (Cray Italy)
Discover how to scale scientific applications to thousands of GPUs in parallel. We will demonstrate our techniques using two codes representative of a wide spectrum of programming methods. The Ludwig lattice Boltzmann package, capable of simulat ...Read More

Discover how to scale scientific applications to thousands of GPUs in parallel. We will demonstrate our techniques using two codes representative of a wide spectrum of programming methods. The Ludwig lattice Boltzmann package, capable of simulating extremely complex fluid dynamics models, combines C, MPI and CUDA. The Himeno three-dimensional Poisson equation solver benchmark combines Fortran (using the new coarray feature for communication) with prototype OpenMP accelerator directives (a promising new high-productivity GPU programming method). We will present performance results using the cutting-edge massively-parallel Cray XK6 hybrid supercomputer featuring the latest NVIDIA Tesla 2090 GPUs.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2286
Streaming:
Download:
 
Maximizing Performance on Multi-GPU Systems
Kenneth Czechowski (Georgia Tech)
Are 512 CUDA Cores not enough? This session is for power users that are looking to scale applications to multi-GPU systems. We will take a holistic approach towards optimization. Rather than just focusing on CUDA programming, this session will c ...Read More

Are 512 CUDA Cores not enough? This session is for power users that are looking to scale applications to multi-GPU systems. We will take a holistic approach towards optimization. Rather than just focusing on CUDA programming, this session will cover techniques for reducing pressure on the PCIe bus, using CUDA Streams to improve load balance, dealing with NUMA impacts, and taking advantage of CPU threads. This talk will also cover strategies for developing applications that run on clusters with 100 or more GPUs.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2362
Streaming:
Download:
 
A 2-Petaflops Stencil Application with Stereoscopic 3D Visualization - Gordon Bell Prize 2011
Takayuki Aoki (Tokyo Institute of Technology)
Most stencil applications such as CFD and structure analysis are memory-bound problems. GPU has high performances in both computation and memory bandwidth suitable for them. The TSUBAME 2.0 supercomputer with 4224 GPUs has started since November ...Read More

Most stencil applications such as CFD and structure analysis are memory-bound problems. GPU has high performances in both computation and memory bandwidth suitable for them. The TSUBAME 2.0 supercomputer with 4224 GPUs has started since November 2010. We study a metal dendritic solidification by solving the phase-field model. The performance of 2.0 Petaflops was achieved for 4,096x6,500x1,0400 mesh on 4000 GPUs and we received the ACM Gordon Bell Prize in 2011. We also demonstrated several large-scale stencil applications (Lattice Boltzmann, weather prediction and so on) with stereoscopic 3D visualization.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2412
Streaming:
Download:
 
Exascaling Your Apps
Mike Bernhardt (The Exascale Report), Satoshi Matsuoka (Titech), Jeff Vetter (Oak Ridge National Laboratory), Olav Lindtjorn (Schlumberger), Steve Scott (NVIDIA)
In the global exascale race, hardware often takes center stage. But the race might ultimately be won or lost based on how well the industry optimizes new and existing applications for extreme parallelism. Today's apps will not just run on to ...Read More

In the global exascale race, hardware often takes center stage. But the race might ultimately be won or lost based on how well the industry optimizes new and existing applications for extreme parallelism. Today's apps will not just run on tomorrow's systems, so we must think strategically and creatively about how to design applications that take maximum advantage of the first power-efficient, accelerator-driven exascale systems. This panel of HPC, software and computer science experts will discuss what we can, and should be doing, including a review of new scientific and commercial HPC requirements, programming model options and how to best align architecture and software design processes.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2531
Streaming:
Download:
 
GPU-Accelerated Science on Titan: Tapping into the World's Preeminent GPU Supercomputer to Achieve Better Science
Jack Wells Ph.D. (Oak Ridge National Laboratory)
This year, the leadership-class computing facility at Oak Ridge National Labs is upgrading its largest supercomputer for open science, "Jaguar", to employ high-performance, power- efficient GPUs. Once the transition is complete, the ma ...Read More

This year, the leadership-class computing facility at Oak Ridge National Labs is upgrading its largest supercomputer for open science, "Jaguar", to employ high-performance, power- efficient GPUs. Once the transition is complete, the machine will be known as "Titan". In this extended GTC session, we will feature a range of presenters showcasing research codes that will run computational science on the GPU at scale. Through these selected presentations, we will investigate the progress and anticipated results of GPU-acceleration of these significant codes. In this session, we will also explain how research scientists interested in tapping into the immense capabilities of Titan can do so, through programs such as the INCITE program sponsored by the US Department of Energy. The presenters include: Speaker: Jacqueline H. Chen (Combustion Research Facility, Sandia National Laboratories) "Direct Numerical Simulation of Turbulence-Chemistry Interactions: Fundamental Insights Towards Predictive Models" Speaker: Ray Grout (National Renewable Energy Laboratory) "S3D Direct Numerical Simulation - Preparations for the 10-100PF Era" Speaker: William Tang (Director, Fusion Simulation Program at the Princeton Plasma Physics Laboratory (PPPL), Princeton) "Fusion Energy Sciences & Computing at the Extreme Scale" Speaker: John A. Turner (Group Leader of Computational Engineering & Energy Sciences , Oak Ridge National Laboratory) "Transforming Modeling and Simulation for Nuclear Energy Applications" Speaker: Loukas Petridis (Staff Scientist, Oak Ridge National Laboratory) "Computer Simulation of Lignocellulosic Biomass" Speaker: Jeroen Tromp (Director, Princeton Institute for Computational Science, Princeton) "Toward Global Seismic Imaging based on Spectral-Element and Adjoint Methods"

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2606
Streaming:
Download:
 
Toward Global Seismic Imaging based on Spectral-Element and Adjoint Methods
Jeroen Tromp (Princeton Institute for Computational Science, Princeton)
Precise information about the structure of the solid Earth comes from seismograms recorded at the surface of a highly heterogeneous lithosphere. Seismic imaging based on spectral-element and adjoint methods can assimilate this information into t ...Read More

Precise information about the structure of the solid Earth comes from seismograms recorded at the surface of a highly heterogeneous lithosphere. Seismic imaging based on spectral-element and adjoint methods can assimilate this information into three-dimensional models of elastic and anelastic structure. These methods fully account for the physics of wave excitation, propagation, and interaction by numerically solving the inhomogeneous equations of motion for a heterogeneous anelastic solid. Such methods require the execution of complex computational procedures that challenge the most advanced high-performance computing systems. Current research is petascale; future research will require exascale capabilities. We illustrate the current state-of-the-art based on an inversion for European upper-mantle structure. Our ultimate goal is to move toward �adjoint tomography�of the entire planet. This session is part of "S0606 - GPU-accelerated Science on Titan: Tapping into the World's Preeminent GPU Supercomputer to Achieve Better Science" mini-track with Dr. Jack Wells.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2608
Streaming:
Download:
 
Best Practices of a 800TFlop Hybrid Supercomputer Implementation (Presented by Appro)
Steve Lyness (Appro), Taisuke Boku (University of Tsukuba)
Learn about the "Frontier Computing System", deployed by Appro for the University Of Tsukuba Center Of Computational Sciences in Japan containing over half a million GPU cores. Learn how reliability, availability, manageability and com ...Read More

Learn about the "Frontier Computing System", deployed by Appro for the University Of Tsukuba Center Of Computational Sciences in Japan containing over half a million GPU cores. Learn how reliability, availability, manageability and compatibility were essential for this successful 800TF hybrid supercomputing implementation. Explore new techniques in how HA-PACS is accelerating large scale parallel code by combining CPU/GPU processing cluster configurations for scientific research, such as astrophysics and climate modeling. Learn how to improve data I/O performance and memory size limitations in hybrid systems configured with Lustre's File System offering the best performance per dollar and excellent memory capacity per/FLOP.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2618
Streaming:
Download:
 
VSIPL++: A High-Level Programming Model for Productivity and Performance (Presented by Mentor Graphics Corporation)
Brooks Moses Ph.D. (Mentor Graphics Corporation)
Learn how VSIPL++ can improve your productivity and provide software portability, without sacrificing performance. We will describe how VSIPL++'s open-standard high-level programming model addresses the challenges of writing high-performance ...Read More

Learn how VSIPL++ can improve your productivity and provide software portability, without sacrificing performance. We will describe how VSIPL++'s open-standard high-level programming model addresses the challenges of writing high-performance embedded software on GP-GPUs and other heterogeneous hardware, using advanced C++ techniques and data abstraction -- and how we make this work in the real world. We will also present a comparison of performance results from various configurations of CPU and GP-GPU processing engines for a signal processing application developed using VSIPL++.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2620
Streaming:
Download:
 
Visualizing Heterogeneous Performance Tested on MPI+CUDA Gigapixel Panorama Stitching
Aaditya Landge (University of Utah)
This session consists of two technical parts. In the first part, we explain the use and implementation of a hybrid Poisson solver for gradient domain processing of massive images. Specifically, we provide a parallel out-of-core method for the se ...Read More

This session consists of two technical parts. In the first part, we explain the use and implementation of a hybrid Poisson solver for gradient domain processing of massive images. Specifically, we provide a parallel out-of-core method for the seamless stitching of gigapixel panoramas in a parallel CUDA + MPI environment. In the second part, we shall cover the ongoing work of using novel visualizing techniques to understand performance data of heterogeneous computing clusters. The Poisson solver application shall be taken up as an example to demonstrate various features of this performance visualization tool.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2623
Streaming:
Download:
 
S3D Direct Numerical Simulation - Preparations for the 10-100PF Era
Ray Grout (National Renewable Energy Laboratory)
The evolution of supercomputing into the mid-petaflop era has been typified by heterogenous compute nodes with the majority of the compute capability delivered by a large number of lightweight cores. In order to prepare for the extension of this ...Read More

The evolution of supercomputing into the mid-petaflop era has been typified by heterogenous compute nodes with the majority of the compute capability delivered by a large number of lightweight cores. In order to prepare for the extension of this trend, the DNS code S3D has been retooled in anticipation of a target architecture offering 10s of thousands of heterogeneous nodes containing many X86 cores as well as GPU derived accelerators. Movement of outer loops to the highest level in the code facilitates hybrid MPI-OpenMP performance and an elegant path to accelerated kernels using OpenACC. It is anticipated that relevant scientific simulations at this scale will have a per-node footprint that can be contained entirely on the accelerator, so provision is made to maintain primary solution variables in accelerator memory with specific regions moved to the CPU for inter-node communication and workload balancing. With the current performance it is estimated that the new code will make it possible to meet early science goals with the full build-out of the anticipated Titan system as well as provide a platform to transition into the exascale software research space.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2625
Streaming:
Download:
 
Learn about new Hewlett-Packard GPU Systems, Solutions, and Applications (Presented by Hewlett-Packard)
David Korf (Hewlett-Packard), John Brown (Hewlett-Packard)
Learn how to shorten time to discovery, gain faster insight, and beat the barriers to innovation, with performance, efficiency and agility! Hear the latest on how you can do this and more with HPâ??s purpose built SL server line. Servers ...Read More

Learn how to shorten time to discovery, gain faster insight, and beat the barriers to innovation, with performance, efficiency and agility! Hear the latest on how you can do this and more with HPâ??s purpose built SL server line. Servers are specifically designed for GPUs with HP ProActive Insight Architecture. Discover what a new generation of workstation desktop GPU computing technology from HP and NVIDIA can do for you! HP will compare and contrast GPU compute performance on the PCI Express Gen2 architecture available in HPâ??s Z800 Workstation to the PCI Express Gen3 architecture in HPâ??s latest Z820 Workstation.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2633
Streaming:
Download:
 
How to Bake Portable Many-Core Programs (Presented by CAPS)
Francois Bodin (CAPS)
A legacy code, a cool many-core accelerator and a directive-based programming environment are the main ingredients of the recipe to transform your legacy code into a portable many-core one. This presentation shows by the example how to exploit a ...Read More

A legacy code, a cool many-core accelerator and a directive-based programming environment are the main ingredients of the recipe to transform your legacy code into a portable many-core one. This presentation shows by the example how to exploit accelerators in legacy code without sacrificing portability. We describe a methodology and the use of directives, such as HMPP and OpenACC, to exploit the massive parallelism provided by many-core devices. During the presentation we illustrate using numerous illustrations how to analyze performance, tune accelerator code, reduce data transfers, deal with libraries, exploit multiple accelerators, etc.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2635
Streaming:
Download:
 
Analyzing Performance and Power of Applications with GPUs on Dell 12G Platforms (Presented by Dell)
Jeff Layton (Dell)
In this talk, both performance and power aspects of running various applications on Nvidia GPUs on Dell 12G platforms will be presented. These platforms utilize the latest PCIe Gen 3 slots and processors in conjunction with varying number of Nvi ...Read More

In this talk, both performance and power aspects of running various applications on Nvidia GPUs on Dell 12G platforms will be presented. These platforms utilize the latest PCIe Gen 3 slots and processors in conjunction with varying number of Nvidia GPUs and are tested with several applications both from a performance perspective and a power perspective.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2637
Streaming:
Download:
 
Effective HPC Architecture - Design, Develop, Implement (Presented by ELEKS)
Oleh Khoma (ELEKS)
Effective HPC system is so much more than just GPGPU. Real-world applications often need to stream large amounts of data from across system boundaries to the dozens of worker nodes in a most scalable and efficient way. They usually require stori ...Read More

Effective HPC system is so much more than just GPGPU. Real-world applications often need to stream large amounts of data from across system boundaries to the dozens of worker nodes in a most scalable and efficient way. They usually require storing huge amounts of data, scheduling of computation jobs, monitoring of system health and results visualization. Having first-hand experience in design, development and implementation of end-to-end HPC solutions, our engineers will share their experience on some of the pitfalls to avoid and things to consider when planning your next HPC system that works.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2647
Streaming:
Download:
 
Fusion Energy Sciences & Computing at the Extreme Scale
William Tang (Fusion Simulation Program at the Princeton Plasma Physics Laboratory (PPPL), Princeton)
The fusion energy sciences community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel supercomputers. A good example is the effec ...Read More

The fusion energy sciences community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel supercomputers. A good example is the effective usage of the full power of modern leadership class computational platforms from the terascale to the petascale and beyond to produce nonlinear particle-in-cell simulations which have accelerated progress in understanding the nature of plasma turbulence in magnetically-confined high temperature plasmas. Illustrative results provide great encouragement for being able to include increasingly realistic dynamics in extreme-scale computing campaigns to enable predictive simulations with unprecedented physics fidelity. William Tang's session is part of "S0606 - GPU-accelerated Science on Titan: Tapping into the World's Preeminent GPU Supercomputer to Achieve Better Science" mini-track with Dr. Jack Wells.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2654
Streaming:
Download:
 
Direct Numerical Simulation of Turbulence-Chemistry Interactions: Fundamental Insights Towards Predictive Models
Jacqueline H. Chen (Combustion Research Facility, Sandia National Laboratories)
Recent petascale direct numerical simulation (DNS) of turbulent combustion have transformed our ability to interrogate fine-grained ??turbulence-chemistry?? interactions in canonical laboratory configurations. In particular, three-dimensiona ...Read More

Recent petascale direct numerical simulation (DNS) of turbulent combustion have transformed our ability to interrogate fine-grained ??turbulence-chemistry?? interactions in canonical laboratory configurations. In particular, three-dimensional DNS, at moderate Reynolds numbers and with complex chemistry, is providing unprecedented levels of detail to understand fundamental coupling between turbulence, mixing and reaction. This information is leading to new physical insight and is providing unique validation data for assessing model assumptions in coarse-grained engineering CFD approaches used to design modern combustors. The role of petascale DNS is illustrated through selected examples relevant to controlling ignition and combustion rates in homogeneous charge compression ignition engines and to fuel injection processes in stationary gas turbines for power generation. Petascale simulations presently generate upwards of a petabyte of complex, multi-scale, time-varying data used by combustion modelers to validate subfilter combustion and mixing models in large-eddy simulation. With the advent of 10-20 petaflop hybrid architectures with accelerators like Titan at Oak Ridge National Laboratory, it will be possible to dramatically increase the chemical complexity of DNS. This will help accelerate the development of predictive subprocess models which will be used by engine developers to better understand and tailor the combustion of gasoline and new, more complex types of fuels in advanced engines. With Titan, simulations will move beyond todayâ??s studies of simple fuelsâ??hydrogen, syngas and methaneâ??to more complex, larger-molecule hydrocarbon fuels like isooctane (a surrogate for gasoline), commercially important oxygenated alcohols (for example, ethanol and butanol), and biofuel surrogates.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2655
Streaming:
Download:
 
Applying for INCITE Program, Conclusions, Q&A
Jack Wells Ph.D. (Oak Ridge National Laboratory)
This session offers a wrap-up of "GPU-accelerated Science on Titan: Tapping into the World's Preeminent GPU Supercomputer to Achieve Better Science" session with Jack Wells. ...Read More

This session offers a wrap-up of "GPU-accelerated Science on Titan: Tapping into the World's Preeminent GPU Supercomputer to Achieve Better Science" session with Jack Wells.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2657
Streaming:
Download:
 
Computer Simulation of Lignocellulosic Biomass
Loukas Petridis (Oak Ridge National Laboratory)
Biomass from terrestrial plants offers the potential of an abundant source of cellulosic ethanol. However, technical problems still hinder the cost-effective conversion of biomass to ethanol arising from the recalcitrance of biomass to hydrolysi ...Read More

Biomass from terrestrial plants offers the potential of an abundant source of cellulosic ethanol. However, technical problems still hinder the cost-effective conversion of biomass to ethanol arising from the recalcitrance of biomass to hydrolysis. Here, computer simulation of biomass is employed to understand the physical origins of biomass recalcitrance. The temperature-dependent structure and dynamics of lignin polymers in aqueous solution are examined using extensive molecular dynamics simulations. Neutron scattering experiments and molecular dynamics simulations reveal the structure of lignin aggregates. Finally, the interaction of lignin with cellulose is examined and differential binding to crystalline and amorphous cellulose explained thermodynamically. This session is part of "S0606 - GPU-accelerated Science on Titan: Tapping into the World's Preeminent GPU Supercomputer to Achieve Better Science" mini-track with Dr. Jack Wells.

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2659
Streaming:
Download:
 
Transforming Modeling and Simulation for Nuclear Energy Applications
John A. Turner (Oak Ridge National Laboratory)
The Consortium for Advanced Simulation of Light-Water Reactors (CASL), is a U.S. Department of Energy Innovation Hub, established July 2010 to develop and apply advanced modeling and simulation to operating nuclear power plants. Through increase ...Read More

The Consortium for Advanced Simulation of Light-Water Reactors (CASL), is a U.S. Department of Energy Innovation Hub, established July 2010 to develop and apply advanced modeling and simulation to operating nuclear power plants. Through increases in power, plant lifetime extension, higher fuel burnup, and enhanced safety, CASL will reduce operating costs and enable delivery of more carbon-free electricity to the U.S. power grid. To achieve these goals, CASL is building the Virtual Environment for Reactor Applications (VERA), a system for analysis of phenomena within nuclear reactors. Since computational demands are considerable, VERA is being developed as a scalable system, able to take advantage of platforms ranging from high-end workstations to the largest leadership-class supercomputers such as Titan at Oak Ridge National Laboratory (ORNL).

  Back
 
Keywords:
HPC and Supercomputing, GTC 2012 - ID S2660
Streaming:
Download:
Life & Material Science
Presentation
Media
GPU-Enabled Spatiotemporal Model of Stochastic Cardiac Calcium Dynamics and Arrhythmias
M. Saleet Jafri (George Mason University), Hoang-Tron Minh Tuan (George Mason University)
Calcium ions play a central role controlling the contraction of the heart to pump blood. This requires tight regulation of cellular calcium dynamics which depends upon over 1,000,000 calcium channels that open and close stochastically and have a ...Read More

Calcium ions play a central role controlling the contraction of the heart to pump blood. This requires tight regulation of cellular calcium dynamics which depends upon over 1,000,000 calcium channels that open and close stochastically and have a very specific spatial arrangement. In the School of Systems Biology at George Mason University, CUDA technology coupled to novel algorithms for Monte Carlo simulation have made possible this computationally expensive spatiotemporal model of calcium dynamics in the heart muscle cell to study the regulation of calcium dynamics and what aberrations leads to cardiac arrhythmia.

  Back
 
Keywords:
Life & Material Science, GTC 2012 - ID S2072
Streaming:
Download:
 
GPU-Accelerated Model-Based Drug Development
Chee Ng (Children Hospital of Philadelphia/University of Pennsylvania)
Explore how GPUs can be used to improve the efficiency of drug development. Drug development is a very time-consuming, complex and expensive process that has low successful rate. A model-based drug development paradigm has been proposed as a pos ...Read More

Explore how GPUs can be used to improve the efficiency of drug development. Drug development is a very time-consuming, complex and expensive process that has low successful rate. A model-based drug development paradigm has been proposed as a possible solution to overcome these problems. A key challenge is to develop computational intensive drug and disease-specific models from a large quantity of highly complicated preclinical and clinical data. This session will describe how GPUs can and will play a key role in shortening the model development times and improving the efficiency of model-based drug development.

  Back
 
Keywords:
Life & Material Science, GTC 2012 - ID S2262
Download:
 
GPU GWAS - CUDA Based Genome Wide Association Studies
Tim Bi (Johns Hopkins University / George Mason University)
We have developed a CUDA based GWAS analyzer that has achieved a 10x analysis speed-up per GPU. Genome wide association studies scans through millions of SNP markers across the human genome seeking the genetic basis of life threatening diseases ...Read More

We have developed a CUDA based GWAS analyzer that has achieved a 10x analysis speed-up per GPU. Genome wide association studies scans through millions of SNP markers across the human genome seeking the genetic basis of life threatening diseases such as coronary artery disease and prostate cancer. The prospect of the $1,000 genome heralds a potential new scale of GWAS involving hundreds of thousands of patients. We will discuss how we utilized the Python, R, and C languages to produce a robust GWAS algorithm that can be extended to multiple GPUs and GPU clusters.

  Back
 
Keywords:
Life & Material Science, GTC 2012 - ID S2272
Streaming:
Download:
 
Large and Sparse - Mass Spectrometry Data Processing in the GPU
Jose de Corral (Waters Corporation)
Learn how the GPU helps identify millions of ions in datasets of several billion points of four-dimensional sparse data. The data is first reduced to 3D to locate regions of dense data, and then only those regions are processed in 4D. Processing ...Read More

Learn how the GPU helps identify millions of ions in datasets of several billion points of four-dimensional sparse data. The data is first reduced to 3D to locate regions of dense data, and then only those regions are processed in 4D. Processing involves combining several steps of convolution filters in three axes, finding local maximums in volumes of data, and extracting information from the data around each local maximum.

  Back
 
Keywords:
Life & Material Science, GTC 2012 - ID S2327
Streaming:
Download:
Machine Learning & Deep Learning
Presentation
Media
Designing Killer CUDA Applications for X86, multiGPU, and CPU+GPU
Robert Farber (BlackDog Endeavors, LLC)
CUDA redefined software development with 10 to 1000-times faster GPU applications. Now a single CUDA source tree can support the x86 mass market (no GPU required) and 1/3 billion CUDA-enabled GPUs. MultiGPU and CPU+GPU apps utilize all system re ...Read More

CUDA redefined software development with 10 to 1000-times faster GPU applications. Now a single CUDA source tree can support the x86 mass market (no GPU required) and 1/3 billion CUDA-enabled GPUs. MultiGPU and CPU+GPU apps utilize all system resources. GPUdirect, UVA, caches, prefetching, ILP (Instruction level Parallelism), automated analysis tools and more offer ease, capability, and performance. The overall impact on software investment, scalability, balance metrics, programming API, and lifecycle will be considered. Working real-time video and other examples from my book, "CUDA Application Design and Development" provide practical insight to enable augmented reality and your killer apps.

  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2012 - ID S2038
Streaming:
Download:
 
Improving Mars Rover Image Compression Via GPUs And Genetic Algorithms
Brendan Babb (University of Alaska Anchorage)
Learn how to use Jacket to accelerate genetic algorithm (GA) image compression. Our research uses a GA to optimize lossy compression transforms that outperform state-of-the-art wavelet-based approaches for a variety of image classes, including f ...Read More

Learn how to use Jacket to accelerate genetic algorithm (GA) image compression. Our research uses a GA to optimize lossy compression transforms that outperform state-of-the-art wavelet-based approaches for a variety of image classes, including fingerprints, satellite, medical, and images transmitted from the Mars Exploration Rovers. A typical training run evolves a population of transforms over many generations; since each transform must be applied to each image from the training set, each run entails thousands of independent, parallelizable fitness evaluations. By using MATLAB, and Jacket to perform 2D convolution on the GPU, we have greatly reduced the total computation time needed.

  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2012 - ID S2133
Streaming:
Download:
 
Efficient k-Nearest Neighbor Search Algorithms on GPUs
Nikos Pitsianis (Aristotle University Greece), Xiaobai Sun (Duke University)
Come see how to select the k smallest elements from an unsorted list. We present a selection and combination of different algorithms that perform exact k-nearest neighbors search (k-NNS) on GPUs and outperform the competition. In this session we ...Read More

Come see how to select the k smallest elements from an unsorted list. We present a selection and combination of different algorithms that perform exact k-nearest neighbors search (k-NNS) on GPUs and outperform the competition. In this session we present four different selection algorithms designed to exploit differently the parallelization of the GPU according to the relative size of the corpus data set, the size of the query set and the number of neighbors sought. We show the application of Logo Retrieval with SIFT vector matching on two different GPUs, the Tesla C1060 and the Fermi GTX480.

  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2012 - ID S2314
Streaming:
Download:
Medical Imaging
Presentation
Media
4D Medical Image Processing with CUDA
Anders Eklund (Linkoping University)
Learn how to do 4D image processing with CUDA, especially for medical imaging applications. In this session we will give a couple of examples of how 4D image processing can take advantage of the computational power of the GPU. We will present ho ...Read More

Learn how to do 4D image processing with CUDA, especially for medical imaging applications. In this session we will give a couple of examples of how 4D image processing can take advantage of the computational power of the GPU. We will present how to use the GPU for functional magnetic resonance imaging (fMRI) analysis and true 4D image denoising. Most of our examples use the GPU both to speedup the analysis and to visualize the results.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2017
Streaming:
Download:
 
Hardware Acceleration for Vessel Visualization Tasks
Christoph Kubisch (NVIDIA)
To analyze datasets visually, systems with fast feedback loops on user interaction are beneficial. In this session rendering and preprocessing techniques for medical volume data will be presented using OpenGL and CUDA. In the context of the coro ...Read More

To analyze datasets visually, systems with fast feedback loops on user interaction are beneficial. In this session rendering and preprocessing techniques for medical volume data will be presented using OpenGL and CUDA. In the context of the coronary artery disease the analysis of individual vessel branches is important. We show how local transfer function application and generation by means of histogramm analysis can help navigating and finding details in the datasets. Furthermore, domain-specific acceleration and illustration techniques for volume rendering are also applied to datasets from brain aneurysms.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2105
Streaming:
Download:
 
GPU-Accelerated Optical Coherence Tomography Imaging
Kang Zhang (GE Global Research)
We developed a series of GPU-based technologies to accelerate the imaging reconstruction and visualization for optical coherence tomography (OCT). Several GPU-based algorithms such as non-uniform fast Fourier transform, numerical dispersion comp ...Read More

We developed a series of GPU-based technologies to accelerate the imaging reconstruction and visualization for optical coherence tomography (OCT). Several GPU-based algorithms such as non-uniform fast Fourier transform, numerical dispersion compensation, simultaneous phase modulation and multi-GPU implementation were developed to achieve improved impulse response, better SNR, doubled imaging range and higher system stability. The GPU-accelerated 4D-OCT system was validated by imaging both in vivo and ex vivo biological tissues. This technology overcomes the imaging reconstruction and visualization bottlenecks that widely exist in current ultrahigh speed OCT systems and opens the way to interventional OCT imaging for applications in guided microsurgery.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2141
Streaming:
Download:
 
GPU Acceleration for Threshold Based Region Growth Algorithms.
Supratik Moulik (University of Pennsylvania), Jason Walsh (University of Pennsylvania 3D lab)
Come learn how the massively parallel computing power of modern GPUs help to create faster and more accurate volume rendered images for the medical imaging community. Attendees of this session will gain insight into how GPUs can accelerate regio ...Read More

Come learn how the massively parallel computing power of modern GPUs help to create faster and more accurate volume rendered images for the medical imaging community. Attendees of this session will gain insight into how GPUs can accelerate region growth algorithms and how these algorithms can be optimized for the latest generation of NVIDIA hardware. Topics covered will include fundamental of region growth, GPU implementations, and practical examples of vessel tracking algorithms based on GPU accelerated algorithms.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2303
Download:
 
GPU Implementation for Rapid Iterative Image Reconstruction in Nuclear Medicine
Jakub Pietrzak (Centre of Oncology, Warsaw, Poland)
GPU implementation can greatly accelerate iterative techniques of 3D image reconstruction in nuclear medicine imaging. Single Photon Emission Computed Tomography (SPECT) is a functional imaging modality widely used in clinical diagnosis. To obta ...Read More

GPU implementation can greatly accelerate iterative techniques of 3D image reconstruction in nuclear medicine imaging. Single Photon Emission Computed Tomography (SPECT) is a functional imaging modality widely used in clinical diagnosis. To obtain high quality images within reduced scanning times high sensitivity collimators need to be used and their response function modeled in the reconstruction. This is in general very computationally intensive and unfeasible with CPU and algorithm implementations. Our software is able to perform the reconstruction of patient data within clinically acceptable times using relatively low cost and widely available hardware.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2312
Streaming:
Download:
 
GPUs Open New Avenues in Medical MRI
Chris A. Cocosco (University Medical Center Freiburg, Dept. of Radiology, Medical Physics.)
See how GPUs enable exciting new developments in medical Magnetic Resonance Imaging (MRI). Their computational power makes now practical new MRI techniques that can bring shorter imaging sessions, better images, and more insight into human physi ...Read More

See how GPUs enable exciting new developments in medical Magnetic Resonance Imaging (MRI). Their computational power makes now practical new MRI techniques that can bring shorter imaging sessions, better images, and more insight into human physiology. Learn about the characteristics of the general computational approach for obtaining the final image, and how it can be implemented using an iterative conjugate gradient algorithm. The algorithm exhibits massive parallelism and fits well the GPU architecture. Learn about its CUDA implementation details and Matlab integration. See throughput measurements of Tesla GPUs compared to top of the line many-core and large RAM CPU systems.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2348
Streaming:
Download:
Mobile Applications
Presentation
Media
File Sharing Plus Real Time Media and Document Collaboration
Kevin Jackson (Viewpartners)
Studiopass is a cloud based file sharing and visual collaboration tool which allows participants to collaborate on Microsoft documents and media files including 1080p video. It is graphic intensive and requires the best GPU performance to push p ...Read More

Studiopass is a cloud based file sharing and visual collaboration tool which allows participants to collaborate on Microsoft documents and media files including 1080p video. It is graphic intensive and requires the best GPU performance to push playback of heavy files. This session will discuss how NVIDIA Tegra powered devices delivers the graphic and video performance needed for efficient collaboration needs and how it will bring more acceleration with the new Tegra 3 Quad Core plus 1. Studiopass collaboration is not only accelerated by Tegra devices but also leverages NVIDIA Tesla accelerated transcoding running on Amazon Web Services.

  Back
 
Keywords:
Mobile Applications, GTC 2012 - ID S2425
Streaming:
Download:
Molecular Dynamics
Presentation
Media
Towards Routine Microsecond Molecular Dynamics Simulations on Commodity Hardware
Ross Walker (University of California San Diego)
The original AMBER 11 provided performance on one GPU equivalent to an 8 node cluster and almost 60ns/day for 8 GPUs running the JAC production benchmark without additional approximations outstripping the performance of all conventional supercom ...Read More

The original AMBER 11 provided performance on one GPU equivalent to an 8 node cluster and almost 60ns/day for 8 GPUs running the JAC production benchmark without additional approximations outstripping the performance of all conventional supercomputers. Here we describe further optimization of the code, coupled with hardware and software advances on the part of NVIDIA, that provides performance of >50ns/day on a single GPU with multiple GPUs providing simulation rates on systems the size of DHFR approaching a microsecond per day. This brings performance levels on desktops and commodity hybrid clusters to levels previously only considered possible using custom silicon.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2010
Streaming:
Download:
 
GPU-Accelerated Molecular Dynamics Simulation of Solid Covalent Crystals
Wei Ge (Institute of Process Engineering, Chinese Academy of Sciences)
An efficient and highly scalable algorithm for molecular dynamics (MD) simulation (using sophisticated many-body potentials) of solid covalent crystals is presented. Its effective memory throughput on a single C2050 GPU board reached 102 GB/s (8 ...Read More

An efficient and highly scalable algorithm for molecular dynamics (MD) simulation (using sophisticated many-body potentials) of solid covalent crystals is presented. Its effective memory throughput on a single C2050 GPU board reached 102 GB/s (81% of the peak), the instruction throughput reached 412 Ginstr/s (80% of the peak), and 27% of the peak flops of a single GPU was obtained. Parallel efficiency of the algorithm can be as high as 95% on all 7168 GPUs of Tianhe-1A, reaching possibly a record in high performance of MD simulations, 1.87Pflops in single precision.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2057
Streaming:
Download:
 
Advancing GPU Molecular Dynamics: Rigid Bodies in HOOMD-blue
Joshua Anderson (University of Michigan)
Learn how rigid body dynamics are implemented in HOOMD-blue. Previous releases were capable of executing classical molecular dynamics -- where free particles interact via smooth potentials and their motion through time is computed using Newton&# ...Read More

Learn how rigid body dynamics are implemented in HOOMD-blue. Previous releases were capable of executing classical molecular dynamics -- where free particles interact via smooth potentials and their motion through time is computed using Newton's laws. The latest version allows particles to be grouped into bodies that move as rigid units. Users can now simulate materials made of cubes, rods, bent rods, jacks, plates, patchy particles, bucky balls, or any other arbitrary shapes. This talk covers how these algorithms are implemented on the GPU, tuned to perform well for bodies of any size, and discusses several use-cases relevant to research.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2058
Streaming:
Download:
 
An Innovative Massively Parallelized Molecular Dynamic Software
Thomas Guignon (IFPEN)
In this paper, we present how we improved the speedup of the electronic structure calculator VASP by more than an order of magnitude. Recently, the research works done (at IFP Energies Nouvelles) have shown that by coupling traditional clusters ...Read More

In this paper, we present how we improved the speedup of the electronic structure calculator VASP by more than an order of magnitude. Recently, the research works done (at IFP Energies Nouvelles) have shown that by coupling traditional clusters or High Performance Computing (HPC) machines with accelerators based on graphical processor units (GPUs), by recording the most time consuming parts of the codes (with programming languages like CUDA, OpenCL) and offloading them on the graphic chips, it is possible to reduce the computing time to ensure a speedup of a factor of 5 to 15.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2108
Streaming:
Download:
 
Computational Screening of Novel Carbon Capture Materials
Jihan Kim (Berkeley Lab), Berend Smit (UC Berkeley/Berkeley Lab)
Discover how GPUs are used to identify optimal framework structures for carbon dioxide separation with the goal of reducing carbon emission. We describe the algorithm behind our GPU software tool that iterates through a database of hypothetical ...Read More

Discover how GPUs are used to identify optimal framework structures for carbon dioxide separation with the goal of reducing carbon emission. We describe the algorithm behind our GPU software tool that iterates through a database of hypothetical zeolites and computes the selectivity of each of the structures. The code can be easily extended to simulate other adsorbent structures such as ZIFs (zeolitic imidazolate frameworks) and provide valuable insights to both theorists and experimentalists who have interest in carbon capture research.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2122
Streaming:
Download:
 
Petascale Molecular Dynamics Simulations on GPU-Accelerated Supercomputers
James Phillips (University of Illinois at Urbana-Champaign)
The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduc ...Read More

The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. How do the Cray XK6 and modern GPU clusters compare to 300,000 CPU cores for a hundred-million-atom Blue Waters acceptance test? Come learn the opportunities and pitfalls of taking GPU computing to the petascale and the importance of CUDA 4.0 features in combining multicore host processors and GPUs in a legacy message-driven application.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2127
Streaming:
Download:
 
GPU-Based Molecular Dynamics Simulations of Protein and RNA Assembly
Samuel Cho (Wake Forest University)
Protein and RNA biomolecular folding and assembly problems have important applications because misfolding is associated with diseases like Alzheimer's and Parkinson's. However, simulating complex biomolecules on the same timescales as ex ...Read More

Protein and RNA biomolecular folding and assembly problems have important applications because misfolding is associated with diseases like Alzheimer's and Parkinson's. However, simulating complex biomolecules on the same timescales as experiments is an extraordinary challenge due to a bottleneck in the force calculations. To overcome these hurdles, we perform coarse-grained molecular dynamics simulations where biomolecules are reduced into simpler components. Furthermore, our GPU-based simulations have a significant performance improvement over CPU-based simulations, which is limited to systems of 50-150 residues/nucleotides. The GPU-based code can simulate protein/RNA systems of 400-10,000+ residues/nucleotides, and we present ribosome assembly simulations.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2139
Streaming:
Download:
 
VMD: High Performance Molecular Visualization and Analysis on GPUs
John Stone (University of Illinois at Urbana-Champaign)
This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on desktop computers, and batch-mode simulation and analysis jobs on GPU-accelerated HPC clusters. We'll present ...Read More

This talk will present recent successes in the use of GPUs to accelerate interactive molecular visualization and analysis tasks on desktop computers, and batch-mode simulation and analysis jobs on GPU-accelerated HPC clusters. We'll present Fermi-specific algorithms and optimizations and compare with those for other devices. We'll also present performance and performance/watt results for VMD analysis calculations on GPU clusters, and conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2142
Streaming:
Download:
 
A Study of Persistent Threads Style Programming Model for GPU Computing
Kshitij Gupta (UC Davis), Jeff Stuart (UC Davis)
We present the usefulness of a new style of GPU programming called Persistent Threads, known to be useful on irregular workloads. First, we will begin by formally defining the PT model. We will then categorize use of PT into four "use cases ...Read More

We present the usefulness of a new style of GPU programming called Persistent Threads, known to be useful on irregular workloads. First, we will begin by formally defining the PT model. We will then categorize use of PT into four "use cases", and present micro-benchmark analyses of when this model is useful over traditional kernel formulations. Third, we will show a full speech recognition application that uses all four PT use cases. Finally, we will conclude our talk by suggesting appropriate modifications to GPU hardware, software, and APIs that make PT kernels both easier to implement and more efficient.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2157
Streaming:
Download:
 
Terascale Volume Visualization in Neuroscience
Johanna Beyer (KAUST), Markus Hadwiger (KAUST)
Learn how to create a scalable volume visualization system for interactive rendering of terascale EM data. We will describe the major design principles, how we can avoid the standard approach of pre-computing a 3D multi-resolution hierarchy such ...Read More

Learn how to create a scalable volume visualization system for interactive rendering of terascale EM data. We will describe the major design principles, how we can avoid the standard approach of pre-computing a 3D multi-resolution hierarchy such as an octree, and how to handle continuous streaming of newly acquired data. For rendering we build upon a visibility-driven approach and 3D virtual texturing, and perform interactive volume rendering of a "virtual" volume, where the corresponding physical storage is only represented and populated in a sparse manner with 2D instead of 3D image data on the fly during rendering.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2202
Streaming:
Download:
 
GPU Enabled Macromolecular Simulation: Challenges and Opportunities
Michela Taufer (University of Delaware), Sandeep Patel (University of Delaware)
GPU enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive pa ...Read More

GPU enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism in the order of several hundreds to few thousands of cores, presents opportunities as well poses implementation challenges. In this talk dive deep into the various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. Learn some of the underlying challenges and get the latest solutions devised to tackle them in the FEN ZI code for fully atomistic macromolecular simulations.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2207
Streaming:
Download:
 
Probing Bio-Nano Interface Structure from Microsecond Molecular Dynamics on GPUs
Olexandr Isayev (Case Western Reserve University)
Using the latest algorithmic development in molecular dynamics on multiple GPUs over MPI, and technologies like GPUDirect it is now possible to address problems of interaction at bio-nano interface via large scale atomistic simulations. This tal ...Read More

Using the latest algorithmic development in molecular dynamics on multiple GPUs over MPI, and technologies like GPUDirect it is now possible to address problems of interaction at bio-nano interface via large scale atomistic simulations. This talk will discuss the aspects of DNA-nanotube interactions and SWCNT induced conformational changes in DNA nucleosome structure. We will also address technical challenges upon porting and tuning AMBER 11 code on Condor GPU cluster at AFRL.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2315
Streaming:
Download:
 
Strong Scaling for Molecular Dynamics Applications
Sarah Tariq (NVIDIA)
In this session we will talk about how to improve strong scaling for molecular dynamics applications. Using the NAMD molecular dynamics code as our primary case study, we will discuss the types of issues that can impede scaling, how to use alrea ...Read More

In this session we will talk about how to improve strong scaling for molecular dynamics applications. Using the NAMD molecular dynamics code as our primary case study, we will discuss the types of issues that can impede scaling, how to use already available and custom tools to discover such issues, and how to build a model to help analyze and predict scaling performance. Although this session is primarily focused on molecular dynamics applications, most of the lessons can be applied equally well to many other areas and applications.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2351
Streaming:
Download:
 
Efficient Molecular Dynamics on Heterogeneous GPU Architectures in GROMACS
Molecular Dynamics is an important application for GPU acceleration, but many algorithmic optimizations and features still rely on code that prefers traditional CPUs. It is only with the latest hardware and software we have been able to realize ...Read More

Molecular Dynamics is an important application for GPU acceleration, but many algorithmic optimizations and features still rely on code that prefers traditional CPUs. It is only with the latest hardware and software we have been able to realize a heterogeneous GPU/CPU implementation and reach performance significantly beyond the state-of-the-art of hand-tuned CPU code in our GROMACS program. The sub-millisecond iteration time poses challenges on all levels of parallelization. Come and learn about our new atom-cluster pair interaction approach for non-bonded force evaluation that achieves 60% work-efficiency and other innovative solutions for heterogeneous GPU systems.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2363
Streaming:
Download:
 
Molecule Dynamics, GPUs, and EC2 (Presented by Amazon)
Scott Le Grand (Amazon Web Services)
GPUs have made molecular dynamics simulations faster, better, and cheaper, achieving supercomputer performance from a single GPU without sacrificing stability or accuracy. In this talk we demonstrate how the GPU refactoring of AMBER 12 Molecular ...Read More

GPUs have made molecular dynamics simulations faster, better, and cheaper, achieving supercomputer performance from a single GPU without sacrificing stability or accuracy. In this talk we demonstrate how the GPU refactoring of AMBER 12 Molecular Dynamics has led to an implementation that produces results that are indistinguishable from the original CPU code. In addition, we describe the GPU compute instances available on the Amazon EC2 platform to show how anyone can run any number of AMBER 12 simulations, anytime from anywhere.

  Back
 
Keywords:
Molecular Dynamics, GTC 2012 - ID S2644
Streaming:
Download:
Quantum Chemistry
Presentation
Media
Enabling Faster Material Science Modeling Using the Accelerated Quantum ESPRESSO
Filippo Spiga (Irish Centre for High-End Computing)
The goal of this session is to present the advantages of mixing CUDA libraries and CUDA kernels to deliver a robust community package for material science modeling that fully exploits multi-core systems equipped with GPUs. The Plane-Wave Self-Co ...Read More

The goal of this session is to present the advantages of mixing CUDA libraries and CUDA kernels to deliver a robust community package for material science modeling that fully exploits multi-core systems equipped with GPUs. The Plane-Wave Self-Consistent Field (PWscf) code of the Quantum ESPRESSO suite is the focus of this work. During the session the main computation-dependent components, that also represent fundamental building blocks for many other quantum chemistry codes, will be discussed and analyzed. Subsequently an in-depth performance assessment of several realistic scientific cases will be presented, starting from single workstations to large clusters equipped with hundreds of GPUs.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2220
Streaming:
Download:
 
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
Antonino Tumeo (Pacific Northwest National Laboratory), Oreste Villa (Pacific Northwest National Laboratory)
This talk discuss the development of a Domain-Specific Language (DSL), the tools and the related runtime for efficiently generating Tensor Contractions (generalized matrix multiplications), an important part of many quantum chemistry methods (e. ...Read More

This talk discuss the development of a Domain-Specific Language (DSL), the tools and the related runtime for efficiently generating Tensor Contractions (generalized matrix multiplications), an important part of many quantum chemistry methods (e.g. Coupled Cluster Theory). Starting from a high level description of the computation, the tool analyses it and generates optimized C, OpenCL or CUDA implementations. The runtime, supporting a task based computation model, is then able to execute the generated code on GPU-accelerated heterogeneous large scale clusters, maximizing the utilization of the processing elements and minimizing communication costs.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2343
Streaming:
Download:
 
VASP Accelerated with GPUs
Maxwell Hutchinson (University of Chicago)
This session will detail the performance and capabilities of GPU-accelerated VASP, explain design decisions made in porting VASP to CUDA, and present a roadmap for GPU accelerated VASP development. We've achieved performance improvements up ...Read More

This session will detail the performance and capabilities of GPU-accelerated VASP, explain design decisions made in porting VASP to CUDA, and present a roadmap for GPU accelerated VASP development. We've achieved performance improvements up to around 20x on systems of around 100 ions and have implemented exact-exchange. We are working on ports of more conventional functionality.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2378
Streaming:
Download:
 
Large-Scale First Principle Pseudopotential DFT Calculations on GPU Clusters
WeiLe Jia (Supercomputing Center of CNIC, Chinese Academy of Sciences)
In this session, we will present a series of work on density functional theory (DFT) plane wave pseudopotential(PWP) calculations on GPU clusters. The GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate ~1000 atoms o ...Read More

In this session, we will present a series of work on density functional theory (DFT) plane wave pseudopotential(PWP) calculations on GPU clusters. The GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate ~1000 atoms on thousands of processors. Our test indicates that the GPU version can have a ~20 times speedup over CPU code. A detail analysis of the speed-up and the scaling on the number of CPU/GPU(up to 256) will be presented.As far as we know, this is the first GPU DFT-PWP code scalable to large number of CPU/GPU.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2392
Streaming:
Download:
 
Quantum Chemistry: Automated Code Generation and Optimization for GPU Kernels
Alexey Titov (Stanford), Ivan Ufimtsev (Stanford)
In this session we discuss the challenges encountered in development of quantum chemistry software for GPUs from scratch and optimization of the kernels for the best performance. We attempt to create a unified framework for automatic generation ...Read More

In this session we discuss the challenges encountered in development of quantum chemistry software for GPUs from scratch and optimization of the kernels for the best performance. We attempt to create a unified framework for automatic generation of efficient quantum chemistry codes tailored individually for various GPU (NVidia, ATI) and CPU architectures and programming (CUDA, OpenCL, C/C++) languages using a meta-programming approach based on a computer algebra system. We demonstrate its utility by generating highly optimized GPU and CPU kernels dealing with various integrals over Gaussian basis functions implemented in the TeraChem quantum chemistry package.

  Back
 
Keywords:
Quantum Chemistry, GTC 2012 - ID S2429
Streaming:
Download:
Rendering and Ray Tracing
Presentation
Media
OptiX for DirectX Programmers - Eve Online's GPU-Raytraced Portraits
Bert Peers (CCP Games)
By integrating NVIDIA's OptiX system for real-time GPU raytracing into a DirectX9 based engine, CCP Games enables high-quality raytraced player portraits for the single shard MMO EVE Online, reusing the game's assets and pipeline. We sel ...Read More

By integrating NVIDIA's OptiX system for real-time GPU raytracing into a DirectX9 based engine, CCP Games enables high-quality raytraced player portraits for the single shard MMO EVE Online, reusing the game's assets and pipeline. We selectively add stochastic effects while closely maintaining the look of the DX9-based renderer that Art Direction aimed for. In this talk we approach OptiX from the point of view of a programmer familiar with DirectX, discuss integrating these two systems, and show how we reproduced some DirectX-based effects like transparency and subsurface scattering within OptiX.

  Back
 
Keywords:
Rendering and Ray Tracing, GTC 2012 - ID S2021
Streaming:
Download:
 
Advanced Driver Assistance System Testing Using OptiX
Erwin Roth (Technische Universitaet Muenchen), Tugkan Calapoglu (VIRES Simulationstechnologie GmbH)
Learn in this session how the AUDI AG and its partners make use of OptiX as a unified platform for the simulation of perception sensors utilizing different physical measurement principles, e.g. Video Camera, LIDAR, Ultra Sonic, etc. The aim is t ...Read More

Learn in this session how the AUDI AG and its partners make use of OptiX as a unified platform for the simulation of perception sensors utilizing different physical measurement principles, e.g. Video Camera, LIDAR, Ultra Sonic, etc. The aim is to generate synthetic sensor data with realistic measurement errors for testing Advanced Driver Assistance Systems. Get details about the challenges they faced during the implementation of the necessary tools for validating the sensor models and join the discussion when they describe the upcoming challenges related to real-time Ray Tracing and advanced material descriptions, when multiple sensors are simulated simultaneously.

  Back
 
Keywords:
Rendering and Ray Tracing, GTC 2012 - ID S2319
Streaming:
Download:
 
OptiX Out-of-Core and CPU Rendering
David McAllister (NVIDIA), James Bigler (OptiX group NVIDIA)
OptiX has broken some major barriers recently by enabling out-of-GPU-core memory rendering and by adding a CPU rendering back-end when an OptiX-capable GPU is not present in the system. OptiX users and CUDA developers will be interested in how w ...Read More

OptiX has broken some major barriers recently by enabling out-of-GPU-core memory rendering and by adding a CPU rendering back-end when an OptiX-capable GPU is not present in the system. OptiX users and CUDA developers will be interested in how we accomplished these feats within the existing GPU architecture. This talk will provide a brief introduction to OptiX and then dive into what the new features provide. We will then go under the covers and show how we pulled it off.

  Back
 
Keywords:
Rendering and Ray Tracing, GTC 2012 - ID S2366
Streaming:
Download:
Visualization
Presentation
Media
Mixing Graphics and Compute with Multiple GPUs
Alina Alt (NVIDIA)
In this session we will cover all the different aspects of interaction between graphics and compute. The first part of the session will focus on compute API interoperability with OpenGL (using CUDA and OpenCL APIs), while the second part of the ...Read More

In this session we will cover all the different aspects of interaction between graphics and compute. The first part of the session will focus on compute API interoperability with OpenGL (using CUDA and OpenCL APIs), while the second part of the session will delve into interoperability at a system level. In particular we will go through the challenges and benefits of dedicating one GPU for compute and another for graphics, how different system configurations affect data transfer between two GPUs, and how it translates into application design decisions helping to enable an efficient, cross-GPU interoperability between compute and graphics contexts. This talk is repeated on Thursday at 3:30 PM (S0267B)

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2267A
Streaming:
Download:
 
Mixing Graphics and Compute with Multiple GPUs
Alina Alt (NVIDIA)
In this session we will cover all the different aspects of interaction between graphics and compute. The first part of the session will focus on compute API interoperability with OpenGL (using CUDA and OpenCL APIs), while the second part of the ...Read More

In this session we will cover all the different aspects of interaction between graphics and compute. The first part of the session will focus on compute API interoperability with OpenGL (using CUDA and OpenCL APIs), while the second part of the session will delve into interoperability at a system level. In particular we will go through the challenges and benefits of dedicating one GPU for compute and another for graphics, how different system configurations affect data transfer between two GPUs, and how it translates into application design decisions helping to enable an efficient, cross-GPU interoperability between compute and graphics contexts. This talk is repeated on Tuesday at 5:00 PM (S0267A)

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2267A
Streaming:
Download:
 
Warping & Blending for Multi-Display Systems
Shalini Venkataraman (NVIDIA)
This talk will describe how to scale up from one to many displays for high end visualization. You will learn about NVIDIA's new Warp and Blend capability that allows you to create a truly seamless logical display comprised of many individual ...Read More

This talk will describe how to scale up from one to many displays for high end visualization. You will learn about NVIDIA's new Warp and Blend capability that allows you to create a truly seamless logical display comprised of many individual display outputs. With this new capability you can project your graphics onto curved surfaces and implement the correct transformation entirely on the GPU without any external hardware to get the correct display transformations.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2322
Streaming:
Download:
 
Content Generation and Real-Time Hologram Computation for Holographic 3D-Displays
Enrico Zschau (SeeReal Technologies GmbH)
This session will introduce SeeReal's sub-hologram technology to massively reduce hologram computation effort in comparison to classic holography and how SeeReal implemented those still compute intensive algorithms to execute on the GPU to e ...Read More

This session will introduce SeeReal's sub-hologram technology to massively reduce hologram computation effort in comparison to classic holography and how SeeReal implemented those still compute intensive algorithms to execute on the GPU to enable viewing of interactive, rich 3D-content on holographic 3D-displays using off-the-shelf graphics hardware. In contrast, you will explore why classic holography does not suit well for interactive applications. Furthermore guidelines to create appropriate 3D-content are presented, including aspects regarding transparency in holograms. Finally the specification and some impressions of SeeReal's 20 holographic prototype will be presented, which allows viewing of live computed holograms showing 3D-content and 3D-video.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2324
Streaming:
Download:
 
Next Generation InfoWall
Andrew Page (NVIDIA), Ian Williams (NVIDIA), Shalini Venkataraman (NVIDIA), Alina Alt (NVIDIA)
Learn how you can use a multiple display configuration to render video content captured from multiple sources, utilizing the power of GPUs to achieve unprecedented performance. ...Read More

Learn how you can use a multiple display configuration to render video content captured from multiple sources, utilizing the power of GPUs to achieve unprecedented performance.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2326
Streaming:
Download:
 
Volumetric Processing and Visualization on Heterogeneous Architecture
Wei Li (Siemens Corporation)
Volumetric data is typically very large and involves intensive computation for processing and visualization. We have developed an OpenCL-based framework that can utilize all available resources in a system or a cluster of systems. The framework ...Read More

Volumetric data is typically very large and involves intensive computation for processing and visualization. We have developed an OpenCL-based framework that can utilize all available resources in a system or a cluster of systems. The framework manages one or more OpenCL devices. A large volume is partitioned into bricks. Each OpenCL device is associated with a set of brick producers that generates the contents of bricks while optionally utilizing other bricks as input. The framework is also composed of a scheduler that distributes brick workloads to different devices and chooses an optimized processing order aiming at certain criteria.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2342
Streaming:
Download:
 
Programming Multi-GPUs for Scalable Rendering
Shalini Venkataraman (NVIDIA)
Multi-GPU configurations are becoming common affordable options for OpenGL applications to scale performance, data size, display size and image quality. We show how to structure your application for multi-gpu rendering by using multiple threads ...Read More

Multi-GPU configurations are becoming common affordable options for OpenGL applications to scale performance, data size, display size and image quality. We show how to structure your application for multi-gpu rendering by using multiple threads and OpenGL contexts and handle the synchronization and data transfer. We conclude with a discussion of how to implement common parallel rendering approaches such as sort-first, sort-last and hybrid techniques.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2353
Streaming:
Download:
 
Seamless Scalable Displays - Using NVDIA Warp + Intensity API
Rajeev Surati (Scalable Display Technologies)
In this talk we will discuss how we use the NVIDIA Warp and Intensity API to create seamless displays made up of multiprojectors based on our camera feedback systems. We will show and discuss case studies in production including a 25 megapixel t ...Read More

In this talk we will discuss how we use the NVIDIA Warp and Intensity API to create seamless displays made up of multiprojectors based on our camera feedback systems. We will show and discuss case studies in production including a 25 megapixel touch wall, military dome simulation systems, VR Walls, VR Caves, and immersive conference rooms that are made affordable and enabled by this technology.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2355
Streaming:
Download:
 
Integrated GPU Acceleration With Real Time Visualization Of Terabyte Data
Kelly Walker (Hue)
Computation and visualization doesn't necessarily have to act as two separate entities. This talk explains the integration of real-time compute with real-time visualization. Industry and academia have provided attractive solutions for compil ...Read More

Computation and visualization doesn't necessarily have to act as two separate entities. This talk explains the integration of real-time compute with real-time visualization. Industry and academia have provided attractive solutions for compiler-directive optimized code for computations. To support cases that involves massive yet ad-hoc data I/O and computation with interactive visualization, Hue developed a different model which bridges the gap between "complete system rewrite" and "compiler directive optimized code". The talk explains how highly optimized data I/O mechanisms coupled with predefined input and output definitions for kernels provide excellent scalability and interactivity during runtime.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2436
Streaming:
Download:
 
Interactive and Scalable Subsurface Data Visualization Framework
Tom-Michael Thamm (NVIDIA ARC), Marc Nienhaus (NVIDIA ARC)
The goal is to present an interactive visualization framework for large geo-spatial data. This framework has been developed by NVIDIA Advanced Rendering Center for the oil and gas (Hydrocarbone) industry. The Cuda based application is running on ...Read More

The goal is to present an interactive visualization framework for large geo-spatial data. This framework has been developed by NVIDIA Advanced Rendering Center for the oil and gas (Hydrocarbone) industry. The Cuda based application is running on the cloud at interactive frame-rates. The visualization is remote on clients in a browser, including tablets. The scalable visualization framework can handle terra bytes of.

  Back
 
Keywords:
Visualization, GTC 2012 - ID S2507
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2017 NVIDIA Corporation Legal Info | Privacy Policy