SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Acoustics & Audio Processing
Presentation
Media
Exploring Recognition Network Representations for Efficient Speech Inference on the GPU
Jike Chong
We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference e ...Read More

We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference engine using the simpler LLM representation evaluates 22x more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4x faster evaluation and 53-65x faster operands gathering for each state transition. We illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel GPUs.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2010 - ID P10C01
Download:
 
Efficient Automatic Speech Recognition on the GPU
Jike Chong
Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be explo ...Read More

Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be exploited on the GPU. However, the state-of-art ASR algorithm involves a highly parallel graph traversal on an irregular graph with millions of states and arcs, making efficient parallel implementations highly challenging. We present four generalizable techniques including: dynamic data-gather buffer, find-unique, lock-free data structures using atomics, and hybrid global/local task queues. When used together, these techniques can effectively resolve ASR implementation challenges on an NVIDIA GPU.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2010 - ID P10C02
Download:
Astronomy & Astrophysics
Presentation
Media
Black Holes in Galactic Nuclei Simulated with Large GPU Clusters in CAS
Rainer Spurzem
- National Astronomical Obersvatories, Chinese Academy of Sciences
Many, if not all galaxies harbour supermassive black holes. If galaxies merge, which is quite common in the process of hierarchical structure formation in the universe, ...Read More
Many, if not all galaxies harbour supermassive black holes. If galaxies merge, which is quite common in the process of hierarchical structure formation in the universe, their black holes sink to the centre of the merger remnant and form a tight binary. Depending on initial conditions and time supermassive black hole binaries are prominent gravitational wave sources, if they ultimately come close together and coalesce. We model such systems as gravitating N-body systems (stars) with two or more massive bodies (black holes), including if necessary relativistic corrections to the classical Newtonian gravitational forces (Kupi et al. 2006, Berentzen et al.2009).  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2010 - ID P10B01
Download:
Computational Fluid Dynamics
Presentation
Media
High-Order Unstructured Compressible Flow Solver on the GPU
Patrice Castonguay
- Stanford University
The objective of this project is to develop a scalable and efficient high-order unstructured compressible flow solver for GPUs. ...Read More
The objective of this project is to develop a scalable and efficient high-order unstructured compressible flow solver for GPUs. The solver allows the achievement of arbitrary order of accuracy for flows over complex geometries. High-order solvers require more operations per degree of freedom, thus making them highly suitable for massively parallel processors. Preliminary results indicate speed-ups up to 70x with the Tesla C1060 compared to the Intel i7 CPU. Memory access was optimized using shared and texture memory.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2010 - ID P10D01
Download:
 
Parallel 3D Geometric Multigrid Solver on GPU Clusters
Dana Jacobsen
- Boise State University
An investigation of the performance and scalability of a multigrid pressure Poisson equation solver running on a GPU cluster. ...Read More
An investigation of the performance and scalability of a multigrid pressure Poisson equation solver running on a GPU cluster.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2010 - ID P10D02
Download:
 
Acceleration of mesh-free CFD using CUDA
Gilles Civario
- Irish Centre for High-End Computing
In this work, the acceleration of a mesh-free Computational Fluid Dynamics (CFD) code is performed using CUDA. ...Read More
In this work, the acceleration of a mesh-free Computational Fluid Dynamics (CFD) code is performed using CUDA. The poster gives an overview of the CUDA implementation strategy and the resulting performance increase.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2010 - ID P10D03
Download:
 
Airblast Modelling on Multiple Tesla units
Sean Lovett
- University of Cambridge
We used NVIDIA Tesla GPUs to accelerate the solution of hyperbolic partial differential equations, with application to modelling airblast generated by industrial bench mining operations. ...Read More
We used NVIDIA Tesla GPUs to accelerate the solution of hyperbolic partial differential equations, with application to modelling airblast generated by industrial bench mining operations. Parallelisation over multiple GPUs was achieved using MPI.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2010 - ID P10D04
Download:
 
mplementation of High-Order Adaptive CFD Methods on GPGPUs
Z.J. Wang
- Iowa State University
This poster describes our implementation of adaptive high-order CFD methods on GPUs. A speedup factor of up to 44 has been achieved for 2D flow problems. ...Read More
This poster describes our implementation of adaptive high-order CFD methods on GPUs. A speedup factor of up to 44 has been achieved for 2D flow problems.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2010 - ID P10D05
Download:
 
Computational Fluid Dynamics on GPU
Long Wang
- Supercomputing Center, Chinese Academy of Sciences
Computational Fluid Dynamics, an important branch in HPC field, has a history of seeking and requiring higher computational performance. ...Read More
Computational Fluid Dynamics, an important branch in HPC field, has a history of seeking and requiring higher computational performance. The traditional way to satisfy this quest is to use faster machines or supercomputers. Yet these approaches seem inconvenient and costly to many individual researchers. We investigated the use of GPU to accelerate CFD codes and tested the performances on CUDA and OpenCL platform. We have ported 2D cave flow, 2D Riemann, and 2D flow over a RAE2882 airfoil to the GPU and explored some GPU-specific optimization strategies. In most cases, approximately 16 to 63 x speed up can be achieved.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2010 - ID P10D06
Download:
Computer Graphics
Presentation
Media
Dynamic and Implicit Trees for Graphics and Visualization on the GPU
Nathan Andrysco
- Purdue University
We propose a new way to represent trees that allows for faster algorithms, that are simple to implement (especially on the GPU), and with a lower memory overhead than previous approaches. ...Read More
We propose a new way to represent trees that allows for faster algorithms, that are simple to implement (especially on the GPU), and with a lower memory overhead than previous approaches. Using our data structure, we have seen significant improvements in both volume ray casting and ray tracing applications over previous state-of-the-art methods.  Back
 
Keywords:
Computer Graphics, GTC 2010 - ID P10E01
Download:
 
Fragment-Parallel Composite and Filter
Anjul Patney
- University of California, Davis
In this poster, we describe our recent work in the area of programmable graphics pipelines by presenting a fragment-parallel formulation of an A-buffer-style composite and filter equation, and describe its implementation on a modern GPU. ...Read More
In this poster, we describe our recent work in the area of programmable graphics pipelines by presenting a fragment-parallel formulation of an A-buffer-style composite and filter equation, and describe its implementation on a modern GPU.  Back
 
Keywords:
Computer Graphics, GTC 2010 - ID P10E02
Download:
Computer Vision & Machine Vision
Presentation
Media
Architecture Aware Design for a Parallel Object Recognition System
Bor-Yiing Su
- University of California, Berkeley
We have developed a parallel object recognition system using CUDA, achieving 70x-80x speedup against the original serial implementation. ...Read More
We have developed a parallel object recognition system using CUDA, achieving 70x-80x speedup against the original serial implementation. In order to optimize our implementation, we evaluated the performance of different parallelization strategies on some key computations in the object recognition system. Finally we concluded that the parallel implementation performance is sensitive to input data properties. Therefore, we should dynamically adjust the parallelization strategy at runtime to optimize key computations.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F01
Download:
 
Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow
Narayanan Sundaram
- University of California, Berkeley
In this poster we discuss a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. ...Read More
In this poster we discuss a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. The parallel implementation of large displacement optical flow runs about 78x faster than the serial C++ version. We use this implementation is a point tracking application. Our resulting technique tracks up to three orders of magnitude more points and is 46% more accurate than the Kanade-Lucas-Tomasi tracker. Compared to the Particle Video tracker, we achieve 66% better accuracy while retaining the ability to handle large displacements while running an order of magnitude faster.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F02
Download:
 
Visual Cortex on a Chip: Large-scale, Real-Time Functional Models of Visual Cortex on a GPGPU
Steven Brumby
- Los Alamos National Laboratory
Los Alamos National Laboratory''s Petascale Synthetic Visual Cognition project is exploring full-scale, real-time functional models of human visual cortex to understand how human vision achieves its accuracy, robustness and speed. ...Read More
Los Alamos National Laboratory''s Petascale Synthetic Visual Cognition project is exploring full-scale, real-time functional models of human visual cortex to understand how human vision achieves its accuracy, robustness and speed. Commercial-off-the-shelf hardware to support this modeling is rapidly improving, e.g., a teraflop GPGPU card costs ~$500 and is ~size of mouse cortex. We present results demonstrating image classification on UAV aerial video with a visual cortex model running on a 240-core NVIDIA GeForce GTX285, and see >x10 speed-up. As this technology continues to improve, cortical modeling on GPGPU devices has the potential to revolutionize computer vision.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F03
Download:
 
Fermi in Action: Robust Background Subtraction for Real-time Video Analysis
Melvin Wong
- Institute for Infocomm Research
Background subtraction is one of the important image processing steps for video surveillance and many computer vision problems such as tracking & recognition. ...Read More
Background subtraction is one of the important image processing steps for video surveillance and many computer vision problems such as tracking & recognition. However, robust background subtraction that adapts well to variable environment changes is highly computational and consumed large amount of memory. Thus, its practical application is often limited. Here, we aimed to expand its usage and tackle vision problems that requires high frame rate camera such as real-time sports analysis, real-time object detection and recognition. Using recent advances in accelerator hardware - NVIDIA Fermi Architecture and taking advantage of heterogeneous computing , we are able to gain good performance that allows to use in these practical applications.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F04
Download:
 
Bridging Neuroscience and GPU Computing to Build General Purpose Computer Vision
Nicolas Pinto
- Massachusetts Institute of Technology
The construction of artificial vision systems and the study of biological vision are naturally intertwined as they represent simultaneous efforts to forward- and reverse-engineer systems with similar goals. ...Read More
The construction of artificial vision systems and the study of biological vision are naturally intertwined as they represent simultaneous efforts to forward- and reverse-engineer systems with similar goals. Here, we present a high-throughput approach to more expansively explore biologically-inspired models by leveraging GPUs. We show that this approach can yield significant gains in performance on object and face recognition (including "Labeled Faces in the Wild" challenge and faces from Facebook), consistently outperforming the state-of-the-art. We highlight how the application of flexible programming tools, such as high-level scripting, template metaprogramming/auto-tuning, can enable large performance gains, while managing complexity for the developer.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F05
Download:
 
CUDA for Vision and Imaging Library
Salman Ul Haq
- TunaCode
CUVI Lib (CUDA for Vision and Imaging Library) is a software library that provides a set of GPU accelerated computer vision and image processing functions. ...Read More
CUVI Lib (CUDA for Vision and Imaging Library) is a software library that provides a set of GPU accelerated computer vision and image processing functions. CUVI can both be utilized as an add-on library for the NVIDIA''s NPP (NVIDIA Performance Primitives) as it compliments the functionality present in NPP as well as it can be used as a standalone library ready to be plugged into end-user C/C++ applications.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F06
Download:
 
GPU-Friendly Multi-View Stereo Reconstruction Using Surfel Representation and Graph Cuts
In Kyu Park
- Inha University
We present a new surfel (surface element) based multi-view stereo algorithm which runs entirely on GPU. ...Read More
We present a new surfel (surface element) based multi-view stereo algorithm which runs entirely on GPU. We utilize flexibility of surfel-based 3D shape representation and global optimization by graph cuts in a same framework.The orientation of the constructed surfel candidates imposes an effective constraint that reduces the effect of the minimal surface bias. The entire processing pipeline is implemented on the latest GPU to speed up the processing significantly. Experimental results show that the proposed approach reconstructs the 3D shape of an object accurately and efficiently, which runs more than 100 times faster than on CPU.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F07
Download:
 
CUDA Accelerated Face Recognition
Jayadeep Vijayan
- NeST Software
A GPU based implementation of a face recognition solution using PCA with Eigenfaces algorithm. ...Read More
A GPU based implementation of a face recognition solution using PCA with Eigenfaces algorithm.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F08
Download:
 
GPU Driven Dense Reconstruction for Community Photo Collections
Jan-Michael Frahm
- University of North Carolina, Chapel Hill
We present a system to reconstruct dense 3D models from community photo collections. First images are described using GIST and are clustered using hamming distances. ...Read More
We present a system to reconstruct dense 3D models from community photo collections. First images are described using GIST and are clustered using hamming distances. Each of these clusters is geometrically verified and connected using Geotags. Connected clusters are bundle adjusted and the obtained registration is used to estimate depthmaps that are finally fused to obtain dense 3D models. Each of the above steps, except Bundle Adjustment, is implemented in CUDA and runs on multiple GPUs . The performance of our pipeline is two order of magnitude faster on one order more images compared to state of the art method.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F09
Download:
 
Portable Central Vision Enhancement System for Macular Degeneration Patients
Chloe Vaniet
- Imperial College London
Vision enhancement systems is an alternative visual aid device to enhance the remaining vision for visual impairment subjects. ...Read More
Vision enhancement systems is an alternative visual aid device to enhance the remaining vision for visual impairment subjects. Our aim is to develop a mobile central vision enhancement system for macular degeneration patients. Three different types of enhancement algorithms have been developed and their efficiency was tested on low vision patients. These three algorithms have been implemented on a portable low power devic. The Nvidia system-on-a-chip Tegra has been chosen for this implementation.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F10
Download:
 
Dense Stereo Vision on GPU
Esubalew Bekele
- Universal Robotics Inc.
A dense stereo vision for a material handling dual-arm industrial robot have been implemented with the Rectification, Stereo Correspondence and 3D Pose from depth are ported out to GPU using CUDA. ...Read More
A dense stereo vision for a material handling dual-arm industrial robot have been implemented with the Rectification, Stereo Correspondence and 3D Pose from depth are ported out to GPU using CUDA.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F11
Download:
 
Upsampling Range Data in Dynamic Environments
Hendrik Dahlkamp
- Stanford University
We present a flexible, parallelized method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. ...Read More
We present a flexible, parallelized method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. Our system takes as input a sequence of monocular camera images as well as a stream of sparse range measurements as obtained from a laser or other sensor system. Our method produces a dense, high-resolution depth map of the scene, automatically generating confidence values for every interpolated depth point. We describe how to integrate priors on object shape, motion and appearance and how to achieve an efficient implementation using parallel processing hardware such as GPUs.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F12
Download:
 
GPU Accelerated Marker-less Motion Capture
Varun Ganapathi
- Stanford University
In this work, we derive an efficient filtering algorithm for tracking human pose at 4-10 frames per second using a stream of monocular depth images. ...Read More
In this work, we derive an efficient filtering algorithm for tracking human pose at 4-10 frames per second using a stream of monocular depth images. The key idea is to combine an accurate generative model-which is achievable in this setting using state of the art GPU hardware-with a discriminative model that feeds data-driven evidence about body part locations. We describe a novel algorithm for propagating the noisy evidence about body part locations up the kinematic chain using the unscented transform.We provide extensive experimental results on 28 real-world sequences using automatic ground-truth annotations from a commercial motion capture system.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F13
Download:
 
3D Facial Feature Modeling with Active Appearance Models
Tim Llewellynn
- nViso / EPFL
Active Appearance Models (AAM) is a powerful tool for modeling and matching objects under shape deformations and texture variations. ...Read More
Active Appearance Models (AAM) is a powerful tool for modeling and matching objects under shape deformations and texture variations. It learns characteristics of objects by building a compact statistical model from applying Principal Component Analysis (PCA) to a set of labeled data. Although AAM has been widely applied in the fields of computer vision, due to its flexible framework, it still cannot satisfy the requirement of real-time situations. To alleviate this problem, we address the computational complexity of the fitting procedure by running the AAM optimization algorithm on a GPU using a hybrid CPU / GPU block processing architecture.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F14
Download:
 
OpenCV on GPU
Anatoly Baksheev
- ITEEZ
OpenCV is a free open source library of computer vision algorithms. Recently a new module consisting of functions implemented on GPU was introduced in OpenCV. ...Read More
OpenCV is a free open source library of computer vision algorithms. Recently a new module consisting of functions implemented on GPU was introduced in OpenCV. It consists of several methods for calculating stereo correspondence between two images that is used to reconstruct a 3D scene. A simple block-matching algorithm works up to 10x faster compared to a CPU implementation in OpenCV providing real-time processing of HD stereo pairs on Tesla cards. Belief propagation-based algorithms show 20-50x speedup compared to a CPU implementation.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2010 - ID P10F15
Download:
Databases, Data Mining, Business Intelligence
Presentation
Media
Speculative Query Processing
Peter Volk
With an increasing amount of data and user demands for fast query processing, the optimization of database operations continues to be a challenging task. A common optimization method is to leverage parallel hardware architectures. With the intro ...Read More

With an increasing amount of data and user demands for fast query processing, the optimization of database operations continues to be a challenging task. A common optimization method is to leverage parallel hardware architectures. With the introduction of general-purpose GPU computing, massively parallel hardware has become available within commodity hardware. To efficiently exploit this technology, we introduce the method of speculative query processing. This speculative query processing works on index structures to efficiently support heavily used database operations. To show the benefits and opportunities of our approach, we present a fine and coarse grain implementation for multidimensional queries.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2010 - ID P10G02
Download:
 
Virtual Local Stores
Henry Cook
We propose a mechanism to provide the benefits of a software-managed memory hierarchy on top of a hierarchy of hardware-managed caches. A virtual local store (VLS) is mapped into the virtual address space of a process and backed by physical main ...Read More

We propose a mechanism to provide the benefits of a software-managed memory hierarchy on top of a hierarchy of hardware-managed caches. A virtual local store (VLS) is mapped into the virtual address space of a process and backed by physical main memory, but is stored in a partition of the hardware-managed cache when active. This reduces context switch cost, and allows VLSs to migrate with their process thread. The partition allocated to the VLS can be rapidly reconfigured without flushing the cache, allowing programmers to selectively use VLS in a library routine with low overhead.

  Back
 
Keywords:
Databases, Data Mining, Business Intelligence, GTC 2010 - ID P10G03
Download:
Developer - Algorithms
Presentation
Media
Accelerating Symbolic Computations on NVIDIA Fermi
Pavel Emeliyanenko
- Max-Planck Institute for Informatics
We present the first implementation of a complete modular resultant algorithm on the graphics hardware. ...Read More
We present the first implementation of a complete modular resultant algorithm on the graphics hardware. Our recent developments taking advantage of new NVidia Fermi GPU architecture and instruction set allowed us to achieve about 150x speed-up over a modular resultant algorithm from Maple 13.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A02
Download:
 
Particle-In-Cell Simulations on the GPU
Hartmut Ruhl
- Ludwig-Maximilians-University
Particle-In-Cell simulations represent an important technique in the field of kinetic plasma simulations. ...Read More
Particle-In-Cell simulations represent an important technique in the field of kinetic plasma simulations. 2D particle pushing and conserved current aggregation has been implemented in CUDA. On a TESLA C1060 the CUDA code is 4 times faster than SSE2 optimized code on a quad core INTEL XEON processor.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A03
Download:
 
Parallel Ant Colony Optimization with CUDA
Octavian Nitica
- University of Delaware
The Ant Colony Optimization (ACO) Algorithm is a metaheuristic that is used to find shortest paths in graphs. ...Read More
The Ant Colony Optimization (ACO) Algorithm is a metaheuristic that is used to find shortest paths in graphs. By using CUDA to implement an ACO algorithm, we achieved significant improvement in performance over a highly-tuned sequential CPU implementation. The construction step of the ACO algorithm consists of each ant creating an independent solution, and this step is where most of the computation is spent. Since the construction step is the same for most ACO variations, parallelizing this step will also allow for easy adaptation to different pheromone updating functions. Currently, our research tests this hypothesis on the travelling salesmen problem.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A04
Download:
 
High Performance and Scalable Radix Sorting for GPU Stream Architectures
Duane Merrill
- University of Virginia
The need to rank and order data is pervasive, and sorting operations are fundamental to many algorithms. ...Read More
The need to rank and order data is pervasive, and sorting operations are fundamental to many algorithms. This poster presents a very efficient method for sorting large sequences of fixed-length keys (and values) using GPU stream processors. Compared to the state-of-the-art, our implementation demonstrates multiple factors of speedup (up to 3.8x) for all NVIDIA GPGPUs. For this domain of sorting problems, we believe our sorting primitive to be the fastest available for any fully-programmable microarchitecture: our stock NVIDIA GTX480 sorting results exceed the 1G keys/sec average sorting rate (i.e., one billion 32-bit keys sorted per second).  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A05
Download:
 
Task Management for Irregular Workloads on the GPU
Stanley Tzeng
- University of California, Davis
We explore software mechanisms for managing irregular tasks on graphics processing units. Traditional GPU programming guidelines teaches us how to efficiently program the GPU for data parallel pipelines with regular input and output. ...Read More
We explore software mechanisms for managing irregular tasks on graphics processing units. Traditional GPU programming guidelines teaches us how to efficiently program the GPU for data parallel pipelines with regular input and output. We present a strategy for solving task parallel pipelines which can handle irregular workloads on the GPU. We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We showcase our results on a real time Reyes rendering pipeline.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A06
Download:
 
A Hybrid Method for Solving Tridiagonal Systems on GPU
Yao Zhang
- University of California, Davis
Tridiagonal linear systems are of importance to many problems in numerical analysis and computational fluid dynamics, as well as to computer graphics applications in video games and computer-animated films. ...Read More
Tridiagonal linear systems are of importance to many problems in numerical analysis and computational fluid dynamics, as well as to computer graphics applications in video games and computer-animated films. This poster presents our study on the performance of multiple tridiagonal algorithms on a GPU. We design a novel hybrid algorithm that combines a work-efficient algorithm with a step-efficient algorithm in a way well-suited for a GPU architecture. Our hybrid solver achieves 8x and 2x speedup respectively in single precision and double precision over a multi-threaded highly-optimized CPU solver and a 2x speedup over a basic GPU solver.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A07
Download:
 
Development of Desktop Computing Applications and Engineering Tools on GPUs
Hans Henrik B. Soerensen
- Technical University of Denmark
A GPU competence center and laboratory for research and collaboration within academia and partners in industry has been established in 2008 at section for Scientific Computing, DTU informatics, Technical University of Denmark. ...Read More
A GPU competence center and laboratory for research and collaboration within academia and partners in industry has been established in 2008 at section for Scientific Computing, DTU informatics, Technical University of Denmark. In GPULab we focus on the utilization of GPUs for high-performance computing applications and software tools in science and engineering, inverse problems, visualization, imaging, dynamic optimization. This poster illustrates the latest and most interesting projects that have been developed at our center.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A08
Download:
 
Ballot Counting for Optimal Binary Prefix Sum
David Whittaker
- University of Alabama at Birmingham
This poster describes a new technique for performing binary prefix sums using Fermi''s new __ballot() and __popc() functions. ...Read More
This poster describes a new technique for performing binary prefix sums using Fermi''s new __ballot() and __popc() functions. These instructions greatly increase intra-warp communication, allowing for an 80% speedup over standard GPU methods in applications like Radix Sort. It also points to future research that will enable suffix array construction, Burrows-Wheeler Transform, and the BZIP algorithm to take advantage of these instructions for efficient GPU compression.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A09
Download:
 
Deriving Parallelism and GPU Acceleration of Algorithms with Inter-Dependent Data Fields
James Malcolm
- Accelereyes
This poster presents an approach to derive parallelism in algorithms that involve building sparse matrix that represents relationships between inter-dependent data fields and enhancing its performance on the GPU. ...Read More
This poster presents an approach to derive parallelism in algorithms that involve building sparse matrix that represents relationships between inter-dependent data fields and enhancing its performance on the GPU. This work compares the algorithm performance on the GPU to its CPU variant that employs the traditional sparse matrix-vector multiplication (SpMV) approach. We have also compared our algorithm performance with CUSP SpMV on GPU. The softwares used in this work are MATLAB and Jacket - GPU engine for MATLAB  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A10
Download:
 
Parallelizing the Particle Level Set Method
Wen Zheng
- Stanford University
The particle level set is widely used as an accurate interface tracking tool in simulation, computer vision and other related fields. ...Read More
The particle level set is widely used as an accurate interface tracking tool in simulation, computer vision and other related fields. However, high computation cost prevents applying this method to real-time and interactive scenarios. This work intensively used parallel design patterns that are implemented in the thrust library, like compaction, reduction and scattering, to parallelize the particle level set method in order to attain real-time performance.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A11
Download:
 
Accelerating Cuda Graph Algorithms at Maximum Warp
Sungpack Hong
- Stanford University
Graphs are powerful data representations favored in many computational domains. GPUs have showed promising results in this domain, but their performance when the graph is highly irregular. ...Read More
Graphs are powerful data representations favored in many computational domains. GPUs have showed promising results in this domain, but their performance when the graph is highly irregular. In this study, we propose three general schemes to accelerate graph algorithms on a modern GPU architecture: (i) deferred processing of outliers, (ii) efficient dynamic workload balancing and (iii) warp-based execution exploiting threads in a SIMD-like manner. Our evaluation reveals that our schemes exhibit up to 9x speedup over previous GPU algorithms and 23x over single CPU execution on irregular graphs.They also yield up to 30% improvement,even for regular graphs  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A12
Download:
 
Implementation of Adaptive Cross Approximation on NVIDIA GPUs
Daniel Faircloth
- Georgia Tech Research Institute
The Method of Moments is a popular computational method for solving integral equations in electromagnetics. ...Read More
The Method of Moments is a popular computational method for solving integral equations in electromagnetics. However, it suffers from high computational and memory costs since it requires the solution of a dense linear system. The Adaptive Cross Approximation (ACA) is an effective technique for compressing the system matrix thereby reducing the necessary storage as well as the number of operations required to solve the system. Acceleration of the ACA MoM with NVIDIA GPUs can finally enable the solution of "real world" scattering problems on a personal workstation in a practical timeframe.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A13
Download:
 
A GPU Accelerated Continuous-based Discrete Element Method for Elastodynamics Analysis
Zhaosong Ma
- Institute of Mechanics, Chinese Academy of Sciences
The Continuum-based Distinct Element Method (CDEM) is the combination of Finite Element Method (FEM) and Discrete Element Method (DEM), which is mainly used in general structural analyses, as well as landslide stability evaluations, ...Read More
The Continuum-based Distinct Element Method (CDEM) is the combination of Finite Element Method (FEM) and Discrete Element Method (DEM), which is mainly used in general structural analyses, as well as landslide stability evaluations, coal and gas outburst analyses. By means of CUDA and a GTX-285 VGA card, the GPU version achieves hundreds times speedup ratio.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A14
Download:
 
GPU Algorithms for NURBS Minimum Distance and Clearance Computations
Adarsh Krishnamurthy
- University of California, Berkeley
We present GPU algorithms and strategies for accelerating distance queries and clearance computations on models made of trimmed NURBS surfaces. ...Read More
We present GPU algorithms and strategies for accelerating distance queries and clearance computations on models made of trimmed NURBS surfaces. We provide a generalized framework for using GPUs as co-processors in accelerating CAD operations. The accuracy of our algorithm is based on the model space precision, unlike earlier graphics algorithms that were based only on image space precision. Our algorithms are at least an order of magnitude faster and about two orders of magnitude more accurate than the commercial solid modeling kernel ACIS.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A15
Download:
 
Gate-Level Simulation with GP-GPUs
Debapriya Chatterjee
- University of Michigan
This poster describes my research work on how to leverage the GP-GPU execution parallelism to achieve high performance in the time consuming problem of gate-level simulation of digital hardware designs. ...Read More
This poster describes my research work on how to leverage the GP-GPU execution parallelism to achieve high performance in the time consuming problem of gate-level simulation of digital hardware designs.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A16
Download:
 
CUDA Implemenation of Barrier Option Valuation using Jump-Diffusion Model and Browning Bridge
Vincent Natoli
- Stone Ridge Technology
Impressive speedups up to 100x using GPUs compared to CPUs are achieved by taking advantage data parallelism, increased bandwidth and the ability to hide latency. ...Read More
Impressive speedups up to 100x using GPUs compared to CPUs are achieved by taking advantage data parallelism, increased bandwidth and the ability to hide latency. We have implemented a Monte Carlo valuation of a barrier option modeled by a standard diffusion process with a jump diffusion term obeying an underlying Poisson process to account for rare events. In addition, a Brownian Bridge is incorporated to account for barrier crossings in between diffusion trajectories and to reduce bias. This option is representative of exotic options which lack a closed-form solution and are amenable to Monte Carlo type methods for valuation.  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A17
Download:
Developer - Programming Languages
Presentation
Media
GPU-to-CPU Callbacks
Jeff Stuart
- University of California, Davis
Our poster outlines GPU-to-CPU callbacks, a method for the GPU to request work from the CPU. We give some motivation, demonstrate the code architecture, and give samples of CPU and GPU code that show callbacks being executed. ...Read More
Our poster outlines GPU-to-CPU callbacks, a method for the GPU to request work from the CPU. We give some motivation, demonstrate the code architecture, and give samples of CPU and GPU code that show callbacks being executed.  Back
 
Keywords:
Developer - Programming Languages, GTC 2010 - ID P10P01
Download:
 
A Speech Recognition Application Framework for Highly Parallel Implementations on the GPU
Jike Chong
- Parasians, LLC
Data layout, data placement, and synchronization processes are not usually part of a speech application expert''s daily concerns. ...Read More
Data layout, data placement, and synchronization processes are not usually part of a speech application expert''s daily concerns. Yet failure to carefully take these concerns into account in a highly parallel implementation on the graphics processing units (GPU) could mean an order of magnitude of loss in application performance. We present an application framework for parallel programming of automatic speech recognition (ASR) applications that allows a speech application expert to effectively implement speech applications on the GPU, and demonstrate how the ASR application framework has enabled a Matlab/Java programmer to achieve a 20x speedup in application performance on a GPU.  Back
 
Keywords:
Developer - Programming Languages, GTC 2010 - ID P10R01
Download:
 
Scalable Computer Vision Applications
Rami Mukhtar
- NICTA
We are developing a domain specific language for computer vision algorithms that facilitates rapid implementation of algorithms that are scalable and portable across CPU-GPU architectures. ...Read More
We are developing a domain specific language for computer vision algorithms that facilitates rapid implementation of algorithms that are scalable and portable across CPU-GPU architectures. The presented approach significantly lowers the barrier of implementation of computer vision algorithms for heterogeneous CPU-GPU architectures, and enables a single implementation to automatically scale to use additional hardware as it becomes available.  Back
 
Keywords:
Developer - Programming Languages, GTC 2010 - ID P10R02
Download:
 
Language and Compiler Extensions for Heterogeneous Computing
Albert Sidelnik
- University of Illinois at Urbana-Champaign
GPGPU architectures offer large performance gains over their traditional CPU counterparts for many applications. ...Read More
GPGPU architectures offer large performance gains over their traditional CPU counterparts for many applications. However, current GPU programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and performance optimization challenges. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPUs using the high productivity programming language Chapel.  Back
 
Keywords:
Developer - Programming Languages, GTC 2010 - ID P10R03
Download:
Developer - Tools & Libraries
Presentation
Media
Mint: An OpenMP to CUDA Translator
Didem Unat
- University of California, San Diego
We aim to facilitate GPU programming for finite difference applications. We have developed Mint, a source to source compiler to generate CUDA code from OpenMP code. ...Read More
We aim to facilitate GPU programming for finite difference applications. We have developed Mint, a source to source compiler to generate CUDA code from OpenMP code. Mint transforms omp parallel for loops into CUDA kernels and applies domain specific optimizations such as shared memory, register and kernel fuse optimizations. Since our translator targets structured grid problems, it optimizes the code better than the general purpose compilers. In this poster, we present translation and optimization steps along with our initial performance results.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2010 - ID P10U01
Download:
 
Real-Time Particle Simulation in the Blender Game Engine with OpenCL
Ian Johnson
- Florida State University
The goal of this project is to produce interactive scientific visualizations that can be used in educational games. ...Read More
The goal of this project is to produce interactive scientific visualizations that can be used in educational games. We use the computational power of OpenCL to enable features in the Blender Game Engine that would otherwise not be possible in real-time. By adding an interactive particle system to the game engine, we set the stage to demonstrate many interesting scientific phenomena (molecular dynamics, fluid dynamics, statistics) with the added benefit of real-time special effects for games in general.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2010 - ID P10U02
Download:
 
GStream: A General-Purpose Data Streaming Framework on GPU Clusters
Yongpeng Zhang
- North Carolina State University
In this poster, we propose GStream, a general-purpose, scalable data streaming framework on GPUs. The contributions of GStream are as follows: (1) We provide powerful, yet concise language abstractions suitable to describe conventional algorithms ...Read More
In this poster, we propose GStream, a general-purpose, scalable data streaming framework on GPUs. The contributions of GStream are as follows: (1) We provide powerful, yet concise language abstractions suitable to describe conventional algorithms as streaming problems. (2) We project these abstraction onto GPUs to fully exploit their inherent massive data- parallelism. (3) We demonstrate the viability of streaming on accelerators. Experiments show that the proposed framework provides flexibility, programmability and performance gains for various benchmarks from a variety of domains, including but not limited to data streaming, data parallel problems, numerical codes and text search.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2010 - ID P10U03
Download:
 
NukadaFFT : An Auto-Tuning FFT Library for CUDA GPUs
Akira Nukada
- Tokyo Institute of Technology
We have released our FFT library for CUDA GPUs. Most of algorithms and auto-tuning technologies of FFT for CUDA are already published. ...Read More
We have released our FFT library for CUDA GPUs. Most of algorithms and auto-tuning technologies of FFT for CUDA are already published. The library now supports new Fermi architecture and works with CUDA 3.0 or later.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2010 - ID P10U04
Download:
Embedded & Automotive
Presentation
Media
Driver Assistance: Speed-Limit Sign Recognition on the GPU
Vladimir Glavtchev
- BMW
We investigate the use of differentGPU-based implementations for performing real-time speed limit sign recognition on a resource-constrainedembedded system. ...Read More
We investigate the use of differentGPU-based implementations for performing real-time speed limit sign recognition on a resource-constrainedembedded system. The system recognized US and European Union speed-limits at over 88% accuracy while running in real-time. The system is hardware-accelerated using CUDA and OpenGL. It introduces a novel technique for detecting speed-limit signs which is only possible with the aid of GPU processing.  Back
 
Keywords:
Embedded & Automotive, GTC 2010 - ID P10H01
Download:
 
Complex Automotive Applications
Marius Vasiliu
- University of Paris Sud
NVIDIA GPU architecture becomes a very interesting hardware target for complex automotive application. ...Read More
NVIDIA GPU architecture becomes a very interesting hardware target for complex automotive application. We implemented the same automotive application on several different hardware targets and analyzed the maximum frame rate and the effective CPU charge. This paper shows how real-time applications like pedestrian detection and driving assistance take benefits from a massively parallel "central" architecture like GPU/CUDA. Real-time performance and zero-delay transfers can be achieved using a full asynchronous implementation. The same approach can really multiply the application performance by the number of GPU devices present on the embedded system, at a reasonable power consumption.  Back
 
Keywords:
Embedded & Automotive, GTC 2010 - ID P10H02
Download:
High Performance Computing
Presentation
Media
A GPU-based Architecture for Real-Time Data Assessment at Synchrotron Experiments
Suren Chilingaryan
- Karlsruhe Institute of Technology
Modern X-ray imaging cameras provide millions of pixels and several thousand frames per second. To process such an amount of information we have optimized the reconstruction software employed at the tomography beamlines of ANKA and ESRF ...Read More
Modern X-ray imaging cameras provide millions of pixels and several thousand frames per second. To process such an amount of information we have optimized the reconstruction software employed at the tomography beamlines of ANKA and ESRF synchrotrons to use the computational power of modern graphic cards. Using GPUs as compute coprocessors we were able to reduce the reconstruction time by a factor 30 and process a typical data set of 20GB in 40 seconds. The time needed for the first evaluation of the reconstructed sample is reduced significantly and quasi real-time visualization is now possible.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I01
Download:
 
Automatic High-Performance GPU code Generation using CUDA-CHiLL
Malik M Khan
- USC/ UoU
This poster presents a system to automatically generate high-performance GPU code starting from an input sequential loop nest computation. ...Read More
This poster presents a system to automatically generate high-performance GPU code starting from an input sequential loop nest computation. The compiler analyzes input computation in C and automatically generates a set of equivalent code variants represented by transformation recipe. These recipes guide the underlying code transformation and generation framework to apply code transformations and ultimately produces CUDA code. We use the system to generate high performing CUDA code for four BLAS functions, matrix transpose and convolution stencils. The results mostly outperform CUBLAS2.2/CUDA_SDK2.2 and naive GPU kernel and can achieve perform up to 435GF(mm) with avg speedup up to 1.78x.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I02
Download:
 
CSIRO Advances in GPU Computing. What could you do with 256 GPUs?
Luke Domanski
- CSIRO
The Commonwealth Scientific and Industrial Research Organisation (CSIRO) is Australia''s national science agency. ...Read More
The Commonwealth Scientific and Industrial Research Organisation (CSIRO) is Australia''s national science agency. CSIRO is currently applying GPU Computing on a scale ranging from single GPU workstations through to their 256 GPU cluster. This poster showcases some of CSIRO''s work in the areas GPU accelerated biological imaging, image deconvolution, synchrotron science and CT reconstruction, and statistical inference in complex environmental models. Speedups of between 8 to 230x have been seen across these applications areas using a broard range of GPU computing platforms.   Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I03
Download:
 
High Performance Agent-Based Simulation with FLAME for the GPU
Paul Richmond
- University of Sheffield
The Flexibile Large-scale Agent Modelling Environment for the GPU (FLAME GPU) addresses the performance and architecture limitations of previous work by presenting a flexible framework approach to ABM on the GPU. ...Read More
The Flexibile Large-scale Agent Modelling Environment for the GPU (FLAME GPU) addresses the performance and architecture limitations of previous work by presenting a flexible framework approach to ABM on the GPU. Most importantly it addresses the issue of agent heterogeneity through the use of state machine based agent representation. This representation allows agents to be separated into associated state lists which are processed in batches to allow very diverse population of agents whilst avoiding large divergence in parallel code kernels. The use of the GPU allows AB models to be visualised in real time, which further widens the application of ABM to real-time simulations.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I04
Download:
 
The Scalable HeterOgeneous Computing (SHOC) Benchmark Suite
Kyle Spafford
- Oak Ridge National Lab
SHOC is a benchmark suite for heterogeneous systems. This poster describes the suite and presents recent performance measurements. ...Read More
SHOC is a benchmark suite for heterogeneous systems. This poster describes the suite and presents recent performance measurements.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I05
Download:
 
HyperFlow: An Efficient Dataflow Architecture for Multi CPU-GPU Systems
Huy Vo
- University of Utah
We propose a new pipeline architecture that can take advantage of the many processing elements available in modern CPU-GPU systems to maximize performance in visualization and computational tasks. ...Read More
We propose a new pipeline architecture that can take advantage of the many processing elements available in modern CPU-GPU systems to maximize performance in visualization and computational tasks. Our architecture is very flexible and allows the construction of classical parallel algorithms such as data streamers and map/reduce templates. We also discuss examples and performance benchmarks that demonstrate the potential of our system.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I06
Download:
 
MPI-CUDA Applications Checkpointing
Nguyen Toan
- Tokyo Institute of Technology
We propose a checkpoint/restart tool for multi-GPU applications such as MPI-CUDA applications ...Read More
We propose a checkpoint/restart tool for multi-GPU applications such as MPI-CUDA applications  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I07
Download:
 
Particle Simulations using DEM on GPUs
Charles Radeke
- University Graz
Particle based numerical methods are an emerging field since the GPU/CUDA technique became widely accepted in the last years. ...Read More
Particle based numerical methods are an emerging field since the GPU/CUDA technique became widely accepted in the last years. 80% of the whole material,used in pharmaceutical technology are powders. Numerical simulations of such material is possible by using the Discrete Element Method (DEM). The main restrictions here is compute power together with the problem size. Only a few ten-thousand particles lead to weeks to months of compute time in order to reflect processes of a few minutes in real time.DEM scales excelent with the massively-parallel CUDA environment, enabling us to access the million particle range in acceptable job runtimes.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I08
Download:
 
Mastering Multi-GPU Computing on a Torus Network
Davide Rossetti
- National Institute of Nuclear Physics
We describe APEnet+, the new generationof our 3D torus network which scales up to tens of thousands of cluster nodes with linear cost. ...Read More
We describe APEnet+, the new generationof our 3D torus network which scales up to tens of thousands of cluster nodes with linear cost. The basic component is a custom PCIe adapter with six high-speed links, designed around a programmable HW component (FPGA), a nice environment for studying integration techniques between GPUs and network interfaces. The highlevel programming model is MPI, while a low-level RDMA API is also available.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I09
Download:
 
Poster: Atmospheric Modelling, Simulation and Visualization using CUDA
Priyanka Sah
- Indian Institute of Technology, Delhi
The Laboratory Meteorological Dynamics (LMD) by CNRS weather model is used extensively for research and weather forecasting purposes. ...Read More
The Laboratory Meteorological Dynamics (LMD) by CNRS weather model is used extensively for research and weather forecasting purposes. Simulation of atmospheric climate is one of the most challenging computational tasks because of its numerical complexity and simulation time. The numerical simulations must be obviously achieved faster than in real time to use them in decision support.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I10
Download:
 
Automatic Program Generation for the Fermi - DFT Transform
Christos Angelopoulos
- Carnegie Mellon University
The goal of SPIRAL is to push the limits of automation in software and hardware development and optimization numerical kernels beyond what is possible with current tools. ...Read More
The goal of SPIRAL is to push the limits of automation in software and hardware development and optimization numerical kernels beyond what is possible with current tools. In this research, we address the problem of an efficient high performance computing platform of libraries automatically generated by a computer forNVIDIA GPU architectures. Spiral generates code that automatically bypasses all the architectural restrictions on GPUs, shared memory bank conflicts, global memory coalescing and pushes code to the limits (maximum number of threads, register pressure, etc.). The procedure of code generation is fast, platform dependent, easy to rewrite and problem adaptable.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I11
Download:
 
Fast N-body Algorithms for Dynamic Problems on the GPU
Qi Hu
- University of Maryland
we present an extension of the earlier algorithm by Gumerov & Duraiswami (J. Comput. Phys., 2008) which adapts the FMM to the GPU, where the data structures are efficiently generated on the GPU as well. ...Read More
we present an extension of the earlier algorithm by Gumerov & Duraiswami (J. Comput. Phys., 2008) which adapts the FMM to the GPU, where the data structures are efficiently generated on the GPU as well. Details and performance on current architectures will be presented.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I12
Download:
 
GPU Acceleration of Cube Calculus Operations
Vamsi Parasa
- Portland State University
In our current work, we present the first massively parallel, GPU accelerated implementation of the Cube Calculus operations for multivalued and binary logic, also called Cube Calculus Machine (CCM). ...Read More
In our current work, we present the first massively parallel, GPU accelerated implementation of the Cube Calculus operations for multivalued and binary logic, also called Cube Calculus Machine (CCM). Substantial speedups upto the order of 85x are achieved using the CUDA enabled nVIDIA Tesla GPU compared to the CPU implementation on a sequential processor.CC is a very efficient and convenient mathematical formalism for representation, processing and synthesis of binary and multivalued logic which has significant applications in logic synthesis, image processing and machine learning. Thus, massive speedups achieved using GPUs are very encouraging to build future parallel VLSI EDA systems   Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I13
Download:
 
An Atomic Tesla
Richard Edgar
- Massachusetts General Hospital
We examined the possibility of using an Atom-based host system to control a Tesla S1070. Our simple benchmarks found that Atom-based systems should be viable for codes with serial portions small enough to make Amdahl''s Law irrelevant. ...Read More
We examined the possibility of using an Atom-based host system to control a Tesla S1070. Our simple benchmarks found that Atom-based systems should be viable for codes with serial portions small enough to make Amdahl''s Law irrelevant. Such systems would have a much lower power draw than ''traditional'' GPU clusters.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I14
Download:
 
ICHEC's GPU Research: Porting of Scientific Application on NVIDIA GPU
Ivan Girotto
- Irish Centre for High-End Computing
ICHEC is the Irish National HPC centre, with a mission to provide both high performance computing resources and expertise for the Irish research community. ...Read More
ICHEC is the Irish National HPC centre, with a mission to provide both high performance computing resources and expertise for the Irish research community. In addition to its core mission of research enablement, ICHEC started in May 2009 an exploratory activity in GPGPU and CUDA programming. Quantum Espresso is an increasingly popular molecular dynamic package, mainly developed by the DEMOCRITOS group in Trieste (IT). PWscf is part of the Qauntum Espresso suite which performs electronic and ionic structure calculations. Interesting part on the porting of PWscf is an high performance [ZD]gemm which execute in parallel between CPU and GPU.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I15
Download:
 
Implementation of Smith-Waterman algorithm in OpenCL for GPUs
Dzmitry Razmyslovich
- Institute of Computer Engineering, University of Heidelberg
In the poster is presented the implementation of Smith-Waterman algorithm done in OpenCL. This implementation is capable of computing similarity indexes between query sequences and a reference sequence with or without sequence alignment paths. ...Read More
In the poster is presented the implementation of Smith-Waterman algorithm done in OpenCL. This implementation is capable of computing similarity indexes between query sequences and a reference sequence with or without sequence alignment paths. In accordance with the requirement for the target application in cancer research the implementation provides processing of very long reference sequences (in the order of millions of nucleotides). Performance compares favorably against CPU, being on the order of 14 - 610 times faster; 4.5 times faster than the Farrar''s implementation. It is also on par with CUDASW++v2.0.1 performance, but with less constraints in sequence length.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I16
Download:
 
Computing Strongly Connected Components in Parallel on CUDA
Milan Ceska
- Masaryk University
The problem of decomposition of a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. ...Read More
The problem of decomposition of a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. We show how existing parallel algorithms can be reformulated in order to be accelerated by NVIDIA CUDA technology. We design a new CUDA-aware procedure for pivot selection and we redesign the parallel algorithms in order to allow for CUDA accelerated computation. We experimentally demonstrate that with a single GTX 280 GPU card we can easily outperform optimal serial CPU algorithm.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I17
Download:
 
A CUDA Runtime Target for the Sequoia Compiler
Michael Bauer
- Stanford University
We describe an implementation of the Sequoia Runtime interface in CUDA that enables the Sequoia compiler to target programs written in Sequoia for single and multiple GPU systems. ...Read More
We describe an implementation of the Sequoia Runtime interface in CUDA that enables the Sequoia compiler to target programs written in Sequoia for single and multiple GPU systems.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I18
Download:
 
GPU Computing for Real-Time Optical Measurement Techniques
Suren Chilingaryan
- Karlsruhe Institute of Technology
Measuring displacement and strains during deformation of advanced materials which are too small, big, compliant, soft or hot are typical scenarios where non-contact techniques are needed. ...Read More
Measuring displacement and strains during deformation of advanced materials which are too small, big, compliant, soft or hot are typical scenarios where non-contact techniques are needed. Using Digital Image Correlation and Tracking, strain can be calculated from a series of consecutive images with sub pixel resolution. However, the image processing is a computation intensive task and can''t be performed in real time using general purpose processors. We implemented 3 stage pipelined architecture: images are loaded, preprocessed using CPU, and correlated on GPUs. Using two GTX295 cards we were able to reach 35 times speedup compared to fastest Core i7 processor.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I19
Download:
 
An MPI/CUDA Implementation of Discontinuous Galerkin Time Domain Method for Maxwell's Equations
Stylianos Dosopoulos
- Ohio State University
We describe an MPI/CUDA approach to solve Maxwell''s equations in time domain by means of an Interior Penalty Discontinuous Galerkin Time Domain Methods and a local time stepping algorithm. ...Read More
We describe an MPI/CUDA approach to solve Maxwell''s equations in time domain by means of an Interior Penalty Discontinuous Galerkin Time Domain Methods and a local time stepping algorithm. We show that MPI/CUDA provides 10x speed up versus MPI/CPU, in double precision. Moreover, we present scalability results and an 85% parallelization efficiency up to 40 GPUs on the Glenn cluster of Ohio Supercomputing Center. Finally, we study an electromagnetic cloaking example for a broad band signal(8-11GHz), to show the potential of our approach to solve real life examples in short simulation times.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I20
Download:
 
Xiaowei Wang
- Institute of Process Engineering, Chinese Academy of Sciences
Mole-8.5 is the first GPGPU supercomputer of petascale using Tesla C2050 in the world, designed and established in April 2010 by Institute of Process Engineering (IPE), Chinese Academy of Sciences. ...Read More
Mole-8.5 is the first GPGPU supercomputer of petascale using Tesla C2050 in the world, designed and established in April 2010 by Institute of Process Engineering (IPE), Chinese Academy of Sciences. A designing philosophy utilizing the similarity between hardware, software and the problems to be solved is embodied, based on the multi-scale method and discrete simulation approaches developed at IPE. With the multi-scale discrete software developed by IPE, Mole-8.5 has already carried out large-scale simulations of high scientific significance covering areas such as chemical engineering, oil exploitation, metallurgy, demonstrating the supercomputer as a paradigm of green computation in innovative architecture.   Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I22
Download:
 
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster
Xianyi Zhang
- Institute of Software, Chinese Academy of Sciences
Linpack is a de facto standard benchmark for supercomputer. We introduce the implementation and tuning technology of Linpack benchmark on IPE Mole-8. ...Read More
Linpack is a de facto standard benchmark for supercomputer. We introduce the implementation and tuning technology of Linpack benchmark on IPE Mole-8.5 Cluster equipped with NVIDA Tesla C2050 (Fermi) GPU, including CPU/GPU overlap, streaming (pipeline) technology and CPU/GPU affinity. As a result, we got 207.3TFlops and IPE Mole-8.5 Cluster ranked No.19 on Top500 June 2010 list. In addition, we analyze the bottleneck of Linpack benchmark on this system.  Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I23
Download:
Life & Material Science
Presentation
Media
Generalized Linear Model (GLM) Based Quantitative Trait Locus (QTL) Analysis
Ali Akoglu
- University of Arizona
Relating Genotype to Phenotype in Complex Environments has been identified as one of the grand challenges of plant sciences. ...Read More
Relating Genotype to Phenotype in Complex Environments has been identified as one of the grand challenges of plant sciences. Under the umbrella of the iPlant Collaborative funded by the Plant Science Cyberinfrastructure Collaborative program of the NSF, our goal is to develop GPU implementation of the General Linear Model (GLM) to statistically link genotype to phenotype and dramatically decrease the execution time for GLM analyses. GPU based highly parallelized Forward Regression stage of the GLM achieved 177x speedup over the Matlab based serial version. Results of this study will enable larger, more intensive genetic mapping analyses to be conducted.  Back
 
Keywords:
Life & Material Science, GTC 2010 - ID P10K01
Download:
 
GPU-REMuSiC: The Implementation of Constrain Multiple Sequence Alignment on Graphics Processing Unit
Chun-Yuan Lin
- Chang Gung University
We implement RE-MuSiC tool on multi-GPUs (called GPU-REMuSiC) with NVIDIA CUDA. By a special model implementation, the DP computation time in GPU-REMuSiC running on single and two GeForce GTX 260 cards achieves more than 75 and 130 speedups ...Read More
We implement RE-MuSiC tool on multi-GPUs (called GPU-REMuSiC) with NVIDIA CUDA. By a special model implementation, the DP computation time in GPU-REMuSiC running on single and two GeForce GTX 260 cards achieves more than 75 and 130 speedups comparing to that in sequential RE-MuSiC running on Intel i7 920 CPU, respectively.  Back
 
Keywords:
Life & Material Science, GTC 2010 - ID P10K02
Download:
 
The Virtual Heart: Working Towards Interactive CUDA Based Simulations of Cardiac Function
Stefano Charissis
- Victor Chang Cardiac Research Institute
Heart disease is the leading cause of death in the developed world. Despite this, our understanding of cardiac dysfunction is limited. ...Read More
Heart disease is the leading cause of death in the developed world. Despite this, our understanding of cardiac dysfunction is limited. Our goal is to create a realistic virtual model of the heart to develop insight into this clinically important problem. The computational complexity of the ''virtual heart'' has been prohibitive until very recently. However, the continued development of massive parallelization using CUDA and GPU technology has now made this a realistic and achievable goal.  Back
 
Keywords:
Life & Material Science, GTC 2010 - ID P10K03
Download:
Machine Learning & Deep Learning
Presentation
Media
CUDA Creatures
Andrew Hershberger
- Stanford University
CUDA Creatures applies parallel algorithms to the iterated Prisoner''s Dilemma, a classic study of the evolution of cooperation. ...Read More
CUDA Creatures applies parallel algorithms to the iterated Prisoner''s Dilemma, a classic study of the evolution of cooperation. We bring interactivity to parameter space exploration by achieving 600x to 800x speedups on GTX 260.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2010 - ID P10L01
Download:
Medical Imaging
Presentation
Media
Real-time Ultrasound Data Processing for Regional Anesthesia Guidance
Stephen Rosenzweig
- Duke University
Ultrasound imaging techniques such as Doppler flow imaging and acoustic radiation force impulse (ARFI) imaging require estimation of velocity or displacement from the received echoes. ...Read More
Ultrasound imaging techniques such as Doppler flow imaging and acoustic radiation force impulse (ARFI) imaging require estimation of velocity or displacement from the received echoes. Real-time processing and display of images allows for real-time guidance of procedures, improving patient safety and efficacy. Using CUDA, the processing code has been implemented in pre-clinical regional anesthesia studies investigating new methods for localizing where fluid is being injected. The computation time has been reduced from 20 minutes to 18 seconds, resulting in the rapid display of dynamic images of the fluid being injected.  Back
 
Keywords:
Medical Imaging, GTC 2010 - ID P10M01
Download:
 
GPU-Accelerated Texture Decompression of Biomedical Image Stacks
Chirantan Ekbote
- Harvard University
Histopathology is the microscopic examination of tissue in order to study the manifestations of disease. ...Read More
Histopathology is the microscopic examination of tissue in order to study the manifestations of disease. High resolutions images are vital for accurate diagnoses and a major obstacle to the use of digital imaging in histopathology has been the inability to display these large images at interactive rates. We have created a tool for interactive visualization of biomedical image stacks using GPU-accelerated on-the-fly texture decompression. The image stacks are compressed using a novel approach custom tailored for the data we are dealing with, i.e. data exhibiting exceptionally high coherence between the slices of each image stack.  Back
 
Keywords:
Medical Imaging, GTC 2010 - ID P10M02
Download:
 
Accelerated Large Scale Spherical Model Forward Solutions for the EEG/MEG using CUDA
Nitin Bangera
- MIND Research Network
The study presented in the poster looks at the utility of a CUDA based approach to improve the computational speed of the spherical model EEG and MEG forward solution for large scale 3-D dipole grid (on order of 1000 and up) and sensor locations ...Read More
The study presented in the poster looks at the utility of a CUDA based approach to improve the computational speed of the spherical model EEG and MEG forward solution for large scale 3-D dipole grid (on order of 1000 and up) and sensor locations (on order of 100 and up). Fast computation of the forward solution is critical in improving the speed of the inverse solution in biosource imaging. The inverse solution gives the location of the epileptogenic foci from the EEG and MEG measurements.  Back
 
Keywords:
Medical Imaging, GTC 2010 - ID P10M03
Download:
 
CUDA Accelerated Real Time Volumetric Cardiac Image Enhancement
Ismayil Guracar
- Siemens Medical Solutons
CUDA enables high data rate real time volumetric cardiac ultrasound image enhancement. Substantial improvements in processing data rate and memory bandwidth demand over a CPU based approach were found with CUDA. ...Read More
CUDA enables high data rate real time volumetric cardiac ultrasound image enhancement. Substantial improvements in processing data rate and memory bandwidth demand over a CPU based approach were found with CUDA.  Back
 
Keywords:
Medical Imaging, GTC 2010 - ID P10M04
Download:
 
Efficient Visualization of Salient Manifolds in Scalar, Vector, and Tensor Fields
Samer Barakat
- Purdue University
Our research focuses on harnessing the massively parallel compute power of the GPU to visually explore complex datasets. ...Read More
Our research focuses on harnessing the massively parallel compute power of the GPU to visually explore complex datasets. We propose adaptive GPU-based approaches that intertwines computation and rendering. Along side we present novel dynamic data structures for the GPU. Our research include the visualization of salient structures in vector fields using LCS, extraction of ridge and valley surfaces from volumetric scalar fields with scale analysis, and efficient volume / surface rendering.  Back
 
Keywords:
Medical Imaging, GTC 2010 - ID P10M05
Download:
 
Highly Parallel Image Reconstruction for Positron Emission Tomography (PET)
Jingyu Cui
- Stanford University
We present a novel method of computing line projection operations required for list-mode ordered-subsets expectation-maximization (OSEM) for fully 3-D PET image reconstruction on a GPU using the CUDA framework. ...Read More
We present a novel method of computing line projection operations required for list-mode ordered-subsets expectation-maximization (OSEM) for fully 3-D PET image reconstruction on a GPU using the CUDA framework. Our method overcomes challenges such as compute thread divergence and exploits GPU capabilities such as shared memory and atomic operations. This new GPU-CUDA implementation is 120X faster than a reference CPU implementation. The image quality is preserved with root mean squared (RMS) deviation between the images generated using the CPU and the GPU being 0.08%, which has negligible effect in typical clinical applications.  Back
 
Keywords:
Medical Imaging, GTC 2010 - ID P10M06
Download:
Molecular Dynamics
Presentation
Media
Energy Evaluation of Rosetta Proteins Using CUDA
Will Kohut
- University of California, Davis
In this poster, we describe preliminary results using CUDA to accelerate the energy evaluation of proteins folded by the Rosetta software suite. ...Read More
In this poster, we describe preliminary results using CUDA to accelerate the energy evaluation of proteins folded by the Rosetta software suite.  Back
 
Keywords:
Molecular Dynamics, GTC 2010 - ID P10N01
Download:
 
GPU Accelerated Molecular Dynamics Algorithms for Soft Matter Systems using HOOMD-Blue
Carolyn Phillips
- University of Michigan
The rheological, thermodynamic, and self-assembly behavior of liquids, colloids, polymers, foams, gels, granular materials and biological systems are often studied in simulation by using coarse-grained models based on molecular dynamics algorithms. ...Read More
The rheological, thermodynamic, and self-assembly behavior of liquids, colloids, polymers, foams, gels, granular materials and biological systems are often studied in simulation by using coarse-grained models based on molecular dynamics algorithms. The open source general purpose particle dynamics code HOOMD-Blue has been expanded to include the simulation techniques and pair potentials used to study this class of problems.  Back
 
Keywords:
Molecular Dynamics, GTC 2010 - ID P10N02
Download:
 
Accelerating Molecular Modeling using GPUs
Wuchun Feng
- Virginia Tech
Computing electrostatic interactions in a biomolecule contributes towards the understanding of its structure and function, e. ...Read More
Computing electrostatic interactions in a biomolecule contributes towards the understanding of its structure and function, e.g., ligand binding, complex formation, and proton transport. However, such calculations on a desktop computer can take on the order of days, or even weeks, to run. Consequently, scientists seek to either reduce the algorithmic complexity, massively accelerate the computation with a GPU, or both. Our approach, based on an analytical linearized Poisson Boltzmann algorithm, delivers a 120-fold speed-up on a GPU (vs. a CPU-optimized -O3 with hand-tuned SSE). When combined with our hierarchical charge partitioning (HCP) multiscale method, however, the delivered speed-up approaches 20,000-fold.  Back
 
Keywords:
Molecular Dynamics, GTC 2010 - ID P10N03
Download:
Neuroscience
Presentation
Media
Distributed Multi-Level Out-of-Core Volume Rendering
Markus Hadwinger
- King Abdullah University of Science and Technology
In neuroscience, scans of brain tissue are acquired using electron microscopy, resulting in extremely high-resolution volume data with sizes of many terabytes. ...Read More
In neuroscience, scans of brain tissue are acquired using electron microscopy, resulting in extremely high-resolution volume data with sizes of many terabytes. To support the work of neurobiologists, interactive exploration of such volumes requires new approaches for distributed out-of-core volume rendering. A major goal of our distributed GPU volume rendering system is to sustain a pixel-to-voxel ratio of about 1:1. This display-aware approach effectively bounds the working set size required for ray-casting, which makes it largely independent of the volume resolution. Currently, our system achieves interactive volume rendering of 43GB and 92GB volumes on 1 to 8 Tesla nodes. Author: Markus Hadwinger (King Abdullah University of Science and Technology)  Back
 
Keywords:
Neuroscience, GTC 2010 - ID P10O01
Download:
Physics Simulation
Presentation
Media
Acceleration of Computational Electromagnetics Physical Optics - Shooting and Bouncing Ray Method
Huan-Ting Meng
- University of Illinois at Urbana-Champaign
Electromagnetic fields radiated by a 1964 Ford Thunderbird are calculated over 50 times faster than a standard CPU by using a Quadro FX 5800 GPU. ...Read More
Electromagnetic fields radiated by a 1964 Ford Thunderbird are calculated over 50 times faster than a standard CPU by using a Quadro FX 5800 GPU.  Back
 
Keywords:
Physics Simulation, GTC 2010 - ID P10Q01
Download:
 
Massively Parallel Micromagnetic FEM Calculations with Graphical Processing Units
Elmar Westphal
- Forschungszentrum Juelich
We adapted our Micromagnetic Simulator "TetraMag" to NVIDIA''s CUDA architecture, resulting in a significant increase in calculation speed and cost efficiency over the most recent PC-based machines. ...Read More
We adapted our Micromagnetic Simulator "TetraMag" to NVIDIA''s CUDA architecture, resulting in a significant increase in calculation speed and cost efficiency over the most recent PC-based machines. The poster gives an outline of the general challenges and the methods used to adapt the solutions to GPUs as well as benchmark results obtained using standard micromagnetic problems.  Back
 
Keywords:
Physics Simulation, GTC 2010 - ID P10Q02
Download:
 
Multiplying Speedups: GPU-Accelerated Fast Multipole BEM, for Applications in Protein Electrostatics
Lorena Barba
- Boston University
We have developed a fast multipole boundary element method (BEM) for biomolecular electrostatics. With GPU acceleration of the FMM, there is a multiplicative speed-up resulting from the fast O(N) algorithm and GPU hardware. ...Read More
We have developed a fast multipole boundary element method (BEM) for biomolecular electrostatics. With GPU acceleration of the FMM, there is a multiplicative speed-up resulting from the fast O(N) algorithm and GPU hardware. With this method, we can obtain converged results for multi-million atom systems in less than an hour, using multi-GPU clusters.  Back
 
Keywords:
Physics Simulation, GTC 2010 - ID P10Q03
Download:
 
GPU-Powered Control of a Compliant Humanoid Robot
Alan Diamond
- University Of Sussex, UK
The ECCEROBOT project deals with the construction and control of a robot with a humanoid skeleton and muscle-like compliant, elastic actuators. ...Read More
The ECCEROBOT project deals with the construction and control of a robot with a humanoid skeleton and muscle-like compliant, elastic actuators. The nonlinear passive and active coupling between the skeletal elements, combined with the effect of environmental interaction, present an extremly complex control problem. Our solution; motor programs are found using physics-based simulation of both the robot and its environment to locate candidate movements. For real time control multiple copies of the simulation must be run in faster than real time, requiring the use of GPU acceleration. Further, in order to capture the environment we use GPU-accelerated dense reconstruction vision.  Back
 
Keywords:
Physics Simulation, GTC 2010 - ID P10Q04
Download:
Signal & Audio Processing
Presentation
Media
Achieving 1 TFLOP for the Radio Astronomy Correlator
Michael Clark
- Harvard University
In this work we apply CUDA, using the Fermi architecture, to the problem of cross-correlation arising in radio astronomy. ...Read More
In this work we apply CUDA, using the Fermi architecture, to the problem of cross-correlation arising in radio astronomy. This accounts for the bulk of computation in radio astronomy, and essentially is described by vector outer-products. Traditionally this task is performed using FPGAs, and the goal of this work was to see how efficiently GPUs could be used for this task. We describe the tiling strategies and optimization techniques employed to maximize performance. We achieve in excess of 1 teraflop per second using a single GeForce GTX 480, which corresponds to 78% of peak performance,  Back
 
Keywords:
Signal & Audio Processing, GTC 2010 - ID P10S01
Download:
 
CUDA Implementation of Software for Identifying Post-Translational Modifications
Long Wang
- Supercomputing Center, Chinese Academy of Sciences
InsPecT is a software for identifying post-translational modifications of protein. With the help of the MS-Alignment algorithm, InsPecT can search PTMs in unrestrictive mode, even reveal unknown types of modifications. ...Read More
InsPecT is a software for identifying post-translational modifications of protein. With the help of the MS-Alignment algorithm, InsPecT can search PTMs in unrestrictive mode, even reveal unknown types of modifications. However, the MS-Alignment has a tremendous time complexity and takes more than 99% computing time of InsPecT. We accelerated MS-Alignment on GPUs. After optimization and parallelization with MPI, cuda-InsPecT, a new open source software based on MPI+CUDA with high efficiency is born.  Back
 
Keywords:
Signal & Audio Processing, GTC 2010 - ID P10S02
Download:
Video & Image Processing
Presentation
Media
Neurite Detection using CUDA, GPU Accelerated Biological Imaging for High-Content Analysis
Luke Domanski
- CSIRO
The analysis of microscopic neurite structures in images is an important for studying the effects of lead compounds on brain diseases or the regeneration of brain cells after trauma. ...Read More
The analysis of microscopic neurite structures in images is an important for studying the effects of lead compounds on brain diseases or the regeneration of brain cells after trauma. In High-Content Analysis (HCA) 100s to 1000s of microscopy images are processed during automated experiments. The speed of the image processing in these situations greatly affects the workflow throughput. We report some early results on GPU acceleration of the Neurite Detection module in our groups'' HCA-Vision. The most time consuming algorithm steps are accelerated by up to 13.6x resulting in a 3.3x speedup for the entire algorithm (70% of theretical maximum).  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10J01
Download:
 
Fast Radon Transform via Fast Non-uniform FFTs on GPUs
Chao Yang
- Lawrence Berkeley National Laboratory
Fast Radon Transform is required in X-ray Phase Contrast Tomography performed at the Advanced Light Source, Lawrence Berkeley National Lab. ...Read More
Fast Radon Transform is required in X-ray Phase Contrast Tomography performed at the Advanced Light Source, Lawrence Berkeley National Lab. We describe a fast implementation based on fast non-uniform FFTs on GPUs.  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10J02
Download:
 
Projected Conjugate Gradient Solvers on GPU and its Applications
Youzuo Lin
- Arizona State University
In this work, the focus is specifically on how to speedup the projected CG algorithm utilizing the GPU. ...Read More
In this work, the focus is specifically on how to speedup the projected CG algorithm utilizing the GPU. It is shown that the projected CG method can be used within the single precision accuracy of the current GPU. One benefit gained through use of the projected CG is that it reduces the total number of matrix vector multiplications, which is usually a bottleneck for an efficient GPU-based Krylov-based algorithm. A modified projection based CG algorithm in the thesis is further proposed which shows a better performance. Numerical results using the GPU are provided to support the proposed algorithm.  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10J03
Download:
 
Real-time Direct Georeferencing of Images from Airborne Line Scan Cameras
Trym Vegard Haavardsholm
- Norwegian Defence Research Establishment (FFI)
The Norwegian Defense Research Establishment (FFI) is developing a technology demonstrator for airborne real-time hyperspectral target detection. ...Read More
The Norwegian Defense Research Establishment (FFI) is developing a technology demonstrator for airborne real-time hyperspectral target detection. The system includes two nadir-pointing line scan cameras. The line scanned images are georeferenced in real-time by intersecting rays cast from the cameras with a 3D model of the terrain underneath. The georeferenced images may then easily be ortho-rectified (e.g by using texture mapping in OpenGL) and overlaid digital maps. This poster presents the performance of a cuda implementation of the georeferencing method.  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10J04
Download:
 
CUDA Acceleration of Color Histogram Matching
Antonio Sanz
- Universidad Rey Juan Carlos
Histogram matching techniques are methods for the adjustment of color in a pair of images. It can be used as a preliminary stage for several video applications as for example 3D content creation. ...Read More
Histogram matching techniques are methods for the adjustment of color in a pair of images. It can be used as a preliminary stage for several video applications as for example 3D content creation. In such application two cameras separated a known distance acquire video streams that can be combined in order to compute a depth map. As both cameras take slightly different scenes they can be lit by different sources becoming a possible color shift between their streams and thus penalizing the quality and the user experience. Our approach considers the use of a NVIDIA 3D broadcast solution system with professional HD cameras.  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10J05
Download:
 
Real-Time Color Space Conversion for High Resolution Video
Klaus Gaedke
Color space conversion or color correction is a widely used technique to adapt the color characteristics of video material to the display technology employed (e.g. CRT, LCD, projection) or to create a certain artistic look. As color correction o ...Read More

Color space conversion or color correction is a widely used technique to adapt the color characteristics of video material to the display technology employed (e.g. CRT, LCD, projection) or to create a certain artistic look. As color correction often is an interactive task and colorists need a direct response, state-of-the-art real-time color correction systems for video are so far based on expensive dedicated hardware. This submission shows the feasibility to replace dedicated color correction systems by General Purpose GPUs. It is shown that a single Tesla C2050 GPU supports real-time color correction up to a resolution of 4096x2048 pixel.

  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10V01
Download:
 
3D Object Detection in Digital Holographic Microscope Images
Vilmos Szabo
Digital Holographic Microscopy (DHM) is based on the classical holographic principle invented by Hungarian physicist Dennis Gabor. The holographic images are acquired by a CCD camera. Depth slices can be reconstructed using Fourier transform. Th ...Read More

Digital Holographic Microscopy (DHM) is based on the classical holographic principle invented by Hungarian physicist Dennis Gabor. The holographic images are acquired by a CCD camera. Depth slices can be reconstructed using Fourier transform. The numerical reconstruction and further image processing for object detection is done using General Purpose Graphical Processor Units (GPGPU).

  Back
 
Keywords:
Video & Image Processing, GTC 2010 - ID P10V02
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2017 NVIDIA Corporation Legal Info | Privacy Policy