SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Acoustics & Audio Processing
Presentation
Media
Interactive 3D Audio Rendering Systems
Nicolas Tsingos
Learn how to leverage GPUs for interactive audio rendering. This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illus ...Read More

Learn how to leverage GPUs for interactive audio rendering. This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illustrate the benefits of GPU-accelerated audio rendering with results from 3D audio processing and sound scattering simulations. Finally, we will discuss best practices for GPU implementations as well as future opportunities for audio rendering on massively parallel architectures.

  Back
 
Keywords:
Acoustics & Audio Processing, Rendering & Ray Tracing, Signal & Audio Processing, GTC 2010 - ID 2042
Streaming:
Download:
 
Implementing CUDA Audio Networks
Giancarlo Del Sordo
Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices. Covers the ...Read More

Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices. Covers the use of inter-process communication to make a plug-in implementation loadable in 32 bit hosts installed in 64 bit systems, distributing the GPU load on remote servers, and creating a CUDA network for high-end purposes such as a big recording facility.

  Back
 
Keywords:
Acoustics & Audio Processing, Signal & Audio Processing, GTC 2010 - ID S102076
Streaming:
Download:
 
Real-time Multichannel Audio Convolution
Jose Antonio Belloch (PhD Student)
Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU. We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks. ...Read More

Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU. We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks.

  Back
 
Keywords:
Acoustics & Audio Processing, Signal & Audio Processing, GTC 2010 - ID S102116
Streaming:
Download:
 
Exploring Recognition Network Representations for Efficient Speech Inference on the GPU
Jike Chong
We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference e ...Read More

We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference engine using the simpler LLM representation evaluates 22x more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4x faster evaluation and 53-65x faster operands gathering for each state transition. We illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel GPUs.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2010 - ID P10C01
Download:
 
Efficient Automatic Speech Recognition on the GPU
Jike Chong
Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be explo ...Read More

Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be exploited on the GPU. However, the state-of-art ASR algorithm involves a highly parallel graph traversal on an irregular graph with millions of states and arcs, making efficient parallel implementations highly challenging. We present four generalizable techniques including: dynamic data-gather buffer, find-unique, lock-free data structures using atomics, and hybrid global/local task queues. When used together, these techniques can effectively resolve ASR implementation challenges on an NVIDIA GPU.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2010 - ID P10C02
Download:
 
HYDRA - A Hybrid CPU/GPU Speech Recognition Engine for Real-Time LVCSR
Jungsuk Kim (Carnegie Mellon Silicon Valley)
HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing ...Read More

HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic and language models due to the limited memory. To overcome this limitation, we have developed a novel architecture for speech recognition decoding that jointly leverages manycore graphic processing units (GPU) and multicore processors (CPU) to perform speech recognition even when large acoustic and language models are applied. The proposed architecture can perform speech recognition at up to 5x faster than real-time with a recognition vocabulary of more than 1 Million words.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2013 - ID S3406
Streaming:
Download:
Advanced Driver Assistance Systems (ADAS)
Presentation
Media
Real-time Traffic Sign Recognition on Mobile Processors
Victor Eruhimov (Itseez, Inc.)
There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new ...Read More

There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new computer vision standard OpenVX. In addition, Itseez traffic sign detection will be showcased. The algorithm is capable of detecting speed limit signs for both North America and EMEA regions as well as several other signs, delivering faster than real-time performance on an embedded platform with a mobile grade GPU.

  Back
 
Keywords:
Advanced Driver Assistance Systems (ADAS), Automotive, Computer Vision, GTC 2013 - ID S3548
Streaming:
Download:
 
Virtualization of Tegra in Automotive Applications: Integration of Head-Unit and Instrument Cluster
Stefaan Sonck Thiebaut (OpenSynergy)
This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot ...Read More

This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot-times. In addition, to reduce costs, more functionality needs to be integrated on a single processor. An example of this is the integration of the head-unit and the instrument cluster as two displays of a single device. As a solution to these requirements, we describe a software architecture that uses virtualization with a micro-kernel and that is already implemented and available on NVIDIA Tegra3. We will give a brief outlook on the next steps regarding the sharing of the GPU and hardware virtualization.

  Back
 
Keywords:
Advanced Driver Assistance Systems (ADAS), Automotive, In-Vehicle Infotainment (IVI) & Safety, Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3577
Streaming:
Download:
Aerospace & Defense
Presentation
Media
XMP: An NVIDIA CUDA?-Accelerated Big Integer Library
Justin Luitjens (NVIDIA)
We''ll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hellma ...Read More
We''ll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hellman key exchange. We''ll focus on what the capabilities of the library are along with how to efficiently use the library.  Back
 
Keywords:
Aerospace & Defense, Tools & Libraries, GTC 2016 - ID S6151
Streaming:
Download:
 
Intelligent Mobile System for Improving Spatial Design Support and Security Inside Buildings
Janusz Bedkowski (Institute of Mathematical Machines)
This talk concerns the intelligent mobile application for spatial design support and security domain. Mobility has two aspects in our research: The first one is the usage of mobile robots for 3D mapping of urban areas and for performing some specific ...Read More
This talk concerns the intelligent mobile application for spatial design support and security domain. Mobility has two aspects in our research: The first one is the usage of mobile robots for 3D mapping of urban areas and for performing some specific tasks. The second is related to a novel software as a service system that allows access to robotic functionalities and data over the Ethernet. Thus, we demonstrate the use of the novel NVIDIA GRID technology, which virtualizes the GPU. We introduce Complex Shape Histogram, a core component of our artificial intelligence engine, used for classifying 3D point clouds with Support Vector Machine. We use NVIDIA CUDA for accelerating computations.  Back
 
Keywords:
Aerospace & Defense, Data Center & Cloud Computing, Robotics & Autonomous Machines, GTC 2016 - ID S6233
Streaming:
Download:
 
Big Geospatial Data + Deep Learning + High Performance Computing = Geospatial Intelligence
Bingcai Zhang (BAE Systems)
We present two algorithms that are specifically designed to accurately detect geospatial objects in geospatial images. Combining these two algorithms with deep learning algorithms, we have achieved detection accuracy over 99% for vehicles, positional ...Read More
We present two algorithms that are specifically designed to accurately detect geospatial objects in geospatial images. Combining these two algorithms with deep learning algorithms, we have achieved detection accuracy over 99% for vehicles, positional accuracy of within 6 pixels, orientation accuracy of less than 10 degrees, and false positive error rate of 0.001% with 7.5cm GSD aerial images. In essence, our algorithms induce learning capability from deep learning into template image matching in geospatial intelligence. Our algorithms reduce false positive error rate by an order of magnitude over softmax classifier. With over 99% accuracy, we believe this may be the game changer in geospatial intelligence domain.  Back
 
Keywords:
Aerospace & Defense, Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6260
Streaming:
Download:
 
GPU-Accelerated Graph Query for Cyber Applications
Jim Carbonaro (Blazegraph)
Cyberspace is a critical domain for government and commercial organizations. It is about networks, devices, and how they interact. Graphs model nodes and links and how they are connected. Defending the critical networks in cyberspace requires process ...Read More
Cyberspace is a critical domain for government and commercial organizations. It is about networks, devices, and how they interact. Graphs model nodes and links and how they are connected. Defending the critical networks in cyberspace requires processing and analyzing extremely large quantities of graph data in near-real time. Key cyber analytics and data sets ranging from Topological Vulnerability Analysis, Traffic Flow Analysis, and Network Attack Graphs are graphs. This session will discuss how Blazegraph GPU meets this challenge by delivering near-real time performance at a very large data scales, uses a flexible and updatable graph representation to support complex analytics, and supports existing graph frameworks (RDF, Tinkerpop) and query languages (SPARQL).  Back
 
Keywords:
Aerospace & Defense, Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6337
Streaming:
Download:
 
Deep Convolutional Neural Networks for Spoken Dialect Classification of Spectrogram Images Using DIGITS
Nigel Cannings (Intelligent Voice Limited)
Deep convolution neural networks are designed for classification tasks involving static images. We''ll outline the novel application of using such networks for speech processing tasks such as the identification of a speaker''s dialect. Representing s ...Read More
Deep convolution neural networks are designed for classification tasks involving static images. We''ll outline the novel application of using such networks for speech processing tasks such as the identification of a speaker''s dialect. Representing speech as spectrogram images, we''ll show our recent results from the NIST language recognition competition, and discuss how the network training results can be improved by manipulation of the spectrogram images in a way appropriate in the context of speech applications.  Back
 
Keywords:
Aerospace & Defense, Deep Learning & Artificial Intelligence, Signal & Audio Processing, GTC 2016 - ID S6371
Streaming:
Download:
 
Real-Time Non-Rigid Image Registration Engine
Randall Miles (Propulsion Science and Technology)
Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We''ll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU ...Read More
Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We''ll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU technology. Time improvements of ~80% were seen through implementing a succession of CUDA optimizations guided by the Nsight profiler results. Tests were conducted using available simulated rocket plume images to calculate run times and create performance measures.  Back
 
Keywords:
Aerospace & Defense, Performance Optimization, Video & Image Processing, GTC 2016 - ID S6397
Streaming:
Download:
 
Missile Defense Radar through Real-Time Electromagnetic Simulation Injection
Ted Selig (FishEye Software, Inc.)
Radars, electromagnetic sensors encoded transmit signals, focus beams, extract targets from noise, and perceive targets and environments. These real-time systems are expensive and risky to build and operate because they are complex, real-time, and di ...Read More
Radars, electromagnetic sensors encoded transmit signals, focus beams, extract targets from noise, and perceive targets and environments. These real-time systems are expensive and risky to build and operate because they are complex, real-time, and difficult to test. The evolution of the GPU has the potential to disrupt this sensor industry by dramatically reducing the cost of radars, accelerate innovation, and reduce sensor maintenance. The presentation will discuss processing techniques and data flow architecture required by these sensors. The discussion explores how GPU adoption can reduce the development costs and risks of sensor development for missile defense but also enable low-cost applications like the self-driving car, weather sensing, and air traffic management.  Back
 
Keywords:
Aerospace & Defense, Embedded, Signal & Audio Processing, GTC 2016 - ID S6434
Streaming:
Download:
 
HD GP-GPU Systems for HPC Applications
Sergio Tafur (Naval Research Laboratory)
We''ll be presenting how we fielded a High Density (HD) GP-GPU system, currently 227 on the Top 500, evaluated its performance, and overcame challenges that arose during testing phases. In addition, we will touch on using Python to code for and " ...Read More
We''ll be presenting how we fielded a High Density (HD) GP-GPU system, currently 227 on the Top 500, evaluated its performance, and overcame challenges that arose during testing phases. In addition, we will touch on using Python to code for and "glue" CPUs and GP-GPUs together in such HD GP-GPU systems.  Back
 
Keywords:
Aerospace & Defense, Algorithms, Supercomputing & HPC, GTC 2016 - ID S6641
Streaming:
Download:
Algorithms
Presentation
Media
Parallel Low Rank LU and Cholesky Refactorization
Lung-Sheng Chien (NVIDIA)
Attendees can learn how to use a low-rank update in linear solver during a nonlinear process--for example, linear programming, structural mechanics, and circuit simulation. A GPU-friendly version is proposed, which is mainly based on BLAS2 operations ...Read More
Attendees can learn how to use a low-rank update in linear solver during a nonlinear process--for example, linear programming, structural mechanics, and circuit simulation. A GPU-friendly version is proposed, which is mainly based on BLAS2 operations. Compared to traditional approaches, with BLAS2 operations, we can hide instruction latency well and achieve full bandwidth of a many-core processor. In this talk, we describe the basic idea of low-rank update and show up to 5x speedup from complexity analysis.  Back
 
Keywords:
Algorithms, Computer-Aided Engineering, GTC 2016 - ID S6129
Streaming:
Download:
 
Optimizing Instruction-Bound Kernels in Dissipative Particle Dynamics
Yu-Hang Tang (Division of Applied Mathematics, Brown University)
In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logist ...Read More
In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logistic map. This RNG can take advantage of the higher FP-to-integer instruction throughput ratio of CUDA GPUs to generate a large number of high quality random streams in situ. Second, warp-votes and shared memory were used to consolidate workload from diverging warps. Last, inline PTX was used to emulate 24-bit integer arithmetics by their floating point counterparts in order to increase throughput. An implementation using C++ templates ensures that no type-casting overhead is triggered and also guards the technique from unintentional usage.  Back
 
Keywords:
Algorithms, Computational Chemistry, Performance Optimization, GTC 2016 - ID S6140
Streaming:
Download:
 
Effective Evaluation of Betweenness Centrality on Multi-GPU Systems
Massimo Bernaschi (National Research Council of Italy)
Learn how to use (multi) GPU and CUDA to speed up the process of ranking the importance of each node in a large scale network. You will see how to solve an extraordinary challenge, that is the exact computation of Betweenness Centrality, by using as ...Read More
Learn how to use (multi) GPU and CUDA to speed up the process of ranking the importance of each node in a large scale network. You will see how to solve an extraordinary challenge, that is the exact computation of Betweenness Centrality, by using as building blocks relatively simple algorithms, like the Breadth First Search, that have been highly tuned for latest generation GPU cards. Our approach is fully scalable and overcomes the limitation on the size of the graph that can be studied on a single GPU. We''ll present results obtained on both synthetic and real-world graphs.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID S6157
Streaming:
Download:
 
Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures
Adam McLaughlin (Georgia Institute of Technology)
Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an executio ...Read More
Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system''s memory consistency model is an NP-complete problem. This session improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. For large tests of interest, our GPU implementation achieves an average application speedup of 26x over existing techniques in use at NVIDIA.  Back
 
Keywords:
Algorithms, Big Data Analytics, Tools & Libraries, GTC 2016 - ID S6180
Streaming:
Download:
 
Not Just a Universal Crutch: Other Useful Things to Do with atomicCAS
Elmar Westphal (Forschungszentrum Julich GmbH)
There is more to atomicCAS than the double-precision atomicAdd loop from the programming guide. Something different from the universal atomic operation loop it represents. We''ll show how to build shared, memory-based hash function loops to solve dif ...Read More
There is more to atomicCAS than the double-precision atomicAdd loop from the programming guide. Something different from the universal atomic operation loop it represents. We''ll show how to build shared, memory-based hash function loops to solve different counting and grouping problems at warp- and block-level. Variations of this loop can be used to count unique elements in a block, find threads sharing common data elements, or speed up histogram building for large numbers of bins. With the now natively implemented atomic operations on shared memory on Maxwell, these functions can be significantly faster than algorithms optimised for other architectures.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID S6220
Streaming:
Download:
 
Hierarchical Computations on Manycore Architectures
Hatem Ltaief (Extreme Computing Research Center, KAUST)
Learn about a new hierarchical matrix structure for fast linear algebra computations on GPUs! Recursivity, tree traversal, hierarchical data layout, and batched kernel executions are some of the ingredients of a new HPC recipe for computing challengi ...Read More
Learn about a new hierarchical matrix structure for fast linear algebra computations on GPUs! Recursivity, tree traversal, hierarchical data layout, and batched kernel executions are some of the ingredients of a new HPC recipe for computing challenging linear algebra operations and solving large scientific problems (e.g., spatial statistics) on GPUs. By exploiting the low-rank matrix representations, the original dense matrix of the problem can be approximated, which results in saving the memory footprint and reducing the algorithmic complexity, while still maintaining an adequate solution accuracy. In addition, the talk showcases a new high-performance hierarchical symmetric eigensolver and SVD, juicing the horsepower out of multiple GPUs to the fullest.  Back
 
Keywords:
Algorithms, Performance Optimization, Tools & Libraries, GTC 2016 - ID S6230
Streaming:
Download:
 
GPU Accelerated Markov Decision Process in Crowd Simulation
Benjamin Hernandez (Oak Ridge National Laboratory)
Markov decision processes have been used in real-world path planning, where environment information is incomplete or dynamic. The problem with the MDP formalism is that its state space grows exponentially with the number of domain variables, and its ...Read More
Markov decision processes have been used in real-world path planning, where environment information is incomplete or dynamic. The problem with the MDP formalism is that its state space grows exponentially with the number of domain variables, and its inference methods grow with the number of available actions. To overcome this issue, we formulate an MDP solver in terms of matrix multiplications, based on the value iteration algorithm; thus we can take advantage of GPUs to produce interactively obstacle-free paths in the form of an optimal policy. We''ll present a performance analysis of our technique using Jetson TK1, CPU, and GPU platforms. Our algorithm presents 90x speed-up in GPUs, and 30x speed-up in the Jetson TK1 in contrast with its CPU multi-threaded version.  Back
 
Keywords:
Algorithms, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6268
Streaming:
Download:
 
XMP Library Internals: Modular Multiplication on Kepler and Maxwell
Niall Emmart (University of Massachusetts)
We''ll present an overview of the internals of the XMP multiple precision library and take a detailed look at the low-level algorithms used for modular squaring and modular multiplication on Kepler and present novel algorithms for Maxwell. Modular mu ...Read More
We''ll present an overview of the internals of the XMP multiple precision library and take a detailed look at the low-level algorithms used for modular squaring and modular multiplication on Kepler and present novel algorithms for Maxwell. Modular multiplication is a performance-critical primitive and widely used in cryptographic algorithms from prime testing and factorization to public key/private key algorithms such as RSA, Diffie-Hellman, and digital signatures.  Back
 
Keywords:
Algorithms, Tools & Libraries, GTC 2016 - ID S6349
Streaming:
Download:
 
Simulating a Quantum Annealer with GPU-Based Monte Carlo Algorithms
James King (D-Wave Systems)
Learn how the world''s most powerful quantum computers are simulated and benchmarked using GPU-based Monte Carlo algorithms. We''ll introduce D-Wave''s quantum annealing platform, describe several Monte Carlo algorithms for their simulation, and comp ...Read More
Learn how the world''s most powerful quantum computers are simulated and benchmarked using GPU-based Monte Carlo algorithms. We''ll introduce D-Wave''s quantum annealing platform, describe several Monte Carlo algorithms for their simulation, and compare CPU- and GPU-based implementations of these algorithms. In particular, we''ll focus on considerations of memory layout and fast mathematical functions to maximize speed. Finally, we''ll present benchmarking results, including CPU-based algorithms, GPU-based algorithms, and D-Wave''s latest-generation quantum annealers.  Back
 
Keywords:
Algorithms, Computational Physics, Supercomputing & HPC, GTC 2016 - ID S6380
Streaming:
Download:
 
GPU Acceleration of Cholesky's Factorization in CHOLMOD: Batching, Hybrid and Multi-GPU
Steven Rennich (NVIDIA)
Sparse matrix factorization is a fundamental tool in scientific computing and has been shown to be well accelerated using GPUs. Yet applying the full capability of the GPU to the factorization operation remains a challenge. This talk covers the lates ...Read More
Sparse matrix factorization is a fundamental tool in scientific computing and has been shown to be well accelerated using GPUs. Yet applying the full capability of the GPU to the factorization operation remains a challenge. This talk covers the latest GPU optimizations that have been applied to the Cholesky factorization algorithm within the well-known SuiteSparse/CHOLMOD linear solver. These optimizations include new NVIDIA CUDA versions of BLAS and LAPACK routines to accelerate operations on batches of small, non-uniformly sized matrices, hybrid computing enhancements, support for multi-GPU acceleration, and further avoidance of PCIe communication through refinements to the sub-tree algorithm.  Back
 
Keywords:
Algorithms, Performance Optimization, Tools & Libraries, GTC 2016 - ID S6387
Streaming:
Download:
 
Fast Detection of Neighboring Vectors
Krzysztof Kaczmarski (Warsaw University of Technology, Faculty of Mathematics and Information Science)
We''ll present several methods for detecting pairs of vectors, which are in Hamming distance 1. This problem is an important part of the cell graph construction in motion planning in a space with obstacles. We''ll begin with a naive square-time solut ...Read More
We''ll present several methods for detecting pairs of vectors, which are in Hamming distance 1. This problem is an important part of the cell graph construction in motion planning in a space with obstacles. We''ll begin with a naive square-time solution, which simply compares pairs of vectors, through building dedicated search trees, moving towards an optimal linear algorithm. Sequential linear time algorithms for the problem were already known, but due to high constants hidden in the complexity function, they appeared to be not very efficient for real-life data. Our GPU-based massively parallel solution promises acceptable execution times, opening dynamic cell graph construction for real-time applications like robotics and optimal path searching.  Back
 
Keywords:
Algorithms, Tools & Libraries, Robotics & Autonomous Machines, GTC 2016 - ID S6402
Streaming:
 
Accelerating Approximate Weighted Matching on GPUs
Antonino Tumeo (Pacific Northwest National Laboratory)
Matching is a fundamental graph problem with numerous applications in science and engineering. This talk discusses the efficient implementation of half-approximate weighted matching on GPUs. We start by describing the Suitor algorithm, currently cons ...Read More
Matching is a fundamental graph problem with numerous applications in science and engineering. This talk discusses the efficient implementation of half-approximate weighted matching on GPUs. We start by describing the Suitor algorithm, currently considered the best algorithm for this problem, and identifying by its key implementation challenges. In its basic formulation, the Suitor algorithm appears poorly suited to GPUs, due to the irregular memory accesses and the use of locks. We proceed by introducing four variants of the algorithm that progressively address these challenges by exploiting Kepler''s hardware features. We demonstrate that the final implementation outperforms by several times the performance of previous best matching algorithms for GPUs and of the Suitor algorithm on CPUs.  Back
 
Keywords:
Algorithms, Big Data Analytics, Aerospace & Defense, GTC 2016 - ID S6423
Streaming:
Download:
 
Exploring Scalable Implementations of Triangle Enumeration in Graphs of Diverse Densities: Apache Spark vs. GPUs
Michela Taufer (University of Delaware)
We''ll present graphs as powerful tools when analyzing complex relationships between entities. We''ll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as gra ...Read More
We''ll present graphs as powerful tools when analyzing complex relationships between entities. We''ll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as graphs. Since many of the real graphs are very large and complex, the associated analysis algorithms must be very efficient and highly parallel. We present two implementations of a key graph-based analysis such as the triangle enumeration for two different parallel paradigms: GPU programming and Apache Spark. We''ll reveal the performance of the two different implementations for the different paradigms as the characteristics of the graph change.  Back
 
Keywords:
Algorithms, Tools & Libraries, Big Data Analytics, GTC 2016 - ID S6424
Streaming:
Download:
 
GPU-Oriented Sparse Multifrontal QR Method
Wissam Sid-Lakhdar (Texas A&M University)
We''ll present the sparse direct method, a multifrontal QR factorization intended specifically for GPU accelerators. Our approach relies on the use of a bucket scheduler that exploits an irregular parallelism on both a coarse grain, among a set of fr ...Read More
We''ll present the sparse direct method, a multifrontal QR factorization intended specifically for GPU accelerators. Our approach relies on the use of a bucket scheduler that exploits an irregular parallelism on both a coarse grain, among a set of fronts with different characteristics, and on a fine grain, through the exploitation of the staircase shape of these fronts. The scheduler then relies on dense GPU kernels which design and implementation target recent GPU architectures.  Back
 
Keywords:
Algorithms, Performance Optimization, Tools & Libraries, GTC 2016 - ID S6439
Streaming:
 
Quotient Filters: Approximate Membership Queries on the GPU
Afton Geil (UC Davis)
Most GPU data structures must be rebuilt (often on the CPU) any time they are modified. We''ll examine the challenges of building and maintaining mutable data structures on the GPU, and will present our solution for one particular data structure: the ...Read More
Most GPU data structures must be rebuilt (often on the CPU) any time they are modified. We''ll examine the challenges of building and maintaining mutable data structures on the GPU, and will present our solution for one particular data structure: the quotient filter. A quotient filter is used for performing fast database queries, similar to a Bloom filter. We describe our search for an efficient parallelization of construction, insertion, and query operations on the quotient filter data structure. We show that this data structure can outperform a Bloom filter for database lookups and insertions, while also providing much greater flexibility.  Back
 
Keywords:
Algorithms, Big Data Analytics, GTC 2016 - ID S6464
Streaming:
Download:
 
Testing Chordal Graphs with CUDA?
Agnieszka Lupinska (Jagiellonian University)
We''ll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two no ...Read More
We''ll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two non-adjacent vertices on the cycle. In total, the algorithm takes O(N) time on N-threads grid and it performs O(N+M) work for graphs of N vertices and M edges. We''ll compare the performance tests results achieved by the CUDA implementation on NVIDIA GeForce GTX TITAN X and the sequential implementation on CPU with four cores (eight threads). We''ll present the tests results for cliques, sparse graphs, dense graphs, and random chordal graphs.  Back
 
Keywords:
Algorithms, Big Data Analytics, GTC 2016 - ID S6489
Streaming:
Download:
 
High-Performance Batched Computations for GPUs: Approaches and Applications
Stanimire Tomov (UTK)
Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learn ...Read More
Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learning, data mining, dense and sparse solvers, high-order FEM, astrophysics, and more. We will consider the development of batched computations for these applications, stressing innovative GPU techniques and algorithms for uniform, as well as variable-size batches, tensor contractions, batched BLAS, and more. Batched computations can fill up the GPU with work, remove scheduling overheads and costly CPU-GPU communications to accelerate the computation often by an order of magnitude compared to non-batched approaches.  Back
 
Keywords:
Algorithms, Tools & Libraries, Performance Optimization, GTC 2016 - ID S6509
Streaming:
Download:
 
GPU Optimization of the Kripke Neutral-Particle Transport Mini-App
David Appelhans (IBM)
For Sierra, a pre-exascale CORAL supercomputer arriving at Lawrence Livermore National Lab in 2017, neutral-particle transport codes will be a primary application and ensuring peak performance of these applications on this system (multiple IBM POWER9 ...Read More
For Sierra, a pre-exascale CORAL supercomputer arriving at Lawrence Livermore National Lab in 2017, neutral-particle transport codes will be a primary application and ensuring peak performance of these applications on this system (multiple IBM POWER9 CPUs + multiple Volta GPUs per node) is important. In preparation, transport mini-apps, like Kripke, are being optimized on today''s hybrid CPU-GPU clusters using different programming models. This talk discusses performance issues encountered by Kripke on these systems and their solutions. Specifically we will focus on: a) a novel implementation of the sweep algorithm; b) techniques useful for modeling physical problems requiring memory footprint exceeding the aggregated GPU memory; and c) porting Kripke using OpenMP4.  Back
 
Keywords:
Algorithms, Computational Physics, Supercomputing & HPC, GTC 2016 - ID S6513
Streaming:
Download:
 
GPU Multisplit
Saman Ashkiani (University of California, Davis)
Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, pr ...Read More
Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often use a sort instead. However, sort does more work than necessary to implement multisplit, and is thus inefficient. In this work, we provide a parallel model and multiple implementations for the multisplit problem, with a focus on a small number of buckets. In our implementations, we exploit the computational hierarchy of the GPU to perform most of the work locally, with minimal usage of global operations. We use warp-synchronous programming models as well as hierarchical reordering of input elements to achieve better performance.  Back
 
Keywords:
Algorithms, GTC 2016 - ID S6517
Streaming:
Download:
 
Portable Performance for Monte Carlo Simulations of Photon Migration in 3D Turbid Media for Single and Multiple GPUs
Leiming Yu (Northeastern University)
We present a parallel Monte Carlo (MCX) algorithm accelerated by GPUs for modeling time-resolved photon migration in a 3-D turbid media. We''ll present optimizations that benefit execution on a single GPU as well as multiple GPUs. By leveraging persi ...Read More
We present a parallel Monte Carlo (MCX) algorithm accelerated by GPUs for modeling time-resolved photon migration in a 3-D turbid media. We''ll present optimizations that benefit execution on a single GPU as well as multiple GPUs. By leveraging persistent threads, our single-GPU implementation provides a high-performance parallel simulation of MCX when run on an NVIDIA GPU. Our implementation is automatically tuned to leverage persistent threads for different GPU architectures. We achieved improvements over 25% for Kepler and 12% for Maxwell architecture as compared to using a heuristic approach. In addition, we propose a linear programming approach based on predictive modeling to optimize MCX execution on multiple devices.  Back
 
Keywords:
Algorithms, Performance Optimization, Rendering & Ray Tracing, GTC 2016 - ID S6635
Streaming:
Download:
 
Training Recurrent Neural Networks in FP16
Erich Elsen (Baidu USA, Inc.)
Reducing training time allows us to learn from our experiments more quickly and make new innovations based on what we''ve learned. Using less than the standard 32 bits to represent a number can help reduce training times. We''ll talk about how to use ...Read More
Reducing training time allows us to learn from our experiments more quickly and make new innovations based on what we''ve learned. Using less than the standard 32 bits to represent a number can help reduce training times. We''ll talk about how to use 16-bit floating point because it is starting to have wide hardware support with the release of Pascal. Unfortunately, naively converting all datatypes from 32- to 16-bits doesn''t work, as training stability and accuracy are comprised. We''ll discuss the reasons for the difficulties and solutions. Finally, we''ll show performance and scalability improvements due to using reduced precision.  Back
 
Keywords:
Algorithms, Deep Learning & Artificial Intelligence, Performance Optimization, GTC 2016 - ID S6661
Streaming:
Download:
 
Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
Guy Steele (Oracle Labs)
We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model for machine learning, that avoids computing a complete table of partial sums of the relative probabili ...Read More
We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model for machine learning, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete partial sums are computed on the fly during a binary search. Measurements using an NVIDIA TITAN Black GPU show that for a sufficiently large number of clusters or topics (K > 200), this technique alone more than doubles the speed of a latent Dirichlet allocation (LDA) application already highly tuned for GPU execution.  Back
 
Keywords:
Algorithms, Performance Optimization, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6665
Streaming:
Download:
 
Fast Splittable Pseudorandom Number Generators
Guy Steele (Oracle Labs)
We describe two new classes of algorithm for a "splittable" pseudorandom number generator (PRNG) that is quite fast: either 9 or 11 64-bit arithmetic/logical operations per 64 bits generated. A splittable PRNG provides a "split" o ...Read More
We describe two new classes of algorithm for a "splittable" pseudorandom number generator (PRNG) that is quite fast: either 9 or 11 64-bit arithmetic/logical operations per 64 bits generated. A splittable PRNG provides a "split" operation that creates a new PRNG that is computationally and statistically independent of its creator and therefore may be used in parallel. Splittable PRNG objects make it easy to organize the use of pseudorandom numbers in multithreaded programs where the number of threads may vary dynamically, but also have sufficient speed and quality to be useful when the number of threads is fixed. It is faster than MRG32k3a and of higher quality than XORWOW. No locking or synchronization is required, and the algorithm is quite suitable for SIMD or GPU implementation.  Back
 
Keywords:
Algorithms, Tools & Libraries, Performance Optimization, GTC 2016 - ID S6666
Streaming:
Download:
 
GPU Accelerated Streaming Algorithms for Halo Finders
Nikita Ivkin (Johns Hopkins University)
In this work we show the connection between two problems: halo-finding and heavy hitters. Finding haloes, dense clumps of matter, in output of cosmological simulation is crucial for verifying theoretical models using observation. Current algorithms r ...Read More
In this work we show the connection between two problems: halo-finding and heavy hitters. Finding haloes, dense clumps of matter, in output of cosmological simulation is crucial for verifying theoretical models using observation. Current algorithms require to load full dataset into memory, making computations infeasible on the desktop machine. We reduce halo-finding problem to problem of finding most frequent items (heavy hitters) in streaming data, and apply two algorithms: Pick-and-Drop and Count Sketch. These algorithms can find top 1000 largest haloes with logarithmical memory usage, but time performance is poor. GPU acceleration makes it possible to make several passes in reasonable time, thus helping to find more haloes in the future.  Back
 
Keywords:
Algorithms, Astronomy & Astrophysics, GTC 2016 - ID S6671
Streaming:
Application Design & Porting Techniques
Presentation
Media
An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm
Martin Burtscher (Texas State University)
This session presents a complete CUDA implementation of the irregular Barnes-Hut n-body algorithm. This algorithm repeatedly builds and traverses unbalanced trees, making it difficult to map to GPUs. We explain in detail how our code exploits th ...Read More

This session presents a complete CUDA implementation of the irregular Barnes-Hut n-body algorithm. This algorithm repeatedly builds and traverses unbalanced trees, making it difficult to map to GPUs. We explain in detail how our code exploits the architectural features of GPUs, including lockstep operation and thread divergence, both of which are commonly viewed as hurdles to achieving high performance, especially for irregular codes. On a five million body simulation running on a Tesla C2050, our CUDA implementation is 30 times faster than a parallel pthreads version running on a high-end 6-core Xeon.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2111
Streaming:
Download:
 
GPU Task-Parallelism: Primitives and Applications
Stanley Tzeng (University of California), Anjul Patney (Davis)
We explore how a task-parallel model can be implemented on the GPU and address concerns and programming techniques for doing so. We discuss the primitives for building a task-parallel system on the GPU. This includes novel ideas for mapping task ...Read More

We explore how a task-parallel model can be implemented on the GPU and address concerns and programming techniques for doing so. We discuss the primitives for building a task-parallel system on the GPU. This includes novel ideas for mapping tasking systems onto the GPU including task granularity, load balancing, memory management, and dependency resolution. We also present several applications which demonstrate how a task-parallel model is more suitable than the regular data parallel model. These applications include a Reyes renderer, tiled deferred lighting renderer, and a video encoding demo.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2138
Streaming:
Download:
 
Large-Scale Reservoir Simulation on GPU
Song Yu (Chemical & Petroleum Department, University of Calgary)
Develop highly parallel GPU-based GMRES solver and several precondtioners, and couple them with the in-house reservoir simulator to speedup large-scale reservoir simulation with over one million grid blocks. For those preconditioners, we develop ...Read More

Develop highly parallel GPU-based GMRES solver and several precondtioners, and couple them with the in-house reservoir simulator to speedup large-scale reservoir simulation with over one million grid blocks. For those preconditioners, we develop the highly parallelized ILU(k), ILUT, and block ILU(k), block ILUT, with matrix partition by METIS on GPU. The excellent speedup and accurate results can demonstrate the great promising future of the GPU parallel device in parallel reservoir simulation.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2190
Streaming:
Download:
 
Levenberg-Marquardt Using Block Sparse Matrices on CUDA
Tetsuo Tawara (Koozyt, Inc.)
This session describes the experiences of constructing GPU based matrix-vector functions for block sparse matrices having multiple block sizes and a domain-specific numerical Jacobian generation function. The bundle adjustment algorithm is an op ...Read More

This session describes the experiences of constructing GPU based matrix-vector functions for block sparse matrices having multiple block sizes and a domain-specific numerical Jacobian generation function. The bundle adjustment algorithm is an optimization procedure which attempts to refine the relative camera pose, and 3D structure location variables, estimated from multiple sets of images. The Conjugate Gradient algorithm is used to solve the normal equations which appear in the inner loop to the non-linear least squares problem.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2231
Streaming:
Download:
 
LAtoolbox: A Multi-platform Sparse Linear Algebra Toolbox
Dimitar Lukarski (Karlsruhe Institute of Technology (KIT)), Jan-Philipp Weiss (Karlsruhe Institute of Technology)
Find out about an easy way for building sparse linear solvers for GPUs and multi-/many-core platforms. Based on data abstraction and virtualization of the hardware, the LAtoolbox supports several platforms such as GPUs, multi-core CPUs, and acce ...Read More

Find out about an easy way for building sparse linear solvers for GPUs and multi-/many-core platforms. Based on data abstraction and virtualization of the hardware, the LAtoolbox supports several platforms such as GPUs, multi-core CPUs, and accelerators. The various backends (CUDA, OpenCL, OpenMP, ...) utilize optimized and platform-specific routines and allow seamless integration of GPUs into scientific applications. By means of unified interfaces across all platforms the library enables you to build generic linear solvers and preconditioners on a single code base without specific information of your hardware. We demonstrate portability and flexibility of our open-source approach on heterogeneous platforms.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2291
Streaming:
Download:
 
Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation
Thomas Benson (Georgia Tech Research Institute)
This presentation describes our development of a GPU-accelerated backpropagation implementation for Synthetic Aperture Sonar systems that supports multiple nodes via MPI and multi-GPU nodes. This implementation can form a complex-valued gigapixe ...Read More

This presentation describes our development of a GPU-accelerated backpropagation implementation for Synthetic Aperture Sonar systems that supports multiple nodes via MPI and multi-GPU nodes. This implementation can form a complex-valued gigapixel image in one hour on a single C2050. We further scale this implementation to the Keeneland system where we can form the same gigapixel image in 21 seconds on 48 nodes with 144 C2070 Tesla GPUs. Our talk will discuss the details of our implementation, including our optimizations and scaling results for various node and GPU configurations, as well as the applicability to other domains, including Synthetic Aperture Radar.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2316
Streaming:
Download:
 
Debugging Floating Point Implementations on GPUs
Miriam Leeser (Northeastern University)
To debug GPU code it is important to understand differences between both CPU and GPU implementations. The differences arise due to floating point (FP) differences and casting from floating point to fixed point. FP differences arise due to the lack ...Read More
To debug GPU code it is important to understand differences between both CPU and GPU implementations. The differences arise due to floating point (FP) differences and casting from floating point to fixed point. FP differences arise due to the lack of associativity of FP, differences in instruction implementation, and choices made by the compiler. We analyzed medical image reconstruction code for breast reconstruction and showed that GPU and CPU code could be made to produce identical results. We also analyze the performance implications of choosing different implementation options on the GPU and CPU to make the codes match.   Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2179
Download:
 
KILO Transactional Memory for GPU
Wilson Wai Lun Fung (University of British Columbia)
GPUs are designed to efficiently execute of 1000s of concurrent threads on multiple SIMT cores to hide long latency operations. Currently, threads in different CUDA blocks can only communicate via global memory accesses, and programmers have to consi ...Read More
GPUs are designed to efficiently execute of 1000s of concurrent threads on multiple SIMT cores to hide long latency operations. Currently, threads in different CUDA blocks can only communicate via global memory accesses, and programmers have to consider data-races. Although fine-grained locks can be constructed using 32-/64-bit word atomic operations in recent GPUs, operations involving multiple locks can have deadlocks. We propose to solve these problems by extending GPUs to support transactional memory. Some of the major challenges are to support 1000s of concurrent transactions, to commit non-conflicting transactions in parallel, and to integrate with stack-based SIMT execution.  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2180
Download:
 
CUDA-Based GPU Computing Framework for GNU Octave
John Melonakos (AccelerEyes)
This poster presents the design of a CUDA-GPU based parallel processing framework for GNU Octave. Octave is a high-level interpreted language, primarily intended for numerical computations. GNU Octave being an open source alternative to Matlab, is wi ...Read More
This poster presents the design of a CUDA-GPU based parallel processing framework for GNU Octave. Octave is a high-level interpreted language, primarily intended for numerical computations. GNU Octave being an open source alternative to Matlab, is widely used in academic and research institutes. The GPU framework allows Octave users to accelerate their software written in Octave high-level M language on GPUs with minimal code modifications. To my knowledge, this is the first attempt to build a GPU framework for Octave, contrary to previous attempts to provide GPU variants for a set of Octave functions.  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2213
Download:
 
Dirk Pleiter (Juelich Supercomputing Centre)
The NVIDIA Application Lab at Julich, established by JSC and NVIDIA in June 2012, aims on enabling scientific applications for GPU-based architectures. Selected applications and their performance characteristics will be presented. Strategies for ...Read More

The NVIDIA Application Lab at Julich, established by JSC and NVIDIA in June 2012, aims on enabling scientific applications for GPU-based architectures. Selected applications and their performance characteristics will be presented. Strategies for multi-GPU parallelizations (necessary to meet computing demands) will be discussed.

  Back
 
Keywords:
Application Design & Porting Techniques, Supercomputing 2012 - ID SC2007
Download:
Architectural Mapping & Event Visualization
Presentation
Media
Real-time Lighting and Rendering for Architectural Visualization
Rodrigo Lopez (Neoscape), Matt Richardson (Neoscape)
When creating computer generated photorealistic imagery, there is a great deal of care taken to materiality and lighting. Having these aspects look as real as possible is essential to the overall quality and final look and feel of the image. In ...Read More

When creating computer generated photorealistic imagery, there is a great deal of care taken to materiality and lighting. Having these aspects look as real as possible is essential to the overall quality and final look and feel of the image. In the past, without the use of real time solutions, several iterations of test renders were needed to dial in the desired settings which was both time consuming for the artist as well as tying up valuable resources while rendering. With the use of an NVIDIA Maximus system along with real time render solutions, such as VRay RT and iray, this process has become greatly accelerated, giving the artist improved flexibility and more responsive interaction when fine-tuning these settings.

  Back
 
Keywords:
Architectural Mapping & Event Visualization, Manufacturing, GTC 2013 - ID S3551
Streaming:
Download:
Astronomy & Astrophysics
Presentation
Media
Gravitational N-body Simulations: How Massive Black Holes Interact with Stellar Systems
Alessandra Mastrobuono Battisti, Roberto Capuzzo-Dolcetta
- Sapienza Univ. of Roma
Astrophysics is a field where super computing is a must to obtain new scientific results. in particular, the study of the interaction among massive black holes and surrounding stars is a hot topic, ...Read More
Astrophysics is a field where super computing is a must to obtain new scientific results. in particular, the study of the interaction among massive black holes and surrounding stars is a hot topic, which requires heavy computations to have good representation of what happens in the inner regions of galaxies. We present the results obtained with our high precisioned N-body code, NBSymple, which exploits the joint power of a multi core CPU system together with the high performance NVIDIA Tesla C1060 GPUs. The code is available at the website: astrowww.phys.uniroma1.it/dolcetta/nbsymple.html  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2010 - ID S102000
Streaming:
Download:
 
GRASSY: Leveraging GPU Texture Units for Asteroseismic Data Analysis
Matt Sinclair
Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We ...Read More

Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We map these pre-computed tables to the GPU''s texture memory. Interpolation then becomes a texture lookup where the hardware automatically performs the interpolation, albeit at very low precision. Our mathematical framework reasons about the impact of this precision and our performance results show 500X speedups. This work generalizes the GPU texture units as computation engines and opens up new problems for GPU acceleration.

  Back
 
Keywords:
Astronomy & Astrophysics, High Performance Computing, GTC 2010 - ID S10044
Download:
 
CU-LSP: GPU-based Spectral Analysis of Unevenly Sampled Data
Richard Townsend
- University of Wisconsin-Madison
Standard FFT algorithms cannot be applied to spectral analysis of unevenly sampled data. Alternative approaches scale as O(N^2), making them an ideal target for harnessing the raw computing power of GPUs. ...Read More
Standard FFT algorithms cannot be applied to spectral analysis of unevenly sampled data. Alternative approaches scale as O(N^2), making them an ideal target for harnessing the raw computing power of GPUs. To this end, I have developed CU-LSP, a CUDA spectral analysis code based on the Lomb-Scargle periodogram. Preliminary benchmarking indicates impressive speed-ups, on the order of 400 relative to a single core of a modern CPU. An initial application of CU-LSP will be the analysis of time-series data from planet-search and asteroseismology satellites.   Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, Signal & Audio Processing, GTC 2010 - ID S102082
Streaming:
Download:
 
Cosmology Powered by GPUs Redux
Dominique Aubert
- Strasbourg University
Cosmological simulations aim at reproducing the physical processes which occur on the largest scales of the Universe since the Big-Bang by means of numerical calculations on supercomputers. ...Read More
Cosmological simulations aim at reproducing the physical processes which occur on the largest scales of the Universe since the Big-Bang by means of numerical calculations on supercomputers. Using CUDA, I have implemented standard cosmological techniques on GPU architecture (PM N-Body solver, Hydrodynamics & moment-based radiative transfer) and designed them to run on supercomputing facilities by means of MPI+CUDA mixed programming. These applications are able to run on 100 or more graphics devices with typical scalar x50 accelerations and with a communication overhead limited to 15%. It allow to explore physical regimes which were out of reach of current simulations.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2010 - ID S102099
Streaming:
Download:
 
Binary Black Holes Simulations using CUDA
Abdul Mroue
- CITA, Univ. Of Toronto
Get the latest information on how to evolve binary black holes simulations on GPUs. ...Read More
Get the latest information on how to evolve binary black holes simulations on GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, Physics Simulation, GTC 2010 - ID S102108
Streaming:
Download:
 
Using GPUs to Track Changes in the Sun
Mark Cheung
- Lockheed Martin Solar & Astrophysics Laboratory
Learn how GPU computing is enabling astrophysicists to study our closest star. NASA''s recently launched Solar Dynamics Observatory is continuously streaming full-disk images of the Sun at visible, UV and EUV wavelengths. ...Read More
Learn how GPU computing is enabling astrophysicists to study our closest star. NASA''s recently launched Solar Dynamics Observatory is continuously streaming full-disk images of the Sun at visible, UV and EUV wavelengths. This presentation will discuss ways that GPU computing is helping scientists cope with the analysis of the immense data volumes as well as in numerical modeling of the Sun.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Computer Vision & Machine Vision, Physics Simulation, GTC 2010 - ID S102178
Streaming:
Download:
 
Multiparticle Simulation
Alice Quillen
A diverse array of science, engineering, and computer graphics applications involve simulations of large numbers of particles. These involve computation of interactions between many particles, potentially mediated by a spatial data structure suc ...Read More

A diverse array of science, engineering, and computer graphics applications involve simulations of large numbers of particles. These involve computation of interactions between many particles, potentially mediated by a spatial data structure such as a grid. Improvements in computation efficiency can be achieved by sorting particles to determine which particles are involved in interactions or undergo close approaches. Nearest neighbor or collision pair groupings can be used to reduce the total number of computation steps by reducing the number of queries for collisions or can speed up and improve accuracy of simulations via a multiple timestep integrator. Identification of nearest neighbor and collision partner groupings is a task that can be efficiently implemented in parallel on the GPU reducing the number of interactions that must be computed. A broad class of problems known as Particle-In-Cell (PIC) code advect particles through cells of a surrounding grid. During this roundtable we will discuss strategies for increasing the efficiency of multiparticle simulations as a general problem as well as challenges for multiparticle simulation in specific settings such as astrophysics, SPH, PIC, and granular flows.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09056
Streaming:
Download:
 
Astrophysical Fluid Simulation Using Adaptive Meshes
Peng Wang (NVIDIA)
Adaptive mesh fluid simulations play a crucial role in many areas of astrophysical research including the formation and explosion of stars, jets from black holes, etc. A parallel adaptive mesh multi-physics fluid code, Enzo, has been widely used ...Read More

Adaptive mesh fluid simulations play a crucial role in many areas of astrophysical research including the formation and explosion of stars, jets from black holes, etc. A parallel adaptive mesh multi-physics fluid code, Enzo, has been widely used in astrophysical community in recent years. In this talk I will describe a CUDA implementation of the finite volume fluid solver used in Enzo. The GPU version shows significant speed-up compared to the CPU version.

  Back
 
Keywords:
Astronomy & Astrophysics, High Performance Computing, GTC 2009 - ID S09062
Streaming:
Download:
 
Diesel-Powered GPU Computing: Enabling a Real-Time Radio Telescope in the Australian Outback
Richard Edgar
The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. The raw data rate is 5 to 20 GiB/sec, precluding offline processing. Since the computing budget for ca ...Read More

The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. The raw data rate is 5 to 20 GiB/sec, precluding offline processing. Since the computing budget for calibration and imaging is 20 TFLOP/sec, a real-time high-performance computer is required on-site. We describe a scalable heterogeneous computing pipeline implementation, exploiting both the high computing density and FLOP-per-watt ratio of modern GPUs. The architecture is highly parallel within and across nodes, with all major processing elements performed by the GPUs. Necessary scatter-gather operations along the pipeline are loosely synchronized and implemented in MPI. Our initial port to NVIDIA hardware shows a typical 10x improvement over the reference CPU implementation, with some portions showing even more substantial gains. The MWA will be a frontier scientific instrument and a demonstrator for planned peta- and exascale facilities.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09065
Streaming:
Download:
 
Computational Fluid Dynamics (CFD) for the GPU
The field of computational fluid dynamics (CFD) has far-reaching applications and displays a consistent need for larger and faster simulations. At EM Photonics we have been studying this field and its computational needs for two years. We have i ...Read More

The field of computational fluid dynamics (CFD) has far-reaching applications and displays a consistent need for larger and faster simulations. At EM Photonics we have been studying this field and its computational needs for two years. We have identified the GPU as a strong performer in the CFD field and as such have implemented solvers that harness the power of GPUs in the application of CFD formulations. We will present some background on these innovations in this summary discussion.

  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Physics Simulation, GTC 2009 - ID S09074
Streaming:
Download:
 
Visualizing the Universe: Raycasting Astrophysical Simulation Data
Ralf Kaehler
We use GPU-assisted raycasting to render large, three-dimensional time-dependent astrophysical AMR data sets at interactive frame rates on standard desktop computers. Our approach allows us to embed unstructured point datasets, like stars or gal ...Read More

We use GPU-assisted raycasting to render large, three-dimensional time-dependent astrophysical AMR data sets at interactive frame rates on standard desktop computers. Our approach allows us to embed unstructured point datasets, like stars or galaxy splats, into the rendering of gaseous interstellar or intergalactic material. The approach supports a combined color-mapping of several input data fields and allows for a very flexible adaption to the special requirements of different types of simulations. Its interactivity makes it a useful tool for data analysis as well as for fast generation of high-quality animations from astrophysical datasets. We will show various resulting animations ranging from large scale structure formation in the early universe, to the evolution of the first stellar object and the cosmological reionization era. Finally, we will give an overview about lessons learned and opportunities for future work.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09112
Download:
 
Applications of Graphics Processing Units to the Binary Black Hole Evolutions
John Silberholz
We apply general-purpose computation on GPUs to obtain sizable speedups over a CPU in post-Newtonian evolutions of a binary black hole system. We discuss effective techniques for optimizing our GPU code on the CUDA architecture and present resul ...Read More

We apply general-purpose computation on GPUs to obtain sizable speedups over a CPU in post-Newtonian evolutions of a binary black hole system. We discuss effective techniques for optimizing our GPU code on the CUDA architecture and present results demonstrating the speedups obtained. We also describe an MPI-based approach for scaling a large number of binary black hole simulations over multiple GPUs. This approach will allow us to complete the largest scientific GPU calculation to date using the NCSA Lincoln cluster.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09402
Download:
 
Directing Experiments in the International Space Station With GPU-Assisted Image Analysis
Peter Lu
We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIAs CUDA. We use our code to analyze images of liquid-gas phase separation in a model colloid-p ...Read More

We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIAs CUDA. We use our code to analyze images of liquid-gas phase separation in a model colloid-polymer system, photographed in the absence of gravity aboard the International Space Station (ISS). Our GPU code is 4000 times faster than simple MATLAB code performing the same calculation on a central processing unit (CPU), 130 times faster than simple C code, and 30 times faster than optimized C++ code using single-instruction, multiple data (SIMD) extensions. The speed increases from these parallel algorithms enable us to analyze images downlinked from the ISS in a rapid fashion and send feedback to astronauts on orbit while the experiments are still being run.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09437
Download:
 
Binary Black Holes using GPUs
Frank Herrmann
We perform ensemble studies of binary black hole inspirals. The binary black hole problem is of great interest for the cosmological community (merger of galaxies with BHs at the center) as well as the gravitational wave community (where the merg ...Read More

We perform ensemble studies of binary black hole inspirals. The binary black hole problem is of great interest for the cosmological community (merger of galaxies with BHs at the center) as well as the gravitational wave community (where the merger of BHs is the most important signal source). The full binary black hole merger problem is computationally very demanding and even with advanced numerical techniques ensemble studies are currently not possible. Using a standard approximate solution to Einstein''s equation (the post-Newtonian equations) one can accurately model the inspiral until shortly before merger when the approximation techniques break down. Utilizing this approximation technique we study the 7-dimensional parameter space of the BH merger problem using a Monte-Carlo approach, which extends very naturally to GPUs.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09441
Download:
 
Numerical Cosmology Powered by GPUs
Dominique Aubert
By definition, cosmology cannot rely on lab experiments to reproduce the phenomenons observed in the sky and test its theories. For this very reason, the use of numerical simulations is widely spread within this community in order to understand ...Read More

By definition, cosmology cannot rely on lab experiments to reproduce the phenomenons observed in the sky and test its theories. For this very reason, the use of numerical simulations is widely spread within this community in order to understand the formation of the astrophysical objects and to put constrains on the physical ingredients that lead to the Universe as it is currently observed. Since 2001, I have been personnaly involved in trying to understand these questions through the intensive use of numerical simulations that reproduce the evolution of the Universe from the Big-Bang to our epoch. During the last two years, I have been investigating the new possibilities offered by GPUs to boost these numerical calculations, mostly using CUDA. At the current stage, three applications benefited from these studies and using 8800 GTX and Tesla C1060 devices, we found that accelerations range from factors 20 to 80 compared to CPU versions : - a cosmological N-Body integrator CUDAPM. It follows the evolution of millions of particles that interact through gravitation in an expanding Universe, modelling the rise of large scale structures. - a non-linear full multigrid solver, for the Poisson equation of modified Newtonian gravity (CUDAMOND). - a cosmological radiative transfer code CUDATON. It models the propagation of ionising radiation and its effect on the gas that filled the Early Universe. This application is multi-gpu and currently runs on 192 devices on the CCRT supercomputing centre. Most of the techniques used in these applications are fairly standard and are not specific to astrophysics and cosmology. Therefore describing my own experience of porting these applications to GPUs as a physicist is likely to benefit to a large public of numerical scientists.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09442
Download:
 
Black Holes in Galactic Nuclei Simulated with Large GPU Clusters in CAS
Rainer Spurzem
- National Astronomical Obersvatories, Chinese Academy of Sciences
Many, if not all galaxies harbour supermassive black holes. If galaxies merge, which is quite common in the process of hierarchical structure formation in the universe, ...Read More
Many, if not all galaxies harbour supermassive black holes. If galaxies merge, which is quite common in the process of hierarchical structure formation in the universe, their black holes sink to the centre of the merger remnant and form a tight binary. Depending on initial conditions and time supermassive black hole binaries are prominent gravitational wave sources, if they ultimately come close together and coalesce. We model such systems as gravitating N-body systems (stars) with two or more massive bodies (black holes), including if necessary relativistic corrections to the classical Newtonian gravitational forces (Kupi et al. 2006, Berentzen et al.2009).  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2010 - ID P10B01
Download:
 
GAMER: GPU-accelerated Adaptive-Mesh-Refinement Code for Astrophysics
His-Yu Schive
- Physics Dept., NTU
 
Keywords:
Astronomy & Astrophysics, GTC Taiwan 2011 - ID GTCT1105
Download:
 
Scalable Frameworks and Algorithms for Terascale Radio Astronomy Images
Christopher Fluke (Swinburne University of Technology - Centre for Astrophysics and Supercomputing)
Learn how the oldest science is using the newest processors to solve a critical problem: how to accomplish traditional image analysis and visualization tasks when the images are terabytes in size? Simple, standard operations such as displaying 2 ...Read More

Learn how the oldest science is using the newest processors to solve a critical problem: how to accomplish traditional image analysis and visualization tasks when the images are terabytes in size? Simple, standard operations such as displaying 2-d slices, evaluating image statistics, and applying histogram equalization become manifestly challenging when images dramatically exceed single-node memory capacity. We will explain how our hybrid CPU-GPU cluster framework - which can volume render a 200GB image at >50fps! - will support traditional radio astronomy tasks for the colossal images that the Square Kilometre Array and its precursor, the Australian SKA Pathfinder, will generate.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2022
Streaming:
Download:
 
GPU Acceleration of Dense Stellar Clusters Simulation
Bharath Pattabiraman (Northwestern University), Stefan Umbreit (Northwestern University)
Computing the interactions between stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. This paper presents the parallelization of a Monte Carlo algorithm for simulating stellar cluster evolutio ...Read More

Computing the interactions between stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. This paper presents the parallelization of a Monte Carlo algorithm for simulating stellar cluster evolution using programmable Graphics Processing Units. The kernels of this algorithm exhibit high levels of data dependent decision making and unavoidable non-contiguous memory accesses. However, we adopt various parallelization strategies and utilize the high computing power of the GPU to obtain substantial near-linear speedups which cannot be easily achieved on a CPU-based system. This acceleration allows to explore physical regimes which were out of reach of current simulations.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2087
Streaming:
Download:
 
Signal Processing on GPUs for Radio Telescopes
John Romein (ASTRON)
In this talk, we will present GPU implementations of four highly compute-intensive algorithms used by radio telescopes. ...Read More

In this talk, we will present GPU implementations of four highly compute-intensive algorithms used by radio telescopes.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2124
Streaming:
Download:
 
GPUs for Radio Imaging
Vamsi Krishna Veligatla (University Of Groningen)
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to pr ...Read More

With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process the large data-sets in a reasonable time. In this session we describe how we have used the computing power of GPU's to improve the performance of the standard radio imaging techniques as well as how this computational power is useful for creating a new generation of Radio Imaging Algorithms.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2187
Streaming:
Download:
 
Accelerating Radio Astronomy Cross-Correlation Beyond 1 Tflops Using Fermi
Michael Clark (NVIDIA)
Radio astronomy is a signal processing application that requires extreme supercomputing. While today's radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase to 1 Exaflops. The most compute ...Read More

Radio astronomy is a signal processing application that requires extreme supercomputing. While today's radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase to 1 Exaflops. The most compute intensive part of this problem is the so-called cross-correlation algorithm, which is a linear-algebra problem. In this session we demonstrate that the Fermi architecture is ideally suited to this problem, and through exploiting the Fermi memory hierarchy it is possible to achieve close to 80% of peak performance in a real application.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2347
Streaming:
Download:
 
Adaptive Beam-forming for Radio Astronomy on GPUS
Vamsi Krishna Veligatla (University Of Groningen)
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process ...Read More
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process the large data-sets in a reasonable time. In this session we describe how we have used the computing power of GPU's to improve the performance of the standard radio imaging techniques as well as how this computational power is useful for creating a new generation of Radio Imaging Algorithms.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2191
Download:
 
Accelerating Real-Time Processing of the ATST Adaptive Optics System
Vivek Venugopal (United Technologies Research Center)
The real-time processing of the four meter Advanced Technology Solar Telescope (ATST) adaptive optics (AO) system with approximately 1750 sub-apertures and 1900 actuators requires massive parallel processing to complete the task. The parallel process ...Read More
The real-time processing of the four meter Advanced Technology Solar Telescope (ATST) adaptive optics (AO) system with approximately 1750 sub-apertures and 1900 actuators requires massive parallel processing to complete the task. The parallel processing is harnessed with the addition of hardware accelerators such as Graphics Processing Unit (GPU). We investigate the hybrid data processing architecture of the Shack-Hartmann correlation and wavefront reconstruction using FPGAs and GPUs. The ATST AO algorithm is implemented, benchmarked on the FPGA-GPU system and compared with the existing legacy Digital Signal Processing (DSP) based hardware system.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2446
Download:
 
Cosmological Calculations on the GPU
Deborah Bard (SLAC National Accelerator Laboratory)
Cosmological measurements often involve the calculation of non-trivial quantities over increasingly large datasets. The next generation of survey telescopes will yield information for billions of galaxies. The scale of the datasets, and the type of c ...Read More
Cosmological measurements often involve the calculation of non-trivial quantities over increasingly large datasets. The next generation of survey telescopes will yield information for billions of galaxies. The scale of the datasets, and the type of calculations involved, are ideal models for use of the GPU. We present two cosmological measurements, and describe the implementation and improvements found with the GPU.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2509
Download:
 
Fast Cross-Matching of Astronomical Catalogs on GPUs
Matthias Lee (Johns Hopkins University)
We present a method of cross-matching objects of large astronomical catalogs, over 150 million objects, in under 4 minutes. We utilize up to 6 NVIDIA c2050 and have achieved an over 40x speedup versus conventional methods. ...Read More
We present a method of cross-matching objects of large astronomical catalogs, over 150 million objects, in under 4 minutes. We utilize up to 6 NVIDIA c2050 and have achieved an over 40x speedup versus conventional methods.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2524
Download:
 
"Big Data" Astronomical Data Analysis and Visualization
Amr Hassan (Swinburne University of Technology)
I will present a high-performance; graphics processing unit (GPU)-based framework for the efficient analysis and visualization of ``big data'' astronomical data cubes. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: volume ...Read More

I will present a high-performance; graphics processing unit (GPU)-based framework for the efficient analysis and visualization of ``big data'' astronomical data cubes. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: volume rendering at 10 fps; computation of basic statistics in 1.7 s; and evaluation of the median in 45s. The framework is one of the first solutions to the image analysis and visualization requirements of next-generation telescopes, including the forthcoming SKA pathfinder telescopes.

  Back
 
Keywords:
Astronomy & Astrophysics, Supercomputing 2012 - ID SB001
Download:
 
Parallel Simulation of the Galaxy with Dark Matter using GPUs/CPUs
Pawel Czarnul (Gdansk University of Technology)
The poster presents parallel simulation of the galaxy with dark matter. First, one of several models of dark matter distribution is assumed and based on the known laws, simulation of the galaxy proceeds in successive time steps. Computations have bee ...Read More
The poster presents parallel simulation of the galaxy with dark matter. First, one of several models of dark matter distribution is assumed and based on the known laws, simulation of the galaxy proceeds in successive time steps. Computations have been parallelized using both CPUs and GPUs and execution times are presented for particular devices for the aforementioned application. Furthermore, visualization of the simulation is provided which gives the view of the universe from a desired angle.  Back
 
Keywords:
Astronomy & Astrophysics, Scientific Visualization, GTC 2013 - ID P3141
Download:
 
Acceleration of a 3D WENO Scheme for Large-Scale Cosmological Simulations on GPU
Long Wang (Supercomputing Center, Computer Network Information Center, Chinese Academy of Sciences)
We present our implementation of a 3D 5th order finite-difference WENO scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution struc ...Read More
We present our implementation of a 3D 5th order finite-difference WENO scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain cubically. Then on each process, we ported the WENO computation to GPU. To avoid the memory limitation of GPUs, we performed a series of optimizations. Our tests on Fermi and Kepler GPU indicate that the GPU version achieve a 12~19 speedup and the computation part is about 19~36 times faster than the Serial Fortran code. At last, we discussed some future work.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, GTC 2013 - ID P3157
Download:
 
GPU-enabled Precision Measurements of the Structure of the Universe
Deborah Bard (SLAC National Accelerator Laboratory)
Future astronomical surveys will characterize tens of billions of galaxies. Calculating cosmological observables, such as correlation functions, over such vast datasets poses a significant computational challenge. Such calculations are ideally suited ...Read More
Future astronomical surveys will characterize tens of billions of galaxies. Calculating cosmological observables, such as correlation functions, over such vast datasets poses a significant computational challenge. Such calculations are ideally suited to parallelization. This poster describes the implementation of the full two-point correlation function on the GPU, and demonstrates the improvement in accuracy compared to current fast approximation methods. We take advantage of scaling capabilities of GPUS by showing how systematic errors can only be fully explored using the compute power of many GPUs.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2013 - ID P3164
Download:
 
Simulating Black Holes with CUDA
Adam Lewis (Canadian Institute for Theoretical Astrophysics (CITA))
This decade will see the first detections of gravitational waves: ripples in spacetime produced strongly by collisions of dense objects like black holes. It will then be possible to study such events through careful comparison of their gravitational ...Read More
This decade will see the first detections of gravitational waves: ripples in spacetime produced strongly by collisions of dense objects like black holes. It will then be possible to study such events through careful comparison of their gravitational radiation against predictions generated through simulations. These simulations are computationally very expensive, requiring 10,000's of FLOPS per grid point per time step. Using NVIDIA's CUDA framework, we have developed techniques to automatically port our black hole code to GPUs. We have also manually optimized certain key routines, which have sped up by 10-50 times in response.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID P3230
Download:
 
Black Holes and Star Clusters in Galactic Nuclei simulated with more than 100k GPU cores
Rainer Spurzem (National Astronomical Observatories, Chinese Academy of Sciences)
100k GPU core benchmark simulations of galactic nuclei and star clusters with high precision direct N-body; on the path to million cores and Exascale... ...Read More
100k GPU core benchmark simulations of galactic nuclei and star clusters with high precision direct N-body; on the path to million cores and Exascale...  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2013 - ID P3242
Download:
 
GPU Accelerated Simulations and Real-time Control of the E-ELT Adaptive Optics Systems
Damien Gratadour (LESIA - Observatoire de Paris)
Adaptive Optics (AO) is an instrumental technique for the correction of dynamically evolving aberrations in optical systems, used on astronomical telescopes to compensate, in real-time, for atmospheric turbulence. Our team has developped a simulation ...Read More
Adaptive Optics (AO) is an instrumental technique for the correction of dynamically evolving aberrations in optical systems, used on astronomical telescopes to compensate, in real-time, for atmospheric turbulence. Our team has developped a simulation code based on YoGA, an original binding between Yorick, an interpreted programming language and CUDA. Using this code, speedups of 10x are obtained as compared to currently available CPU codes. We will present the various features of the code and its performance for various system dimensioning and GPUs. Additionally, we will present profiles of a GPU-based AO real-time controller simulator demonstrating performance compatible with real-time operations.   Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID P3213
Download:
 
The Telescope Array Fluorescence Detector Simulation on GPUs
Tareq AbuZayyad (University of Utah)
The Telescope Array Cosmic Rays Detector located in the Western Utah Desert is used for the observation of ultra-high energy cosmic rays. The simulation of a fluorescence detector response to cosmic rays initiated air showers presents many oppor ...Read More

The Telescope Array Cosmic Rays Detector located in the Western Utah Desert is used for the observation of ultra-high energy cosmic rays. The simulation of a fluorescence detector response to cosmic rays initiated air showers presents many opportunities for parallelization. In this presentation we report on the Monte Carlo program used for the simulation of the Telescope Array fluorescence detector located at the Middle Drum site. The program makes extensive use of GPU acceleration to achieve a 50x speedup compared to running on a single CPU core. All of the physics simulation from shower development, light production and propagation with atmospheric attenuation, as well as, the realistic detector optics and electronics simulations are done on the GPU. A detailed description of the code implementation is given, and results on the accuracy and performance of the simulation are presented as well.

  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2013 - ID S3189
Streaming:
Download:
 
Powering Real-time Radio Astronomy Signal Processing with GPUs
Harshavardhan Reddy Suda (GMRT Observatory, National Centre for Radio Astrophysics, TIFR, Pune, India), Pradeep Kumar Gupta (NVIDIA)
The goal of this session is to demonstrate the power of GPUs in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. Modern radio astronomy telescop ...Read More

The goal of this session is to demonstrate the power of GPUs in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. Modern radio astronomy telescopes are multiple antenna instruments where the wideband data from each antenna needs to be processed in real-time to implement digital receiver systems such as correlators and beamformers. We will demonstrate how such compute and data I/O intensive algorithms can be implemented on a distributed GPGPU system, with a fully real-time realisation. Hybrid computing techniques such as CUDA on GPU, OpenMP & MPI to synchronise the distributed host machines and handle the large i/o between them are key elements of such designs. Optimised implementation of signal processing algorithms such as FFT and MAC on GPUs, as well as the use of streams to optimise computing and I/O on the GPU, will be addressed in detail. All these concpets will be illustrated with the example of the prototype GPGPU correlator and beamformer that has been developed by us for the GMRT which is a 30-antenna radio telescope with 400 MHz BW dual polarised signals from each antenna, coming in at a sustained input data rate of 24 GBytes/sec.

  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, GTC 2013 - ID S3225
Streaming:
Download:
 
Signal Processing on GPUs for Radio Telescopes
John Romein (ASTRON Netherlands Institute for Radio Astronomy)
This talk will present research on accelerator-based computing for radio telescopes. Showing GPU implementations of a dozen of (signal-processing) algorithms used by radio telescopes, e.g., filtering, correlating, beam forming, dedispersion, and ...Read More

This talk will present research on accelerator-based computing for radio telescopes. Showing GPU implementations of a dozen of (signal-processing) algorithms used by radio telescopes, e.g., filtering, correlating, beam forming, dedispersion, and peak detection. Glued together, these computational kernels form several processing pipelines. Each pipeline implements an observation mode, as used by the LOFAR radio telescope. Implemented pipelines create sky images, to search for pulsars, to observe known pulsars, and to detect ultra-high-energy particles - first on a Blue Gene/P, and ported these to GPUs. This talk will briefly explain these algorithms and processing pipelines, show performance results, multi-GPU scaling results, and impact on energy efficiency. The research is relevant to current radio telescopes like LOFAR, and the future SKA telescope, that needs exascale computing power.

  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, GTC 2013 - ID S2124
Streaming:
Download:
 
ENZO Hydrodynamics and Magnetohydrodynamics Solvers on GPU
Peng Wang (NVIDIA)
Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in ast ...Read More

Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in astrophysics. We have ported the PPM Hydrodynamics and Magnetohydrodynamics solvers to GPU and integrated the GPU solvers fully into the AMR framework. This talk will describe the porting strategy and performance results.

  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID S3401
Streaming:
Download:
 
Accelerating Radio Astronomy Cross-correlation Using the Kepler Architecture
Ben Barsdell (Harvard University)
Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflop ...Read More

Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflops regime, driven by the Hydrogen Epoch of Reionization Array (HERA) and the Square Kilometer Array (SKA). The most compute intensive part of this problem is the so-called cross-correlation algorithm, which can be recast as a linear-algebra problem similar in spirit to DGEMM. In this session we describe the cross-correlation engine that powers the pathfinder LEDA radio telescope and has been (re)optimized for the Kepler GK110 architecture to achieve over 2.5 Tflops in sustained performance. This level of efficiency is critical to meeting strict power and space constraints imposed by the instrument''s remote location.

  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2013 - ID S3497
Streaming:
Download:
 
Cosmology on the GPU
Claudio Gheller (ETH CSCS)
Numerical simulations represent one of the most effective tools to study and to solve astrophysical problems. Thanks to the enormous technological progress in the recent years, the available supercomputers allow now to study the details of compl ...Read More

Numerical simulations represent one of the most effective tools to study and to solve astrophysical problems. Thanks to the enormous technological progress in the recent years, the available supercomputers allow now to study the details of complex processes, like galaxy formation or the evolution of the large scale structure of the universe. Sophisticated numerical codes can exploit the most advanced HPC architectures to simulate such phenomena and process and visualize their results. Enzo, Ramses and Splotch are prime examples of such codes. Work is ongoing to enable such codes to GPUs using the CUDA and OpenACC programming models. The accomplished refactoring work together with recent tests and results are presented.

  Back
 
Keywords:
Astronomy & Astrophysics, Supercomputing & HPC, GTC 2013 - ID S3555
Streaming:
Download:
 
Follow the Light: Plasma Physics on 18,000 GPUs
Richard Pausch (Helmholtz-Zentrum Dresden - Rossendorf), Guido Juckeland (ZIH, Technical University Dresden)
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted ...Read More
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted by individual electrons in a large-scale, turbulent plasma, providing unique insight into the relation between the plasma dynamics and observable radiation spectra.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4139
Streaming:
Download:
 
Real-Time Imaging in Radio-Astronomy: A Fully GPU-Based Imager
Sanjay Bhatnagar (National Radio Astronomy Observatory), Pradeep Kumar Gupta (NVIDIA)
We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near ...Read More

We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near-real time. Imaging software running on conventional computers currently take many orders of magnitude longer for imaging. In this presentation, we will briefly describe the algorithms and describe in more detail their adaptation for GPUs in particular and for heterogeneous computing in general. We will discuss the resulting run-time performance on the GPU using deal data from existing radio telescopes. Test with our current implementation show a speed-up of upto 100x compared to CPU implementation in the critical parts of processing enabling us to reduce the memory footprint by replacing compute-and-cache with on-demand computing on the GPU. For scientific use cases requiring high resolution high sensitivity imaging such a GPU-based imager represents an enabler technology.

  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, GTC 2014 - ID S4223
Streaming:
Download:
 
High Resolution Astrophysical Fluid Dynamics Simulations on a GPU Cluster
Pierre Kestener (CEA)
A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We ...Read More
A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We will first report on technical expertise gained in developing code Ramses-GPU designed for efficient use of large cluster of GPUs in solving MHD flows. We will illustrate how challenging state-of-the-art highly resolved simulations requiring hundreds of GPUs can provide new insights into real case applications: (1) the study of the Magneto-Rotational Instability and (2) high Mach number MHD turbulent flows.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Supercomputing & HPC, GTC 2014 - ID S4274
Streaming:
Download:
 
Conquering the Titan Supercomputer: A Star-by-Star Simulation of the Milky Way Galaxy
Evghenii Gaburov (SURFsara), Jeroen Bedorf (Leiden Observatory)
In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. T ...Read More
In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. The audience will learn what it takes to parallelize an advanced hierarchical GPU tree-code to efficiently run on the Titan supercomputer. A gravitational N-body problem is by definition an all-to-all problem and it is of utmost importance for scalability to hide data communication behind computations. This turned out to be a major challenge on the Titan supercomputer because Bonsai's GPU kernels are ~3x faster on Kepler than on Fermi, which reduced compute time and as a result hampered scalability. We were able to solve this by redesigning the communication strategy by taking full advantage of each of the 16- CPU cores while the GPUs were busy computing gravitational forces. This allowed Bonsai to scale to more than 8192 GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4347
Streaming:
Download:
 
Driving the Next Generation of Extremely Large Telescopes Using Adaptive Optics with GPUs
Damien Gratadour (LESIA - Observatoire de Paris)
The European Southern Observatory is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the Universe ever built, with a first light foreseen in 2022. The E-E ...Read More
The European Southern Observatory is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the Universe ever built, with a first light foreseen in 2022. The E-ELT will be the first telescope that will entirely depend, for routine operations, on adaptive optics (AO), an instrumental technique for the correction of dynamically evolving aberrations in an optical system, used on astronomical telescopes to compensate, in real-time, for the effect of atmospheric turbulence. In this session, we will show how GPUs can provide the throughput required to both simulate at high framerate and drive in real-time these AO systems that provide tens of thousands of degrees of freedom activated several hundreds times per second.   Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Supercomputing & HPC, GTC 2014 - ID S4357
Streaming:
 
RAMSES on the GPU: An OpenACC-Based Approach
Claudio Gheller (ETH-CSCS)
We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e. ...Read More
We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e.g. star formation, galaxy dynamics, large scale structure of the universe) treating at the same time various components (dark energy, dark matter, baryonic matter, photons) and including a variety of physical processes (gravity, magneto-hydrodynamics, chemical reactions, star formation, supernova and AGN feedback, etc.). It is implemented in Fortran 90 and adopts the OpenACC paradigm to offload some the most computationally demanding algorithms to the GPU. Two different strategies have been pursued for code refactoring, in order to explore complementary solutions and select the most effective approach. The resulting algorithms are presented together with the results of tests, benchmarks and scientific use cases.  Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4365
Streaming:
Download:
 
Black Holes on the GPU: Experiences with Accelerated Relativity
Adam Lewis (University of Toronto/ CITA)
New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these ban ...Read More
New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these banks requires black hole mergers of many different masses, spins, and orbital eccentricities to be simulated. This is not yet feasible, since even a single simulation may take several months. GPU acceleration offers a theoretical speedup of 50X, but until now has been too laborious to attempt. This is no longer the case: using a combination of hand-coding in CUDA, calls to CUBLAS and cuSPARSE, and our own automatic porting routine "CodeWriter," we have successfully accelerated the C++-based "Spectral Einstein Code". I will discuss our porting strategy, the challenges we encountered, and the new science made possible by the GPU. This talk should be of particular interest to scientists working on GPU ports of their own codes.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Developer - Programming Languages, Supercomputing & HPC, GTC 2014 - ID S4423
Streaming:
 
COBALT: Creating a High-Throughput, Real-Time Production System Using CUDA, MPI and OpenMP
Wouter Klijn (ASTRON), Jan David Mol (ASTRON)
We present our experiences in designing, building and deploying a massively parallel processing system for the LOFAR radio telescope using off-the-shelf hardware and software. After numerous hurdles, we created a high-throughput system, based on CUDA ...Read More
We present our experiences in designing, building and deploying a massively parallel processing system for the LOFAR radio telescope using off-the-shelf hardware and software. After numerous hurdles, we created a high-throughput system, based on CUDA, MPI and OpenMP running on multi-GPU, multi-socket servers and InfiniBand. These techniques have established niches. However, due to conflicting memory models, incompatible requirements and abstractions, the otherwise orthogonal techniques do not cooperate well within the same application. Using the project's time line as a guide we will answer the following questions: (1)What problems appear when combining these techniques? (2) How did we adjust both the hardware and the software to meet our requirements? (3) How did we robustly develop and deploy to both development boxes and a production cluster? And, most importantly, (4)how does the system perform?   Back
 
Keywords:
Astronomy & Astrophysics, Developer - Programming Languages, Signal & Audio Processing, Supercomputing & HPC, GTC 2014 - ID S4441
Streaming:
Download:
 
Fire and Ice: How Temperature Affects GPU Performance
Danny Price (Harvard-Smithsonian Center for Astrophysics)
Is it worth cooling your GPUs, or should you run them hot? In this session, we discuss how operating temperature affects the computational performance of GPUs. Temperature-dependent leakage current effects contribute significantly to power dissipatio ...Read More
Is it worth cooling your GPUs, or should you run them hot? In this session, we discuss how operating temperature affects the computational performance of GPUs. Temperature-dependent leakage current effects contribute significantly to power dissipation in nanometer-scale circuits; within GPUs this corresponds to decreased performance per watt. We use the CUDA-based xGPU code for radio astronomy to benchmark Fermi and Kepler GPUs while controlling the GPU die temperature, voltage, and clock speed. We report on trends and relate these measurements to physical leakage current mechanisms.  Back
 
Keywords:
Astronomy & Astrophysics, Clusters & GPU Management, GTC 2014 - ID S4484
Streaming:
Download:
 
Petascale Cross-Correlation: Extreme Signal-Processing Meets HPC
Ben Barsdell (Harvard University)
How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from mon ...Read More
How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from monolithic single-dish telescopes to massive arrays of smaller antennas. In this session we will describe how general-purpose HPC installations can be used to achieve scaling of a cross-correlation pipeline to petascale with all the flexibility of a purely-software implementation. Optimisations we will discuss include tuning of the GPU cross-correlation kernel, maximising concurrency between compute and network operations, and minimising bandwidth bottlenecks in a streaming application. GPUs are already powering the world's biggest radio telescope arrays, and this work paves the way for entirely off-the-shelf correlators for the future exascale-generation of instruments.  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, Supercomputing & HPC, GTC 2014 - ID S4511
Streaming:
Download:
 
Real-Time RFI Rejection Techniques for the GMRT Using GPUs
Rohini Joshi (Drexel University)
Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, ...Read More

Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, etc. Seen in the form of spikes and bursts in raw voltage data, RFI is statistically seen as outliers in a Gaussian distribution. We present an approach to tackle the problem of RFI, in real-time, using a robust scale estimator such as the Median Absolute Deviation (MAD). Given the large data rate from each of the 30 antennas, sampled at 16 ns, it is necessary for the filter to work well within real-time limits. To accomplish this, the algorithm has been ported to the GPUs to work within the GMRT pipeline. Presently, the RFI rejection pipeline runs in real-time for 0.3-0.7 sec long data chunks. The GMRT will soon be upgraded to work at 10 times the current data rate. We are now working on improving the algorithm further so as to have the RFI rejection pipeline ready for the upgraded GMRT.

  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, Signal & Audio Processing, GTC 2014 - ID S4538
Streaming:
Download:
 
GPUs In High Energy Physics: Reconstruction of Particle Trajectories
Akitaka Ariga (University of Bern, Switzerland)
The history of particle physics is a history of particle detectors, namely developments of new detectors and data analysis tools. For recent experiments, the size of data coming from particle detectors is huge and therefore a reconstruction of partic ...Read More
The history of particle physics is a history of particle detectors, namely developments of new detectors and data analysis tools. For recent experiments, the size of data coming from particle detectors is huge and therefore a reconstruction of particle trajectories using GPU is worth implementing. LHEP Bern pioneered the use of GPUs in this field. Here, we show some applications of GPUs on the reconstruction of particle trajectories. This work is partially related to the talk S4372 - Does Antimatter Fall On The Earth? Measurement Of Antimatter Annihilation with GPU, and more general for high energy physics.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4228
Download:
 
Cosmology With the 3-Point Correlation Function on the GPU
Deborah Bard (SLAC National Accelerator Laboratory)
Information about the period immediately after the Big Bang is lost in most metrics used to study the large-scale structure of the Universe. However, the cosmological three-point correlation function (3ptCF) applied to galaxy positions can provide in ...Read More
Information about the period immediately after the Big Bang is lost in most metrics used to study the large-scale structure of the Universe. However, the cosmological three-point correlation function (3ptCF) applied to galaxy positions can provide information about this early time. The 3ptCF scales with the cube of the number of galaxies. Approximation functions can speed this, but can introduce systematic errors that will be unacceptable in the coming era of large astronomical datasets. Previous work (Bard et al., 2013) has established that the full calculation of the 2-point correlation function on the GPU reduces computation time by up to a factor of 140 compared to the CPU. In this work we consider the implementation of the full 3ptCF on the GPU, which presents very different challenges both cosmologically and computationally.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4236
Download:
 
Interactive Visualization of Astrophysical Data
Frederick Bogert (University of California, Santa Cruz)
The general purpose of the project is to provide a volume rendering suite that utilizes graphics cards to interactively visualize large astrophysical data sets. We are working with open source packages PyCuda and PyOpenGL to build inter operations be ...Read More
The general purpose of the project is to provide a volume rendering suite that utilizes graphics cards to interactively visualize large astrophysical data sets. We are working with open source packages PyCuda and PyOpenGL to build inter operations between CUDA and the yt-project, which has been optimized to handle various sets of astrophysical data. The result is a robust tool that provides researches with an interactive visual of their data.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4201
Download:
 
Using GPUs to Analyze Solar Spectral Observations and Synthetized 3D Radiative-MHD Simulations
Juan Martinez Sykra (Bay Area Environmental Research Institute)
Solar-physics observations and 3D radiative-MHD simulations of the Sun provide an enormous amount of data that makes difficult to analyze. NASA's recently launched Interface Region Imaging Spectrograph (IRIS) provides a very large 4D dataset (2D sp ...Read More
Solar-physics observations and 3D radiative-MHD simulations of the Sun provide an enormous amount of data that makes difficult to analyze. NASA's recently launched Interface Region Imaging Spectrograph (IRIS) provides a very large 4D dataset (2D space, time and spectra) and enable us to study with great detail the dynamics of the Sun of one of the most intriguing layers of the Sun, the chromosphere. Moreover, state-of-the-art 3D radiative MHD simulations are needed to interpret these observations. This poster will describe different tools using GPU computing which helps scientists analyze the immense observational and numerical modeling data volumes of the Sun, as well as we can compare of both of them creating synthetic observables from the simulations using GPUs.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4138
Download:
 
Streaming Multiframe Deconvolution of Atmospherically Distorted Images on GPUs
Matthias Lee (Johns Hopkins University)
We present an easily extensible, open source, GPU-accelerated tool for testing, comparing and experimenting with multiple approaches to multiframe deconvolution. Currently we provide options for a Gaussian, Richardson-Lucy and damped Richardson-Lucy ...Read More
We present an easily extensible, open source, GPU-accelerated tool for testing, comparing and experimenting with multiple approaches to multiframe deconvolution. Currently we provide options for a Gaussian, Richardson-Lucy and damped Richardson-Lucy approach as well as Wavelet filtering and Robust Statistics weighting. Our tool yields an over 20x speedup over the CPU implementation, allowing for interactive experimentation of parameters.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4285
Download:
 
News From Black Holes in Galactic Nuclei Simulated With Large GPU Clusters
Rainer Spurzem (National Astronomical Observatories, Chinese Academy of Sciences)
We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware and in one case a first preliminary test wit ...Read More
We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware and in one case a first preliminary test with Intel PHI. Our clusters are directly linked under the Chinese Academy of Sciences special GPU cluster program in the cooperation of ICCS (International Center for Computational Science). We reach about half of the peak Kepler K20 GPU performance for our production ready phiGPU code, in a real application scenario with individual hierarchically block time-steps with the high (4th, 6th and 8th) order Hermite integration schemes and a real core-halo density structure of the modeled stellar systems. The code is mainly used to simulate star clusters and galactic nuclei with supermassive black holes, in which correlations between distant particles (two body relaxation) cannot be neglected.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4270
Download:
 
Lunar-Forming Giant Impact Model Utilizing GPUs
Travis Salzillo (Tarleton State University)
Recent giant impact models focus on producing a circumplanetary disk of the proper composition around Earth and defer to earlier works for the accretion of this disk into the Moon. The discontinuity between creating the circumplanetary disk and accre ...Read More
Recent giant impact models focus on producing a circumplanetary disk of the proper composition around Earth and defer to earlier works for the accretion of this disk into the Moon. The discontinuity between creating the circumplanetary disk and accretion of the Moon is unnatural and lacks simplicity. Here we return to first principles and produce a highly parallelizable model that readily produces stable Earth-Moon systems from a single, continuous simulation. The resultant systems possess an iron-deficient, heterogeneously mixed Moon and accurate axial tilt of the Earth. This project was made financially feasible by the utilization of modern GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4139
Download:
 
Large-Scale Global MHD Simulation for Solar Wind-Magnetosphere Interaction on TSUBAME 2.5
Un-Hong Wong (Tokyo Institute of Technology)
Investigations of the space plasma environment are necessary to space exploration. MHD simulation has been a powerful tool to modeling space plasmas, but it is computationally expensive. In this poster, large-scale global MHD simulations of solar win ...Read More
Investigations of the space plasma environment are necessary to space exploration. MHD simulation has been a powerful tool to modeling space plasmas, but it is computationally expensive. In this poster, large-scale global MHD simulations of solar wind interacting with the planet's magnetosphere are presented. Simulation results of a 1350 x 900 x 900 domain of the space plasma environment around a planet was produced by our GPU accelerated MHD simulation code, running on the GPU-rich supercomputer TSUBAME 2.5 using 324 K20x (Kepler) GPUs. Performance test shows 7.8 TFOPS of our simulation code. Simulation results of solar wind interacting with the Earth's magnetic field and dipole magnetic fields with non-vertical magnetic pole are presented.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4125
Download:
 
Acceleration of the Longwave Rapid RadiativeTransfer Module Using GPGPU
Pragati Dharmale (SNHU, NH)
This poster presents Weather Research and Forecast (WRF) model, a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, ...Read More
This poster presents Weather Research and Forecast (WRF) model, a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, one is the Long-Wave Rapid Radiative Transfer Model (RRTM). Even with the advent of large-scale parallelism in weather models, much of the performance increase has come from increasing processor speed rather than increased parallelism. We present an alternative method of scaling model performance.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID P5144
Download:
 
Galaxy Classification with Deep Convolutional Neural Networks
Honghui Shi (University of Illinois, Urbana-Champaign)
There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of information ...Read More
There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of information is not meant for humans to process, and CPUs and traditional algorithm both meet their bottleneck in processing. With the help of recent deep learning technologies and powerful implementations with NVIDIA's GPUs, the developed models can competitively accurately classify galaxies.  Back
 
Keywords:
Astronomy & Astrophysics, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID P5176
Download:
 
Time-Efficient Analysis of Simulations of the Sun's Magnetic Field
Christopher Scarborough (Lockheed Martin Space Sciences Corporation)
Dynamics in the solar atmosphere, including solar flares, coronal mass ejections, micro-flares and different types of jets, are powered by the evolution of the sun's intense magnetic field. 3D Radiative Magnetohydrodnamics (MHD) computer simulations ...Read More
Dynamics in the solar atmosphere, including solar flares, coronal mass ejections, micro-flares and different types of jets, are powered by the evolution of the sun's intense magnetic field. 3D Radiative Magnetohydrodnamics (MHD) computer simulations have furthered our understanding of the processes involved. Detailed analysis of this evolution entails tracing magnetic field lines, an operation which is not time-efficient on a single processor. By utilizing a GPU to trace lines in parallel, conducting such analysis is made feasible.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID P5196
Download:
 
Unified Representation for Collaborative Visualization of Planetary Terrain Data
Daniel Herman (DigitalFish, Inc.)
Current and future NASA planetary missions are generating ever-increasing volumes of terrain data from orbital and surface-based assets at vastly different resolutions. We have applied an alternative technology, subdivision surfaces, coupled with a n ...Read More
Current and future NASA planetary missions are generating ever-increasing volumes of terrain data from orbital and surface-based assets at vastly different resolutions. We have applied an alternative technology, subdivision surfaces, coupled with a novel volumetric reconstruction process, to help manage and present high-fidelity mesh representations of the disparate range of terrain data collected by rovers and satellites. Applications include terrain data visualization, autonomous navigation, and other localization and mapping problems.  Back
 
Keywords:
Astronomy & Astrophysics, Visualization - In-Situ & Scientific, GTC 2015 - ID P5307
Download:
 
Astrophysical Gamma-Ray Source Imaging with NASA's Swift Telescope Using Nvidia GPUs
Tim McMahon (Langston University)
We have implemented an extensive package of detector modeling and image reconstruction algorithms for Swift's BAT telescope on a Tesla K20 processor. Individual reconstructed images with 2 million pixels are reprocessed with a compute intensive nois ...Read More
We have implemented an extensive package of detector modeling and image reconstruction algorithms for Swift's BAT telescope on a Tesla K20 processor. Individual reconstructed images with 2 million pixels are reprocessed with a compute intensive noise reduction algorithm which has been modified to run under CUDA 6.5. Methods employed to port existing code to a GPU implementation with a minimum of code development are presented.  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2015 - ID P5316
Download:
 
Exact and Approximate Methods in Stellar Dynamics
Yohai Meiron (Peking University)
Stellar systems come in many shapes and sizes. We present two new GPU-accelerated N-body codes focusing on two kind of systems: dwarf spheroidal galaxies and globular clusters. ETICS is based on series expansion of the Poisson equation and is ideal f ...Read More
Stellar systems come in many shapes and sizes. We present two new GPU-accelerated N-body codes focusing on two kind of systems: dwarf spheroidal galaxies and globular clusters. ETICS is based on series expansion of the Poisson equation and is ideal for diffuse objects such as dwarf galaxies. Since in globular clusters close stellar encounters and binaries play very important roles in the dynamics, a much more accurate integrator is needed. NBODY6++ is a direct-summation N-body code which can provide this kind of accuracy.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID P5323
Download:
 
Maximum Likelihood Estimation on GPUs: Leveraging Dynamic Parallelism
Michele Mastropietro (Italian National Institute for Astrophysics (INAF), Rome)
The estimation of the Maximum Likelihood (MLE) is the most robust algorithm used in gamma-ray astronomy but, particularly if used in conduction with "unbinned" analysis, it uses a huge amount of computing resources. Typically, the estimatio ...Read More
The estimation of the Maximum Likelihood (MLE) is the most robust algorithm used in gamma-ray astronomy but, particularly if used in conduction with "unbinned" analysis, it uses a huge amount of computing resources. Typically, the estimation of the maximum is left to a single-thread minimizer, like MINUIT, running on a CPU while providing a callback function that may estimate the likelihood on the GPU. We propose an alternative to the MINUIT package, that leverages Dynamic Parallelism and runs entirely on GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID P5327
Download:
 
HTC for Gamma-Ray Astronomy on Kayla and Low-Power Platforms
Alberto Madonna (Italian National Institute for Astrophysics (INAF), Rome)
Detectors for Gamma-ray Astronomy are the prototypes for distributed experiments. Single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them ...Read More
Detectors for Gamma-ray Astronomy are the prototypes for distributed experiments. Single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them to the central data acquisition provides a key advantage. We aim at developing and testing algorithms and techniques to implement such kind of local data sparsification at detector level. To reach this goal, we leveraged, and compare, the parallel capabilities of Kayla and Jetson TK1.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, GTC 2015 - ID P5328
Download:
 
Shooting for the Stars with GPUs
Hatem Ltaief (KAUST), Damien Gratadour (Université Paris Diderot & LESIA, Observatoire de Paris)
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the ...Read More
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the universe ever built. MOA is the most complex adaptive optics concept proposed for the E-ELT and simulating the instrument at full scale is extremely compute-intensive. The tomographic reconstructor (TR) is one of the core components of both the design simulations and eventually system operations and it requires the inversion of a large dense covariance matrix.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID S5122
Streaming:
 
GPU-Accelerated Imaging Processing for NASA's Solar Dynamics Observatory
Mark Cheung (Lockheed Martin Solar & Astrophysics Laboratory)
Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onb ...Read More

Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onboard SDO deliver 4096x4096 pixel images at a cadence of more than one image per second. Although SDO images are free from distortion by absorption and scattering in the Earth's atmosphere, images are still blurred by the intrinsic point spread functions of the telescopes. In this presentation, we show how the instrument teams have deployed CUDA-enabled GPUs to perform deconvolution of SDO images. The presentation will demonstrate how we leveraged cuFFT and Thrust to implement an efficient image processing pipeline.

  Back
 
Keywords:
Astronomy & Astrophysics, Video & Image Processing, GTC 2015 - ID S5209
Streaming:
Download:
 
Embedded Supercomputing: Radio Astronomy at the Limit
Simon Ratcliffe (SKA South Africa)
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the devel ...Read More
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the development of a highly parallel, low power, low cost imager using System on Chip devices. In particular NVIDIA's TK1 and successors are considered. The talk will also briefly describe the opportunities and solutions presented by the forthcoming Square Kilometer Array, whose processing costs require game changing technology shifts to become achievable.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, GTC 2015 - ID S5222
Streaming:
Download:
 
Taranis: Ray-Traced Radiative Transfer in Smoothed Particle Hydrodynamics
Sam Thomson (University of Edinburgh)
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is mot ...Read More
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is motivated by the current intractability of coupled radiation-hydrodynamics simulations. This talk focuses on Taranis' tracing component, which has been influenced by recent work in computer graphics. It outperforms a 32-core CPU code on a single GPU. Our scheme allows particles to be updated independently and requires fewer rays than a typical 'long characteristics' method. Taranis' radiation transport solver is also implemented on the GPU, and targets large-scale simulations of reionization. However, the tracing API exists as a standalone entity.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Rendering & Ray Tracing, GTC 2015 - ID S5266
Streaming:
Download:
 
Optimization of GPU-Based Signal Processing of Radio Telescopes
Vinay Deshpande (NVIDIA)
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio ...Read More
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio Telescope (GMRT) receiver with wide-band GPU-based back-end and extending this design as a proposal for back-end for the LOW frequency array of the SKA Telescope. We look at various processing stages involved in pipeline for exploring optimization possibilities with some interesting results already achieved.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID S5302
Streaming:
Download:
 
Statistics of the Universe: Exa-Calculations and Cosmology's Data Deluge
Matthew Bellis (Siena College), Deborah Bard (SLAC National Accelerator Laboratory)
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and ...Read More
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and three-point correlation functions, which quantify the clustering of galaxies. Cosmological datasets can number in the millions (and soon billions) of galaxies, making these O(N^2) and O(N^3) metrics computationally challenging. This talk will detail how we have ported solutions to the GPU. In particular we focus on the novel histogramming bottlenecks inherent in these calculations, and how they can be mitigated. Throughout we will emphasise how GPUs and heterogeneous computing can be used for everyday data analysis with large datasets.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID S5509
Streaming:
Download:
 
The Ramses Code for Numerical Astrophysics: Toward Full GPU Enabling
Claudio Gheller (ETHZ CSCS)
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that ...Read More
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that populate it: stars, galaxies, black holes The most powerful computing systems are required to pursue such goals and GPUs represent an outstanding opportunity. In this talk, we present one of these codes, Ramses, and the ongoing work to enable this code to efficiently exploit GPUs through the adoption of the OpenACC programming model. The most recent achievement will be shown together with some of the scientific challenges GPUs can help addressing.  Back
 
Keywords:
Astronomy & Astrophysics, OpenACC, Computational Physics, Supercomputing & HPC, GTC 2015 - ID S5531
Streaming:
Download:
 
Pulsar Hunting with the Square Kilometre Array
Ewan Barr (Swinburne University of Technology)
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsar ...Read More
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsars. Radio pulsars provide us with phenomenal tools with which we may probe the most extreme environments in the Universe. More massive than our Sun, yet spinning faster than a kitchen blender and sending jets of radio waves out from their magnetic poles, these exotic cosmic lighthouses are key to understanding gravity and allowing us to ask the question: was Einstein right? To answer this question we must use the SKA to scour the Galaxy in search of exotic pulsars binary systems. This task is extremely computationally expensive, requiring the execution of many billions of Fourier transforms. Here I will review the work being done to leverage the power of GPUs to solve the SKAs pulsar searching challenge.  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, Supercomputing & HPC, GTC 2015 - ID S5875
Streaming:
Download:
 
Computational Simulation of World's Biggest Eye on GPUs
Hatem Ltaief (Extreme Computing Research Center, KAUST)
Have you heard about the world''s biggest eye? Learn how GPUs help design major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on dis ...Read More
Have you heard about the world''s biggest eye? Learn how GPUs help design major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we''ll explain how the resulting dense linear algebra operations associated with an efficient task-based programming model help design the next generation of telescope instruments.  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms, Performance Optimization, GTC 2016 - ID S6229
Streaming:
Download:
 
Shaping the Light with GPUs
Damien Gratadour (Universite Paris Diderot & Observatoire de Paris)
Learn how GPUs are used to shape the light on extreme diameter telescopes. By providing the means to process, in real time, large-scale images from wavefront sensors, GPUs are revolutionizing adaptive optics, an instrumental technique used to compens ...Read More
Learn how GPUs are used to shape the light on extreme diameter telescopes. By providing the means to process, in real time, large-scale images from wavefront sensors, GPUs are revolutionizing adaptive optics, an instrumental technique used to compensate fast-evolving aberrations in optical systems. We''ll show how GPUs are used to power the real-time controllers of these systems to provide millions of commands per second to deformable mirrors so as to stabilize the image quality at the output of a large telescope. The first results of the Green Flash project, a large-scale European initiative aimed at prototyping real-time controllers for the European Extremely Large Telescope, will be presented and illustrated with preliminary data obtained in the lab.  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, Supercomputing & HPC, GTC 2016 - ID S6236
Streaming:
Download:
 
A CUDA?-Based 3D Kinetic Model for Space Plasma Physics
Shahab Fatemi (University of California, Berkeley)
We''ve developed the first three-dimensional, self-consistent kinetic plasma model that runs on NVIDIA GPUs using CUDA. The model self-consistently solves the charged-particles motion and their associated electromagnetic fields. We use this model to ...Read More
We''ve developed the first three-dimensional, self-consistent kinetic plasma model that runs on NVIDIA GPUs using CUDA. The model self-consistently solves the charged-particles motion and their associated electromagnetic fields. We use this model to explore the microphysics of plasma interactions with solar system objects, to understand fundamental kinetic processes of plasma, and to meet NASA''s requirements for planetary and space exploration.  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms, Computational Physics, GTC 2016 - ID S6265
Streaming:
Download:
 
Fourier Domain Pulsar Acceleration Searches on GPUs for the Square Kilometre Array
Sofia Dimoudi (University of Oxford)
We''ll describe how we can accelerate one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world''s largest next generation radio telescope, the Square Kilometre Array (SKA). We''ll explain the scien ...Read More
We''ll describe how we can accelerate one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world''s largest next generation radio telescope, the Square Kilometre Array (SKA). We''ll explain the scientific goals and importance of pulsar searches, along with the technical challenges facing pulsar signal processing on the SKA. Pulsar acceleration searches will be introduced, and an overview of a Fourier Domain method for recovering signal power from binary accelerated pulsars will be given. We''ll then present our GPU implementation of this method, discuss techniques used for optimisation, show comparative computational performance results, and consider performance projections with future GPU technology.  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms, GTC 2016 - ID S6412
Streaming:
Download:
 
Bifrost: High-Throughput CPU/GPU Pipelines Made Easy
Ben Barsdell (NVIDIA)
We''ll present Bifrost, a lightweight new framework designed to ease the development and deployment of pipeline applications that demand sustained peak utilization of network, CPU, and GPU resources under soft real-time constraints. Such applications ...Read More
We''ll present Bifrost, a lightweight new framework designed to ease the development and deployment of pipeline applications that demand sustained peak utilization of network, CPU, and GPU resources under soft real-time constraints. Such applications are common in experimental science and computer vision, where processing must keep up with acquisition systems to avoid data loss. Bifrost enables operations to be wrapped in a simple task container with metadata-rich inputs and outputs. By connecting tasks together, complex branching pipelines can be constructed, with asynchronous communication handled by efficient ring buffers in host or device memory. We''ll demonstrate Bifrost using a high-performance radio astronomy application that has been deployed as part of the LEDA project.  Back
 
Keywords:
Astronomy & Astrophysics, Tools & Libraries, Signal & Audio Processing, GTC 2016 - ID S6627
Streaming:
Download:
 
Embedded Supercomputing: Radio Astronomy at the Limit
Simon Ratcliffe (SKA South Africa)
This talk will present designs and performance results for a highly parallel Tegra X1 based compute platform being developed as part of a next generation radio telescope. The MeerKAT radio telescope is currently under construction in the semi-desert ...Read More
This talk will present designs and performance results for a highly parallel Tegra X1 based compute platform being developed as part of a next generation radio telescope. The MeerKAT radio telescope is currently under construction in the semi-desert Karoo region of Southern Africa. This talk presents the ongoing work into developing novel computing technologies to deliver a large scale computational platform within the strict confines of power, space and emission that are in force at this remote site. Using the Tegra X1 as a building block, a rugged, oil-cooled platform has been developed that will power the imager that lies at the heart of the compute challenge. This is a follow on talk from an initial exploration presented in 2015.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, Press-Suggested Sessions: HPC & Science, GTC 2016 - ID S6692
Streaming:
Download:
Audio, Image and Video Processing
Presentation
Media
Using the GPU Direct for Video API
Thomas True (NVIDIA), Alina Alt (NVIDIA)
This tutorial will demonstrate how video I/O devices can take advantage of the GPU Direct for Video API to optimize the data transfer performance for digital video, film and broadcast applications and computer vision applications. The GPU Direct ...Read More

This tutorial will demonstrate how video I/O devices can take advantage of the GPU Direct for Video API to optimize the data transfer performance for digital video, film and broadcast applications and computer vision applications. The GPU Direct for Video API is a technology that permits the DMA transfer of data buffers between video I/O devices and the GPU through the use of a shared system memory buffer for immediate processing by OpenGL, DirectX, CUDA and OpenCL. This direct transfer can improve synchronization and eliminate latency between video capture, GPU processing and video output.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2049
Streaming:
Download:
 
Fast High Quality Image and Video Background Removal with CUDA
Timo Stich (NVIDIA)
A tool to efficiently and easily cut out objects from a taken picture has great practical value. In this session we present aspects on how to efficiently implement such a tool with CUDA and the NPP library based on the GrabCut approach by Rother ...Read More

A tool to efficiently and easily cut out objects from a taken picture has great practical value. In this session we present aspects on how to efficiently implement such a tool with CUDA and the NPP library based on the GrabCut approach by Rother et al. Through GPU acceleration both runtime and accuracy is improved compared to CPU based implementations such as the one in MS Word 2011. Further we show how to extend our GPU implementation to enable live background removal in a webcam video stream.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2052
Streaming:
Download:
 
Cost-effective GPU Acceleration of a Video Restoration and Archiving Workflow
Klaus Gaedke (Technicolor)
The goal of this session is to present a complex GPU-accelerated video restoration and archiving workflow. The workflow consists of many different processing steps and a final review application. Fast and cost-effective processing and real-time ...Read More

The goal of this session is to present a complex GPU-accelerated video restoration and archiving workflow. The workflow consists of many different processing steps and a final review application. Fast and cost-effective processing and real-time display of the processed video material is a key requirement. It will be shown in detail how a GPU based acceleration can be achieved for many different processing steps and the review application based on the use of OpenCV, OpenCL, and OpenGL. Furthermore, an object oriented software architecture supporting the acceleration of several different processing tasks on the same graphics adapter will be presented.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2073
Streaming:
Download:
 
Multi-GPU Real-Time Ptychographic X-ray Image Reconstruction
Filipe Maia (Lawrence Berkeley National Laboratory)
Learn how a new imaging technique, combined with the computational power of GPUs and the brightness of modern X-ray synchrotrons can quickly and easily produce images with nanometer level resolution. Ptychography is a recent X-ray imaging techni ...Read More

Learn how a new imaging technique, combined with the computational power of GPUs and the brightness of modern X-ray synchrotrons can quickly and easily produce images with nanometer level resolution. Ptychography is a recent X-ray imaging technique in which overlapping regions of a sample are exposed in quick succession and the resulting scattering is used to reconstruct a high resolution image of the sample. Discover why GPUs can substitute for the lack of X-ray lenses and how they enabled a dramatic reduction in the feedback time for users of the technique from days to seconds.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2131
Streaming:
Download:
 
Rapid Training of Acoustic Models Using GPUs
Jike Chong (Carnegie Mellon University), Ian Lane (Carnegie Mellon University Co)
Learn how to realize robust and accurate speech recognition systems by training acoustic models on GPUs. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large ...Read More

Learn how to realize robust and accurate speech recognition systems by training acoustic models on GPUs. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large cluster of machines. To overcome this development bottleneck, we propose a new framework for rapid training of acoustic models using highly parallel GPUs. With a single NVIDIA GTX580 GPU, our proposed approach is shown to be 51x faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000-hour speech data in just over 9 hours.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2222
Streaming:
Download:
 
Building Real-Time Professional Visualization Solutions with OpenCL
Kristof Denolf (Barco), Samuel Maroy (Barco)
Professional visualization solutions, like high-quality high-resolution medical displays or very large screens for surveillance or entertainment, benefit from GPUs image and graphics compute capabilities to achieve real-time performance, but add ...Read More

Professional visualization solutions, like high-quality high-resolution medical displays or very large screens for surveillance or entertainment, benefit from GPUs image and graphics compute capabilities to achieve real-time performance, but add specific constraints, like low-latency, multiple HD streams and strict synchronization. This talk first motivates the industrial relevance of development in OpenCL on heterogeneous devices. It then explains the techniques currently explored to meet the specific design constraints, with a main focus on parallel data transfer and compute. The lessons learned are illustrated with a real-life example.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2252
Streaming:
Download:
 
Sensor Processing with Rugged Kepler GPUs (Presented by GE Intelligent Platforms)
Dustin Franklin (GE Intelligent Platforms)
Swimming in sensors and drowning in data? Turn the tide on high-bandwidth sensors with rugged next-generation GPUs from NVIDIA. See how we deploy NVIDIA GPUs into the most extreme of environments, providing GPGPU capabilities onboard platforms w ...Read More

Swimming in sensors and drowning in data? Turn the tide on high-bandwidth sensors with rugged next-generation GPUs from NVIDIA. See how we deploy NVIDIA GPUs into the most extreme of environments, providing GPGPU capabilities onboard platforms where SWaP and GFLOPS/watt is key. Dig into four realtime CUDA sensor processing applications - Hyperspectral Imaging, Wide-Area Surveillance, 360° Situational Awareness, and GSM cellular SIGINT. Discuss the CUDA algorithms, interconnects, and rugged platforms behind each. Learn how we utilize GPUDirect and realtime Linux for improved latency and determinism.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2253
Streaming:
Download:
 
Fast JPEG Coding on the GPU
Fyodor Serzhenko (Fastvideo), Victor Podlozhnyuk (NVIDIA)
The goal of this session is to demonstrate how high speed JPEG compression and decompression can be efficiently implemented on the GPU using CUDA. In this session we will present: detailed analysis of Baseline JPEG compression and decompression ...Read More

The goal of this session is to demonstrate how high speed JPEG compression and decompression can be efficiently implemented on the GPU using CUDA. In this session we will present: detailed analysis of Baseline JPEG compression and decompression processes and its constituent parts (such as Huffman Coding, RLE, Differential Coding, Quantization, Discrete Cosine Transform) and their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to high-speed imaging.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2273
Streaming:
Download:
 
Best Practices in GPU-Based Video Processing
Thomas True (NVIDIA)
The combination of the GPU's massively parallel compute engine with extremely high memory bandwidth and new programming paradigms such as CUDA and OpenCL have made the GPU well suited for image and video processing applications. This session ...Read More

The combination of the GPU's massively parallel compute engine with extremely high memory bandwidth and new programming paradigms such as CUDA and OpenCL have made the GPU well suited for image and video processing applications. This session will explore best practices and techniques for the development of efficient GPU-based video and image processing applications. Topics to be discussed include image segmentation and threading models for efficient parallelism, optimal memory usage strategies to reduce expensive data movement as well as multi-GPU considerations. Case studies and examples specific to video and image processing will be presented.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2328
Streaming:
Download:
 
GPU-Based Video Processing Round Table
Thomas TRUE (NVIDIA), Alina Alt (NVIDIA), Eric Young (NVIDIA), Ian Williams (NVIDIA), Andrew Page (NVIDIA)
Have questions, concerns or thoughts about the direction of GPU-based video and image processing? Join NVIDIA engineers and product managers for a lively discussion of such topics as application design, multi-GPU architecture, data movement, thr ...Read More

Have questions, concerns or thoughts about the direction of GPU-based video and image processing? Join NVIDIA engineers and product managers for a lively discussion of such topics as application design, multi-GPU architecture, data movement, threading, APIs, and color management as they apply to Video and Image processing applications.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2601
Streaming:
Download:
 
Rapid Training of Acoustic Models Using GPUs
Jike Chong (Carnegie Mellon University)
Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large c ...Read More
Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large cluster of machines. To overcome this development bottleneck, we propose a new framework for rapid training of acoustic models using highly parallel GPUs. With a single NVIDIA GTX580 GPU, our proposed approach is shown to be 51x faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000-hour speech data in just over 9 hours.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2222
Download:
 
2 Million Pixel Experiment
Philipp Drieger (Noumentalia.de - Digital Arts & KU Eichstatt-Ingolstadt)
This experimental application has been created as a piece of computational art using visual computing technologies. It maps a high definition video source (1080p) into 3D space. The pixel transformation is accelerated by a CUDA kernel to achieve ...Read More

This experimental application has been created as a piece of computational art using visual computing technologies. It maps a high definition video source (1080p) into 3D space. The pixel transformation is accelerated by a CUDA kernel to achieve realtime accuracy. Beside the production of visual effects in arts this method may be utilized for video quality checking on lower pixel level.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2266
Download:
 
Speeding Up Camera Sabotage Detection on CUDA
Alptekin Temizel (Middle East Technical University)
Camera Sabotage Detection (CSD) algorithms, namely Camera Moved Detection, Camera Out of Focus Detection and Camera Covered Detection, are used to detect tampering attempts on surveillance cameras. CSD algorithms are required to be run on a high num ...Read More
Camera Sabotage Detection (CSD) algorithms, namely Camera Moved Detection, Camera Out of Focus Detection and Camera Covered Detection, are used to detect tampering attempts on surveillance cameras. CSD algorithms are required to be run on a high number of cameras in real-time, bringing high computational load to the video analytics systems. In this work, the CSD algorithms are accelerated by using CUDA. The overall system test results show that parallelization in GPU makes the system 18 times faster than its CPU counterpart and up to 400 cameras can be supported in real time on a GTX 470.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2381
Download:
 
Remote Sensing on GPU: A Case Study
Alptekin Temizel (Middle East Technical University)
Satellite images have become widely available; as a result there are increasing number of commercial applications utilizing these images. Satellites provide data in different wavelengths and they have higher resolution and larger data size compared t ...Read More
Satellite images have become widely available; as a result there are increasing number of commercial applications utilizing these images. Satellites provide data in different wavelengths and they have higher resolution and larger data size compared to typical images. Running complex algorithms on satellite images for large data volumes is highly time consuming using CPUs and can be speeded-up using GPUs. In this paper, performance of shadow detection and vegetation detection algorithms are investigated and their performance on GPU and CPU are compared. Results show that up to 10.2 times speed up could be achieved using GPU.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2387
Download:
 
Finite Difference-Based Sound Synthesis Using GPUs
Marc Sosnick (San Francisco State University)
Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. In this poster, ...Read More
Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. In this poster, we describe the current state of our implementation of a real-time sound synthesizer using an FD-based simulation of a two-dimensional membrane executed on GPUs. We demonstrate that it is possible to use this method to create a usable real-time audio synthesizer.   Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2397
Download:
 
Parallelization of Hough Transform for Circles using CUDA
Alptekin Temizel (Middle East Technical University)
Hough Transform (HT) is a well-known technique used for detection of parametric shapes in image processing. However, various optimizations are necessary in its implementation due to large memory and computational requirements. In this paper, we consi ...Read More
Hough Transform (HT) is a well-known technique used for detection of parametric shapes in image processing. However, various optimizations are necessary in its implementation due to large memory and computational requirements. In this paper, we consider the case of parallelization of Hough Transform for circles. A number of different implementation approaches of the algorithm is compared in CUDA. Results show that up to 360 times speed up could be achieved compared to its CPU version, enabling real time applications.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2438
Download:
 
Accelerating an Imaging Spectroscopy Algorithm Using GPUs
Matthew Sellitto (Northeastern University)
Graphics Processing Units (GPUs) have proven to be effective at accelerating a range of scientific applications. As data needs increase, and more complex data analysis methods are used, the processing requirements for solving scientific problems ...Read More

Graphics Processing Units (GPUs) have proven to be effective at accelerating a range of scientific applications. As data needs increase, and more complex data analysis methods are used, the processing requirements for solving scientific problems also increase. The parallel processing power of GPUs can be harnessed and used alongside multi-core CPUs to address this. As an example, many problems require solving optimization problems of multiple variables across large arrays of data. By utilizing modern optimization techniques and combining them with the computational throughput of a CPU-GPU computing platform, we can greatly decrease the processing time required to solve these problems.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2455
Download:
 
CUVILib - GPU Accelerated Vision & Imaging Library
Salman Ul Haq (TunaCode)
Image Processing algorithms are used in a variety of different domains, from surveillance to medicine to industry. CUVI (CUDA Vision and Imaging Library) provides GPU accelerated Vision and Imaging functionality with plug-and-play ease of use, simple ...Read More
Image Processing algorithms are used in a variety of different domains, from surveillance to medicine to industry. CUVI (CUDA Vision and Imaging Library) provides GPU accelerated Vision and Imaging functionality with plug-and-play ease of use, simple yet powerful interface and support for both NVIDIA and AMD GPUs. With over 1000 users of the Beta version, CUVI has fast grown into a mature solution of choice when it comes to delivering real-time performance for your Imaging/Vision applications and software-frameworks.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2462
Download:
 
Implementation of Raptor Code on GPU
Linjia Hu (Michigan Technological University)
Raptor Code comes as an improvement to LT-Code, which performs as close as possible to the Shannons channel limit and provides linear encoding and decoding time. It has been chosen for the forward error correction (FEC) scheme in 3GPP and DVB-H stan ...Read More
Raptor Code comes as an improvement to LT-Code, which performs as close as possible to the Shannons channel limit and provides linear encoding and decoding time. It has been chosen for the forward error correction (FEC) scheme in 3GPP and DVB-H standards. We implement Raptor Codes on GPU for the purpose of processing large block size and symbol size effectively and efficiently.Our GPU decoding achieve up to a 40x speedup over the sequential CPU decoding.   Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2473
Download:
 
Real-Time Wind Velocity Estimation from Aerosol Lidar Data using GPUs
Chris Mauzey (California State University, Chico)
The REAL is an atmospheric light detection and ranging (LIDAR) system. It produces near-horizontal and vertical cross-sectional images of the lower atmosphere. The images reveal the spatial distribution of atmospheric aerosol (particulate matter ...Read More

The REAL is an atmospheric light detection and ranging (LIDAR) system. It produces near-horizontal and vertical cross-sectional images of the lower atmosphere. The images reveal the spatial distribution of atmospheric aerosol (particulate matter). By applying motion estimation algorithms to image sequences, two-dimensional vector wind fields can be determined. We will explore the use of GPU computing in the real-time computation of wind vector fields.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2501
Download:
 
GPU Based Feature Extraction Implementation
Haofeng Kou (SCU)
In this poster, we introduce an efficient parallel implementation of Mel-frequency Cepstral Coefficient (MFCC)-based feature extraction and describe the optimizations required for effective throughput on many core Graphic Processing Units (GPU) proce ...Read More
In this poster, we introduce an efficient parallel implementation of Mel-frequency Cepstral Coefficient (MFCC)-based feature extraction and describe the optimizations required for effective throughput on many core Graphic Processing Units (GPU) processors. We demonstrate that the feature extraction process in automatic speech recognition is well suited for GPUs and a substantial reduction in computation time can be obtained by performing feature extraction on these platforms. Using a single NVIDIA GTX460 GPU our proposal approach is shown to be approximately 25x faster than a sequential CPU implementation, enabling feature extraction to be performed in real-time.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2510
Download:
Augmented Reality & Virtual Reality
Presentation
Media
High Efficiency Near-Eye Light Field Display
Andrew Maimone (University of North Carolina at Chapel Hill)
We present a near-eye light field display design that supports accommodation and high spatial resolution while using with the same bandwidth as a conventional display. A light source array reflects light in multiple directions off a high speed binary ...Read More
We present a near-eye light field display design that supports accommodation and high spatial resolution while using with the same bandwidth as a conventional display. A light source array reflects light in multiple directions off a high speed binary display, creating a light field over the eye. The display bandwidth conventionally used for color gradations is instead used to create a high angular resolution binary light field; color gradations will be partially recovered when the light field is collected by the eye and focused on the retina.  Back
 
Keywords:
Augmented Reality & Virtual Reality, GTC 2015 - ID P5248
Download:
 
GPU Accelerated Cutting for Surgical Simulation Systems
Pourya Shirazian (University of Victoria)
One of the main objectives of virtual reality based surgical simulation systems is the removal of pathologic tissues. Cutting imposes many challenges in the development of a robust, interactive surgery simulation, not only because of the nonlinear ma ...Read More
One of the main objectives of virtual reality based surgical simulation systems is the removal of pathologic tissues. Cutting imposes many challenges in the development of a robust, interactive surgery simulation, not only because of the nonlinear material behavior exhibited by soft tissue but also due to the complexity of introducing the cutting-induced discontinuity. We propose a high performance cutting algorithm for complex tetrahedral meshes. As a proof of concept we integrated our algorithm in a craniotomy simulation.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computational Physics, GTC 2015 - ID P5254
Download:
 
Game-based Learning and Simulation System using Web Technologies and GPU
Ibrahim Demir (University of Iowa)
We developed a web-based 3D interactive learning environment for teaching hydrological concepts. The system provides a visually striking platform with realistic terrain information, and water simulation. Students can create scenarios, control paramet ...Read More
We developed a web-based 3D interactive learning environment for teaching hydrological concepts. The system provides a visually striking platform with realistic terrain information, and water simulation. Students can create scenarios, control parameters, and evaluate mitigation alternatives. The system utilizes web technologies and GPU for water simulation and object collisions on the terrain. The system supports virtual reality, augmented and immersive reality modes, and enables interaction using gesture, body movement and portable devices.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Education & Training, GTC 2015 - ID P5255
Download:
 
The Future of Human Vision: Preferential Augmentation Using GPUs
Muhammad Shamim (Baylor College of Medicine)
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more genera ...Read More
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, Medical Imaging, Video & Image Processing, GTC 2015 - ID S5182
Streaming:
Download:
 
Accelerating Computer Vision and Augmented Reality via GPGPU Computing
Jack Dashwood (Metaio)
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely comp ...Read More
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID S5626
Streaming:
Download:
 
VR Direct: How NVIDIA Technology Is Improving The VR Experience
Nathan Reed (NVIDIA), Dario L. Sancho Pradel (Crytek)
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this tal ...Read More
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this talk, we'll show how developers can use NVIDIA GPUs and VR Direct to improve the gaming experience on the Oculus Rift and other VR headsets.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Game Development, Real-Time Graphics, GTC 2015 - ID S5668
Streaming:
Download:
 
Augmented Reality with Google's Project Tango and NVIDIA Technology
Wil Braithwaite (NVIDIA)
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will de ...Read More
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will demonstrate showcases the NVIDIA® VCA cluster for cloud-based rendering, NVENC for low-latency video encoding, and Google's Project Tango with the Tegra K1 processor for pose tracking and video decoding. The demo system presented can also serve graphics to multiple low-latency devices, such as a Virtual Reality HMD, at a rate much faster than the graphics are rendered.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Media & Entertainment, Real-Time Graphics, GTC 2015 - ID S5733
Streaming:
 
VR Everywhere: Consumer Virtual Reality for Desktop, Mobile and Web
Tony Parisi (Third Eye)
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU a ...Read More
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU acceleration and cheap sensors has enabled low-cost consumer-grade VR, and the rapid adoption of software development kits is paving the way for creating virtual reality apps on platforms from desktops to smartphones, and even running in your web browser using WebGL. Join VR pioneer and WebGL developer Tony Parisi as he explores this exciting frontier. This session will take a look at the latest VR hardware devices, supported operating systems and software development kits, and a wide applications already being deployed.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Developer - Tools & Libraries, Real-Time Graphics, GTC 2015 - ID S5737
Streaming:
Download:
Automotive
Presentation
Media
Creating Mobile Apps for the Automotive Market
Kerry Johnson (QNX Software Systems)
The growing convergence of mobile handsets and automotive platforms is creating a new market opportunity for app developers. That said, many differences exist between the smartphone and the car, and understanding them is key to unlocking the pot ...Read More

The growing convergence of mobile handsets and automotive platforms is creating a new market opportunity for app developers. That said, many differences exist between the smartphone and the car, and understanding them is key to unlocking the potential of this new market. To give the app developer a jump-start, this session explores how a car infotainment system is structured, UX considerations for automotive applications, design principles for taking best advantage of SoCs like Tegra 3, and key differences between mobile and automotive platforms.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3223
Streaming:
Download:
 
Augmented Reality Head-up Display for Cars
Victor Ng-Thow-Hing (Honda Research Institute USA)
The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This s ...Read More

The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This session explores how to solve these problems by combining design methodologies with technological research. Before field testing ideas in actual cars, high fidelity prototypes with driving simulators are utilized with an actual windshield head-up display to visualize the augmented graphics. UI Composer is leveraged with proprietary software to engage designers in the prototyping process.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Instrument Clusters & Heads-Up Display (HUD), Manufacturing Technical, GTC 2013 - ID S3230
Streaming:
Download:
 
High Performance Map Rendering for In-vehicle Navigation
Don Burns (NVIDIA)
This session will provide techniques for rendering 3D maps efficiently and at high frame rates, while still preserving quality. Topics to be discussed include tile cache management, tile fetch, tile rendering techniques, layer management, and 3D ...Read More

This session will provide techniques for rendering 3D maps efficiently and at high frame rates, while still preserving quality. Topics to be discussed include tile cache management, tile fetch, tile rendering techniques, layer management, and 3D object rendering.

  Back
 
Keywords:
Automotive, Navigation Systems, GTC 2013 - ID S3386
Streaming:
Download:
 
Optimizing Pedestrian Detection for Real-time Automotive Applications
Vladimir Glavtchev (NVIDIA)
This session will present a motion estimation approach to pedestrian and cyclist detection. Through analyzing motion across several frames, this technique accurately segments foreground objects from the background, including their positions and ...Read More

This session will present a motion estimation approach to pedestrian and cyclist detection. Through analyzing motion across several frames, this technique accurately segments foreground objects from the background, including their positions and velocities. Foreground objects are classified as pedestrians, bicyclists or motorcyclists, or other objects on the road surface. The entire process is optimized to minimize the computation resources needed for detection and classification. The optimizations make it possible to perform the entire process on a mobile grade GPU system with a modest host processor.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Computer Vision, GTC 2013 - ID S3396
Streaming:
Download:
 
Speech and Vision Processing for Immersive In-Vehicle Applications
Ian Lane (Carnegie Mellon University)
AIDAS, an Intelligent Driver Assistive System being developed at Carnegie Mellon University, enables the investigation of Immersive Interaction within vehicles. The AIDAS platform enables rich, speech-centric interaction with the driver. Interac ...Read More

AIDAS, an Intelligent Driver Assistive System being developed at Carnegie Mellon University, enables the investigation of Immersive Interaction within vehicles. The AIDAS platform enables rich, speech-centric interaction with the driver. Interactions are both context-aware, based on the location of the car and driver''s gaze direction, and are natural, akin to interacting with a human assistant. This session will introduce the core speech and vision components used within AIDAS and describe the approaches used to accelerate these technologies to realize a real-time interactive system.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3403
Streaming:
Download:
 
Automotive Advanced Driver Assistance Systems: Challenges & Opportuinities
Ian Riches (Strategy Analytics)
This session will examine the driving forces behind the adoption of Advanced Driver Assistance Systems (ADAS), one of the fastest growing application areas by car makers. Key battle grounds between new and existing suppliers will be examined, an ...Read More

This session will examine the driving forces behind the adoption of Advanced Driver Assistance Systems (ADAS), one of the fastest growing application areas by car makers. Key battle grounds between new and existing suppliers will be examined, and forecasts presented for key systems, semiconductors and sensors. Despite the high forecast growth, challenges remain to widespread adoption across the globe. These barriers will be explained, together with recommendations for what needs to be done to overcome them.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3413
Streaming:
Download:
 
Overview of UI Composer Studio
Justin Ebert (NVIDIA)
UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. Developed by NVIDIA, it is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, m ...Read More

UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. Developed by NVIDIA, it is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, market research, usability testing and ultimately final production.This session covers the basics of constructing an instrument cluster and IVI using Studio''s advanced authoring environment.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3419
Streaming:
Download:
 
Audi Urban Intelligent Assist: Taking Urban Mobility to the Next Level
Mario Tippelhofer (Audi)
The goal of the Audi Urban Intelligent Assist (AUIA) research initiative is to showcase different technologies and approaches to make the challenges of navigating the chaotic roadways of the world''s megacities less stressful, safer and ...Read More

The goal of the Audi Urban Intelligent Assist (AUIA) research initiative is to showcase different technologies and approaches to make the challenges of navigating the chaotic roadways of the world''s megacities less stressful, safer and more efficient a generation from now. This is mainly achieved through advancements in predictive technology, by harnessing the power of Big Data through algorithms, real time data, Human Machine Interfaces (HMI), advanced sensors and other innovative approaches. The AUIA project is the latest in a series of university collaborations that Audi has formed to explore the frontiers of automotive technologies and electronics.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3481
Streaming:
Download:
 
GPU Requirements for Automotive Infotainment Systems
Ron Szabo (Delphi Coroporation)
This session will cover the current and future requirements for GPUs in the automotive space for infotainment systems. Four areas contribute to the exponential growth of processing power required onboard: (1) traditional feature growth; (2) the ...Read More

This session will cover the current and future requirements for GPUs in the automotive space for infotainment systems. Four areas contribute to the exponential growth of processing power required onboard: (1) traditional feature growth; (2) the impact of mobile devices and brought in content; (3) the compounding effect of off-board services and cloud connectivity and; (4) development headroom to eventually eliminate optimization. Critical tradeoffs that Tier 1s and OEMs need to make will be discussed.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3542
Streaming:
Download:
 
From Big Data to Thin Client: The GPU as an Experiential Enabler
Christopher Nelson (RTT USA, Inc.)
We will tour the life of data from PDM to POS (Point-Of-Sale). Some stops along the way will include: Design, Engineering and Perceived Quality. With an end result of high-end visualization, a focus on new hardware from NVIDIA will take the expe ...Read More

We will tour the life of data from PDM to POS (Point-Of-Sale). Some stops along the way will include: Design, Engineering and Perceived Quality. With an end result of high-end visualization, a focus on new hardware from NVIDIA will take the experience to uncharted territories.

  Back
 
Keywords:
Automotive, Cloud Visualization, SIGGRAPH 2013 - ID SIG1326
Streaming:
Download:
 
UI Composer for Automotive HMIs - Part 1: What, Why, and How
Gavin Kistner (NVIDIA), Stephen Mendoza (NVIDIA)
In depth view into content creation using UI Composer. Including the Digital asset pipeline, animation, materials, development of state-machines, and debugging. ...Read More
In depth view into content creation using UI Composer. Including the Digital asset pipeline, animation, materials, development of state-machines, and debugging.  Back
 
Keywords:
Automotive, Debugging Tools & Techniques, Digital Product Design & Styling, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4616
Streaming:
Download:
 
UI Composer for Automotive HMIs - Part 2: Building Content
Gavin Kistner (NVIDIA), Xavier Mendoza (NVIDIA)
A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this se ...Read More
A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this session, attendees are asked to bring their own Windows laptop with UI Composer installed. UI Composer is available for free from http://uicomposer.nvidia.com/  Back
 
Keywords:
Automotive, Debugging Tools & Techniques, Digital Product Design & Styling, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4806
Streaming:
Download:
 
Real-Time Electromagnetic Wave Propagation Using OptiX for Simulation of Car-to-Car-Communication
Manuel Schiller (Technische Universitat Munchen)
In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless C ...Read More
In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless Car-to-Car communication. Learn how ray tracing performance can be improved to archieve real-time simulations and how the ray tracing results are post-processed to perform the electromagnetic calculations on the GPU using the Thrust library.  Back
 
Keywords:
Automotive, Computational Physics, Rendering & Ray Tracing, GTC 2014 - ID S4359
Streaming:
Download:
 
Tegra K1 and the Automotive Industry
Gernot Ziegler (NVIDIA), Timo Stich (NVIDIA)
Discover how mobile GPUs enable modern features of car driving in a power-efficient and standardized way, by providing the fundamental building blocks of computer vision to the higher-level reasoning functions that enable the car to detect lanes, par ...Read More
Discover how mobile GPUs enable modern features of car driving in a power-efficient and standardized way, by providing the fundamental building blocks of computer vision to the higher-level reasoning functions that enable the car to detect lanes, park automatically, avoid obstacles, etc. We explain the challenges of having to fit into a given time budget, and how the low-level machine vision such as corner detection, feature tracking and even more advanced functionality such as 3D surrounding reconstruction is achieved in the context of the car's systems and its outside environment.  Back
 
Keywords:
Automotive, Computer Vision, Machine Learning & Deep Learning, Mobile Applications, GTC 2014 - ID S4412
Streaming:
Download:
 
Beyond Pedestrian Detection: Deep Neural Networks Level-Up Automotive Safety
Hideki Niihara (Denso IT Laboratory, Inc.), Ikuro Sato (Denso IT Laboratory, Inc.)
People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We hav ...Read More
People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We have a vision that future Advanced Driver Assistance Systems enable not just detecting pedestrians but recognizing how the pedestrians are and understanding the level of danger to avoid emergency situations. We claim deep Convolutional Neural Networks (CNN) are the right tools for these highly non-trivial tasks, and Tegra is the best partner. We demonstrate real-time deep CNN using Tegra.   Back
 
Keywords:
Automotive, Computer Vision, Machine Learning & Deep Learning, GTC 2014 - ID S4621
Streaming:
Download:
 
One Car Fits You: Technology and Opportunities in the Personalized Car
Ryan Middleton (Delphi)
Learn about two Delphi projects that are pushing the concept of a personalized in-vehicle experience. As drivers bring more of their personal content and personal style into the car, opportunities are emerging for car makers and platform providers to ...Read More
Learn about two Delphi projects that are pushing the concept of a personalized in-vehicle experience. As drivers bring more of their personal content and personal style into the car, opportunities are emerging for car makers and platform providers to differentiate their offerings. We will explore the infotainment architecture of the future - enabling feature upgrades at the same rate as mobile devices. We will also explore how GPU technology enables "months-to-minutes" user interfaces, and greater flexibility in end-user personalization.  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4659
Streaming:
 
NVIDIA Vision Toolkit for Advanced Driver Assistance Systems, Computational Photography and Beyond
Elif Albuz (NVIDIA), Frank Brill (NVIDIA)
In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision ap ...Read More
In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision applications. It leverages state-of-the-art Computer Vision research and offers a variety of functions to its developers,initially targeting Advanced Driver Assistance Systems (ADAS) and Augmented Reality (AR) applications. The toolkit will be highly GPU accelerated on mobile platforms, offering significant speedup and reducing engineering effort to design real-time vision applications. The toolkit includes open source samples and offers a flexible framework that enables users to extend and contribute new functionality. It will be deployed on different operating systems including Android and the Linux on ARM to registered developers and partners through NVIDIA's web site.  Back
 
Keywords:
Automotive, Computational Photography, Computer Vision, Mobile Summit, GTC 2014 - ID S4714
Streaming:
Download:
 
Today's LiDARs and GPUs Enable Ultra-Accurate GPS-Free Navigation with Affordable Simultaneous Localization and Mapping
Louay Eldada (Quanergy Systems, Inc.)
With recent advances in low-cost high-performance LiDARs (laser-based Light Detection and Ranging sensors) and GPUs, ultra-accurate GPS-free navigation based on SLAM (Simultaneous Localization and Mapping) is becoming a reality. Learn how the latest ...Read More
With recent advances in low-cost high-performance LiDARs (laser-based Light Detection and Ranging sensors) and GPUs, ultra-accurate GPS-free navigation based on SLAM (Simultaneous Localization and Mapping) is becoming a reality. Learn how the latest 360? field of view long-range 3D mapping LiDARs capable of generating data streams at gigasample-per-second (GSPS) sampling rates are used with 192 CUDA core GPUs based on the Kepler architecture to run artificial intelligence software and deliver advanced vehicular safety and navigation systems capable of real-time object detection, tracking, identification and classification, as well as offline full-availability jam-proof centimeter-accurate navigation.  Back
 
Keywords:
Automotive, Combined Simulation & Real-Time Visualization, In-Vehicle Infotainment (IVI) & Safety, Machine Learning & Deep Learning, GTC 2014 - ID S4761
Streaming:
Download:
 
Embedded Development For Tegra K1
Jesse Clayton (NVIDIA)
The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. ...Read More
The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. Jesse Clayton from NVIDIA will articulate the embedded development process for Tegra K1. The talk will cover the platform, programming paradigm, and development tools, and provide details on the Tegra K1 architecture relevant to embedded applications.   Back
 
Keywords:
Automotive, Defense, Computer Vision, Machine Learning & Deep Learning, GTC 2014 - ID S4938
Streaming:
 
Audi Piloted Parking on zFAS: Valet Parking for the 21st Century
Miklos Kiss (Audi Electronics Venture GmbH)
What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century. ...Read More
What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century.  Back
 
Keywords:
Automotive, Video & Image Processing, GTC 2014 - ID S4961
Streaming:
 
Object Detection: GPU-Friendly Soft Cascades
Alexander Smorkalov (Itseez)
Fast on-road object detection is an important ADAS feature (advanced driver assistance systems). We propose CUDA implementation of soft cascade detector that allows real-time object detection on Tegra K1 platform. Applicable for pedestrian and vehicl ...Read More
Fast on-road object detection is an important ADAS feature (advanced driver assistance systems). We propose CUDA implementation of soft cascade detector that allows real-time object detection on Tegra K1 platform. Applicable for pedestrian and vehicle detection.  Back
 
Keywords:
Automotive, GTC 2014 - ID P4289
Download:
 
Predicting ADAS Algorithms Performances on K1 Architecture
Romain Saussard (Renault)
Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remains a ...Read More
Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remains a problem for the car manufacturer. We propose a method to predict performance of computer vision algorithms on multiple, heterogeneous architectures in order to help choosing the best algorithm - architecture association. The approach is illustrated with a lane detection algorithm embedded on the K1.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, GTC 2015 - ID P5158
Download:
 
GPUService: GPU Acceleration of Robotic Services: Real Time 3D Point Cloud Processing
Leonardo Christino (Universidade de São Paulo)
GPU acceleration of robotic services focused on 3D point cloud processing of robotic depth sensors to approach real time for use in self-driving automobiles. ...Read More
GPU acceleration of robotic services focused on 3D point cloud processing of robotic depth sensors to approach real time for use in self-driving automobiles.  Back
 
Keywords:
Automotive, Embedded, GTC 2015 - ID P5192
Download:
 
Vision-Based Driver Assistance: Seeing the Way Forward
Ian Riches (Strategy Analytics)
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implication ...Read More
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5108
Streaming:
Download:
 
Through the Eyes of a Car: Visualizing a Car's Camera System
Gernot Ziegler (NVIDIA)
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even a ...Read More
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID S5123
Streaming:
Download:
 
Rapidly Prototyping Automotive User Experiences at Jaguar Land Rover
Matt Jones (Jaguar Land Rover)
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars. ...Read More
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars.  Back
 
Keywords:
Automotive, Embedded, Manufacturing, Real-Time Graphics, GTC 2015 - ID S5137
Streaming:
 
Next Generation Surround-View for Cars
Miguel Sainz (NVIDIA), Timo Stich (NVIDIA)
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final v ...Read More
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID S5295
Streaming:
Download:
 
Pimp My Ride: How to Mod Cars with Tegra
Dave Anderson (NVIDIA)
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted int ...Read More
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted into these cars as a proof-of-concept for next-generation digital clusters and infotainment systems.  Back
 
Keywords:
Automotive, Embedded, Video & Image Processing, GTC 2015 - ID S5396
Streaming:
 
Enabling Next-Gen Vehicle Architectures with Embedded Supercomputing
Uday Pitambare (Delphi)
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computin ...Read More
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computing to up-integrate traditionally disparate vehicle systems. We will also discuss the advantages and challenges involved in this process.  Back
 
Keywords:
Automotive, GTC 2015 - ID S5469
Streaming:
Download:
 
Safe and Seamless Integration of Tegra into the In-Vehicle Network
Stefaan Sonck Thiebaut (OpenSynergy)
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing ...Read More
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing and safety constraints required by the automotive industry. In addition, learn how the solution allows controlled communication between virtualized operating systems and the vehicle networks while maintaining the isolation between both.  Back
 
Keywords:
Automotive, GTC 2015 - ID S5532
Streaming:
Download:
 
Benchmarking Real-World In-Vehicle Applications
Michael Carstens-Behrens (mycable GmbH)
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world appli ...Read More
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world applications, such as infotainment systems, will find the bottlenecks in your system. Find them before the project fails or find options to transfer tasks to the GPU (e.g. using CUDA). Attendees will see how to transform your system architecture into a ""System Resource Model"" then find the ""Critical Use Cases"" of your application and match them with this model. This practical approach will show how to setup benchmarks in parallel to emulate use cases under reproducible conditions based on an example for an automotive infotainment system.  Back
 
Keywords:
Automotive, Embedded, Developer - Performance Optimization, GTC 2015 - ID S5587
Streaming:
Download:
 
Self-Driving Vehicles: Changing the Mission of Human-Machine Interface
Walter Sullivan (Elektrobit)
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can i ...Read More
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can it be selected? Is the idea of a work load manager still relevant? On the other hand, autonomous driving brings new challenges for the vigilance and distraction of the driver. How can the driver be pulled back into the loop when required? When is it required? How can drivers be informed about the limits of the machine? We will also discuss methods on how to "measure" HMI and driving performance in automation, such as steering wheel reversal rate, standard deviation lane position, speed keeping and more.  Back
 
Keywords:
Automotive, Augmented Reality & Virtual Reality, GTC 2015 - ID S5588
Streaming:
Download:
 
Gesture Recognition: Using a Multi Sensor Approach
Shalini Gupta (NVIDIA)
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. ...Read More
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, GTC 2015 - ID S5599
Streaming:
Download:
 
Robust Speech Recognition for Cars
Ian Lane (Carnegie Mellon University)
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, ...Read More
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, we are able to significantly improve the performance of speech recognition, even in challenging noise conditions.  Back
 
Keywords:
Automotive, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5633
Streaming:
 
ZFAS - The Brain of Piloted Driving at Audi
Matthias Rudolph (Audi AG)
During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computat ...Read More

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5637
Streaming:
 
The Fast Lane from Silicon Valley to Munich
Uwe Higgen (BMW Group)
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. ...Read More
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.  Back
 
Keywords:
Automotive, Embedded, Computer Vision & Machine Vision, GTC 2015 - ID S5789
Streaming:
Download:
 
Audi Piloted Driving: In the Fast Lane to the Future
Daniel Lipinski (Audi of America)
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car ...Read More
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5870
Streaming:
Download:
 
Ubiquitous Perceptive 3D Sensing for a Smart Internet of Things
Louay Eldada (Quanergy Systems, Inc.)
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart s ...Read More
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart sensors provide real-time information on billions of 'Things' and their surroundings (through 3D object detection, tracking, and classification) and, when needed, the ability to control them. The 'Things' include vehicles, infrastructure, buildings, homes, appliances, light controls, thermostats, medical devices, computers, and handheld devices. Growth of personal devices (phones, tablets, laptops, game consoles) is limited by the number of people in the world. The largest growth will come from connected devices in areas such as smart energy, home automation, and transportation.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID S5918
Streaming:
 
Electronics & APIs: The Aftermarket's new Bondo
John Waraniak (Specialty Equipment Market Association (SEMA)), John Ellis (Ellis & Associates)
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the v ...Read More
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the vehicle? Drawing heavily on the Vehicle Dynamics Program, The Specialty Equipment Market Association ("SEMA") has developed the Vehicle Electronics Program to ensure that the next generation of in-car electronics realizes its full potential. Learn about this new program including the new proposed federal motor vehicle standard, FMVSS 150. In addition, we'll cover the resources and opportunities available to developers for designing and customizing vehicles.  Back
 
Keywords:
Automotive, Product Design & Styling, GTC 2015 - ID S5545
Streaming:
Download:
Best of GTC
Presentation
Media
Advanced Rendering Solutions from NVIDIA
Phillip Miller (NVIDIA)
Learn about the latest breakthroughs and offerings in NVIDIAs Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will ...Read More

Learn about the latest breakthroughs and offerings in NVIDIAs Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will be explored and demonstrated, along with what''s possible with the latest in NVIDIA OptiX for accelerating custom ray tracing development. Industry trends and production examples will also be explored as advanced in both interactive and production rendering possibilities continue to revolutionize workflows.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4111
Streaming:
Download:
 
How V-Ray RT and GPU Rendering are Defining a New Filmmaking Paradigm
Chris Nichols (Chaos Group), Kevin Margo (Blur Studio)
Blur Studio''s CG and VFX Supervisor Kevin Margo and Chaos Group''s Creative Director Christopher Nichols will discuss how they collaborated with NVIDIA in the production of Margo''s short CONSTRUCT. Using GPU accelerated ...Read More

Blur Studio''s CG and VFX Supervisor Kevin Margo and Chaos Group''s Creative Director Christopher Nichols will discuss how they collaborated with NVIDIA in the production of Margo''s short CONSTRUCT. Using GPU accelerated V-Ray-RT, along with the latest hardware from NVIDIA, they were able to hyper accelerate rendering allowing Margo to be able to focus on the creative process without being slowed down by the technology.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4112
Streaming:
Download:
 
See the Big Picture: Scalable Visualization Solutions for High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displa ...Read More

Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4113
Streaming:
Download:
 
Practical Real-Time Voxel-Based Global Illumination for Current GPUs
Alexey Panteleev (NVIDIA)
This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin''s research, a library has be ...Read More

This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin''s research, a library has been developed that allows applications to render GI effects for large and fully dynamic scenes at 30 frames per second or more, producing soft diffuse indirect lighting and blurry specular reflections, and providing emissive material support. During the session, Alexey will talk about the cone tracing GI algorithm in general and get into the details of scene representation, efficient multi-resolution voxelization, and indirect light gathering.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4114
Streaming:
Download:
 
Sharing Physically Based Materials between Renderers with MDL
Jan Jordan (NVIDIA), Lutz Kettner (NVIDIA)
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-b ...Read More

The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based definitions can be defined while developers will learn what''s entailed in supporting MDL within their own product/renderer.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4115
Streaming:
Download:
 
Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's Latest Developer Tools Suite
Sebastien Domine (NVIDIA)
The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this technical presentation spans from advanced graphics to compute and ...Read More

The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this technical presentation spans from advanced graphics to compute and multi-core CPU tools to enable developers to fully take advantage of the heterogeneous computing horsepower available. More specifically, compute developers will learn about the tools available to program CUDA on Tegra K1. Graphics developers will be introduced to the new Tegra Graphics Debugger for Tegra K1. This new mobile graphics development tool supports all the advanced features that Tegra K1 has to offer, via OpenGL ES 2.0, 3.0 and OpenGL 4.3. Finally, game developers will see how to manage their Android build configuration and debugging sessions all within the latest Visual Studio 2013, profile their application to identify hot spots and corresponding call stacks with our brand new release of Tegra System Profiler.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4116
Streaming:
Download:
 
OpenGL Scene Rendering Techniques
Christoph Kubisch (NVIDIA)
OpenGL provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermor ...Read More

OpenGL provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4117
Streaming:
Download:
 
OpenGL Update for NVIDIA GPUs
Piers Daniell (NVIDIA), Mark Kilgard (NVIDIA)
Attend this session to get the most out of OpenGL on NVIDIA Quadro, GeForce, and Tegra GPUs. NVIDIAs OpenGL experts explain how the OpenGL standard is evolving and NVIDIAs latest support. See examples of the latest features for compute, tessella ...Read More

Attend this session to get the most out of OpenGL on NVIDIA Quadro, GeForce, and Tegra GPUs. NVIDIAs OpenGL experts explain how the OpenGL standard is evolving and NVIDIAs latest support. See examples of the latest features for compute, tessellation, vector graphics, and modern high-performance usage including AZDO (approximately zero driver overhead) techniques. Learn how your application can benefit from NVIDIA''s leadership driving OpenGL as a cross-platform, open industry standard.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4121
Streaming:
Download:
 
Image and Vision Processing on Tegra
Elif Albuz (NVIDIA)
Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fa ...Read More

Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fast processing of these algorithms enable new paradigms in embedded and mobile applications. Tegra K1 is built to address data parallel embedded and mobile applications, with CUDA enabled GPU, Image Signal processing Engine, NEON enabled quad-core ARM and encode and decode accelerator hardware. Tegra software libraries wrap all this capability and provide to the use of developers. In this session, an overview of software libraries and architecture that are relevant for image and vision computing on Tegra platforms will be presented.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4122
Streaming:
Download:
 
NVIDIA FlameWorks - Real-time Volumetric Fire and Smoke Simulation
Simon Green (NVIDIA)
Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big difference ...Read More

Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big differences between real-time and offline visual effects. In this talk we will show how volumetric effects are now practical on current GPU hardware. We will describe several new simulation and rendering techniques, including new solvers, combustion models, optimized ray marching and shadows, which together can make volumetric effects a practical alternative to particle-based methods for game effects.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4123
Streaming:
Download:
 
NVIDIA OptiX for High Performance Ray Tracing
David McAllister (NVIDIA), Damien Fagnou (MPC)
This session will cover everything developers need to get started with ray tracing in OptiX, including OptiX C and C++ APIs, the execution model, acceleration structures, programmable entry points, and best practices. We will also cover exciting ...Read More

This session will cover everything developers need to get started with ray tracing in OptiX, including OptiX C and C++ APIs, the execution model, acceleration structures, programmable entry points, and best practices. We will also cover exciting customer use cases and the new OptiX Prime API that provides to-the-metal ray tracing without shading or recursion.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4118
Streaming:
Download:
 
Delivering High-Performance Remote Graphics with NVIDIA GRID Virtual GPU
Andy Currid (NVIDIA)
Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the ...Read More

Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the GPU between multiple virtual machines, a walkthrough of Virtual GPU setup on Citrix XenServer with remote graphics, and examples of how to tune the configuration for optimum remote graphics performance.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4119
Streaming:
Download:
 
Cloud Architectures and Game Streaming with NVIDIA GRID Technologies
Eric Young (NVIDIA), Samuel Gateau (NVIDIA)
This session will cover the technologies behind NVIDIA GRID and game streaming in the cloud. We will present NVIDIA GRID technologies and the software components of the GRID SDK used for capturing graphics and using the hardware compression engi ...Read More

This session will cover the technologies behind NVIDIA GRID and game streaming in the cloud. We will present NVIDIA GRID technologies and the software components of the GRID SDK used for capturing graphics and using the hardware compression engine enabling developers to deliver the ultimate low latency cloud gaming experience. The second part will review our set of optimization guidelines to enable efficient game streaming from the cloud for improvements in performance and enhancements in the gameplay experience. We will also present research in cloud exclusive techniques that enable the use of of Global Illumination, Multiple-Viewport Rendering, and Hybrid and Cloud rendering for advanced game engines.

  Back
 
Keywords:
Best of GTC, SIGGRAPH 2014 - ID SIG4120
Streaming:
Download:
Big Data Analytics
Presentation
Media
Accelerate Distributed Data Mining with Graphics Processing Units
Nam-Luc Tran (EURA NOVA)
Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more ...Read More
Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more flexible models exist based on the DFG processing model. None of the existing frameworks however have considered the case when the individual processing nodes are equipped with GPUs to accelerate parallel computations. In this talk, we discuss this challenge and the implications of the presence of GPUs on some of the processing nodes on the DFG model representation of such heterogeneous jobs and on the scheduling of the jobs, with big data mining as principal use case.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4169
Streaming:
Download:
 
GPU-Accelerated Large-Scale Dense Subgraph Detection
Andy Wu (Xerox Research Center)
Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation ...Read More
Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation limitation, traditional approaches are infeasible when dealing with large-scale graph with millions or billions vertices. In this presentation, we proposed a GPU accelerated dense subgraph detection algorithm to solve the large-scale dense subgraph detection problem. It successfully mapped the irregular graph clustering problem into the GPGPU platform, and extensive experimental results demonstrated our strong scalability on the GPU computing platforms.  Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, GTC 2014 - ID S4215
Streaming:
Download:
 
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Haicheng Wu (Georgia Institute of Technology)
This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel f ...Read More
This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel fusion which can be applied to other applications.   Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, GTC 2014 - ID S4222
Streaming:
Download:
 
Histograms in CUDA: Privatized for Fast, Level Performance
Nicholas Wilt (The CUDA Handbook)
Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to u ...Read More
Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to use "privatized" (per-thread) histograms to balance performance of the average case against data-dependent performance of degenerate cases.  Back
 
Keywords:
Big Data Analytics, Video & Image Processing, GTC 2014 - ID S4249
Streaming:
 
Packet-based Network Traffic Monitoring & Analysis with GPUs
Wenji Wu (Fermilab)
In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications f ...Read More
In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications face tremendous performance and scalability challenges. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. At Fermilab, we have prototyped a GPU-assisted network traffic monitoring & analysis system, which analyzes network traffic on a per-packet basis. We implemented a GPU-accelerated library for network traffic capturing, monitoring, and analysis. The library consists of various CUDA kernels, which can be combined in various ways to perform monitoring and analysis tasks. In this talk, we will describe our architectural approach in developing a generic GPU-assisted network traffic monitoring and analysis capability. Multiple examples will be given to demonstrate how to use GPUs to analyze network traffic.  Back
 
Keywords:
Big Data Analytics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4320
Streaming:
Download:
 
The Energy Case for Graph Processing on Hybrid CPU and GPU Systems
Elizeu Santos-Neto (University of British Columbia)
This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the ...Read More
This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the algorithmic tasks exercise each of the processing units where they perform best; GPUs have much higher TDP thus their impact on overall energy consumption is unclear. An evaluation on large real-world graphs as well as on synthetic graphs as large as 1 billion vertices and 16 billion edges shows that efficiency - in terms of both performance and power, can be achieved.   Back
 
Keywords:
Big Data Analytics, Energy Exploration, GTC 2014 - ID S4338
Streaming:
 
Real-Time Quantification Filters for Multidimensional Databases
Peter Strohm (Jedox AG)
Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, g ...Read More
Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, given a set of dimensional elements, returns all those elements for which ANY (or ALL) numeric cells in the respective slice of a user-defined subcube satisfy a given condition. Such filters are especially useful for the exploration of big data spaces, for zero-suppression in large views, or for top-k analyses. In addition to the main algorithmic aspects, attendees will see how our implementation solves challenges such as economic utilization of the CUDA memory hierarchy or minimization of threading conflicts in parallel hashing.  Back
 
Keywords:
Big Data Analytics, Finance, GTC 2014 - ID S4395
Streaming:
Download:
 
Rhythm: Harnessing Data Parallel Hardware for Server Workloads
Sandeep Agrawal (Duke University)
We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the sh ...Read More
We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the shelf hardware used for individual machines leading to an inefficient usage of energy and area. Rhythm improves upon this by harnessing data parallel hardware to execute "cohorts" of web service requests, grouping requests together based on similar control flow and using intelligent data layout optimizations. An evaluation of the SPECWeb Banking workload for future server platforms on the GTX Titan achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4447
Streaming:
Download:
 
Parallel Lossless Compression Using GPUs
Evangelia Sitaridi (Columbia University)
Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute re ...Read More
Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute resources. We focus on a the DEFLATE algorithm that is a combination of the LZSS and Huffman entropy coding algorithms, used in common compression formats like gzip. Both algorithms are inherently serial and trivial parallelization methods are inefficient. We show how to parallelize these algorithms efficiently on GPUs and discuss trade-offs between compression ratio and increased parallelism to improve performance. We conclude our presentation with a head-to-head comparison to a multi-core CPU implementation, demonstrating up to half an order of performance improvement using a single Kepler GPU. This is joint work with IBM researchers Rene Mueller and Tim Kaldewey.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4459
Streaming:
Download:
 
GPUs and Regular Expression Matching for Big Data Analytics
Alon Shalev Housfater (IBM)
Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based r ...Read More
Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based regular expression technology will be introduced, its basic performance characteristics will be presented. We'll demonstrate that the GPU enables impressive performance gains in pattern matching tasks and compare its performance against latest generation processors. Finally, we'll examine the key challenges in using such accelerators in large software products and highlight open problems in GPU implementation of pattern matching tasks.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4462
Streaming:
 
High Speed Analysis of Big Data Using NVIDIA GPUs and Hadoop
Partha Sen (Fuzzy Logix)
Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs ...Read More
Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs to accelerate analytics on Hadoop is an optimal solution that drives high price to performance benefits. In this session, we'll demonstrate a solution using NVIDIA GPUs for the analysis of big data in Hadoop. The demo will show how you can leverage the Hadoop file system, it's map reduce architecture and GPUs to run computationally intense models bringing together both data and computational parallelism. Methods demonstrated will include classification techniques such as decision trees, logistic regression and support vector machines and clustering techniques like k means, fuzzy k means and hierarchical k means on marketing, social and digital media data.   Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, Finance, GTC 2014 - ID S4471
Streaming:
 
Recursive Interaction Probability: A New Paradigm in Parallel Data Processing
Richard Heyns (brytlyt)
This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will e ...Read More
This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will end with how RIP was implemented on a NVIDIA Kepler K20c, the design choices and how these affect performance. Use cases that play to the strengths of RIP as well as use cases that reveal its weaknesses will also be shared.   Back
 
Keywords:
Big Data Analytics, Numerical Algorithms & Libraries, Clusters & GPU Management, GTC 2014 - ID S4483
Streaming:
 
Indexing Documents on GPU - Can You Index Web in Real Time?
Michael Frumkin (NVIDIA)
Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an applicatio ...Read More
Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an application that has a large degree of parallelism, but medium divergence. Specifically, we concentrate on text processing used to index web documents. We present indexing algorithms for both GPU and CPU and show that GPU outperforms CPU on two common workloads. We argue that a medium sized GPU enabled cluster will be able to index all internet documents in one day. Indexing of web documents on GPU opens a new area for GPU computing. Companies that provide search services spend a lot of cycles on indexing. Faster and more energy efficient indexing on GPU may provide a valuble alternative to CPU-only clusters used today.   Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2014 - ID S4506
Streaming:
Download:
 
Evaluation of Parallel Hashing Techniques
Rajesh Bordawekar (IBM T. J. Watson Research Center)
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present ...Read More
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present different ways of implementing these functions on the GPU, with emphasis on data structures that exploit GPU's data parallel features as well as memory constraints.   Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, GTC 2014 - ID S4507
Streaming:
Download:
 
A High-Speed 2-Opt TSP Solver for Large Problem Sizes
Martin Burtscher (Texas State University)
Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and t ...Read More
Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and tiling, introducing non-determinism to avoid synchronization, and parallelizing each operation rather than across operations to minimize thread divergence and drastically lower the latency of result production. The final code evaluates 68.8 billion moves per second on a single Titan GPU.  Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, Supercomputing & HPC, GTC 2014 - ID S4534
Streaming:
Download:
 
Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL
Jeremy Meredith (Oak Ridge National Laboratory)
Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific ...Read More
Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific data model and targets future high performance computing ecosystems. This talk shows how a productive programming API built upon an efficient data model can help algorithm developers achieve high performance with little code. Discussions will include examples and lessons learned.  Back
 
Keywords:
Big Data Analytics, Scientific Visualization, GTC 2014 - ID S4553
Streaming:
Download:
 
Middleware Framework Approach for BigData Analytics Using GPGPU
Ettikan Kandasamy Karuppiah (MIMOS Bhd)
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing ...Read More
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing across heterogeneous computing resources for structured and unstructured BigData applications. Thus, we propose a middleware framework for 'Big Data' analytics that provides mechanisms for automatic data segmentation, distribution, execution, information retrieval across multiple cards (CPU & GPU) and machines, a modular design for easy addition of new GPU kernels at both analytic and processing layer, and information presentation. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory ac-cess, algorithms for parallel GPU computation and results for various test con-figurations are shown. Our results show proposed middleware framework pro-vides alternative and cheaper HPC solution to users.   Back
 
Keywords:
Big Data Analytics, Finance, Video & Image Processing, GTC 2014 - ID S4583
Streaming:
Download:
 
Extending Python for High-Performance Data-Parallel Programming
Siu Kwan Lam (Continuum Analytics, Inc)
Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich ...Read More
Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich library support and language simplicity makes Python ideal for subject matter experts to rapidly develop powerful applications. Python enables fast turnaround time and flexibility for custom analytic pipelines to react to immediate demands. However, CPython has been criticized as being slow and the existence of the global interpreter lock (GIL) makes it difficult to take advantage of parallel hardware. To solve this problem, Continuum Analytics has developed LLVM based JIT compilers for CPython. Numba is the open-source JIT compiler. NumbaPro is the proprietary compiler that adds CUDA GPU support. We aim to extend and improve the current GPU support in NumbaPro to further increase the scalability and portability of Python-based GPU programming.  Back
 
Keywords:
Big Data Analytics, Large Scale Data Analytics, Defense, Developer - Programming Languages, GTC 2014 - ID S4608
Streaming:
Download:
 
High-Performance Graph Primitives on GPU: Design and Implementation of Gunrock
Yangzihao Wang (UC Davis)
Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future developm ...Read More
Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. The talk will share experience on how to design the framework and APIs for computing efficient graph primitives on GPUs. We will focus on the following two aspects: 1) Details of the implementations of several graph algorithms on GPUs. 2) How to abstract these graph algorithms using general operators and functors on GPUs to improve programmer productivity.  Back
 
Keywords:
Big Data Analytics, Large Scale Data Analytics, Defense, GTC 2014 - ID S4609
Streaming:
Download:
 
Speeding Up GraphLab Using CUDA
Vishal Vaidyanathan (Royal Caliber)
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and ...Read More
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple gpus within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU based GAS framework.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, Large Scale Data Analytics, Defense, GTC 2014 - ID S4611
Streaming:
 
Speeding Up GraphLab Using CUDA
Vishal Vaidyanathan (Royal Caliber)
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and ...Read More
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple GPUs within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU-based GAS framework.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2014 - ID S4612
Streaming:
 
A High Level API for Fast Development of High Performance Graphic Analytics on GPUs
Zhisong Fu (SYSTAP)
The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performan ...Read More
The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performance graph analytics are critical for a large range of application domains. The SIMT architecture of the GPUs and the irregularity nature of the graphs make it difficult to develop efficient graph analytics programs. In this session, we present an open source library that provides a high level abstraction for efficient graph analytics with minimal coding effort. We use several specific examples to show how to use our abstraction to implement efficient graph analytics in a matter of hours.   Back
 
Keywords:
Big Data Analytics, Large Scale Data Analytics, Defense, GTC 2014 - ID S4617
Streaming:
Download:
 
Getting Big Data Done On a GPU-Based Database
Ori Netzer (SQream Technologies)
We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our dat ...Read More
We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our database provides close to real-time analytics and provides up to 100X faster insights all in a very cost-effective manner. We will elaborate on these features and more in order to provide a clear understanding of how our technology works and why it is beneficial for teleco companies.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4644
Streaming:
Download:
 
Parallel Decomposition Strategies in Modern GPU
Sean Baxter (NVIDIA)
Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join. ...Read More
Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2014 - ID S4674
Streaming:
 
Extreme Machine Learning with GPUs
John Canny (UC Berkeley)
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach ...Read More
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach is an easy-to-use, interactive environment similar to SciPy/Matlab, but with qualitatively higher performance. The session will discuss: Performance: BIDMach follows a "Lapack" philosophy of building high-level algorithms on fast low-level routines (like BLAS). It exploits the unique hardware features of GPUs to provide more than order-of-magnitude gains over alternatives. Accuracy: Monte-Carlo methods (MCMC) are the most general way to derive models, but are slow. We have developed a new approach to MCMC which provides two orders of magnitude speedup beyond hardware gains. Our "cooled" MCMC is fast and improves model accuracy. Interactivity: We are developing interactive modeling/visualization capabilities in BIDMach to allow analysts to guide, correct, and improve models in real time.  Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, Machine Learning & Deep Learning, Scientific Visualization, GTC 2014 - ID S4811
Streaming:
Download:
 
First Glimpse into the OpenPOWER Software Stack with Big Data Workload Example (Presented by IBM)
Keith Campbell (IBM), Ken Rozendal (IBM)
The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables uni ...Read More
The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables unique innovation across the full hardware and software stack. OpenPOWER ecosystem partners and developers now have more choice, control and flexibility to optimize at any level of the technology from the processor on up for next-generation, hyperscale and cloud datacenters. Integrating support for NVIDIA GPUs on the POWER platform enables high performance enterprise and technical computing applications such as Big Data and analytics workloads. This presentation will cover the software stack and developer tools for OpenPOWER, the planned support for CUDA, and a proof of concept showing GPU acceleration. This proof of concept will be available as a demo in the IBM booth.  Back
 
Keywords:
Big Data Analytics, Debugging Tools & Techniques, Developer - Programming Languages, GTC 2014 - ID S4882
Streaming:
Download:
 
Dynamic GPU Graph Analytics
Adam McLaughlin (Georgia Institute of Technology)
Graphs that model social networks, numerical simulations, and the structure of the internet are enormous and continuously changing with time. Contemporary software packages neglect temporal variations in these networks and can only analyze them stati ...Read More
Graphs that model social networks, numerical simulations, and the structure of the internet are enormous and continuously changing with time. Contemporary software packages neglect temporal variations in these networks and can only analyze them statically. This poster presents an optimized GPU implementation of dynamic betweenness centrality, a popular analytic with applications in power grid analysis, the study of protein interactions, and community detection. By avoiding unnecessary accesses to memory, we achieve up to a 110x speedup over a CPU implementation of the algorithm and can update the analytic 45x faster on average than a static recomputation on the GPU.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4171
Download:
 
Kinetic Parameter Estimation in Metabolic Networks With GPGPU
Ali Khodayari (PSU)
In this study, the recently introduced Ensemble Modeling (EM) approach was used to construct a kinetic model of E. coli metabolism. We put forth a metabolic model composed of 34 reactions and 22 metabolites representing E. coli's core metabolism. We ...Read More
In this study, the recently introduced Ensemble Modeling (EM) approach was used to construct a kinetic model of E. coli metabolism. We put forth a metabolic model composed of 34 reactions and 22 metabolites representing E. coli's core metabolism. We developed a Newton-Raphson based estimation approach to identify the kinetic parameters of a given metabolic network. The solver is designed and implemented using CUDA, in order to accelerate the overall process. The application initially parses a large set of equations using the Boost::Spirit C++ framework, finds an analytic Jacobian J, and then iteratively updates the 'best' solution with delta by solving J.delta=-f using GMRES from CUSP. Successive updates of the parameter set, Jacobian matrix, and function updates, as well as the system solver are all implemented on GPU.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4258
Download:
 
Fast Vertical Data Classification Using GPUs
Arjun G. Roy (NDSU)
Massive amounts of data is being generated in recent times. Current classification methods are quite accurate but extremely slow on big data. We propose a two-pronged approach a) Treat data vertically instead of conventional horizontal treatment and ...Read More
Massive amounts of data is being generated in recent times. Current classification methods are quite accurate but extremely slow on big data. We propose a two-pronged approach a) Treat data vertically instead of conventional horizontal treatment and use our vertical-data specific classification algorithm and b) Exploit GPUs fast mathematical computational speed to process vertical data quickly which significantly benefits from our data structure called P-Tree. Our classification algorithm is O(k) where k is number of attributes and achieves significantly high accuracy.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4263
Download:
 
GPU Accelerated PrefixSpan Algorithm for Sequential Pattern Mining
Benuraj Sharma (Sri Sathya Sai Institute Of Higher Learning)
This poster describes the CUDA version of PrefixSpan algorithm implemented on NVIDIA Kepler GPU's that extracts the inherent task parallelism and leverage the dynamic parallelism feature for implementing recursion. The results show that the GPU acce ...Read More
This poster describes the CUDA version of PrefixSpan algorithm implemented on NVIDIA Kepler GPU's that extracts the inherent task parallelism and leverage the dynamic parallelism feature for implementing recursion. The results show that the GPU accelerated PrefixSpan i.e., CUDAPrefixSpan achieves a speedup of ~5x for sequence database of varying sizes.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4187
Download:
 
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Haicheng Wu (Georgia Institute of Technology)
This poster presents the Red Fox system sponsored by NVIDIA Grad Fellowship program. It introduces the compilation flow and performance results of executing relational queries such as TPC-H on GPUs. ...Read More
This poster presents the Red Fox system sponsored by NVIDIA Grad Fellowship program. It introduces the compilation flow and performance results of executing relational queries such as TPC-H on GPUs.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4182
Download:
 
GPU Accelerated Histogram Based Analytics Engine
Jack Gerrity (Center For Advanced Public Safety, Computer Science Department at The University of Alabama)
In this poster, we show a column-based approach with the use of multiple GPUs to quickly produce ad-hoc histograms from previously compiled data. We then compare this approach's histogram building speed to Apache's Lucene based Solr. ...Read More
In this poster, we show a column-based approach with the use of multiple GPUs to quickly produce ad-hoc histograms from previously compiled data. We then compare this approach's histogram building speed to Apache's Lucene based Solr.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4230
Download:
 
Galactica - Accelerated Queries Processing
Keh Kok Yong (MIMOS Berhad)
Information is one of the most influential forces transforming the growth of business. Companies churn out a burgeoning volume of transactional data, capturing and matching trillions of bytes of information. This has caused the data to grow exponen ...Read More
Information is one of the most influential forces transforming the growth of business. Companies churn out a burgeoning volume of transactional data, capturing and matching trillions of bytes of information. This has caused the data to grow exponentially. Our works progressively research the mechanisms for accelerating SQL query operations by using GPU. This proposed system is able to process large volume of data, which is exceeding the total size of the GPU RAM. It performs fundamental SQL operations such as select, like, order by, join, sum, min and others. In addition, it works with PostgreSQL and MySQL.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4149
Download:
 
I/O Acceleration With GPU for I/O-bound Applications
Kento Sato (Tokyo Institute of Technology)
Recent several supercomputers usually have GPUs on each compute node to accelerate computation. However, not all applications can be accelerated by GPUs. For example, performance of I/O-bound applications is limited by underlying I/O device performan ...Read More
Recent several supercomputers usually have GPUs on each compute node to accelerate computation. However, not all applications can be accelerated by GPUs. For example, performance of I/O-bound applications is limited by underlying I/O device performance. Such I/O-bound applications require more I/O bandwidth rather than computational power. If we execute such non-GPU applications, the GPUs are not utilized, and we waste the resources. To accelerate I/O-bound applications, we develop GPU-accelerated I/O interface (gmfs). Our experimental results show gmfs can accelerate sequential read/write, and utilize 82% of PCIe-gen2 peak bandwidth, 50% of PCIe-gen3 peak bandwidth.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4185
Download:
 
Preliminary I/O Performance Evaluation on GPU Accelerator and External Memory
Koichi Shirahata (Tokyo Institute of Technology)
Recent supercomputers deploy not only many-core accelerators such as GPU but also Non-Volatile Memory (NVM) such as flash memory as an external memory, in order to handle large-scale data processing for a wide range of applications. However, how to c ...Read More
Recent supercomputers deploy not only many-core accelerators such as GPU but also Non-Volatile Memory (NVM) such as flash memory as an external memory, in order to handle large-scale data processing for a wide range of applications. However, how to construct NVM as local disks at a low cost with large volume for heterogeneous supercomputers is not clear. In order to clarify I/O characteristics between GPU and NVM, we comparatively investigate I/O strategies on GPU and multiple mini SATA SSDs. Our preliminary results exhibit 3.06GB/s of throughput from 8 mini SATA SSDs to GPU by using RAID0 with appropriate stripe size.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4251
Download:
 
Acceleration of K-Means and K-Means++ Using CUDA
Marek Fiser (Purdue University)
The research project focuses on GPU implementation of commonly used clustering algorithm K-means. Our implementation minimizes the overhead caused by copying the data between CPU and GPU. We were able to implement the entire algorithm on GPU which gr ...Read More
The research project focuses on GPU implementation of commonly used clustering algorithm K-means. Our implementation minimizes the overhead caused by copying the data between CPU and GPU. We were able to implement the entire algorithm on GPU which greatly improved the performance over CPU reaching up to 15x speedup. Our work also analyses an improved version of the algorithm called K-means++. This algorithm builds on the original version of K-means improving it by more careful initialization which leads to better performance. We adjusted a K-means++ algorithm to work on a GPU, which led to 9x speedup.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4264
Download:
 
Space and Speed Advantage of pTree for Big Data Processing
Mohammad Hossain (NDSU CS Department)
This poster shows the space compression and speed gain while processing 'Big Data' using pTrees which is a vertical bit slice of column of a data set. Our experiment shows 92% speed gain over traditional processing of data set of size in the range ...Read More
This poster shows the space compression and speed gain while processing 'Big Data' using pTrees which is a vertical bit slice of column of a data set. Our experiment shows 92% speed gain over traditional processing of data set of size in the range of billion records.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4268
Download:
 
Towards A Hash Based GroupBy/Aggregate Algorithm for Fast Query Processing on GPU
Sina Meraji (IBM Canada Ltd.)
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. As part of our research, we are developing a high performance GPU library for costly database opera ...Read More
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. As part of our research, we are developing a high performance GPU library for costly database operations. Our work will leverage the latest NVIDIA GPU features (i.e. Unified Virtual Addressing, Multi-Streaming) and various host side partitioning algorithms to run database operations on large size tables. The focus of this article is on the prototype for GroupBy/Aggregate operations that we created to exploit GPUs. The algorithm has two main steps. In the first step, we create a hash table by doing coalesced reads from the table on which we run the Groupby/aggregate query. The aggregation operations occur at the same time as creating the hash table. After creating the hash table, we only need a probe phase to retrieve result from hash. Our results indicate that by using GPU shared memory we can get 28X speed up over the CPU implementation  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4211
Download:
 
Processing Data Streams With Hard Real-Time Constraints on CPU/GPU Systems
Uri Verner (Technion)
Growing rates of collected data present a challenge when it comes to scalable solutions for data transmission and processing. Even more challenging is the problem of real-time stream processing. In such applications, the system needs to react to the ...Read More
Growing rates of collected data present a challenge when it comes to scalable solutions for data transmission and processing. Even more challenging is the problem of real-time stream processing. In such applications, the system needs to react to the incoming data within given time bounds. This poster presents the challenges in processing multiple real-time data streams on CPU/GPU systems, and the results of our efforts for dealing with these challenges. The work addresses various issues related to single- and multi-GPU systems, including resource sharing in computation and communication under real-time constraints.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4141
Download:
 
Real-Time GPU Computation of Ballistic Thermal Signatures
Glenn Parker (Georgia Tech Research Institute)
GTRI has implemented and tested a CUDA upgrade to a ground-to-air Hardware-in-the-Loop missile simulator. By breaking a single-threaded thermal integrator loop into multiple independent kernels, a speedup of 20X is achieved for complex targets. This ...Read More
GTRI has implemented and tested a CUDA upgrade to a ground-to-air Hardware-in-the-Loop missile simulator. By breaking a single-threaded thermal integrator loop into multiple independent kernels, a speedup of 20X is achieved for complex targets. This speed increase reduces computation time from days to hours, and preliminary results show that multiple GPUs may allow additional speedup by removing stream concurrency limits.  Back
 
Keywords:
Big Data Analytics, Computational Physics, GTC 2015 - ID P5136
Download:
 
Parallel Map Projection of Vector-based Big Spatial Data
Wenpeng Feng (University of North Carolina at Charlotte)
Because of large volume and complex algorithm of map projection, transformation of Geo-referenced data among various projection are facing big computational challenge, especially for big spatial data. In order to overcome this challenge, we present a ...Read More
Because of large volume and complex algorithm of map projection, transformation of Geo-referenced data among various projection are facing big computational challenge, especially for big spatial data. In order to overcome this challenge, we present a cloud-based parallel computing framework for accelerating the map projection of vector-based big spatial data in this work. GPU-enabled parallel map projection algorithms were developed based on CUDA platform for our framework.  Back
 
Keywords:
Big Data Analytics, Supercomputing & HPC, GTC 2015 - ID P5161
Download:
 
Large-Scale Pattern Recognition Using GPU-Accelerated Relational Database
Matthew England (University of Missouri, Columbia)
We are leveraging the abilities of relational databases for scalable storage and retrieval, and massively parallelized computation on the GPU to perform large-scale pattern recognition tasks. We have successfully integrated these two technologies, pr ...Read More
We are leveraging the abilities of relational databases for scalable storage and retrieval, and massively parallelized computation on the GPU to perform large-scale pattern recognition tasks. We have successfully integrated these two technologies, providing the database with the means to do high-performance computation on massive stored datasets. Internalizing this capability within the database facilitates blending of advanced relational and spatial operations into pattern matching tasks; which is applicable in a variety of fields.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5233
Download:
 
Accelerating Topological Data Analysis Using GPUs
Ryan Hsu (Ayasdi)
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analy ...Read More
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analytics software to simplify the analysis of complex, multi-variate datasets. In this poster, we illustrate how GPGPU's can be leveraged to accelerate key operations in TDA by over 14X.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID P5239
Download:
 
GPU Based Data Analysis on the example of Time-of-flight Spectroscopy
Gregor Hartmann (DESY)
Free electron lasers enable to study non-linear multi-photon processes within one single FEL shot which is less than 50ft long and has a repetition rate of 120Hz at LCLS. As a result, a huge amount of data is created in a very short acquisition time. ...Read More
Free electron lasers enable to study non-linear multi-photon processes within one single FEL shot which is less than 50ft long and has a repetition rate of 120Hz at LCLS. As a result, a huge amount of data is created in a very short acquisition time. Analyzing this data on a single shot level needs a lot of computing power but can be massively parallelized. In order to decrease the evaluation time, we created a GPU-based evaluation software for our electron time-of-flight spectrometer setup.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5276
Download:
 
Massively Parallel Geo-Spatial Coordinates Computation with GalacticaDB
Keh Kok Yong (MIMOS Berhad)
Persons or things have become a mobile sensor, with GPS enabled. These self-quantified technologies generate humongous amounts of raw data, which can turn to be a valuable information for users and business. A massively parallel SQL-like parallel eng ...Read More
Persons or things have become a mobile sensor, with GPS enabled. These self-quantified technologies generate humongous amounts of raw data, which can turn to be a valuable information for users and business. A massively parallel SQL-like parallel engine, GalacticaDB with extended geo-spatial capabilities. It accelerates analytic computation with optimizing queries processing and exploiting NVIDIA Tesla GPUs. Our results indicate that the GPU is an effective and energy efficient co-processor for executing database query operations.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5277
Download:
 
Fuzzy String Matching of Vehicle Identification Numbers in a Highly Parallel Environnent
Mason Saucier (Center For Advanced Public Safety, Computer Science Department at The University of Alabama)
This poster analyzes GPU implementation of the Levenshtein Distance function for fuzzy string matching of a Vehicle Identification Number (VIN) against a dataset of known VINs. Our solution gives a 13x - 15x increase over similar CPU solutions. Our w ...Read More
This poster analyzes GPU implementation of the Levenshtein Distance function for fuzzy string matching of a Vehicle Identification Number (VIN) against a dataset of known VINs. Our solution gives a 13x - 15x increase over similar CPU solutions. Our work aims to help correct human error that occurs during data entry and return meaningful information to the user, that they can then use to inform their decisions.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5305
Download:
 
Fighting Malware with GPUs in Real Time
Libor Morkovsky (Avast s.r.o)
Today's malware ecosystem produces hundreds of thousands distinct samples per day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a se ...Read More
Today's malware ecosystem produces hundreds of thousands distinct samples per day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID P5313
Download:
 
GPU Accelerated Multi-predicate Join Algorithms for Listing Cliques in Graphs
Haicheng Wu (Georgia Institute of Technology)
This poster introduces how to run general join algorithm in GPU to solve an important graph problem: clique listing. In particular, two different join algorithms are presented for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a ...Read More
This poster introduces how to run general join algorithm in GPU to solve an important graph problem: clique listing. In particular, two different join algorithms are presented for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a recently presented worst-case optimal multi-predicate join algorithm. The second is a novel approach, inspired by the first approach but more suitable for GPU architectures. The performance benchmarks show that for both approaches using GPUs is efficient.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID P5319
Download:
 
Gunrock: A High-Performance Graph Processing Library on the GPU
Yangzihao Wang (University of California, Davis)
For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock" ...Read More
For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system, uses a high-level bulk-synchronous abstraction with traversal and computation steps, designed specifically for the GPU. It is a framework that is general, straightforward to program, and fast (on par with hardwired primitives and faster than any other programmable GPU library).  Back
 
Keywords:
Big Data Analytics, Developer - Tools & Libraries, GTC 2015 - ID P5326
Download:
 
From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA
Paul Richmond (University of Sheffield)
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system c ...Read More
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system can be simulated and visualized at large scales using the open source FLAME GPU framework. Methods of code generation from XML documents and use of CUDA streams for heterogeneous state execution are presented. Examples include cellular tissue modelling and large scale crowd dynamics.  Back
 
Keywords:
Big Data Analytics, Developer - Tools & Libraries, Life & Material Science, GTC 2015 - ID S5133
Streaming:
Download:
 
Coordinating More Than 3 Million CUDA Threads for Social Network Analysis
Adam McLaughlin (Georgia Institute of Technology)
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in commu ...Read More
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running BC on 192 GPUs.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Supercomputing & HPC, GTC 2015 - ID S5156
Streaming:
Download:
 
Fast Triangle Counting for Social Network Analytics on the K40
Oded Green (ArrayFire)
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks. ...Read More
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5176
Streaming:
Download:
 
Big Data on a Budget: Cost Efficient Large-Scale Graph Analytics
Joe Schneible, Ph.D. (Technica Corporation)
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include in ...Read More
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include insights into data storage structures for I/O efficient processing as well as the application of the massive parallelism of the GPU to real world graph data.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Machine Learning & Deep Learning, GTC 2015 - ID S5200
Streaming:
Download:
 
High Performance Indexing of Large Data Sets Using GPU
Massimo Bernaschi (National Research Council of Italy)
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at imp ...Read More
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at improving efficiency and reliability of the indexing process.The solution we propose is scalable and exploits in-memory computing to minimize I/O operations and enhance performance. Moreover we describe the CUDA-based parallelization of the most compute-intensive tasks involved in the indexing process. The integration of the CUDA components within an architecture that is mostly Java-based led us to develop a technique for Java-CUDA interoperability that can be applied to other applications. Some visualisation results will also be presented.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5212
Streaming:
Download:
 
Maximize the Performance of your Cluster: Marrying GPUs and Dataflow Graph Processing
Nam-Luc Tran (EURA NOVA)
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardw ...Read More
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardware. Among these the dataflow graph processing model is the most general, representing jobs as distributed operators (nodes) connected by data channels (edges). In this talk, we explain how we have extended an existing dataflow graph processing framework to fully take into account GPU resources in the cluster. We show how this paradigm fully exploits the batch and streaming features of the GPU in a distributed job. We then finally expose our model for the scheduling on this heterogeneous processing framework.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5215
Streaming:
Download:
 
Unleashing The Power Of GPUs Over The Web
Vishal Vaidyanathan (Royal Caliber)
GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure tha ...Read More

GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure that provides a suite of GPU-driven machine learning and graph algorithms as a web service. The effortless usability of an HTTP API unlocks the power of GPU computing with none of the attendant complexities. As examples, we will show interactive analytics on web-scale graphs and deep learning on large data sets using nothing more than a modern web browser.

  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5224
Streaming:
Download:
 
Towards Fast SQL Query Processing in DB2-BLU Using GPUs
Sina Meraji (IBM)
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processin ...Read More
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processing in such databases, GPUs can be used as fast, high bandwidth co-processors. As part of our work, we integrate Nvidia GPUs to DB2-BLU by changing the infrastructure of DB2-BLU and developing GPU kernels. We have a hybrid design in which we use some of DB2-BLU features on IBM's POWER8 processor and NVIDIA's GPU accelerator technology for fast query processing. This work was done in collaboration with Peter Kokosielis.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5229
Streaming:
Download:
 
PG-Strom: Query Acceleration Engine of PostgreSQL Powered by GPGPU
Kohei KaiGai (NEC)
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, h ...Read More
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, however, increasion of data size makes performance concerns. PG-Strom is an extension of PostgreSQL database, designed to off-load several CPU intensive query workloads (scan, join and aggregation; right now) to GPGPU, then x10 times faster than existing SQL implementation. Its characteristics well fits usual workloads of BI (business intelligence) tools in cost effective way, but not all. PG-Strom extension is released under the GPLv2 terms, and will be supported by PostgreSQL v9.5.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID S5276
Streaming:
Download:
 
Recent Advances in Multi-GPU Graph Processing
Giancarlo Carbone (Sapienza Universtity of Rome)
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achi ...Read More
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achieve excellent performances in the traversal, via a level synchronous Breadth First Search (BFS), of large scale graphs (i.e. million of nodes and billions of edges) using multiple GPUs system. We are going to present our recent activities related to GPU-based graph processing: a new implementation of the BFS based on a 2D partitioning exploiting the atomic operations of the Kepler architecture, two solutions to the st-connectivity problem and all-pairs shortest path. Some of these can be of immediate use in the analysis of large sets of data.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5337
Streaming:
Download:
 
GPU-Accelerated Network Centrality
Erik Saule (University of North Carolina at Charlotte, Department of Computer Science)
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) ...Read More
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) that need to be performed for ensuring the computation is correct. We will show how to interleave shortest path based computation in the context of network centrality metric to reduce the number of memory accesses and to maximize their coalescing. We will also see how the representation of the network in memory is key to balance thread divergence and the number of atomic operations.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5425
Streaming:
Download:
 
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Computing
Rajesh Bordawekar (IBM T. J. Watson Research Center), Ruchir Puri (IBM Research)
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM ...Read More
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5459
Streaming:
Download:
 
Multi-Dimensional, In-GPU-Memory Databases: Streaming Conditional Calculations in Big Data Sets
Peter Strohm (Jedox AG)
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated ...Read More
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated values as input for further processing like conditional calculating (if-then-else) or top-k evaluation and therefore often run into memory problems. We present the design of optimized condition-based processors in large data sets combined with a floating frame approach to stream through these data areas. Conditional calculations are especially useful to split large value sets into clusters for further analyzing or aggregating and we will provide examples on real world social media data including localized Twitter trends and Wikipedia page hits.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2015 - ID S5481
Streaming:
Download:
 
Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems by Extending Cloudera Impala
Jianting Zhang (The City College of New York)
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do ...Read More
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do not support geospatial data. In addition to our work on managing spatial data on single-node GPUs, we have integrated our parallel designs with an open source, a big data system called Impala to support both efficient and scalable distributed spatial query processing in an interactive SQL environment. We present system architecture, data parallel designs for spatial indexing and query processing as well as performance on real datasets for point-in-polygon test based spatial joins.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5489
Streaming:
 
Map-D: Hyper-Interactive GPU-Powered Visualytics for Big Data
Todd Mostak (Map-D)
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while sing ...Read More
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Real-Time Graphics, GTC 2015 - ID S5544
Streaming:
 
Scaling Data Visualization with GPUs and Design
Leo Meyerovich (Graphistry, Inc.)
GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximiz ...Read More

GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximize visibility. The bad news is that these layouts and basic interactions are computationally intensive enough that analysts can no longer simply slide a slider, drag a graph cluster, etc. With the availability of GPUs, however, the rules have changed. This talk shows examples of smarter designs and how we use GPUs to turn them into interactive tools. For experts, we will discuss how running in browsers and even phones led to Graphistry's tiered GPU visualization engine approach, and touch on our use of WebGL, WebCL, and our own in-house libraries.

  Back
 
Keywords:
Big Data Analytics, Web Acceleration, Visualization - In-Situ & Scientific, GTC 2015 - ID S5589
Streaming:
Download:
 
Fighting Malware With GPUs in Real Time
Peter Kovac (Avast Software)
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosys ...Read More
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosystem every day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5612
Streaming:
Download:
 
Single CUDA Block Implementation of Time Synchronous Viterbi Search for Speech Recognition
Nigel Cannings (Chase Information Technology Services Limited)
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utteranc ...Read More
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utterances are decoded simultaneously using CUDA streams. The single CUDA block approach is shown to be substantially more efficient and enables overlapping of CPU and GPU computation by merging ten thousands of separate CUDA kernel calls for each utterance. The proposed approach has the disadvantage of large GPU global memory requirement because of the simultaneous decoding feature. However, the latest GPU cards with up to 12GB of global memory fulfill this requirement and the full utilization of the GPU card is possible using all available SMs.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5658
Streaming:
 
POWER8 and GPUs: Helping Unfold the Intricate Loops of Genome Architecture (Presented by IBM)
Ido Machol (Baylor College of Medicine)
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for tr ...Read More
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for true chromatin loops. This team is working with IBM POWER8 and NVIDIA Tesla GPU technologies to creating customized algorithms for enabling genomics scientists to see fine details about genome folding and learn more about genetic regulation. The maps of looping revealed thousands of hidden switches not known to have existed before. For genes that cause diseases or cancers, locating these switches is essential. GPUs help speed up these algorithms up to 200x, reducing the cycle time to process a single chromosome from a week long process to less than a coffee break.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Life & Material Science, GTC 2015 - ID S5821
Streaming:
Download:
 
SenDISA: Distributed Intelligent, Video, Sensor & Actuator Analytics Platform for Smart Cities (Presented by Sensen)
Dr. Subhash Challa (Sensen Networks)
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Serv ...Read More
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Services Australia, Westgate Bridge, Melbourne, Australia, City of Trondheim, Norway, City of Brisbane, Ipswich city, Manly city and more. We will present how our innovative algorithms powered by the GPGPU based SenDISA platform is enabling Big Data analytic applications by fusing data from Video, Sensor & IoT devises and combining them with other transaction data to deliver smart city solutions across the globe. We will provide insights into the architecture of SenDISA and the market specific Big Data solutions serving different market verticals.  Back
 
Keywords:
Big Data Analytics, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5869
Streaming:
Download:
 
In-Place Computing on PostgreSQL: SQL as a Shortcut of GPGPU
Kohei KaiGai (NEC)
Near data computing is one recent technology trend. Cost of data translation is never ignorable, thus people are inclined to run their tasks on the location of data (e.g., Hadoop). Our PG-Strom technology transparently off-loads some CPU-intensive SQ ...Read More
Near data computing is one recent technology trend. Cost of data translation is never ignorable, thus people are inclined to run their tasks on the location of data (e.g., Hadoop). Our PG-Strom technology transparently off-loads some CPU-intensive SQL workloads to GPU devices, using automatic SQL-to-CUDA code generator. It enables users to describe their mathematical/statistical algorithm by SQL, then run these logic on the location very close to the data managed with PostgreSQL database. Usually, users have to export an entire dataset once, prior to what they really want to process. However, integration of GPU computing power within SQL database eliminates the necessity of these tasks, and allows researchers to focus on what they really want to dive into.  Back
 
Keywords:
Big Data Analytics, GTC 2016 - ID S6118
 
Unblock Performance Limit of DNN by CUDA? in R
Patric Zhao (NVIDIA)
You''ll learn technical solutions to accelerate R by CUDA. R''s DNN has become a very popular approach in statistical analysis areas. Even though there are several DNN packages in R, it is rarely used in big data and deep neural networking because th ...Read More
You''ll learn technical solutions to accelerate R by CUDA. R''s DNN has become a very popular approach in statistical analysis areas. Even though there are several DNN packages in R, it is rarely used in big data and deep neural networking because the single core performance of R is limited and the current design of DNN packages in R is not GPU-friendly. Firstly, we''ll introduce how we apply specific patterns, such as general matrix multiplication (GEMM), to DNN in R, which is a GPU-friendly pattern and can be easily accelerated by cuBLAS. Secondly, we''ll show the tradeoff between performance and memory usage in R for DNN. Finally, we''ll package all of these CUDA approaches into a R package and publish to CRAN so than anyone can install it in R quickly, and get significant performance improvement from NVIDIA GPUs.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Performance Optimization, GTC 2016 - ID S6156
Streaming:
Download:
 
CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs
Wei Tan (IBM T. J. Watson Research Center)
We present cuMF, a highly optimized matrix factorization system on GPUs. Matrix factorization (MF) is a key algorithm in recommender systems. On a single GPU, we introduce a memory-optimized alternating least square (ALS) method; it alleviates discon ...Read More
We present cuMF, a highly optimized matrix factorization system on GPUs. Matrix factorization (MF) is a key algorithm in recommender systems. On a single GPU, we introduce a memory-optimized alternating least square (ALS) method; it alleviates discontiguous access and aggressively uses registers, so as to reduce memory latency. On multiple GPUs, we combine data parallelism with model parallelism, and introduce a topology-aware parallel reduction method, so as to scale ALS to multiple GPUs. Using only one machine with four NVIDIA GPU cards, cuMF can be 6-10 times as fast, and 33-100 times as cost-efficient, compared with the state-of-art distributed CPU solutions. Moreover, cuMF can solve the largest matrix factorization problem ever reported.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Performance Optimization, GTC 2016 - ID S6211
Streaming:
Download:
 
Data Analytics and Machine Learning at Your Finger Tips - No CUDA Required
Bryan Thompson (Blazegraph)
Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) a ...Read More
Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) and graph pattern matching (SPARQL) that provide speedups of up to 1,000x over Spark native and up to 300x over leading graph databases when executed on the BlazeGraph platform. These high-level languages are translated into task graphs that expose the available parallelism. The mapgraph runtime evaluates the task graphs and provides a scalable architecture on GPUs and GPU clusters. This presentation discusses the concepts for graph algorithms and queries, the mapgraph architecture, and how algorithms are evaluated on a GPU cluster.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Aerospace & Defense, GTC 2016 - ID S6267
Streaming:
Download:
 
Accelerating Spark Workloads Using GPUs
Rajesh Bordawekar (IBM Research)
The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We''ll discus ...Read More
The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We''ll discuss opportunities for exploiting GPUs for accelerating different Spark components such as MLLib. The talk will first overview the Spark programming and execution model and the describe key issues in integrating GPUs into the Spark infrastructure. We then describe our approach for enabling Spark to use multiple GPUs in a distributed manner and provide details of accelerating key MLLib kernels without changing the source Spark program.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Algorithms, GTC 2016 - ID S6280
Streaming:
Download:
 
Anomaly Detection and Categorization Using Unsupervised Deep Learning
Stephen McGough (Durham University)
The potential information buried within datasets is immense -- though extracting this information is difficult when the data is large, noisy, unlabeled, and unstructured. We present the use of GPGPU-powered unsupervised deep learning to identify the ...Read More
The potential information buried within datasets is immense -- though extracting this information is difficult when the data is large, noisy, unlabeled, and unstructured. We present the use of GPGPU-powered unsupervised deep learning to identify the anomalies within such datasets. Analysis of these anomalies can be performed to determine which are "pertinent" and which are "benign." Once the significance of an anomaly has been determined, this then becomes a label, which is added to the data. Repeating this process will lead to unlabeled data becoming labeled. This newly labeled data can be used to train a supervised deep learning system to identify new instances of that stereotype. We demonstrate how GPGPUs can be used to enable real-time anomaly detection and stereotyping.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6340
Streaming:
Download:
 
Visual Sensemaking with GPU-Driven Machine Learning
Stef van den Elzen (SynerScope BV)
We show how our interactive, integrated analytics solution allows a new class of users to perform machine-assisted visual sensemaking. Up till now, machine learning techniques such as predictive analytics and deep learning are mostly used as part of ...Read More
We show how our interactive, integrated analytics solution allows a new class of users to perform machine-assisted visual sensemaking. Up till now, machine learning techniques such as predictive analytics and deep learning are mostly used as part of a complex tool-chain that serves as an endpoint in the decision making process. We combine the strengths of human decision making and GPU-driven machine learning in a multi-coordinated visual analytics solution. This enables the discovery of actionable insights by bridging the gap between data scientist and business user.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Self-Driving Cars & Automotive, GTC 2016 - ID S6356
Streaming:
 
Graph Analytics: Using GPU-Accelerated Sparse Linear Algebra Routines
Paul Fox (EM Photonics, Inc.)
Large-scale graph analytics frameworks provide a convenient and highly scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. ...Read More
Large-scale graph analytics frameworks provide a convenient and highly scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. We''re developing an implementation of the high-level functions supported by these APIs in terms of linear algebra operations, which will be parallel on each pair of vertices connected by an edge. This technology can reduce the number of nodes required and map well to computational accelerators such as GPUs, thus enabling users to perform more complex analysis with less hardware at lower cost. We''ll detail our latest work on this project, including challenges, specifics of our approach, and preliminary results.  Back
 
Keywords:
Big Data Analytics, Algorithms, GTC 2016 - ID S6360
Streaming:
 
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Processing
Jose Ricardo da Silva Junior (Universidade Federal Fluminense)
Learn how to perform data analysis over software repositories in GPU architecture Dominoes tool. We''ll give an overview and introduction of the tool and its capabilities, which provides a unified view of the computational resources. Dominoes allows ...Read More
Learn how to perform data analysis over software repositories in GPU architecture Dominoes tool. We''ll give an overview and introduction of the tool and its capabilities, which provides a unified view of the computational resources. Dominoes allows anyone to explore large software repositories at any grain (files, methods, or classes), without using any programming language. Due to its high-level parallel architecture in GPU, the results are processed in real time. The attendees will learn the strategy used by Dominoes to allow big data to be processed over GPU.  Back
 
Keywords:
Big Data Analytics, Tools & Libraries, GTC 2016 - ID S6372
Streaming:
Download:
 
Gunrock: A Fast and Programmable Multi-GPU Graph Processing Library
Yangzihao Wang (University of California Davis)
We present Gunrock, a multi-GPU graph processing library, that enables easy graph algorithm implementation and extension onto multiple GPUs for scalable performance on large graphs with billions of edges. Attendees can learn how to 1) solve large-sca ...Read More
We present Gunrock, a multi-GPU graph processing library, that enables easy graph algorithm implementation and extension onto multiple GPUs for scalable performance on large graphs with billions of edges. Attendees can learn how to 1) solve large-scale graph problems with high-performance GPU computing primitives and optimization strategies, using our high-level data-centric abstraction that focuses on vertex or edge frontier operations, and 2) utilize multi-GPU computing power by just a few algorithm-dependent blocks, using our multi-GPU framework that handles most multi-GPU implementation details and memory allocation. We will also share experience on the library''s design and implementation that helps it achieve the best performance among programmable GPU graph libraries.  Back
 
Keywords:
Big Data Analytics, Tools & Libraries, Supercomputing & HPC, GTC 2016 - ID S6374
Streaming:
Download:
 
Graph Database and Analytics in a GPU-Accelerated Cloud Offering
Brad Bebee (Blazegraph)
Blazegraph GPU provides 300X acceleration for SPARQL graph query and graph database management with acceleration for existing RDF/SPARQL and Property Graph (Tinkerpop) applications. Multi-GPU configurations can effectively manage billion+ edge graphs ...Read More
Blazegraph GPU provides 300X acceleration for SPARQL graph query and graph database management with acceleration for existing RDF/SPARQL and Property Graph (Tinkerpop) applications. Multi-GPU configurations can effectively manage billion+ edge graphs on single-node machines with 4 or 8 K80 GPU accelerators. This is a cost-effective way to deliver high performance for graphs, but many end-users and applications do not have existing multi-GPU systems; current cloud offerings at this scale are not generally available. Cirrascale has developed a cloud-based solution for provisioning multi-GPU Tesla systems using its switch riser technology. This session details the Blazegraph GPU cloud offering on Cirrascale, demonstrates how to quickly deploy it in the cloud, and shows graph benchmarks on cloud systems.  Back
 
Keywords:
Big Data Analytics, Data Center & Cloud Computing, Aerospace & Defense, GTC 2016 - ID S6395
Streaming:
Download:
 
Production Intelligence: GPU-Databases for Predictive Maintenance and In-Line Controlling in Automobile Manufacturing
Peter Strohm (Jedox AG)
Learn how in-GPU-memory databases optimize complex manufacturing processes by enabling real-time data input into big datasets, in-line decision making, and predictive maintenance. In general, manufacturing processes today provide tons of data, e.g., ...Read More
Learn how in-GPU-memory databases optimize complex manufacturing processes by enabling real-time data input into big datasets, in-line decision making, and predictive maintenance. In general, manufacturing processes today provide tons of data, e.g., on the process itself, workpieces, machine sensor data, parts delivered by external vendors, etc. In the Production Intelligence project, our goal is to turn this unspecific data into "smart data" to gain better insight in the manufacturing process, e.g., prevent machine shutdowns or decrease the amount of junk parts. We''ll present our solutions to streaming input data vectors into big datasets, analyzing incoming data in real time and predicting production or system errors with the help of deep learning algorithms.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Computer Vision & Machine Vision, GTC 2016 - ID S6426
Streaming:
Download:
 
Accelerating Influence Spread Estimation on Social Networks in the Continuous-Time Domain
Zissis Poulos (University of Toronto, Sysomos Inc.)
This session showcases how to leverage GPUs to accelerate influence spread estimation in large social networks. Estimating the spread of an opinion or product across members of a graph-modelled social network is a hard problem requiring compute-inten ...Read More
This session showcases how to leverage GPUs to accelerate influence spread estimation in large social networks. Estimating the spread of an opinion or product across members of a graph-modelled social network is a hard problem requiring compute-intensive approximation algorithms. The complexity of the problem further rises in the continuous-time domain, where influence transmission rates on network edges are derived from stochastic distributions. Spread estimation algorithms that operate on stochastic transmission rates, such as naive sampling and neighbourhood size estimation, require a plethora of samples to achieve convergence. By exploiting the inherent independence across multiple sampling iterations of these algorithms we achieve up to 11x improvement in run-time using GPUs.  Back
 
Keywords:
Big Data Analytics, Algorithms, GTC 2016 - ID S6471
Streaming:
Download:
 
The Promise of GPU Analytics or Why GPU is the New CPU
Todd Mostak (MapD)
We''ll explain why GPU-powered in-memory databases and analytics platforms are the logical successor to CPU in-memory systems, largely due to recent increases in onboard memory available on GPUs. With sufficient memory, GPUs possess numerous advantag ...Read More
We''ll explain why GPU-powered in-memory databases and analytics platforms are the logical successor to CPU in-memory systems, largely due to recent increases in onboard memory available on GPUs. With sufficient memory, GPUs possess numerous advantages over CPUs, including much greater compute and memory bandwidth and a native graphics pipeline. We''ll demo how MapD is able to leverage multiple GPUs per server to extract orders-of-magnitude performance increases over CPU-based systems, bringing interactive querying and visualization to multi-billion row datasets.  Back
 
Keywords:
Big Data Analytics, Performance Optimization, GTC 2016 - ID S6472
Streaming:
Download:
 
Data Science Applications of GPUs in the R Language
Norm Matloff (University of California, Davis)
In this presentation, you will learn about the use of GPUs in data science applications using the R language, as well as a general method, Software Alchemy, for parallelizing statistical applications. The talk will provide an overview of R libraries ...Read More
In this presentation, you will learn about the use of GPUs in data science applications using the R language, as well as a general method, Software Alchemy, for parallelizing statistical applications. The talk will provide an overview of R libraries available for interfacing with GPUs, and discussion of issues involved in writing such libraries, before showing you how to use Software Alchemy (with or without R) to overcome GPU memory limitations in statistical applications.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6708
Streaming:
Download:
 
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning (Presented by Hewlett Packard Enterprise)
Natalia Vassilieva (Hewlett Packard Enterprise)
Applications of deep learning in sensor data analysis has not been studied as extensively as in speech and vision. However, sensor data have properties similar to those of images and audio: multidimensional, with intrinsic dependencies and correlatio ...Read More
Applications of deep learning in sensor data analysis has not been studied as extensively as in speech and vision. However, sensor data have properties similar to those of images and audio: multidimensional, with intrinsic dependencies and correlations in the data, and hard to analyze with conventional approaches. Our results prove that deep learning has better generalization capabilities compared to conventional methods on sensor data and has high potential in sensor data analytics. We also address scalability issues of the training process for models best suited for sensor data. The training of these models do not scale-out beyond a certain number of nodes.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, IoT, GTC 2016 - ID S6773
Streaming:
Download:
 
The OpenPOWER Foundation: Revolutionizing Data-Centric Transformation (Presented by IBM)
Sumit Gupta (IBM Power Systems)
The growth of the OpenPOWER Foundation has been phenomenal. Why, you might ask? In less than two years, OpenPOWER has grown from five members to over 180, with membership across all tiers of hardware, software, and end users themselves. The Foundatio ...Read More
The growth of the OpenPOWER Foundation has been phenomenal. Why, you might ask? In less than two years, OpenPOWER has grown from five members to over 180, with membership across all tiers of hardware, software, and end users themselves. The Foundation provides a compelling and rapidly growing open approach to infrastructure and software for rapidly changing workloads and evolving IT consumption models. This is a revolution that is making a profound difference in the price/performance criteria of end users, as well as accelerating compelling development for performance to drive business advantage. OpenPOWER members are co-creating their approach to technology?as innovators, producers, and consumers utilizing IBM''s Power Architecture.  Back
 
Keywords:
Big Data Analytics, Data Center & Cloud Computing, Supercomputing & HPC, GTC 2016 - ID S6825
Streaming:
Download:
 
Towards a High Performance Analytics and Computing Platform for Brain Research
Dirk Pleiter (Forschungszentrum Juelich)
Understanding and modeling the human brain continues to be one of the biggest challenges of research. The Human Brain Project is a European flagship, which is in the process of creating a research infrastructure that will facilitate this research. Ma ...Read More
Understanding and modeling the human brain continues to be one of the biggest challenges of research. The Human Brain Project is a European flagship, which is in the process of creating a research infrastructure that will facilitate this research. Many research topics in this field require scalable compute resources or the ability to process extreme-scale data volumes (in some cases even both). Examples are approaches to simulate the network of a human brain in its full complexity and the efforts to create high-resolution brain atlases. GPUs play already today an important role to realize the necessary computational capabilities. We''ll give an overview of the efforts of building an high-performance analytics and computing platform for brain research.  Back
 
Keywords:
Big Data Analytics, Press-Suggested Sessions: HPC & Science, Supercomputing & HPC, GTC 2016 - ID S6655
Streaming:
Download:
Bioinformatics & Genomics
Presentation
Media
Algorithms and Tools for Bioinformatics on GPUs
Bertil Schmidt (Nanyang Technological University)
Learn how to use GPUs to accelerate compute- and data-intensive applications and algorithms Bioinformatics. High-throughput techniques for DNA sequencing and gene expression analysis with microarrays have led to a rapid growth in the amount of d ...Read More

Learn how to use GPUs to accelerate compute- and data-intensive applications and algorithms Bioinformatics. High-throughput techniques for DNA sequencing and gene expression analysis with microarrays have led to a rapid growth in the amount of digital biological data, e.g. the NCBI Sequence Read Archive (SRA) houses raw sequence data generated by next-generation sequencing (NGS) technologies which succeeds 25 trillion base-pairs. Therefore, modern bioinformatics tools need to be scalable; i.e. they need to deal with an ever growing amount of data. GPUs and CUDA provide the opportunity to significantly reduce the runtime of many biological algorithms on inexpensive hardware.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2008
Streaming:
Download:
 
SeqNFind: Application Of CUDA GPU Technologies To Sequence Alignment Techniques
D. Andrew Carr (Accelerated Technology Laboratories Inc.)
Explosive growth in the amount of genomic data has created a need for faster systems that align and compare nucleotide sequences. With the development of tools for leveraging the massively parallel architecture of NVIDIA GPUs it is a logical nex ...Read More

Explosive growth in the amount of genomic data has created a need for faster systems that align and compare nucleotide sequences. With the development of tools for leveraging the massively parallel architecture of NVIDIA GPUs it is a logical next step to construct algorithms for genomic analysis on GPU clouds/clusters. Although a seemingly simple task, there are a number of challenges to deploying the current algorithms. Every algorithm from Smith-Waterman to BLAST has its own unique set of barriers. Presented here some of the lessons learned and how ongoing genomic research projects have benefitted from the increased speed and accuracy.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2037
Streaming:
Download:
 
Swift: A GPU-based Smith-Waterman Sequence Alignment Program
Pankaj Gupta (St Jude Children's Research Hospital)
This session describes Swift, a GPU-based Smith-Waterman implementation for aligning short DNA sequences to large genomes. Swift has been designed to reduce computation time and lower hardware cost. Also, unlike other leading GPU-based Smith-Wat ...Read More

This session describes Swift, a GPU-based Smith-Waterman implementation for aligning short DNA sequences to large genomes. Swift has been designed to reduce computation time and lower hardware cost. Also, unlike other leading GPU-based Smith-Waterman sequence alignment programs like CUDASW++ and SWCUDA which focus on protein sequence alignment, Swift has been developed for DNA sequence alignment. Swift performs 200x faster than CUDASW++ using a test data set containing 1000 reads (100 bases each) and 1000 references (1000 bases each), and it performs 11x faster than the CPU-based implementation of Smith-Waterman using 24 million reads (100 bases each) and human chromosome 1.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2083
Streaming:
Download:
 
CUMACH - A Fast GPU-based Genotype Imputation Tool
Agatha Hu (NVIDIA)
The goal of this session is to introduce a GPU-implemented tool in bioinformatics. Genotype imputation is method which extrapolates genetic correlations from a densely characterized reference panel to a sparsely typed study sample. There have al ...Read More

The goal of this session is to introduce a GPU-implemented tool in bioinformatics. Genotype imputation is method which extrapolates genetic correlations from a densely characterized reference panel to a sparsely typed study sample. There have already been lots of CPU-based tools, but they all cost lots of time for large data-set. In this session, we try to implement a GPU-based imputation tool which can get relatively good result and fast speed. There will be three main parts for the session: 1) Introduce the background and its HMM based algorithm, 2) GPU implementation and optimization, 3) Results.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2084
Streaming:
Download:
 
SOAP3: GPU-based Compressed Indexing and Ultra-fast Parallel Alignment of Short Reads
BingQiang Wang (BGI)
We give the fi_x000C_rst implementation of a compressed index (Burrows-Wheeler Transform) on the GPU, supporting very efficient parallel alignment of short patterns (reads) onto the human genome. The new alignment software SOAP3 is tens of times ...Read More

We give the fi_x000C_rst implementation of a compressed index (Burrows-Wheeler Transform) on the GPU, supporting very efficient parallel alignment of short patterns (reads) onto the human genome. The new alignment software SOAP3 is tens of times faster than existing ones and can catch up the throughput (Giga to Tera bp) of next generation DNA sequencer. It takes 2.4 seconds to perform exact matching for one million length-100 reads (tens of seconds for small-error approximate matching). Technically, we show how to minimize memory accesses to the index from individual threads and to control the branching and divergence of the threads.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2109
Streaming:
Download:
 
Accurate Sequence Alignment Using Distributed Filtering on GPU Clusters
Reza Farivar (University of Illinois at Urbana-Champaign), Shivaram Venkataraman (UC Berkeley)
Learn how GPUs enable new ways to rethink a complex bioinformatics problem: Accurate sequence alignment. What was once prohibitive to compute can become the basic block of novel GPU-based algorithms. Modern DNA sequencing machines generate enorm ...Read More

Learn how GPUs enable new ways to rethink a complex bioinformatics problem: Accurate sequence alignment. What was once prohibitive to compute can become the basic block of novel GPU-based algorithms. Modern DNA sequencing machines generate enormous amounts of short sequences within minutes, and they should be aligned to a reference genome in real time. Most solutions only find a few locations that match a short sequence. We introduce a new technique to find all matching locations inside a reference sequence for a given number of mismatches. Our technique is based on a distributed filtering scheme and GPU based processing.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2152
Streaming:
Download:
 
Towards Computing the Cure for Cancer
Wu Feng (Virginia Tech), Heshan Lin (Virginia Tech)
Learn about how to create "designer" genomic analysis pipelines as part of the "Compute the Cure" for cancer initiative from NVIDIA Foundation. Get an overview of an open-source framework that enables the creation of customiz ...Read More

Learn about how to create "designer" genomic analysis pipelines as part of the "Compute the Cure" for cancer initiative from NVIDIA Foundation. Get an overview of an open-source framework that enables the creation of customized genomic analysis pipelines. Discover how different plug-ins from the "mapping/realignment/discovery" repositories, respectively, can be composed to form a genomic analysis pipeline. Learn to use next-generation sequencing data to characterize previously undetectable genetic changes between normal and malignant cells. Find out how you can contribute to the "Compute the Cure" cause.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2156
Streaming:
Download:
 
High-Throughput Epistasis Screening Using GPUs
Mark Seligman (Insilicos LLC)
Epistasis is the interaction of two or more genes in coding for a biological property. Epistasis is believed to be an important factor in an individual's susceptibility to disease, and the search for epistasis is a major component in the dev ...Read More

Epistasis is the interaction of two or more genes in coding for a biological property. Epistasis is believed to be an important factor in an individual's susceptibility to disease, and the search for epistasis is a major component in the development of personalized approaches to genomic medicine. Statistical tests for epistasis are typically confounded by the multiple-testing problem, that is, the aggregated loss of precision incurred through repeated hypothesis testing. One way to circumvent this problem is to simulate a false-discovery rate via resampling. We report success in using GPUs to accelerate these highly compute-intensive resampling techniques.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2337
Streaming:
Download:
 
GPGPU Accelerated Protein Similarity Measures Identifying Biological Relevant Structure
Edward Lowe (Vanderbilt University), Nils Woetzel (Vanderbilt University)
Atomic structure similarity measures for proteins help in de novo protein structure prediction. For a large set of computationally generated protein structures (~20k) all pairwise similarities have to be calculated to cluster structures. Common ...Read More

Atomic structure similarity measures for proteins help in de novo protein structure prediction. For a large set of computationally generated protein structures (~20k) all pairwise similarities have to be calculated to cluster structures. Common similarity measures are root mean square deviation (RMSD) and global distance test total score (GDT_TS). Although GDT_TS has advantages over RMSD, it is not used due to its time consuming calculation. Afore mentioned and other similarity measures are ported for parallel execution on GPGPUs to make them amenable for clustering de novo generated structural models to find the largest cluster representing the biological relevant protein conformations.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2346
Streaming:
Download:
 
Dynamic Programming on CUDA: Finding the Most Similar DNA Sequence
Grzegorz Kokosinski (IBM Poland), Krzysztof Zarzycki (IBM Poland)
Learn a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU. Our particular problem regarded DNA sequences: given a reference sequence, how to find the one most similar to it among a large database? The seque ...Read More

Learn a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU. Our particular problem regarded DNA sequences: given a reference sequence, how to find the one most similar to it among a large database? The sequences are millions characters long, and their similarity is calculated with a (quadratic) DP algorithm, which makes the problem very tough even for the GPUs. We speed up both the theoretical and practical side: we present programming techniques that enable Dynamic Programming to be performed at the hardware speed, and improvements to the algorithm itself that drastically lower the execution time.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2376
Streaming:
Download:
 
The Advantage of GPU Computation for Analyzing Complex Traits
Jun Zhu (Zhejiang University)
Most import agriculture traits and human diseases are complex traits which are controlled by gene network with gene by gene interaction (epistasis) and gene by environment interaction (GE). New statistic methods and software are developed for an ...Read More

Most import agriculture traits and human diseases are complex traits which are controlled by gene network with gene by gene interaction (epistasis) and gene by environment interaction (GE). New statistic methods and software are developed for analyzing genetic architecture for complex traits based on genome-wide association study (GWAS). When deal with large mapping population and huge amount of molecular information, GPU computation has an advantage over CPU computation. We will demonstrate the newly developed GPU based software QTLNetwork V3.0 and GWAS-GMDR for mapping genes with epistasis and GE interaction for complex traits of human, crops, and mouse.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2516
Streaming:
Download:
 
GPU Accelerated Bioinformatics Research at BGI
BingQiang Wang (BGI)
After digitizing DNA double helix by sequencing, computation is the key connecting raw sequences with life science discoveries. As massive data is generated, how to process and analysis as well as storage them in an efficiently manner turns out ...Read More

After digitizing DNA double helix by sequencing, computation is the key connecting raw sequences with life science discoveries. As massive data is generated, how to process and analysis as well as storage them in an efficiently manner turns out to be a major challenge. By developing GPU accelerated bioinformatics tools and integrate them into pipelines, BGI researchers now run analysis pipelines in several hours instead of several days. These tools include SOAP3 aligner, SNP calling and tool for population genomics. The speed up is generally around 10-50x comparing with traditional counterparts.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2519
Streaming:
Download:
 
Acceleration of Complex Network Analysis
Athanasios Grivas (Newcastle University)
The scientific role of complex networks nowadays is of great importance. Their universal characteristics can be adopted for use from all over the scientific fields as network pharmacology.There is need for acceleration where the time execution of the ...Read More
The scientific role of complex networks nowadays is of great importance. Their universal characteristics can be adopted for use from all over the scientific fields as network pharmacology.There is need for acceleration where the time execution of the used algorithms will be decreased in a large scale.The breakthrough is the use of GPUs and parallel computing in order to accelerate the whole process.The transformation of common algorithms as matrix multiplication to a parallel model has shown large acceleration, which is a promising point for the field of network analysis.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID P2451
Download:
 
GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics
Shuji Suzuki (Tokyo Institute of Technology)
A vast amount of sensitive homology searches is required for mapping sequence data to known protein sequence databases in metagenomic analysis. However, fast search tools such as BLAT do not have enough search sensitivity for metagenomic analysis. Th ...Read More
A vast amount of sensitive homology searches is required for mapping sequence data to known protein sequence databases in metagenomic analysis. However, fast search tools such as BLAT do not have enough search sensitivity for metagenomic analysis. Thus a sensitive and efficient homology search tool is highly required. We develop GPU optimized algorithm for performing sensitive sequence homology searches. We implemented as the GPU-Accelerated Homology Search Tool for Metagenomics (GHOSTM), achieves calculation speeds faster and search accuracy higher than BLAT program. Our results indicate that GHOSTM offers a potentially cost-efficient solution to the increasingly difficult computational analysis of metagenomic data.   Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID P2500
Download:
 
Photorealistic and Interactive Molecule Visualizer
Cyrille Favreau
IMV is an interactive molecule visualizer based on a ray-tracing engine. Targeting high quality images and ease of interaction, IPV uses the latest GPU computing acceleration techniques, combined with natural user interfaces such as Kinect and Wiimot ...Read More
IMV is an interactive molecule visualizer based on a ray-tracing engine. Targeting high quality images and ease of interaction, IPV uses the latest GPU computing acceleration techniques, combined with natural user interfaces such as Kinect and Wiimotes.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID P3106
Download:
 
Fast GPU Applications in Bioinformatics
Fang Liu (SuperComputing Center of Chinese Academy of Sciences)
This work presents a fast algorithm to compute linkage disequilibrium (LD) on GPU using CUDA. The traditional accumulations can be converted to bitwise operations equivalently, thus can benefit from a specially designed single instruction '__popc' ...Read More
This work presents a fast algorithm to compute linkage disequilibrium (LD) on GPU using CUDA. The traditional accumulations can be converted to bitwise operations equivalently, thus can benefit from a specially designed single instruction '__popc' on NVIDIA GPU devices. So the algorithm processes 32 samples simultaneously using only several bitwise instructions, and reduces the input data of each allele to 1/4 from a 8-bit 'char' to two bits. Experimental results shows that our algorithm can gain around a thousand times speedup than its serial counterparts on CPU using NVIDIA C2075 cards.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID P3153
Download:
 
Parallel strategies for identifying genetic networks describing the biological clock using GPUs
Ahmad Al-Omari (Institute of Bioinformatics/The University of Georgia)
A graphics processing unit (GPU) offers a solution to a very important and fundamental problem of interest to many researchers, a problem that would be prohibitive to solve without the technology of GPUs. The problem is how the biological clock contr ...Read More
A graphics processing unit (GPU) offers a solution to a very important and fundamental problem of interest to many researchers, a problem that would be prohibitive to solve without the technology of GPUs. The problem is how the biological clock controls the rhythms of ~2400 genes in the genome (with 11,000 genes) of a model system, the filamentous fungus, Neurospora crassa (Dong, et al.,2008). Ultimately, we want to be able to predict and hence understand the dynamics of all of these genes and their products in a genetic network describing how the clock functions.   Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID P3194
Download:
 
An Ultra-Fast Computing Pipeline For Metagenome Analysis With GPUs
Shuji Suzuki (Tokyo Institute of Technology)
Metagenome analysis is useful for not only understanding symbiotic systems but also watching environment pollutions. However, metagenome analysis requires sensitive sequence homology searches which require large computation time and it is thus a bott ...Read More
Metagenome analysis is useful for not only understanding symbiotic systems but also watching environment pollutions. However, metagenome analysis requires sensitive sequence homology searches which require large computation time and it is thus a bottleneck in current metagenome analysis based on the data from the latest DNA sequencers generally called a next-generation sequencer. To solve the problem, we developed a large-scale computing pipeline for metagenome analysis on TSUBAME2 in Tokyo Institute of Technology.   Back
 
Keywords:
Bioinformatics & Genomics, Supercomputing & HPC, GTC 2013 - ID P3196
Download:
 
G-MSA - a Powerful GPU-based Tool for Multiple Sequence Alignment
Wojciech Frohmberg (Poznan University of Technology)
Life sciences experience a great deal of problems that computer-based tools solve automatically. These methods, although accurate and effective, need to face rapidly increasing size of input datasets. This also applies to the Multiple Sequence ...Read More
Life sciences experience a great deal of problems that computer-based tools solve automatically. These methods, although accurate and effective, need to face rapidly increasing size of input datasets. This also applies to the Multiple Sequence Alignment problem. G-MSA, that is our tool addressing this problem, is able to handle growing input instances effectively. This has been achieved by an adaptation of existing algorithm to distributed environment of powerful machines equipped with multiple graphics cards. The poster outlines the mechanisms that allow G-MSA to deal with a massive number of simultaneously working threads and presents algorithmic tricks behind the tool accuracy.   Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID P3217
Download:
 
BCL: ChemInfo - GPU-Accelerated Cheminformatics Suite for Probe Development and Drug Discovery
Edward Lowe (Vanderbilt University)
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, methods for Ligand-Based Computer-Aided Drug Discovery (LB-CADD) have the potential to accelerate, reduce cost, and increases quality of probe deve ...Read More
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, methods for Ligand-Based Computer-Aided Drug Discovery (LB-CADD) have the potential to accelerate, reduce cost, and increases quality of probe development and drug discovery efforts. From a computational science and technology perspective, increased public availability of large HTS data sets stimulates the development of innovative LB-CADD tools that should than be applied in academic research. Here, we present BCL::ChemInfo, a cheminformatics framework featuring GPU acceleration, MYSQL integration, and automation of model optimization. We present several current studies leveraging BCL::ChemInfo against targets indicated in cancer, malaria, and neuroscience.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID P3241
Download:
 
Acceleration of Biological Circuit Reconstruction: Biological Clock System in Neurospora Crassa
Chulwoo Lim (University of Georgia)
A fundamental and ubiquitous difficulty of systems biology is identifying relevant model parameters. A genetic network model of the biological clock of Neurospora crassa that is quantitatively consistent with the available RNA and protein profiling d ...Read More
A fundamental and ubiquitous difficulty of systems biology is identifying relevant model parameters. A genetic network model of the biological clock of Neurospora crassa that is quantitatively consistent with the available RNA and protein profiling data was proposed. However, the oscillating nature of biological models poses more challenge for identifying model parameters due to the high dimensional complex search space and computational cost of numerically solving ODEs. In this work, an evolutionary algorithm leveraging the GPU architecture is proposed. Our implementation identified promising model parameters with speedup of two orders of magnitude versus CPU implementation.  Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID P3245
Download:
 
CUDA-Enabled Applications for Next-generation Sequencing
Bertil Schmidt (Johannes Gutenberg University Mainz)
Next-Generation Sequencing (NGS) refers to new technologies for high-throughput DNA sequencing which produce up to billions of DNA or RNA reads in short time and at low cost. To exploit NGS, efficient parallel and scalable algorithms and tools a ...Read More

Next-Generation Sequencing (NGS) refers to new technologies for high-throughput DNA sequencing which produce up to billions of DNA or RNA reads in short time and at low cost. To exploit NGS, efficient parallel and scalable algorithms and tools are needed to process the massive amount of generated reads within a reasonable amount of time. This talk will present several CUDA-enabled algorithms and data structures to accelerate (i) the accurate processing of short/long read alignment to human genomes (i.e. CUSHAW and CUSHAW2) and (ii) the analysis of metagenomic data from microbial environmental sequencing studies (CRiSPy-CUDA and CRiSPy-Embed).

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3004
Streaming:
Download:
 
Ultra Fast Sequence Alignment for the DNA Assembly Problem
Michal Kierzynka (Poznan University of Technology, Poznan Supercomputing and Networking Center)
The goal of this session is to present a software tool performing pairwise sequence alignment of nucleotide sequences as well as GPU optimizations used to achieve the top performance. The software uses the dynamic programming method to efficient ...Read More

The goal of this session is to present a software tool performing pairwise sequence alignment of nucleotide sequences as well as GPU optimizations used to achieve the top performance. The software uses the dynamic programming method to efficiently compute the exact alignment in a form that may be conveniently used in the DNA de-novo assembly problem. Its uniqueness is also due to the fact that it has been optimized for nucleotide reads coming from modern sequencers (Illumina/Solexa, Roche/454, AB/SOLiD). As a result, it is currently the fastest implementation of the Needlemen-Wunch algorithm, reaching up to 89GCUPS on a single GPU, and scaling up well on multiple GPUs systems. The following real-world use case will be presented: the application of the software in finding similar sequences in huge datasets coming from the next-generation Illumina sequencer.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3025
Streaming:
Download:
 
A Scalable Short-read Sequence Aligner Using a CUDA Kernel Pipeline
Richard Wilton (Johns Hopkins University -- Department of Physics and Astronomy)
The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data ...Read More

The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data. Compute nodes in the cluster contain multiple CPU cores, 100GB or more of system RAM, and one or more GPUs; a prototype node is implemented with 12 CPU cores (24 hyperthreads), 144GB of RAM, and four NVIDIA C2070s. In this session we will describe the design of a genomic sequence-alignment application that targets the cluster compute-node hardware. We will discuss the algorithms we use and how they are implemented as CUDA kernels, point out the key optimizations in the implementation, and look at the performance of the software.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3092
Streaming:
Download:
 
GWIS-GS: GPU-Accelerated Screening Platform for Second Order Genome Wide Interaction Search
Qiao Wang (National ICT Australia Victoria Lab), Adam Kowalczyk (Victorian Research Laboratory of National ICT Australia)
This talk will show a developed platform enabling an exhaustive analysis of all pairwise (2nd order) interactions of Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Studies (GWAS) data. Given typical datasets of 300K SNPs and 3 ...Read More

This talk will show a developed platform enabling an exhaustive analysis of all pairwise (2nd order) interactions of Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Studies (GWAS) data. Given typical datasets of 300K SNPs and 3K samples, our GPU-accelerated solution is capable of completing the search below 3 minutes on single NVIDIA GTX470. The method involves construction of contingency tables for all SNP-pairs followed by a battery of conventional statistical tests such as Fisher-Exact and Variance Explained. All previous implementations described in the literature required hours, days or even months to complete the same analysis. In addition, presented will be an interface that allows users to define their own statistical tests at runtime and describe our latest developments towards practical 3rd order implementation.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3169
Streaming:
Download:
 
Unveiling Cellular Mechanisms Using GPU-based Sparse Linear Algebra
Marco Maggioni (University of Illinois at Chicago)
In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical M ...Read More

In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical Master Equation (CME) stochastic framework at large scale, determining both probabilistic steady-state and transient dynamic of biochemical reaction networks. Our GPU implementation leverages the structure of the problem to optimize the sparse linear algebra routines needed by the stochastic model. As a result, we achieve an average 15.57x speedup over the optimized Intel MKL library running on a 64-core architecture.

  Back
 
Keywords:
Bioinformatics & Genomics, Quantum Chemistry, GTC 2013 - ID S3245
Streaming:
Download:
 
Tackling Big Data in Genomics with GPU
BingQiang Wang (Beijing Genomics Institute)
GPU can help scientists find new clues for curing cancer as well as other diseases from massive genomics data. How to compute and mine against huge volume of genomics and derivated data becomes a major challenge. Compression technique helps redu ...Read More

GPU can help scientists find new clues for curing cancer as well as other diseases from massive genomics data. How to compute and mine against huge volume of genomics and derivated data becomes a major challenge. Compression technique helps reduce volume and more efficient access. GPU accelerated version for typical compression algorithms are developed with speed up around 2-10x. Integrating Hadoop framework with GPU is very promising for large-scale analysis over big data like genome-wide associate study (GWAS), which made the entire analysis more balanced in terms of computing to data access ratio.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3257
Streaming:
Download:
 
Using SIMD Video Instructions to Achieve 200 GCUPs with K10 for Smith-Waterman
Erich Elsen (Royal Caliber)
Learn how to use the SIMD video instructions introduced with the Kepler architecture to accelerate Smith-Waterman and Needleman-Wunsch DNA sequence alignment. Speedups of up to 3x over scalar code are possible - new code achieves over 80 GCUPs o ...Read More

Learn how to use the SIMD video instructions introduced with the Kepler architecture to accelerate Smith-Waterman and Needleman-Wunsch DNA sequence alignment. Speedups of up to 3x over scalar code are possible - new code achieves over 80 GCUPs on a GeForce GTX 680 and close to 150 GCUPs on a Tesla K10 GPU accelerator. Specific implementation is for the case of performing many independent alignment problems of length < 1024 simultaneously, however the techniques that will be discussed are generally applicable to any sequence alignment problem. SIMD video instructions allow one to split a 32-bit register into two 16-bit or four 8-bit parts and operate on them independently.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3279
Streaming:
Download:
 
Computing Protein Size Distributions Using Centrifugation Techniques and the Tesla K20 GPU
Robert Zigon (Beckman Coulter)
Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first p ...Read More

Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first principles to derive the relative molecule sizes. Learn how the solution to the resulting regularized least squares problem can be computed in real time with the Tesla K20.

  Back
 
Keywords:
Bioinformatics & Genomics, Quantum Chemistry, GTC 2013 - ID S3330
Streaming:
Download:
 
Implementing Modern Short Read DNA Alignment Algorithms in CUDA
Jonathan Cohen (NVIDIA)
Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map hig ...Read More

Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map highly divergent and control flow-heavy code to a SIMD architecture. By transforming complex serial flow of control into a sequence of communicating sequential processors running in parallel, we are able to achieve high throughput on very branchy code, while maintaining memory coherence and avoiding execution divergence. I will present initial results from NVIDIA''s internal "nvbio" project to develop efficient computational building blocks for analysis of Next-Generation Sequencing data, with a focus on implementations of BWA and Bowtie2-type aligners.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3580
Streaming:
Download:
 
GPU Accelerated Signal Processing in Ion Proton Whole Genome Sequencer
Mohit Gupta (Life Technologies), Jakob Siegel (Life Technologies)
Learn how GPUs are enabling whole genome sequencing by accelerating primary data analysis pipeline of benchtop Ion Proton sequencer. Leveraging the compute power of GPUs to process the high-throughput data generated by this sequencer for a fast, ...Read More

Learn how GPUs are enabling whole genome sequencing by accelerating primary data analysis pipeline of benchtop Ion Proton sequencer. Leveraging the compute power of GPUs to process the high-throughput data generated by this sequencer for a fast, scalable and cost-effective desktop compute solution to democratize DNA sequencing and accelerate the path towards personalized medicine. In this talk, the implementation of data fitting algorithms on the GPU and a streaming execution model to overlap data transfer and kernel execution for this high throughput system will be dicussed. Explained will be how changing the algorithms to suit the GPU compute model while still maintaining quality of the results. 

  Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID S3229
Streaming:
Download:
 
Accelerating Computational Genomics and Other Best Practices Using OpenACC
Florent Lebeau (CAPS Entreprise), Stephane Chauveau (CAPS Entreprise)
 
Keywords:
Bioinformatics & Genomics, Developer - Programming Languages, GTC Webinars 2012 - ID GTCE016
Download:
 
Introduction to SeqAn, an Open-source C++ Template Library
Knut Reinert (Freie Universitšt Berlin)
SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix ...Read More

SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs.

In this webinar, Knut Reinert, Professor, Freie Universität Berlin will introduce SeqAn and string indices, then explain his team’s generic parallelization concept and end with details on how they achieved an up to 47 speedup using an FM-index on a NVIDIA Tesla K20.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2013 - ID GTCE059
Download:
 
Folding@home and OpenMM: Using a Cluster of 50,000 GPUs to Simulate Disease Relevant Protein Dynamics
Vijay Pande (Stanford University)
With the combined power of large-scale distributed computing resources such as Folding@home or supercomputers such as Blue Waters or Titan, one can now routinely simulate atomistic protein dynamics on the milliseconds timescale. Join Profes ...Read More

With the combined power of large-scale distributed computing resources such as Folding@home or supercomputers such as Blue Waters or Titan, one can now routinely simulate atomistic protein dynamics on the milliseconds timescale. Join Professor Vijay Pande, Stanford University as he presents efforts to push the limits of this methodology even further to the seconds timescale for protein folding, as well as to a variety of new applications in protein conformational change. The results of these simulations suggest novel targets for disease intervention (for Alzheimer’s and Cancer), as well as new biophysical insights into protein dynamics.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2014 - ID GTCE071
Streaming:
 
The Next Steps for Folding@home
Vijay Pande (Stanford University)
Folding@home is a large-scale volunteer distributed computing project, started in October 1, 2000. For over a decade, new types of hardware (such as GPUs, multi-core CPUs, and PS3) and algorithms have been pioneered in order to make significant ...Read More

Folding@home is a large-scale volunteer distributed computing project, started in October 1, 2000. For over a decade, new types of hardware (such as GPUs, multi-core CPUs, and PS3) and algorithms have been pioneered in order to make significant advances in our ability to simulate diseases at the molecular scale. Join Professor Vijay Pande from Stanford University for a brief introduction to the goals of Folding@home, followed by the successes so far. Prof. Pande will end with a discussion of what’s being done today, as well as the plans for greatly enhancing what Folding@home can do through new initiatives currently under way. 

  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2014 - ID GTCE082
Streaming:
 
Restricting the Seed-and-Extend Search Space in GPU-Based Short-Read Alignment
Richard Wilton (Johns Hopkins University)
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 ...Read More
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 to 10 times faster than CPU implementations with comparable sensitivity and mapping quality. We have taken a different approach to the use of GPUs by implementing a series of CUDA kernels that filter the set of reference locations at which to compute seed-and-extend alignments, thereby decreasing the amount of parallel sequence-matching computation and improving the overall throughput of the GPU/CPU pipeline. Even without extreme CUDA code optimization, we observe increased sensitivity (i.e., a larger number of reported valid mappings) with throughput as good as or better than existing GPU-based sequence aligners.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4248
Streaming:
Download:
 
Parallel Implementation of PK-PD Parameter Estimation on GPU Using Grid Search Method
Nishant Agrawal (Tata Consultancy Services Limited), Rihab Abdulrazak (Tata Consultancy Services Limited)
The goal of this session is to showcase the performance improvements acheived during parallel implementation of PK-PD (Pharmacokinetic-Pharmacodynamic) parameters estimation on GPU. Grid Search method is being used here for PK-PD model initial parame ...Read More
The goal of this session is to showcase the performance improvements acheived during parallel implementation of PK-PD (Pharmacokinetic-Pharmacodynamic) parameters estimation on GPU. Grid Search method is being used here for PK-PD model initial parameters estimation. Parallel implementation on GPUs provides much faster solutions to time-consuming problems in pharma domain as discovery of new drugs has become increasingly challenging because of sheer volume of data. Parallelizing serial version of the application on GPU keeping device architectural aspects in mind helps in achieving high performance i.e. to reduce the overall execution time. This talk is about stepwise approaches used to optimize application further and to leverage Tesla & Kepler hardware architecture capabilities for high performance. A substantial improvement in execution time was observed after implementation in parallel.  Back
 
Keywords:
Bioinformatics & Genomics, Clusters & GPU Management, Supercomputing & HPC, GTC 2014 - ID S4396
Streaming:
 
Hybrid Clustering Algorithms for Degenerate Primer Development on the GPU
Trevor Cickovski (Eckerd College)
Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When ...Read More
Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When analyzing multiple related genomes the primer must be degenerate, containing an amount of uncertainty that we must minimize. We use graphics processing units (GPUs) to analyze the performance of a parallelized hierarchical clustering algorithm for grouping related genomes prior to degenerate primer construction, and also hybridize this algorithm with strategies from K-Means and Fuzzy C-Means. We demonstrate an order of magnitude improvement when running these algorithms on nearly one thousand sequences of more than seven thousand nucleotides from the human genome.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4424
Streaming:
Download:
 
GPU-Based Bayesian Phylogenetic Inference Beyond Extreme Scale
Mitchel Horton (Georgia Institute of Technology)
See how researchers can, for the first time, infer phylogenetics trees of unlimited size using the bonanza of biological sequence data available to them today. We will present a phylogenetic inference approach that combines an existing GPU-based Bay ...Read More
See how researchers can, for the first time, infer phylogenetics trees of unlimited size using the bonanza of biological sequence data available to them today. We will present a phylogenetic inference approach that combines an existing GPU-based Bayesian phylogenetic reconstruction application (BEAST/BEAGLE) with the notion of performing an independent Markov chain Monte Carlo (MCMC) run on any number of GPUs, on any number of nodes, of any size HPC GPU cluster. The approach will be shown to scale indefinitely for sufficiently large problems. In addition, we will present a new batch matrix-matrix product CUDA kernel used for the matrix exponentiaton at the heart of the phylogenetic inference algorithm.  Back
 
Keywords:
Bioinformatics & Genomics, Numerical Algorithms & Libraries, Supercomputing & HPC, GTC 2014 - ID S4476
Streaming:
Download:
 
Training Random Forests on the GPU: Genomic Implications on HIV Susceptibility
Mark Seligman (Rapidics LLC)
The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions fo ...Read More
The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions for new data. Recent efforts at acceleration have focused on the independence of both the construction, and walking, of distinct trees using, for example, multi-CPU and Hadoop-based approaches. Here, by contrast, we report progress in parallelizing the construction of individual trees themselves using the GPU. This enables the algorithm to treat very wide data sets, such as those common in genomic studies, in times significantly shorter than have been reported before now. This also makes practical iterative invocation and enables, for example, reweighted and variational applications of the algorithm. We demonstrate recent results on studies of HIV-susceptibility in subjects from Sub-Saharan Africa.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, Machine Learning & Deep Learning, Supercomputing & HPC, GTC 2014 - ID S4502
Streaming:
Download:
 
GPU Accelerated Genomics Data Compression
BingQiang Wang (BGI)
A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated co ...Read More
A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated compression algorithms, and 3) column-major storage. This approach fully exploit similarity within individual columns in popular genomics data formats, by using appropriate compression scheme (combination of algorithms), then GPU is employed to speedup compression / decompression thus several folds faster bandwidth.  Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4526
Streaming:
Download:
 
GPU Enables Bivariate and Trivariate Routine Analysis of Case-Control GWAS
Adam Kowalczyk (National ICT Australia), Qiao Wang (National ICT Australia)
Genome-wide association studies (GWAS) examine millions of DNA loci in an attempt to associate DNA mutations with a given disease. Complex aetiologies of many common diseases are believed to involve combinations of different genes, requiring evaluati ...Read More
Genome-wide association studies (GWAS) examine millions of DNA loci in an attempt to associate DNA mutations with a given disease. Complex aetiologies of many common diseases are believed to involve combinations of different genes, requiring evaluation of trillions of (non-additive) combinations of loci. We have developed solutions using a single GPU to evaluate association of all bivariate features within minutes (available via a free web service). Although exhaustive trivariate analysis currently requires a GPU cluster, focused trivariate analysis can be accomplished routinely on a single GPU within hours.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4592
Streaming:
Download:
 
GPU-Accelerated Algorithms in Bioinformatics and Data Mining
Bertil Schmidt (Johannes Gutenberg University Mainz)
The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-accelera ...Read More
The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-acceleration of the following methods will be discussed: (1) Smith-Waterman algorithm on Kepler (CUDASW++ 3.0) compared to an equivalent Xeon Phi implementation (SWAPHI); (2) Short read aligment (CUSHAW2-GPU and CUSHAW3); (3) Clustering of protein structures; (4) Alignment of time series with a Dynamic Time Warp inspired similarity measure; and (5) an effective scalable clustering algorithm for large data sets that builds upon the concept of divide-and-conquer.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4603
Streaming:
 
Current Uses and Future Prospects of Many-Core GPUs for High-Throughput Sequencing Data Analyses
Brian Lam (Cambridge University)
High-throughput sequencing (HTS) instruments can produce enormous amounts of information about our genome in a short period of time and enable us to better understand the biology of our everyday lives. HTS also poses a substantial challenge to the IT ...Read More
High-throughput sequencing (HTS) instruments can produce enormous amounts of information about our genome in a short period of time and enable us to better understand the biology of our everyday lives. HTS also poses a substantial challenge to the IT infrastructure and human resources, where analyzing data from these instruments often involves the use of high-performance computing (HPC) clusters and expertise from interdisciplinary professionals, who are literate in both biology and computing, thus restricting the access of the technology to large and well-established laboratories only. Many-core architectures, which can be seen in many high-end computer graphics processing units, or GPUs, may provide us an answer to this challenge. Packed with thousands of cores on a physical chip, a GPU can be just as quick as a small HPC cluster in many cases. In this session, we will explore the use of GPUs in accelerating the data analysis pipeline associated with HTS and investigate its future in this area.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4623
Streaming:
Download:
 
BWT Indexing: Big Data from Next Generation Sequencing and GPU
Jeanno Cheung (HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory)
With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text ind ...Read More
With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text indexing based on BWT has found to be very useful in speeding up the analysis of the high-throughput sequencing data. In this talk we consider two major problems in this context, namely, alignment of sequencing data onto a reference genome (for genetic variations detection), and indexing of sequencing data. These two problems have different applications and different technical challenges. We show how GPU can be exploited to achieve tremendous improvement in each case. In particular, our alignment solution makes it feasible to conduct NGS analysis even in the time-critical clinical environment; for example, 30+ fold whole genome sequencing data of human (~100 Gigabases) can be aligned and analyzed in a few hours, with sensitivity and accuracy even higher than before.  Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4628
Streaming:
Download:
 
Accelerating the DNA Sequencing Variant Calling Pipeline
Mauricio Carneiro (Broad Institute of MIT and Harvard)
Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and acceler ...Read More
Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and accelerate key parts of this pipeline. First we will give you an overview of the process and how researchers around the world are using DNA sequencing data to understand complex and rare variants and their associations with disease. Second we will show you the work we have done to speed up this pipeline through use of GPUs and other technologies. Third we will discuss a new version of the pipeline that takes advantage of the optimizations to enable incremental analysis, that is, leveraging all historical data on every new sequencing project with minimal overhead. We close this presentation by discussing the many points that are still open for optimization and how the community can get involved.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4679
Streaming:
 
Introducing NVBIO: High Performance Primitives for Computational Genomics
Jonathan Cohen (NVIDIA), Nuno Subtil (NVIDIA)
Learn about NVIDIA's new open source CUDA/C++ library for high-performance computational genomics, NVBIO. NVBIO includes primitives for fast alignment using many variants of Smith-Waterman, text indexing via an FM-Index and related data structures, ...Read More
Learn about NVIDIA's new open source CUDA/C++ library for high-performance computational genomics, NVBIO. NVBIO includes primitives for fast alignment using many variants of Smith-Waterman, text indexing via an FM-Index and related data structures, and approximate string matching with backtracking. It also provides basic services like file IO and inter-thread communication. The design of NVBIO supports pipeline parallelism, where computation is expressed as a sequence of stages with queues to communicate between stages. Using this design concept, we have engineered an implementation of the Bowtie2 aligner on top of NVBIO, which aligns short read data 2-7x faster than the original Bowtie2 running on a high-end multicore CPU at comparable quality. In this talk we will introduce the codebase and demonstrate how to use it for your own applications.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4741
Streaming:
Download:
 
Solving Large Nonlinear Systems of ODE With Hierarchical Structure Using Multi-GPGPUs and an Adaptive Runge Kutta(ARK)
Ahmad Al-Omari (The University of Georgia)
The Adaptive Runge Kutta Method (ARK) on multi-General-Purpose Graphical Processing Units GPGPUs is used for solving large nonlinear systems of first-order ordinary differential equations with over ~10,000 variables describing a large genetic network ...Read More
The Adaptive Runge Kutta Method (ARK) on multi-General-Purpose Graphical Processing Units GPGPUs is used for solving large nonlinear systems of first-order ordinary differential equations with over ~10,000 variables describing a large genetic network in systems biology for the biological clock. To carry out the computation of the trajectory of the system, a hierarchical structure of the ODEs is exploited, and an ARK solver is implemented in CUDA/C++ on GPGPUs(Kepler 20-x). The result is a 75-fold speedup for calculations of 2436 independent modules within the genetic network describing clock function relative to a comparable CPU architecture.   Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4108
Download:
 
Adaptive GenCodex : A Novel Algorithm for Compressing DNA Sequences on Multi-Cores and GPUs
Ajith Padyana (Sri Sathya Sai Institute of Higher Learning, India)
High Performance Computing has become an enabling agent to many a scientific domain of investigation. The issue of sequence analysis is a core problem in Bio-Informatics. Various strategies and implementations have been proposed and used to meet the ...Read More
High Performance Computing has become an enabling agent to many a scientific domain of investigation. The issue of sequence analysis is a core problem in Bio-Informatics. Various strategies and implementations have been proposed and used to meet the end. But the amount of data that is generated and need to be handled remains a big challenge. In this context from the point of view of computing "storage" and "communication bandwidth" become critical issues. This can be captured together and termed as "I/O Bottleneck". To overcome this we propose compression technique on the Bio-Sequences. Thereby we reduce the storage requirement and apparently increase the communication bandwidth effectively. Once this problem is overcome, we need to look for possibility of performing the essential analysis of Bio-Sequences in the compressed form. Unless this is achieved, the task remains half done. We intend to make an oral presentation of our work that includes a compression strategy in GPUs  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4256
Download:
 
CUSHAW Software Package: Harnessing CUDA-Enabled GPUs for Next Generation Sequencing Read Alignment
Bertil Schmidt (University of Mainz, Germany)
We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using CUDA-enabled GPUs. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2 and GEM. Furthermore, C ...Read More
We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using CUDA-enabled GPUs. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2 and GEM. Furthermore, CUSHAW2-GPU with a Tesla K20c GPU achieves significantly speedups over the multi-threaded CUSHAW2, BWA-SW, Bowtie2 and GEM on the 12 cores of a high-end CPU for both single-end and paired-end alignments. In addition, we have presented some features of CUSHAW3, an extension of CUSHAW2 to further improve the alignment quality of base-space reads and offer new support for color-space reads. For color-space alignment, CUSHAW3 is consistently one of the best aligners compared to SHRiMP2 and BFAST.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4117
Download:
 
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU
Jing Zhang (Virginia Tech)
BLAST, short for Basic Local Alignment Search Tool, is a fundamental algorithm in the life sciences that searches for similarities between a short query sequence and a large set of database sequences. However, with the advent of next-generation seque ...Read More
BLAST, short for Basic Local Alignment Search Tool, is a fundamental algorithm in the life sciences that searches for similarities between a short query sequence and a large set of database sequences. However, with the advent of next-generation sequencing (NGS), the exponential growth of sequence databases is arguably outstripping our ability to analyze the data. The previous studies for accelerating BLAST on GPU used coarse-grained parallel approaches, which are not adapted to GPU architecture and cannot fully utilize massively parallel computational capability of GPU. We propose a faster GPU-BLAST, mapping most time-consuming phases to GPU using a fine-grained multithreaded approach.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4214
Download:
 
Modeling the Molecular Basis of Cardiac Arrhythmia
Mohsin Jafri (George Mason University)
Heart disease is the leading cause of death in the developed world and an increasing problem in developing nations. Heart failure accounts for a majority of those affected by heart disease with a high likelihood of death. Death most often results fr ...Read More
Heart disease is the leading cause of death in the developed world and an increasing problem in developing nations. Heart failure accounts for a majority of those affected by heart disease with a high likelihood of death. Death most often results from cardiac arrhythmia. However, the mechanisms behind the initiation of this the fatal arrhythmia is yet unknown. Using a multi-scale GPU-enabled simulation we show how stochastic molecular events can trigger a cardiac arrhythmia using a hierarchy of cellular and tissue models to describe the individual proteins, the heart muscle cell and the geometry of the heart.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4259
Download:
 
Reverse Engineering of Genome-Scale Biological Networks
Raghuram Thiagarajan (University of Michigan)
Availability of genome-scale data sets in biology present a great opportunity as well as a challenge for computational biologists. Simulation and model based analysis on such large-scale dynamical systems pose compute-intensive problems. A reverse-en ...Read More
Availability of genome-scale data sets in biology present a great opportunity as well as a challenge for computational biologists. Simulation and model based analysis on such large-scale dynamical systems pose compute-intensive problems. A reverse-engineering algorithm optimized for parallel architectures has been developed to study these dynamical systems. The parallel architecture and processing power of Graphics processing units(GPUs) provide a platform to carry out genome-scale simulations. We show that genome-scale networks can be inferred using this reverse-engineering algorithm in a matter of days on a single Tesla K20 GPU.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4220
Download:
 
Applying GPU Dynamic Parallelism to High-Performance Normalization of Gene Expressions
Roberto Pinto Souto (National Laboratory for Scientific Computing, LNCC/Brazil)
This work presents methods for computing a quantile normalization (Q-norm) of high-density oligonucleotide array data on GPUs. Our approach focuses on CUDA-5.5, which allows for exploiting dynamic parallelism, and also takes advantage of the expressi ...Read More
This work presents methods for computing a quantile normalization (Q-norm) of high-density oligonucleotide array data on GPUs. Our approach focuses on CUDA-5.5, which allows for exploiting dynamic parallelism, and also takes advantage of the expressive processing power offered by the GPU Kepler architecture. We believe that our contribution represents a step forward to provide computational support to a generic Q-norm for large microarray data sets as well as a low-cost and high-performance alternative to high-end workstation systems.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4209
Download:
 
GPMoo: Genomic Selection Related Analyses
Scott Winkleblack (California Polytechnic State University - San Luis Obispo)
Exploring the use of GPGPU processing to decrease the runtime of genetic selection algorithms. We present a tool, GenSel, which can be used to efficiently infer the effects of genetic markers on a desired trait or to determine the genomic estimated b ...Read More
Exploring the use of GPGPU processing to decrease the runtime of genetic selection algorithms. We present a tool, GenSel, which can be used to efficiently infer the effects of genetic markers on a desired trait or to determine the genomic estimated breeding values (GEBV) of genotyped individuals. GenSel performs Bayesian inference using Gibbs sampling, a Markov chain Monte Carlo (MCMC) algorithm. Parallelizing this algorithm proves to be a technically challenging problem because there exists a loop carried dependence between each iteration of the Markov chain.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4244
Download:
 
Accelerating Identification of Frequent K-Mers in DNA Sequences With GPU
Shuji Suzuki (Tokyo Institute of Technology)
Identifying the frequencies of k-mers (substring of length k) in strings is an important sub-problem in many bioinformatics applications. In this research, we propose a new k-mer counting algorithm suitable for GPUs calculations based on sorting algo ...Read More
Identifying the frequencies of k-mers (substring of length k) in strings is an important sub-problem in many bioinformatics applications. In this research, we propose a new k-mer counting algorithm suitable for GPUs calculations based on sorting algorithm. We implemented this algorithm on a CPU-GPU heterogeneous environment. We implemented our algorithm onto GPU by using CUDA 5.0 and evaluated it by using real G. Gallus genome sequence data. As results, our algorithm with 12 CPU cores and 2 GPUs was 2.4 times faster than Turtle software on 12 CPU cores.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4190
Download:
 
Parallel Brain Network Analysis Platform
Xiaoming Chen (Tsinghua University, Beijing, China)
In this poster, we introduce a hybrid CPU-GPU platform to accelerate the computation of the human brain connectome. The 2 main steps of the platform are network construction from non-invasive neuroimaging data and network analysis. Using this platfor ...Read More
In this poster, we introduce a hybrid CPU-GPU platform to accelerate the computation of the human brain connectome. The 2 main steps of the platform are network construction from non-invasive neuroimaging data and network analysis. Using this platform, you can calculate the correlation matrix, the small-world property (cluster coefficient and characteristic path length), the modular structure, and the betweenness centrality. Also you can do probabilistic fiber tracking with this tool.   Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4128
Download:
 
HOOMD-blue 1.0: Easy-to-use and Highly Scalable Molecular Dynamics on GPUs
From the GPU-equipped desktop computer to a supercomputer, learn how to accelerate your MD simulations with HOOMD-blue. Complex fluids, polymers, and nano-particles are only some of the possibilities. In this webinar, Joshua A. Anderson, Senio ...Read More
From the GPU-equipped desktop computer to a supercomputer, learn how to accelerate your MD simulations with HOOMD-blue. Complex fluids, polymers, and nano-particles are only some of the possibilities. In this webinar, Joshua A. Anderson, Senior Research Area Specialist and Jens Glaser, Research Fellow at the University of Michigan, will show how you can combine any of the versatile features of HOOMD-blue to meet your research needs, and how you can easily exploit the highly flexible Python script interface in your workflow. The most important new feature in HOOMD-blue v1.0 is its multi-GPU capability, which scales HOOMD-blue’s remarkable single-GPU performance to clusters and supercomputers with many GPUs. We will demonstrate how HOOMD-blue scales on the latest generation of high performance computing systems, and give practical tips for obtaining optimal performance.
 
  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2014 - ID GTCE099
Streaming:
Download:
Climate, Weather, Ocean Modeling
Presentation
Media
Large-Scale CFD and a Full GPU Implementation of Weather Prediction Code on the TSUBAME Supercomputer
Takayuki Aoki
- Global Scientific Information and Computing Center (GSIC) of Tokyo Institute of Technology (Tokyo Tech)
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2010 - ID SC1014
Download:
 
GPU Considerations for Next Generation Weather Simulations
Thomas Schulthess
- Swiss National Supercomputing Centre
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2010 - ID SC1010
Download:
 
Tsubame 2.0: 2 Petaflops Performance of a GPU Stencil Application
Takayuki Aoki
- Tokyo Institute of Technology
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2011 - ID SC138
Download:
 
Successes and Challenges using GPUs for Weather and Climate Models
Mark Govett
- National Oceanic and Atmospheric Administration
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2011 - ID SC134
Download:
 
GPU-based Operational Weather Model with Horizontal 500m Resolution
Takayuki Aoki
Numerical weather prediction is one of the major applications in high performance computing and demands fast and high-precision simulation over fine-grained grids. In order to drastically shorten the runtime of a weather prediction code, we have ...Read More

Numerical weather prediction is one of the major applications in high performance computing and demands fast and high-precision simulation over fine-grained grids. In order to drastically shorten the runtime of a weather prediction code, we have rewritten its huge entire code for GPU computing from scratch in CUDA. The code ASUCA is a high resolution meso-scale atmosphere model that is being developed by the Japan Meteorological Agency (JMA) for the purpose of the next-generation weather forecasting service. A benchmark on the 3996 GPUs on TSUBAME 2.0 achieves extremely high performance of 145 Tflops in single precision for 14368 × 14284 × 48 mesh. With the initial data and the boundary condition currently used in the JMA weather forecast, we have carried out the run with 500m horizontal mesh 4792 × 4696 × 48, covering whole Japan area with 437 GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1173
Streaming:
 
Unified Modeling System for Seamless Weather and Climate Predictions of Monsoons
Subodh Kumar
We present our recently undergoing work to design and develop a GPU based unified modeling system for seamless weather and climate predictions of Monsoons. The system design is capable of handling different time and spatial scales of atmospheric ...Read More

We present our recently undergoing work to design and develop a GPU based unified modeling system for seamless weather and climate predictions of Monsoons. The system design is capable of handling different time and spatial scales of atmospheric phenomena that are crucial for accurate forecasting of weather and regional climates, and of monsoons in particular. Our focus is on high-resolution model utilizing accurate approximations on the icosahedral-hexagonal grid. We also develop parameterizations of fine and multi-scale moist convective processes, cloud microphysics and precipitation, radiative transfer, hydrology and land surface processes, atmospheric and oceanic turbulence. Starting with the core of LMDZ model, we are developing from scratch a parallel version appropriate for efficient computation on GPUs and CPUs. Another goal of our system design is to rid the programmer with low level programming details using a programming model that automatically distributes computation among all available CPUs and GPUs appropriately. We are developing a programming API to unify parallel code development on CPUs and GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1174
Streaming:
 
GPU Considerations for Next Generation Weather and Climate Simulations
Thomas Schulthess
Numerical weather prediction is among the oldest fields of computational science, and existed before the advent of electronic computing. Thanks to the performance of modern computers, the fidelity of weather simulations has reached a point where ...Read More

Numerical weather prediction is among the oldest fields of computational science, and existed before the advent of electronic computing. Thanks to the performance of modern computers, the fidelity of weather simulations has reached a point where they are indispensible in weather forecasting, and thus have become one of the economically most impactful domains of computational science. Typically, the dynamical cores of models of weather simulations are grid based and memory bandwidth bound, thus performing poorly on modern X86 type processors. In this presentation, we will discuss a refactoring project of the COSMO code that implements a regional climate model used by several weather services and academic institutions worldwide. The dynamical core has been rewritten and is easily portable to multiple architectures, including GPU. The physics part of the code is being ported to GPU with OpenACC directives. Preliminary performance results for production scale problems will be presented. Other contributors to this research include Oliver Fuhrer, Swiss Federal Office of Meteorology and Climatology MeteoSwiss, Tobias Gysi and David Müller, Supercomputing Systems AG, Xavier Lapillonne, Center for Climate Systems Modeling, ETH Zurich, William Sawyer, Ugo Varetto, and Mauro Bianco, Swiss National Supercomputing Center.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1175
Streaming:
 
GRAPES Weather Code Porting to and Optimization on the GPU Platform
Bin Zhou
In this session, we will discuss the GRAPES weather model and the basic porting techniques to GPU platform. It will cover how to start from the very beginning to the low level optimization. The MPI+CUDA pattern will be discussed. Four different ...Read More

In this session, we will discuss the GRAPES weather model and the basic porting techniques to GPU platform. It will cover how to start from the very beginning to the low level optimization. The MPI+CUDA pattern will be discussed. Four different modules, including GCR, Radiation, WSM6 and PBL, will be demonstrated. The performance consideration will be discussed and results showed. It will be a good example for a real-life scientific application porting procedure.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1176
Streaming:
 
GPU Computing in Numerical Space Weather Modeling
Xueshang Feng
Space weather refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and that affect human life ...Read More

Space weather refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and that affect human life or health. Space weather has two focal points: scientific research and applications. In order to make the real- or faster than real-time numerical prediction of adverse space weather events and their influence on the geospace environment, high performance computational models are required. The main objective in this talk is how programmable GPUs can be used in the numerical space weather modeling and its visualization. As an example study, GPU programming is realized for our Solar-Interplanetary-CESE MHD model (SIP-CESE MHD model) and the visualization of its numerical results by numerically studying the solar corona. Our initial tests with available hardware show speedups of roughly 10x compared to traditional software implementation. This work presents a novel application of GPU to the space weather study.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1177
Streaming:
 
Real Time GPU-Based Marine Scenes Simulation
Jerome Graindorge (ALYOTECH), Julien Houssay (ALYOTECH)
Marine survey, carried out by sea or by air, is of major concern for current defense and security applications. Essential surveillance/ observation/ identification systems involve electro-optics (visible and infra-red) and radar. Optimizing thei ...Read More

Marine survey, carried out by sea or by air, is of major concern for current defense and security applications. Essential surveillance/ observation/ identification systems involve electro-optics (visible and infra-red) and radar. Optimizing their performance requires amounts of expensive observational data spanning the wide variability of the marine environment. Computer simulation provides a valuable flexible and inexpensive alternative. Since 2007, ALYOTECH, in partnership with the IFREMER (French Research Institute for Exploration of the Sea), has been developing a GPU-based real-time ocean scene simulator for visible, infrared and radar sensors, in order to meet the challenging requirements arising from marine survey issues.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID S2053
Streaming:
Download:
 
A Stencil Library for the New Dynamic Core of COSMO
Tobias Gysi (SCS), Peter Messmer (NVIDIA)
We will present a stencil library used in the heart of the COSMO numeric weather prediction model. During the talk we'll show how we implemented an abstraction that allows easy development of new stencils and solvers on top of a framework al ...Read More

We will present a stencil library used in the heart of the COSMO numeric weather prediction model. During the talk we'll show how we implemented an abstraction that allows easy development of new stencils and solvers on top of a framework allowing execution on both CPU and GPU. The library makes efficient use of GPU resources and we will show how to structure memory accesses and computation optimally. Developers involved in porting or writing fully-featured C++ libraries for CUDA will also be interested in attending.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID S2256
Streaming:
Download:
 
CUDA/JAVA Model for Gas Line-by-Line Absorption of Atmospheric Radiation
William Godoy (NASA Langley Research Center)
The potential of graphics processing units (GPU) to speed up the calculation of radiative energy absorption by atmospheric gases is presented. Gas absorption calculations are needed at millions of electromagnetic waves to have an accurate depiction o ...Read More
The potential of graphics processing units (GPU) to speed up the calculation of radiative energy absorption by atmospheric gases is presented. Gas absorption calculations are needed at millions of electromagnetic waves to have an accurate depiction of the Earths in-coming and out-coming radiative energies. The CUDA/GPU portion obtains the gases' Voigt lineshapes, whereas the Java/CPU portion performs efficient I/O tasks on the large HITRAN database of molecular gas parameters. A modular combination of the lower-level CUDA algorithms and the higher-level Java language results in an accessible interface to the end-user that is not an expert in GPU.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID P2485
Download:
 
Heat Transfer Ray Tracing with OptiX
Scot Halverson (University of Minnesota Duluth)
QUIC Radiant is part of a suite of GPU-assisted tools developed by our research group that aim to increase knowledge for how environment and urban form interact. Our hypothesis is that urban structures exist that can minimize energy use while also mi ...Read More
QUIC Radiant is part of a suite of GPU-assisted tools developed by our research group that aim to increase knowledge for how environment and urban form interact. Our hypothesis is that urban structures exist that can minimize energy use while also minimizing air pollution exposure. Our efforts investigate the complex interactions of various types of urban structures by developing design strategies for optimizing urban form under a variety of constraints.   Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID P2495
Download:
 
Running the FIM and NIM Weather Models on GPUs
Mark Govett (NOAA Earth System Research Laboratory)
Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed ...Read More

Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed to run at 4KM resolution. This presentation will give an update on our efforts to parallelize and run these models on GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2012 - ID SC2018
Download:
 
Hybrid CPU-GPU Solutions for Weather and Cloud Resolving Climate Simulations
Thomas Schulthess (Swiss National Supercomputing Center)
Reliable weather prediction for the Alpine region and cloud resolving climate modeling require simulations that run at 1-2 km resolution. Additionally, since the largest possible ensembles are needed, high fidelity models have to run on the most ...Read More

Reliable weather prediction for the Alpine region and cloud resolving climate modeling require simulations that run at 1-2 km resolution. Additionally, since the largest possible ensembles are needed, high fidelity models have to run on the most economical resource in a given time to solution. In this presentation we will give an update on the refactoring of COSMO, a widely used production code in academia as well as seven European weather services, and discuss the performance experience on hybrid CPU-GPU systems.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2012 - ID SC2036
Download:
 
Analysis of GPU-acceleration for a Climate Modeling Application
Mohamed Wahib (Advanced Institute for Computational Science, RIKEN, JAPAN)
The use of GPUs to accelerate applications has always been a nontrivial task. SCALE3, a climate simulation model developed at AICS, RIKEN in Japan is no exception. The features of SCALE required careful adjustments when ported for the Fermi architect ...Read More
The use of GPUs to accelerate applications has always been a nontrivial task. SCALE3, a climate simulation model developed at AICS, RIKEN in Japan is no exception. The features of SCALE required careful adjustments when ported for the Fermi architecture. Moreover, porting SCALE to use the Kepler architecture brought in new adjustments affected by the features of Kepler. This poster discusses the challenges for porting SCALE3 to nVidia's Fermi and Kepler architectures. Moreover, the change in design choices when porting from Fermi to Kepler are highlighted. The results show that utilizing architecture features of Kepler can be greatly effective when carefully taking into account the nature of the application.   Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3145
Download:
 
Modeling Vegetative Heat Transfer in Urban Environments with OptiX
Matthew Overby (University of Minnesota Duluth)
Our research group is developing QUIC Energy, a software tool that models radiative heat transfer in three dimensional urban environments. We hypothesize that trees, vegetative roofing, and other green infrastructure have the potential to reduce heat ...Read More
Our research group is developing QUIC Energy, a software tool that models radiative heat transfer in three dimensional urban environments. We hypothesize that trees, vegetative roofing, and other green infrastructure have the potential to reduce heat load in urban environments and lower power consumption required for heating and cooling buildings. Additionally, certain building materials, shapes, and urban layouts can mitigate trapped heat and air pollutants. By taking advantage of parallel computation on the GPU using NVIDIA's OptiX ray tracing engine, we are able to model urban domains upwards of five square kilometers, containing thousands of trees and buildings.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3178
Download:
 
Hybrid Fortran - New Directive Based GPGPU / CPU Framework
Michel Muller (RIKEN)
One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pure CUD ...Read More
One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pure CUDA Fortran or OpenACC leads to a lengthy manual redesign and large execution time overheads when executing the new code back on the CPU. The Hybrid Fortran meta programming framework has been designed to (a) automate this process and (b) be able to run the user code in CPU optimized loop structure as well, thus enabling optimal performance both on GPU and CPU. Results when using it for the ASUCA physical core show High GPU performance, CPU performance on par with the original x86-optimized code, and reduced portation overhead.   Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Tools & Libraries, GTC 2013 - ID P3199
Download:
 
Global High Resolution Estimation of Evapotranspiration - SEBS on GPU using CUDA-C
Mohammad Abouali (Computational Science Research Center - San Diego State University)
This poster introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). It uses Compute Unified Device Architecture C (CUDA-C) programming model. The o ...Read More
This poster introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). It uses Compute Unified Device Architecture C (CUDA-C) programming model. The output of the new implementation is compared to a MATLAB code that has already been fully tested in the Water Cycle Multimission Observation Strategy (WACMOS) project. The code is timed against both MATLAB and a purely high-performance C implementation of the same algorithm. The code has been tested on several different NVIDIA cards, with different compute capabilities.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3225
Download:
 
Numerical Ocean Modeling and Simulation with CUDA
Chris Lupo (California Polytechnic State University)
Learn how GPGPUs have been targeted to the Regional Ocean Modeling System (ROMS) software package. We describe a multi-model parallelization approach that uses CUDA Fortran and the OpenACC directives based models supported PGI compilers. Initial ...Read More

Learn how GPGPUs have been targeted to the Regional Ocean Modeling System (ROMS) software package. We describe a multi-model parallelization approach that uses CUDA Fortran and the OpenACC directives based models supported PGI compilers. Initial research using only CUDA Fortran on one Tesla card offers comparable performance to a 16-node CPU cluster, and a 2.5x speedup compared to an OpenMP implementation on an eight-core CPU system. We are currently targeting multiple GPU devices, and the use of OpenACC to parallelize more of the ROMS software to obtain even greater performance enhancements to allow larger, higher resolution ocean models to be simulated.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID S3082
Streaming:
Download:
 
Accelerating Shallow Water Flow and Mass Transport Using Lattice Boltzmann Methods on GPUs
Kevin Tubbs (Dell, Inc.)
A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing P ...Read More

A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing Processors. GPU computing is performed using the Jacket GPU engine for MATLAB and ArrayFire. Mass transport with velocity-dependent dispersion in shallow water flow is simulated by combining the MRT-LBM model and the TRT-LBM model. This talk will demonstrate the GPU parallel performance for modeling mass transport phenomena in shallow water flows.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Algorithms, GTC 2013 - ID S3324
Streaming:
Download:
 
Porting Marine Ecosystem Model Spin-up Using Transport Matrices to GPUs
Jaroslaw Piwonski (Institute for Computer Science and Kiel Marine Science, Centre for Interdisciplinary Marine Science, Christian-Albrechts Universitaet zu Kiel)
This session shows the necessary steps of porting an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory archi ...Read More

This session shows the necessary steps of porting an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The used programming languages are C and Fortran. A special emphasis lies on using biogeochemical models written in Fortran without any modifications to the original code. Using the GPU Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing & HPC, GTC 2013 - ID S3385
Streaming:
Download:
 
Towards GPU-accelerated Operational Weather Forecasting
Oliver Fuhrer (MeteoSwiss)
A full GPU implementation of the COSMO numerical weather prediction and regional climate model will will presented. Design criteria such as high performance, retaining maintainability, enforcing a single source code which still compiles and runs ...Read More

A full GPU implementation of the COSMO numerical weather prediction and regional climate model will will presented. Design criteria such as high performance, retaining maintainability, enforcing a single source code which still compiles and runs on x86-based, led us to opt for different approaches in different parts of the model code. Performance critical parts are implemented employing a stencil library built on top of a domain specific embedded language (DSEL) with a CUDA back-end. Other parts were ported by restructuring of the legacy Fortran code and inserting OpenACC compiler directives. The session will also highlight the integration of these different technologies.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID S3417
Streaming:
Download:
 
Running the FIM and NIM Weather Models on GPUs
Mark Govett (NOAA Earth System Research Laboratory)
Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed ...Read More

Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed to run at 4KM resolution. This presentation will give an update on our efforts to parallelize and run these models on GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing & HPC, GTC 2013 - ID SC2018
Streaming:
Download:
 
Accelerating NEMO with OpenACC
Maxim Milakov (NVIDIA)
Learn how OpenACC can be used to accelerate a challenging application with a large amount of code and a flat profile. NEMO is an Ocean modeling code consisting of tens of thousands of lines of code, hundreds of subroutines. NEMO also has a flat ...Read More

Learn how OpenACC can be used to accelerate a challenging application with a large amount of code and a flat profile. NEMO is an Ocean modeling code consisting of tens of thousands of lines of code, hundreds of subroutines. NEMO also has a flat execution profile, making it a challenge to expose opportunities for parallel acceleration. Using OpenACC directives, we show how the time stepping loop can be migrated to the GPU to achieve substantial performance improvements on multiple problems at small and large scales.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Programming Languages, GTC 2013 - ID S3209
Streaming:
Download:
 
Hybrid Fortran 90: High Performance, Low Friction GPGPU for Weather Prediction
Michel Muller (RIKEN)
One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pur ...Read More

One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pure CUDA Fortran or OpenACC leads to a lengthy manual redesign and large execution time overheads when executing the new code back on the CPU. The Hybrid Fortran 90 meta programming framework has been designed to (a) automate this process and (b) be able to run the user code in CPU optimized loop structure as well, thus enabling optimal performance both on GPU and CPU. Results when using it for the ASUCA physical core show High GPU performance, CPU performance on par with the original x86-optimized code, and reduced portation overhead. In this session learn about what''s behind Hybrid Fortran 90 and how to use it.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Programming Languages, GTC 2013 - ID S3326
Streaming:
Download:
 
Development, Parallelization and Performance of the NIM Next-Generation Weather Model on Titan
Mark Govett (NOAA)
The Non-hydrostatic Icosahedral Model (NIM) is a next-generation global weather model being developed at NOAA to improve 0-100 day weather predictions. Since development began in 2008, the model has been designed to run on highly parallel computer a ...Read More
The Non-hydrostatic Icosahedral Model (NIM) is a next-generation global weather model being developed at NOAA to improve 0-100 day weather predictions. Since development began in 2008, the model has been designed to run on highly parallel computer architectures such as GPUs. GPU parallelization has relied on the directive-based Fortran-to-CUDA ACCelerator (F2C-ACC) compiler developed at NOAA. Recent work has focused on parallelization of model physics, evaluating the openACC compilers, and preparing the model to run at the full 3.5KM resolution on 5000 nodes of Titan. This talk will report on the development of the NIM model, describe our efforts to improve parallel performance on Titan, and report on our experiences using the openACC compilers.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2014 - ID S4157
Streaming:
 
QUIC EnvSim: Radiative Heat Transfer in Vegetative and Urban Environments with NVIDIA OptiX
Matthew Overby (University of Minnesota Duluth)
This session presents QUIC EnvSim, a scientific tool for modeling the complex interactions between the environment and urban form. The talk will focus on the simulation of radiative heat transfer in urban environments with vegetation (such as trees, ...Read More
This session presents QUIC EnvSim, a scientific tool for modeling the complex interactions between the environment and urban form. The talk will focus on the simulation of radiative heat transfer in urban environments with vegetation (such as trees, parks, or green rooftops) using the GPU accelerated NVIDIA OptiX ray tracing engine. Attend this session to learn how we utilize OptiX to efficiently and accurately simulate radiative transport in urban domains. Topics include: (1) The physical properties of surfaces and vegetation and how they interact with longwave and shortwave radiation; (2) Efficient and scalable discretization of large urban domains; (3) Strategies we employed for overcoming challenges such as atomic operations, multiple GPUs, and more; and (4) Results that illustrate the validity, efficiency, and scalability of the system.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Rendering & Ray Tracing, GTC 2014 - ID S4312
Streaming:
Download:
 
ASUCA on GPU: Uncompromising Hybrid Port for Physical Core of Japanese Weather Model
Michel Muller (RIKEN Advanced Institute for Computational Science)
ASUCA is the next generation non-hydrostatic Japanese mesoscale weather prediction model, currently developed at the Japan Meteorological Agency. In order to join the successful GPU port of its Dynamical Core by Shimokawabe et al., the Physical Core ...Read More
ASUCA is the next generation non-hydrostatic Japanese mesoscale weather prediction model, currently developed at the Japan Meteorological Agency. In order to join the successful GPU port of its Dynamical Core by Shimokawabe et al., the Physical Core has now been fully ported as well. In order to achieve a unified codebase with high usability as well as high performance on both GPU and CPU, a new directive based Open Source language extension called 'Hybrid Fortran' has been used (as introduced at GTC 2013). Using a python-based preprocessor it automatically creates CUDA Fortran code for GPU and OpenMP Fortran code for CPU - with two separate horizontal loop orders in order to keep performance. Attendees of this session will learn how to create a hybrid codebase with high usability as well as high performance on both CPU and GPU, how we used a preprocessor to achieve our goals and, how to use Macros for Memory optimizations while following the DRY principle.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2014 - ID S4352
Streaming:
Download:
 
Weather Prediction Code Witten by a High-productivity Framework for Multi-GPU Computing
Takashi Shimokawabe (Tokyo Institute of Technology)
Numerical weather prediction is one of the major applications in high-performance computing and is accelerated on GPU supercomputers. Obtaining good parallel efficiency using more than thousand GPUs often requires skillful programming, for example, b ...Read More
Numerical weather prediction is one of the major applications in high-performance computing and is accelerated on GPU supercomputers. Obtaining good parallel efficiency using more than thousand GPUs often requires skillful programming, for example, both MPI for the inter-node communication and NVIDIA GPUDirect for the intra-node communication. The Japan Meteorological Agency is developing a next-generation high-resolution meso-scale weather prediction code ASUCA. We are implementing it on a multi-GPU platform by using a high-productivity framework for mesh-based application. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU codes. The framework can also hide the complicated implementation for the efficient communications described above. In this presentation, we will show the implementation of the weather prediction code by using this framework and the performance evaluation on the TSUBAME 2.5 supercomputer at Tokyo Institute of Technology.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Computational Fluid Dynamics, Supercomputing & HPC, GTC 2014 - ID S4565
Streaming:
 
Developing a System For Real-Time Numerical Simulation During Physical Experiments in a Wave Propagation Laboratory
Darren Schmidt (National Instruments)
ETH-Zurich is proposing a new concept for wave propagation laboratories in which the physical experiment is linked with a numerical simulation in real time. Adding live experimental data to a larger numerical simulation domain creates a virtual lab e ...Read More
ETH-Zurich is proposing a new concept for wave propagation laboratories in which the physical experiment is linked with a numerical simulation in real time. Adding live experimental data to a larger numerical simulation domain creates a virtual lab environment never before realized and enabling the study of frequencies inherent in important seismological and acoustic real-world scenarios. The resulting environment is made possible by a real-time computing system under development. This system must perform computations typically reserved for traditional (offline) HPC applications but produce results in a matter of microseconds. To do so, National Instruments is using the LabVIEW platform to leverage NI's fastest data acquisition and FPGA hardware with NVIDIA's most powerful GPU processors to build a real-time heterogenous simulator.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Big Data Analytics, Numerical Algorithms & Libraries, Signal & Audio Processing, GTC 2014 - ID S4682
Streaming:
Download:
 
Delivering Performance in Scientific Simulations: Present and Future Role of GPUs in Supercomputing
Thomas Schulthess (ETH Zurich / CSCS)
GPU-based supercomputers are the most energy efficient and among the most powerful computing systems in use today. We show with examples from computational physics and climate simulations how this performance is delivered today to solve real-world pr ...Read More
GPU-based supercomputers are the most energy efficient and among the most powerful computing systems in use today. We show with examples from computational physics and climate simulations how this performance is delivered today to solve real-world problems. You will see how application software can has been structured in order to port seamlessly across hardware platforms, what aspects of current hybrid CPU-GPU platforms matter, and how such architectures should best develop, so that applications continue to benefit from exponential performance increases in the future.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4719
Streaming:
Download:
 
GPU Parallelization of Geostatistical Simulation for Mineral Reserves Quantification
Daniel Baeza (ALGES laboratory)
Geostatistical techniques are widely used for the spatial characterization of phenomena in the Earth Sciences. Many of the estimation and simulation techniques proposed decades ago are still in use, however little effort has been done to rethink thei ...Read More
Geostatistical techniques are widely used for the spatial characterization of phenomena in the Earth Sciences. Many of the estimation and simulation techniques proposed decades ago are still in use, however little effort has been done to rethink their programming structure to take advantage of the current languages and hardware available. This poster shows a parallel implementation of the Turning Bands Method for conditional simulation of random fields, using Graphics Processing Units (GPU).   Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2014 - ID P4248
Download:
 
GPU Accelerated Numerical Methods for Tsunami Modeling
Rajesh Gandham (Rice University)
We present an efficient GPU accelerated numerical method for modeling tsunami wave propagation. We use two dimensional shallow water equations to model tsunami waves and a high order accurate discontinuous Galerkin method for numerical solution of th ...Read More
We present an efficient GPU accelerated numerical method for modeling tsunami wave propagation. We use two dimensional shallow water equations to model tsunami waves and a high order accurate discontinuous Galerkin method for numerical solution of the model. We describe the inherent fine-grain parallel nature of our algorithms, and implementation on GPUs and CPUs using a portable threading language OCCA. Kernels written in OCCA are cross compiled with CUDA, OpenCL or OpenMP at runtime. This enables portability of the code among several hardware architectures. We compare the performance of these kernels across different threading languages on GPUs and CPUs.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2014 - ID P4135
Download:
Cloud Visualization
Presentation
Media
MatCloud: Accelerating Matrix Math GPU Operations with SaaS
Frank Mueller, Xing Wu
We present MatCloud (www.mat-cloud.com), a cloud infrastructure and service for scientific computing using state-of-the-art GPU clusters. MatCloud is a service infrastructure exposed by a simple web terminal interface to run Matlab-like commands ...Read More

We present MatCloud (www.mat-cloud.com), a cloud infrastructure and service for scientific computing using state-of-the-art GPU clusters. MatCloud is a service infrastructure exposed by a simple web terminal interface to run Matlab-like commands/scripts. Join us to see how GPU technology can not only be applied to cloud computing community, but also boost the adoption of cloud computing for its dramatic performance gains over traditional cloud infrastructures.MatCloud is an in-progress academic project and is under active development.

  Back
 
Keywords:
Cloud Visualization, Developer - Tools & Libraries, GTC 2010 - ID S1020260
Streaming:
Download:
 
Microsoft RemoteFX - GPU Virtualization for Desktop Centralization
Tad Brockway
- Microsoft
Learn about Microsoft''s upcoming GPU Virtualization feature, RemoteFX, which will ship in Windows Server 2008 R2 SP1. ...Read More
Learn about Microsoft''s upcoming GPU Virtualization feature, RemoteFX, which will ship in Windows Server 2008 R2 SP1. Microsoft RemoteFX enables GPUs to be hosted in the datacenter as a service that can be shared by multiple users for streaming the real-time and complete Windows 7 desktop experience to ultra-lightweight client devices anywhere on the corporate network. With Microsoft RemoteFX, users will be able to work remotely in a Windows Aero desktop environment, watch full-motion video, enjoy Silverlight animations, and run 3D applications - all with the fidelity of local-like performance.  Back
 
Keywords:
Cloud Visualization, Computer Graphics, GTC 2010 - ID S102243
Streaming:
Download:
 
GPU Cloud Computing Case Study in Civil Engineering using RealityServer
Tamrat Belayneh, Paul Arden
- mental images
 
Keywords:
Cloud Visualization, Supercomputing 2010 - ID SC1008
Download:
 
GPU Cloud Computing 101: Getting Started
Dale Southard
- NVIDIA
 
Keywords:
Cloud Visualization, Supercomputing 2010 - ID SC1007
Download:
 
Studiopass: Cloud Based Media Collaboration Using Tegra Portable Devices
Kevin Jackson
- ViewPartners Limited
 
Keywords:
Cloud Visualization, Embedded & Automotive, SIGGRAPH 2011 - ID SIG1123
Download:
 
GPGPU Computing with Amazon EC2
Deepak Singh
- Amazon
 
Keywords:
Cloud Visualization, Supercomputing 2011 - ID SC116
Download:
 
Graphics in the Cloud - How NVIDIA is Enabling Cloud Visualization
Will Wade (NVIDIA)
Engineers, artists, scientists, and gamers are the most demanding visual thinkers on the planet, and as such have not been willing to move their computing environments to the infamous "cloud". These remotely accessed systems are seen a ...Read More

Engineers, artists, scientists, and gamers are the most demanding visual thinkers on the planet, and as such have not been willing to move their computing environments to the infamous "cloud". These remotely accessed systems are seen as slow and not up to the visual experience that users expect when dealing with these types of applications. NVIDIA aims to change that perception with the NVIDIA Virtual Graphics Platform. In this session you will hear about the technologies behind accelerating graphics in the cloud, and some of the industry partnerships that are enabling it.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2254
Streaming:
Download:
 
Scalable GPU Computing Service Architecture
Henrik Hoj Madsen (LEGO), Michael Scholer (LEGO)
In this session we describe our GPU accelerated computing service which supports several internal business processes in a large scale company setup. The service supports diverse computational needs such as on-demand rendering, mesh optimization, ...Read More

In this session we describe our GPU accelerated computing service which supports several internal business processes in a large scale company setup. The service supports diverse computational needs such as on-demand rendering, mesh optimization, a Massive Multiplayer Online Game (MMO), product visualizations and other demanding computational tasks. We present the architectural considerations for a service-oriented computational framework and the practical learning's and opportunities encountered during development a enterprise system using NVIDIA technologies such as CUDA, OptiX, OpenGL and OpenCL. Our aim is to share knowledge and present LEGO's vision for a GPU accelerated computational platform as a business-driven technology.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2261
Streaming:
Download:
 
Delivering 3D Professional Graphics from the Cloud with Citrix XenDesktop
Derek Thorslund (Citrix Systems, Inc.)
Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual pro ...Read More

Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual property safe in the data center since only fully-rendered screen images are sent over the network. Users in remote locations no longer have to wait for large file transfers. And they can access 3D models from a wide variety of devices, including iPads and Android tablets. Learn how Citrix XenDesktop, XenServer and Receiver technologies have made all of this a reality for many organizations today.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2413
Streaming:
Download:
 
Accelerating Cloud Graphics
Franck Diard (NVIDIA)
A new NVIDIA SDK provides access to a class of key components which allow optimal capture, compression, streaming and low latency display of high performance games from the cloud. We demonstrate how all these components fit together to deliver a ...Read More

A new NVIDIA SDK provides access to a class of key components which allow optimal capture, compression, streaming and low latency display of high performance games from the cloud. We demonstrate how all these components fit together to deliver an ultimate cloud gaming experience for the customer, but also how they help optimize the relevant metrics for cloud gaming companies.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2627
Streaming:
Download:
 
Interactive Preclinical Analytics via GPU Cloud Platform (Presented by Penguin Computing)
Matt Jacobs (Penguin Computing), David Weinstein (Numira Biosciences)
David Weinstein, CTO of Numira Biosciences and Matt Jacobs, SVP of Corporate Development for Penguin Computing will discuss how Penguin's On-Demand GPU compute environment (POD) and Numira's specialized medical imaging services have been ...Read More

David Weinstein, CTO of Numira Biosciences and Matt Jacobs, SVP of Corporate Development for Penguin Computing will discuss how Penguin's On-Demand GPU compute environment (POD) and Numira's specialized medical imaging services have been forged into a single, service-based offering for the pharmaceutical and bioinformatics markets. Attendees will learn more about the nature of GPU-based cloud resources and the benefits and challenges associated with bringing a commercial medical imaging service to market on such a platform.

  Back
 
Keywords:
Cloud Visualization, GTC 2012 - ID S2639
Streaming:
Download:
 
Accelerating Simulation and Analysis with Hybrid GPU Parallelization and Cloud Computing
Devin Jensen (Altair Engineering)
An innovative hybrid parallelization using multiple GPUs and MPI dramatically reduces solution time for structural analysis and sensitivity calculations. Offloading the intensive matrix computation on the GPU and using heterogeneous computing im ...Read More

An innovative hybrid parallelization using multiple GPUs and MPI dramatically reduces solution time for structural analysis and sensitivity calculations. Offloading the intensive matrix computation on the GPU and using heterogeneous computing improves performance. Users also benefit from accelerated access to compute resources "in the cloud."

*Note: This session was not recorded.

  Back
 
Keywords:
Cloud Visualization, SIGGRAPH 2012 - ID SIG1225
Download:
 
Graphics in the Cloud: How NVIDIA is Enabling Cloud Visualization
Will Wade (NVIDIA)
Learn about NVIDIA Maximus 2 and the latest NVIDIA Kepler GPU architecture that is enabling design and creative professionals to realize groundbreaking performance and productivity benefits through a complete transformation of their traditional ...Read More

Learn about NVIDIA Maximus 2 and the latest NVIDIA Kepler GPU architecture that is enabling design and creative professionals to realize groundbreaking performance and productivity benefits through a complete transformation of their traditional workflows.

  Back
 
Keywords:
Cloud Visualization, SIGGRAPH 2012 - ID SIG1223
Download:
 
Accelerating Compute-Intensive Processing with Hybrid GPU Parallelization and Cloud Computing
Ravi Kunju (Altair Engineering)
In this presentation, Altair will discuss how innovative hybrid parallelization using multiple GPUs and MPI dramatically reduces runtime for certain classes of compute-intensive workloads. Offloading intensive computations on the GPU and using h ...Read More

In this presentation, Altair will discuss how innovative hybrid parallelization using multiple GPUs and MPI dramatically reduces runtime for certain classes of compute-intensive workloads. Offloading intensive computations on the GPU and using heterogeneous computing with optimized workload management improves performance; users also benefit from simplified, accelerated access to compute resources via cloud portals.

  Back
 
Keywords:
Cloud Visualization, Supercomputing & HPC, Supercomputing 2012 - ID SC2026
Download:
 
Visualization as a Service Performed by GPGPU Platforms based on NVIDIA(r) TESLA
Sergio Augusto Gelvez Cortes (Universidad Industrial de Santander)
Scientific research demands visualisation to attain better knowledge of the phenomena studied, be these natural or technological in nature. HPC resources are transcendental for scientific computing, but those resources are not readily available in al ...Read More
Scientific research demands visualisation to attain better knowledge of the phenomena studied, be these natural or technological in nature. HPC resources are transcendental for scientific computing, but those resources are not readily available in all sites. An important part of those resources and an area of interest is visualisation of large scientific images; a solution for academic institutions is a display cluster made from off-the-shelf parts. Noting the restrictions mentioned, it's important to share those resources when available. Thus, we propose visualisation as a service to scientific computing needs using GUANE-1, a hybrid supercomputing platform based on NVIDIA(r) TESLA GPUs.  Back
 
Keywords:
Cloud Visualization, Clusters & GPU Management, GTC 2013 - ID P3253
Download:
 
Using Tesla GPUs, Reality Server and Penguin Computing's Cloud for Visualizing Product Customizations (Presented by Penguin Computing)
Arend Dittmer (Penguin Computing)
Penguin Computing's public HPC cloud Penguin Computing on Demand (POD) provides compute power for HPC applications through NVidia Tesla GPUs. To make it easy to leverage NVidia Tesla GPU resources for rendering tasks Penguin is hosting migen ...Read More

Penguin Computing's public HPC cloud Penguin Computing on Demand (POD) provides compute power for HPC applications through NVidia Tesla GPUs. To make it easy to leverage NVidia Tesla GPU resources for rendering tasks Penguin is hosting migenius' Reality Server. The RealityServer platform is a 3D web services software that leverages NVidia Tesla GPUs to deliver interactive, photorealistic applications over the web, enabling product designers, architects and consumers to easily visualise 3D scenes with remarkable realism. The session will discuss the workflow for using Reality Server on POD. Using Fluid Inc's Configure offering as an example the session will illustrate how retailers can leverage POD, NVidia Teslas and the reality server platform for scalable, fast-to-market, and easy to manage product customizations.

  Back
 
Keywords:
Cloud Visualization, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3552
Streaming:
Download:
 
Power Plays in the Digital Era
Gil Rosen (T-Labs, Deutsche Telekom)
The "Power Plays in the Digital Era" is a thought leadership presentation that aims to provide strategic clarity in an era of chaos. In this session, the real strategy of leading market players will be exposed and analyzed providing in ...Read More

The "Power Plays in the Digital Era" is a thought leadership presentation that aims to provide strategic clarity in an era of chaos. In this session, the real strategy of leading market players will be exposed and analyzed providing insight into how the market is likely to develop. This information can be key for decision makers who need to place their bets today and make decisions that will effect their companies future.

  Back
 
Keywords:
Cloud Visualization, Media & Entertainment, Mobile Summit, Remote Graphics & Cloud-Based Graphics, GTC 2013 - ID S3593
Streaming:
Download:
 
State-of-the-Art Virtualized Graphics
Will Wade (NVIDIA), Ian Williams (NVIDIA)
As businesses look to move PCs to the cloud, users are more and more demanding of a better experience and support for all of their devices. NVIDIA GRID enables state of the art graphics in a virtualized environment. This session will show how NV ...Read More

As businesses look to move PCs to the cloud, users are more and more demanding of a better experience and support for all of their devices. NVIDIA GRID enables state of the art graphics in a virtualized environment. This session will show how NVIDIA GRID enables graphics-intensive applications to run interactively in the cloud and display on any device, ranging from phones to tablets to laptops. It also covers the technology behind GPUs in virtual environments and the graphics architecture seen by applications.

  Back
 
Keywords:
Cloud Visualization, SIGGRAPH 2013 - ID SIG1312A
Streaming:
Download:
 
Delivering 3D Graphics from the Private or Public Cloud with XenDesktop and GRID
Derek Thorslund (Citrix)
Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual pro ...Read More

Recent technological advances have made it practical to deliver 3D professional graphics applications from the Cloud (private or public) with a high quality user experience and at an attractive cost. Organizations can keep their intellectual property safe in the data center since only fully-rendered screen images are sent over the network. Users in remote locations no longer have to wait for large file transfers and can access 3D models from a wide variety of devices, including iPads, Android tablets and thin clients.

Join Derek Thorslund, Director of Product Management, Citrix to learn how Citrix XenDesktop, XenServer and Receiver technologies leverage NVIDIA GRID to make all of this a reality for many organizations today.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE042
Streaming:
Download:
 
Virtualizing Tough 3D Workloads with VMware Horizon View and NVIDIA Technologies
Mike Coleman (VMware)
Join Mike Coleman, Sr. Product Manager User Experience at VMware to understand how virtualized 3D graphics can benefit your entire user base - from knowledge workers to high-end design engineers. NVIDIA and VMware have built a platform ...Read More

Join Mike Coleman, Sr. Product Manager User Experience at VMware to understand how virtualized 3D graphics can benefit your entire user base - from knowledge workers to high-end design engineers.

NVIDIA and VMware have built a platform that allows the toughest workloads to be virtualized, while improving reliability and security. From software-based GPUs to shared graphics to dedicated virtual workstations, there is an option for every use case and budget.

 Specific topics include:

 • What is VMware Horizon View and what benefits does it bring to the desktop?

• How are 3D graphics implemented in VMware Horizon View, including the latest joint announcements from VMware and NVIDIA?

• What use cases can be addressed with virtualized 3D graphics?

• What customers are using the technology today?

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE051
Streaming:
Download:
 
NVIDIA GRID VCA: A Turnkey Appliance Delivering Remote Graphics for Design and Engineering Applications
Ankit Patel (NVIDIA)
Imagine giving your Adobe, Autodesk or SolidWorks designers and engineers the power of a workstation delivered over your network. NVIDIA’s GRID Visual Computing Appliance (VCA) is a turnkey solution that delivers remote graphics for up to ...Read More

Imagine giving your Adobe, Autodesk or SolidWorks designers and engineers the power of a workstation delivered over your network. NVIDIA’s GRID Visual Computing Appliance (VCA) is a turnkey solution that delivers remote graphics for up to 8 concurrent workstation users. In this webinar, Ankit Patel, Sr. Product Manager, will show how GRID VCA allows you to optimize your design and engineering teams, giving them the performance they need, while giving you the security and manageability you require.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE039
Streaming:
Download:
 
GPU Accelerated XenDesktop for Designers and Engineers
Thomas Poppelgaard (Poppelgaard.com)
If you’ve ever wanted to virtualize your CAD or professional video graphics application and have the exact same local experience on a secure central platform, then this webinar provides insight on how to get there. Join Technology Evangeli ...Read More

If you’ve ever wanted to virtualize your CAD or professional video graphics application and have the exact same local experience on a secure central platform, then this webinar provides insight on how to get there. Join Technology Evangelist, Thomas Poppelgaard, and learn how Citrix XenDesktop®, XenApp® and XenServer®, in combination with NVIDIA GRID, makes it possible to virtualize 2D/3D applications from any device, anywhere, while keeping your data and intellectual property safe and secure.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE037
Streaming:
Download:
 
How NVIDIA GRID Brings Amazing Graphics to the Virtualized Experience
Will Wade (NVIDIA)
You can’t buy a phone, computer, tablet, PC or workstation today without a GPU. Why would you expect a server without graphics to successfully serve the same users? As enterprises look to move PCs to the data center, users are requiring th ...Read More

You can’t buy a phone, computer, tablet, PC or workstation today without a GPU. Why would you expect a server without graphics to successfully serve the same users? As enterprises look to move PCs to the data center, users are requiring the modern PC experience that they have come to expect from their desktop. Users are not willing to go back to Windows 95. And now they don’t have to.  NVIDIA GRID for enterprise enables IT managers to deliver an experience equal to a local PC with all the promised benefits of a virtual desktop environment.In this webinar presented by Will Wade, Director of GRID Products, NVIDIA, you'll learn how GRID is being enabled in the most common hypervisors and talk about the technology behind GPUs in virtual environments.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE034
Streaming:
Download:
 
Revolutionize Virtual Desktops with the One Missing Piece: A Scalable GPU
Will Wade (NVIDIA)
With all PCs, tablets, phones, and even modern cars running a graphical user interface, how can we expect a virtual desktop without a graphics accelerator to compete in the minds of users? Well, now we don't have to. Just like virtualization ...Read More

With all PCs, tablets, phones, and even modern cars running a graphical user interface, how can we expect a virtual desktop without a graphics accelerator to compete in the minds of users? Well, now we don't have to. Just like virtualization enables sharing of other system resources, NVIDIA's new GRID vGPU technology now enables virtualized graphics acceleration. This new technology enables the GRID GPU to scale across the spectrum of users in your company giving them the experience they've come to expect from a modern desktop.Join Will Wade, Director of GRID Products, NVIDIA as he discusses the details of the architecture, and how to successfully deploy usable virtual desktops across your organization.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE060
Download:
 
Getting the Most out of NVIDIA GRID vGPU with Citrix XenServer
Steve Harpster (NVIDIA)
Join Steve Harpster, Solution Architect, NVIDIA, for this technical webinar and learn how to set up GRID vGPU with Citrix XenServer and Citrix XenDesktop 7.1 Tech Preview. You'll also discover how to optimize the virtual machines to get the ...Read More

Join Steve Harpster, Solution Architect, NVIDIA, for this technical webinar and learn how to set up GRID vGPU with Citrix XenServer and Citrix XenDesktop 7.1 Tech Preview. You'll also discover how to optimize the virtual machines to get the best performance for your demanding 3D workloads, and have your questions answered by Citrix and NVIDIA experts.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE061
Download:
 
NVIDIA GRID VCA for SolidWorks Users
Ankit Patel (NVIDIA)
The NVIDIA GRID™ Visual Computing Appliance (VCA) is the only platform certified and supported by Dassault Systèmes to virtualize and remotely deliver SolidWorks 2014 over the network. VCA is a powerful GPU-based appliance that can ...Read More

The NVIDIA GRID™ Visual Computing Appliance (VCA) is the only platform certified and supported by Dassault Systèmes to virtualize and remotely deliver SolidWorks 2014 over the network. VCA is a powerful GPU-based appliance that can be centrally located and accessed via the company network. GPU acceleration gives users working locally or remotely, the same SolidWorks experience they would get from a dedicated high performance desk-side workstation. It’s a powerful tool for small and medium-size businesses looking to provide their workforce with workstation performance anywhere, anytime - without the IT complexity of commercial virtualization solutions.Join Ankit Patel, Sr. Product Manager, NVIDIA to learn more about GRID VCA and its benefits for the SolidWorks community.

  Back
 
Keywords:
Cloud Visualization, GTC Webinars 2013 - ID GTCE070
Download:
 
Jared Cowart (NVIDIA)
Join Jared Cowart, NVIDIA Solution Architect, for this technical webinar and learn how to set up an NVIDIA GRID™ vGPU (virtual GPU) with Citrix XenServer and Citrix XenDesktop 7.1. You'll also discover how to optimize your virtual mach ...Read More

Join Jared Cowart, NVIDIA Solution Architect, for this technical webinar and learn how to set up an NVIDIA GRID™ vGPU (virtual GPU) with Citrix XenServer and Citrix XenDesktop 7.1. You'll also discover how to optimize your virtual machines to get the best performance for your demanding 3D workloads. Plus, get insight into what to consider when planning for scalability and density. Key takeways from the webinar include:- How to demo, pilot, and deploy GPU-accelerated virtual desktops and applications- Tips and tricks for having an amazing HDX 3D Pro demo- Planning and scaling guidance- How to equip your demo, lab, or hosting platform with NVIDIA graphics 

  Back
 
Keywords:
Cloud Visualization, Remote Graphics & Cloud-Based Graphics, GTC Webinars 2014 - ID GTCE077
Streaming:
Download:
Clusters & GPU Management
Presentation
Media
A Large Scale Simulation of Lattice QCD with a GPU Cluster
Ting-Wai Chiu
Quantum Chromodynamics (QCD) is the quantum field theory of the strong interaction, describing the interactions of the quarks and gluons making up hadrons (e.g., proton, neutron, and pion). Most importantly, it accounts for the nuclear energy in ...Read More

Quantum Chromodynamics (QCD) is the quantum field theory of the strong interaction, describing the interactions of the quarks and gluons making up hadrons (e.g., proton, neutron, and pion). Most importantly, it accounts for the nuclear energy inside an atom, as well as plays an important role in the evolution of the early universe. To solve QCD is a grand challenge among all sciences. Now the most promising approach to solve QCD nonperturbatively is to discretize the continuum space-time into a 4 dimensional lattice (i.e., lattice QCD), and to compute physical observables with Monte Carlo simulation. For lattice QCD with exact chiral symmetry, it often requires supercomputers (e.g., 10 racks of IBM BlueGene) to perform the simulations. The TWQCD Collaboration in Taiwan is the first lattice QCD group around the world to use a GPU cluster (with 120 GPUs) to perform large-scale unquenched simulations of lattice QCD with the optimal domain-wall fermions, attaining 14 Teraflops (sustained) at a price of $200,000. This has significant impacts to the lattice QCD, as well as the physics of the strong interaction.

  Back
 
Keywords:
Clusters & GPU Management, High Performance Computing, GTC 2009 - ID S09461
Streaming:
Download:
 
Best Practices for Architecting and Managing High-Performance GPU Clusters
Dale Southard (NVIDIA)
An overview of designing, deploying, and managing GPU clusters for HPC. Learn to build and operate top500-class GPU computing resources that provide users with the latest CUDA features. ...Read More

An overview of designing, deploying, and managing GPU clusters for HPC. Learn to build and operate top500-class GPU computing resources that provide users with the latest CUDA features.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2012 - ID S2119
Streaming:
Download:
 
Tesla Cluster Monitoring & Management APIs
Robert Alexander (NVIDIA)
Learn more about cluster management and monitoring of Tesla and Quadro products. This includes a detailed description of the NVIDIA Management Library (NVML) and user facing third party software. Additionally, a brief summary of our out-of-band ...Read More

Learn more about cluster management and monitoring of Tesla and Quadro products. This includes a detailed description of the NVIDIA Management Library (NVML) and user facing third party software. Additionally, a brief summary of our out-of-band capabilities will be provided.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2012 - ID S2238
Streaming:
Download:
 
Dynamically Allocating GPGPU to Host Nodes (Servers)
Saeed Iqbal (Dell)
Learn how to remotely change the mapping of GPUs to hosts based on application needs. Audience will then be presented with example scripts and a demo illustrating how this can be implemented to improve system resource utilization. ...Read More

Learn how to remotely change the mapping of GPUs to hosts based on application needs. Audience will then be presented with example scripts and a demo illustrating how this can be implemented to improve system resource utilization.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2012 - ID S2309
Streaming:
Download:
 
Tesla Cluster Monitoring and Management APIs
Robert Alexander (NVIDIA)
Learn more about cluster management and monitoring of NVIDIA GPUs. This includes a detailed description of the NVIDIA Management Library (NVML) and user-facing third party software. Additionally, the nvidia-healthmon GPU health check tool will b ...Read More

Learn more about cluster management and monitoring of NVIDIA GPUs. This includes a detailed description of the NVIDIA Management Library (NVML) and user-facing third party software. Additionally, the nvidia-healthmon GPU health check tool will be covered.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing 2012 - ID SC135
 
Fine-Grained Cycle Sharing of Idle GPUs for Homology Search
Fumihiko Ino (Osaka University)
The goal of this session is to introduce a fine-grained cycle sharing (FGCS) system for homology search. A critical issue for GPU-enabled FGCS systems is to prevent significant system slowdown that occurs during simultaneous execution of multiple ker ...Read More
The goal of this session is to introduce a fine-grained cycle sharing (FGCS) system for homology search. A critical issue for GPU-enabled FGCS systems is to prevent significant system slowdown that occurs during simultaneous execution of multiple kernels. There have already been lots of screensaver-based cycle sharing systems, but they all exploit relatively long idle periods such as minutes. In this session, we show how short idle periods on the order of seconds can be exploited without dropping the frame rate significantly. We also show several experimental results obtained in our laboratory, where host systems are interactively operated ordinarily.  Back
 
Keywords:
Clusters & GPU Management, Bioinformatics & Genomics, GTC 2013 - ID P3119
Download:
 
Multi-level Parallelization of Computations using Clusters with GPUs
Pawel Czarnul (Gdansk University of Technology)
The poster presents an approach for multi-level parallelization of computations among clusters that are equipped with both multicore CPUs and modern GPUs. A multi-level and modular scheme is presented that allows the programmer to define an applicati ...Read More
The poster presents an approach for multi-level parallelization of computations among clusters that are equipped with both multicore CPUs and modern GPUs. A multi-level and modular scheme is presented that allows the programmer to define an application as a workflow using ready-to-use elements and constructs. The programmer just needs to code partitioners, mergers, computational kernels and provide input data. The application is then parallelized automatically among available CPUs and GPUs, possibly on various clusters. The results for a compute intensive application show promising scalability.  Back
 
Keywords:
Clusters & GPU Management, Developer - Programming Languages, GTC 2013 - ID P3140
Download:
 
Node-Level Runtime System to Support Multi-tenancy in Clusters with GPUs
Michela Becchi (University of Missouri)
GPUs are increasingly becoming part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling m ...Read More
GPUs are increasingly becoming part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. We propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic binding of applications to GPUs.   Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2013 - ID P3247
Download:
 
Using GPI for low Latency Datatransfer on a GPU-Cluster
Lena Oden (Fraunhofer ITWM)
The Global address space Programming Interface (GPI) is an Industry -quality programming interface for PGAS. As it has proven its superiority against MPI in many industrial and scientific applications in a CPU domain, we extend this concept to GPU do ...Read More
The Global address space Programming Interface (GPI) is an Industry -quality programming interface for PGAS. As it has proven its superiority against MPI in many industrial and scientific applications in a CPU domain, we extend this concept to GPU domains to provide a high scalability and low latency communication interface. GPI for GPUs supports the new GPU-Direct RDMA Technology, which allows direct data transfer between device memories. We improve the latency from 35us with MPI to 3.3us with GPI for GPUs. The bandwidth could be improved from 2.2GB/s with MPI to up to 2.9 GB/s with GPI.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2013 - ID P3249
Download:
 
Pedraforca: A First ARM + GPU Cluster for HPC
Alex Ramirez (Barcelona Supercomputing Center)
The HPC community is always on the lookout for increased performance and energy efficiency. Recently, this led to a growing interest in GPU computing and in clusters built from low-power energy efficient parts from the embedded and mobile market ...Read More

The HPC community is always on the lookout for increased performance and energy efficiency. Recently, this led to a growing interest in GPU computing and in clusters built from low-power energy efficient parts from the embedded and mobile markets. See a developed first proof of concept for a hybrid compute platform that brings together an ARM multicore CPU for energy efficiency, and a discrete GPU accelerator that provides the compute performance. This talk presents the architecture of the system, the system software stack, preliminary performance and power measurements, and concludes with guidelines for future ARM+GPU platforms.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2013 - ID S3064
Streaming:
Download:
 
Acceptance Testing a GPU Based Cluster
Craig Idler (Los Alamos National Laboratory), Phil Romero (Los Alamos National Laboratory), Laura Monroe (Los Alamos National Laboratory)
Hear supercomputer acceptance testers explain how to test your new cluster to obtain a highly performing, well balanced cluster by identifying weak nodes. We will describe our experiences in testing supercomputing clusters equipped with GPUs, ho ...Read More

Hear supercomputer acceptance testers explain how to test your new cluster to obtain a highly performing, well balanced cluster by identifying weak nodes. We will describe our experiences in testing supercomputing clusters equipped with GPUs, how they differ from CPU only clusters, finding tests that can discriminate performance levels and how to segregate weak performers. We will discuss the wide variety of tests utilized and identify tests most useful in determining/segregating weak performing nodes/components. Also discussed will be experiences in tuning the High Performance Linpack to obtain maximum performance.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2013 - ID S3248
Streaming:
Download:
 
Introduction to Deploying, Managing, and Using GPU Clusters
Dale Southard (NVIDIA)
Introduction to deploying, managing, and using GPU clusters. Talk will cover a combination of "lessons learned" and "new features" that are of interest to sites deploying GPU clusters for high-performance computing. ...Read More

Introduction to deploying, managing, and using GPU clusters. Talk will cover a combination of "lessons learned" and "new features" that are of interest to sites deploying GPU clusters for high-performance computing.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2013 - ID S3249
Streaming:
Download:
 
Building Your Own GPU Research Cluster Using Open Source Software Stack
Pradeep Kumar Gupta (NVIDIA)
An overview of designing, deploying, and managing small research prototype GPU clusters for HPC. This talk will focus on describing all building components for a cluster and complete software stack to run and manage it. The emphasis is to build ...Read More

An overview of designing, deploying, and managing small research prototype GPU clusters for HPC. This talk will focus on describing all building components for a cluster and complete software stack to run and manage it. The emphasis is to build a rsearch prototype GPU cluster using all open source Software and with minimal hardware. Learn to build and operate basic GPU computing resources that provide end users with the latest CUDA features.

  Back
 
Keywords:
Clusters & GPU Management, Developer - Tools & Libraries, GTC 2013 - ID S3516
Streaming:
Download:
 
Accelerate GPU Innovation with HP Gen8 Servers (Presented by HP)
Marc Hamilton (HP Enterprise Group), Dick Bland (Hewlett-Packard Co.), Jean-Luc Assor (Hewlett-Packard Co.)
Come to this session to learn about the latest innovations for GPU computing and visualization from HP. The new ProLiant Gen8 SL servers and workstation blades will be featured for solutions like accelerated HPC applications, cloud visualization ...Read More

Come to this session to learn about the latest innovations for GPU computing and visualization from HP. The new ProLiant Gen8 SL servers and workstation blades will be featured for solutions like accelerated HPC applications, cloud visualization and virtualized desktops. Real world customer use cases from the manufacturing/engineering and Oil&Gas segments will be highlighted. You will also learn everything you need to get started with a GPU clusters in a single, easy-to-use HP GPU cluster starter kit.

  Back
 
Keywords:
Clusters & GPU Management, Cloud Visualization, GTC 2013 - ID S3536
Streaming:
Download:
 
System Design of Kepler Based HPC Solutions (Presented by Dell Inc.)
Saeed Iqbal (Dell Inc.)
The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design bala ...Read More

The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design balanced systems and we will discuss different system-level considerations for various use cases. We will utilize HPL to analyze performance and power consumption at a system level as well as compare to the previous generation GPUs as applicable to highlight the improvements. The goal is to provide the audience information and best practices to design GPU enabled systems using the Kepler GPUs considering parameters such as power consumption, system size and system-level features.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2013 - ID S3556A
Streaming:
Download:
 
System Design of Kepler Based HPC Solutions (Presented by Dell Inc.)
Saeed Iqbal (Dell Inc.)
The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design bala ...Read More

The latest Kepler based GPUs are very powerful parallel processors. These GPUs are capable of providing a quantum leap in performance across the broad HPC application spectrum. However to fully, realize these gains it is important to design balanced systems and we will discuss different system-level considerations for various use cases. We will utilize HPL to analyze performance and power consumption at a system level as well as compare to the previous generation GPUs as applicable to highlight the improvements. The goal is to provide the audience information and best practices to design GPU enabled systems using the Kepler GPUs considering parameters such as power consumption, system size and system-level features.

  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2013 - ID S3556B
Streaming:
Download:
 
Optimizing GPU Utilization and Application Throughput in HPC Clusters (Presented by IBM)
Chris Porter (Platform Computing, an IBM Company)
Achieving real-world performance is about much more than just the raw-performance of underlying hardware. Much as a highly efficient power plant connected to a distribution network losing 70% of its power in transmission makes little sense, the ...Read More

Achieving real-world performance is about much more than just the raw-performance of underlying hardware. Much as a highly efficient power plant connected to a distribution network losing 70% of its power in transmission makes little sense, the same applies to HPC clusters as well â  efficiency matters. While many factors impact efficiency, this session focuses on the critical role of scheduling and workload management in getting the most out of your GPU cluster. By "working smarter", and enabling GPU clusters with dramatically higher utilization and throughput, not only can organizations achieve savings in infrastructure and management costs, they can boost productivity as well.

  Back
 
Keywords:
Clusters & GPU Management, GTC 2013 - ID S3578
Streaming:
Download:
 
CUDA in the Cloud: Enabling HPC Workloads in OpenStack
John Paul Walters (University of Southern California Information Sciences Institute)
Learn how to deploy heterogeneous, GPU-enabled private clouds through OpenStack. In this session we describe the latest HPC features for the OpenStack cloud computing platform. These features target the OpenStack Grizzly release, the successor t ...Read More

Learn how to deploy heterogeneous, GPU-enabled private clouds through OpenStack. In this session we describe the latest HPC features for the OpenStack cloud computing platform. These features target the OpenStack Grizzly release, the successor to OpenStack Folsom and include heterogeneity-aware scheduling, bare-metal provisioning for non-virtualizable architectures, and multi-hypervisor GPU/CUDA support based on LXC and Xen. A particular focus of this work is to enable high performance signal and image processing in the cloud. Performance results will be shown through a series of examples, demonstrating the impact of Xen vs. LXC on GPU performance for both regular and irregular computations. The session will conclude with a discussion of the next steps in HPC OpenStack development.

  Back
 
Keywords:
Clusters & GPU Management, Cloud Computing, Desktop & Application Virtualization, Signal & Audio Processing, GTC 2013 - ID S3214
Streaming:
Download:
 
Best Practices for Deploying and Managing GPU Clusters
Dale Southard (NVIDIA)
 
Keywords:
Clusters & GPU Management, GTC Webinars 2012 - ID GTCE025
Streaming:
Download:
 
Bright Cluster Manager: A CUDA-ready Management Solution for GPU-based HPC
Ian Lumb (Bright Computing)
Bright Cluster Manager delivers a comprehensive and integrated CUDA-ready solution for those who seek to make optimal use of their GPU-based environments for HPC. Bright provisions, monitors and manages systems with NVIDIA GPUs within cluster-ma ...Read More

Bright Cluster Manager delivers a comprehensive and integrated CUDA-ready solution for those who seek to make optimal use of their GPU-based environments for HPC. Bright provisions, monitors and manages systems with NVIDIA GPUs within cluster-management hierarchies.

Join Ian Lumb, Bright Evangelist and learn how Bright:

  1. Supports CUDA 5.5 from the essentials to initial experiences with the Multi-Process Service (MPS)
  2. Completely automates the installation of CUDA drivers, tools and toolkits, including recompilation of the CUDA drivers at system boot time
  3. Exposes GPU-specific metrics for monitoring and rules-based actions plus health checks based on nvidia-healthmon
  4. Supports multiple CUDA configurations - use CUDA 5.5 plus legacy versions of the CUDA toolkit across Fermi and Kepler architecture GPUs
  5. Establishes GPU-based clusters in the cloud or extends on-premise clusters into the cloud to make use of GPU
  6. Keeps pace with CUDA innovations and makes them rapidly available as seamlessly applied updates
  Back
 
Keywords:
Clusters & GPU Management, GTC Webinars 2013 - ID GTCE063
Download:
 
GASPI/GPI2 for GPUS: A PGAS Framework for Efficient Communication in GPU Systems
Lena Oden (Fraunhofer ITWM)
GPI2 for GPUs is a PGAS framework for efficient communication in heterogeneous clusters. In this session you learn, how multi GPU programs can benefit from an RDMA based programming model. We will introduce the industry proven PGAS-communication libr ...Read More
GPI2 for GPUs is a PGAS framework for efficient communication in heterogeneous clusters. In this session you learn, how multi GPU programs can benefit from an RDMA based programming model. We will introduce the industry proven PGAS-communication library GPI2 and its support for GPUs. GPUDirect RDMA technology allows real one sided communication between multiple GPUs on different nodes. Therefore, an RDMA based programming model suits best for this technology. Due to the very low ommunication overhead of one sided operations, a latency for an inter-node data transfer of 3us can be reached. Still, GPI2 for GPUs is not only optimized for inter-node communication, but also intra-node communication is optimized by combining the different GPU-Direct technologies.   Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2014 - ID S4183
Streaming:
 
Tools and Tips For Managing a GPU Cluster
Adam DeConinck (NVIDIA)
Managing a multi-user heterogeneous HPC cluster can be challenging, but there are ways to make it easier. This session will cover the GPU-aware cluster software stack from the perspective of a system administrator, from driver installation through re ...Read More
Managing a multi-user heterogeneous HPC cluster can be challenging, but there are ways to make it easier. This session will cover the GPU-aware cluster software stack from the perspective of a system administrator, from driver installation through resource manager integration and centrally-managed development tools such as MPI libraries. This will include an overview NVIDIA's tools for GPU management and monitoring, a survey of third-party tools with GPU integration, and a number of "lessons learned" from managing HPC clusters inside NVIDIA.  Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID S4253
Streaming:
Download:
 
GPU-Accelerated Signal Processing in OpenStack
John Paul Walters (USC Information Sciences Institute)
Learn how to deploy both Fermi and Kepler-based GPUs in an OpenStack cloud. In this session we describe the latest HPC features for the OpenStack cloud computing platform, including Kepler and Fermi GPU support, high speed networking, bare metal pro ...Read More
Learn how to deploy both Fermi and Kepler-based GPUs in an OpenStack cloud. In this session we describe the latest HPC features for the OpenStack cloud computing platform, including Kepler and Fermi GPU support, high speed networking, bare metal provisioning, and heterogeneous scheduling. The features are based on OpenStack Grizzly and Havana, with upcoming support for OpenStack Icehouse. Using examples drawn from signal and image processing, we will characterize the performance and versatility of LXC and Xen GPU support for both regular and irregular computations. We'll also characterize the performance improvements due to support for high speed networking in the OpenStack cloud. The session will conclude with a discussion of the next steps in HPC OpenStack development.  Back
 
Keywords:
Clusters & GPU Management, Desktop & Application Virtualization, Signal & Audio Processing, Supercomputing & HPC, GTC 2014 - ID S4257
Streaming:
Download:
 
How to Efficiently Virtualize Local and Remote GPUs
Pavan Balaji (Argonne National Laboratory)
In this session, you will get familiar with vACC, a virtual accelerator/GPU library that virtualizes remote and local GPUs installed across a cluster of compute nodes. The main objective is to provide efficient virtualized access to GPUs from any hos ...Read More
In this session, you will get familiar with vACC, a virtual accelerator/GPU library that virtualizes remote and local GPUs installed across a cluster of compute nodes. The main objective is to provide efficient virtualized access to GPUs from any host in the system. GPU virtualization brings new opportunities for effective management of GPU resources by decoupling them from host applications. In addition to access to remote GPUs, the vACC framework offers power-aware physical/virtual accelerator mapping, fault tolerance with transparent migration, efficient integration with virtual machines in Cloud environments and support for both CUDA and OpenCL paradigms. vACC can enable GPU service providers to offer cost-effective, flexible and fault-tolerant access to GPUs in the Cloud. Such capabilities are crucial in facilitating the adoption of GPU-based services across academia and industry. During the session, we will demonstrate how using vACC can improve GPU access experience and maintenance cost in a local cluster or a Cloud.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2014 - ID S4321
Streaming:
 
Accurate Power and Energy Measurements on Kepler-Based Tesla GPUs
Martin Burtscher (Texas State University)
Learn how to correctly profile the power and energy consumption of your kernels using the built-in power sensor of K20 compute GPUs. The measurements do not directly follow the GPU activity but lag behind and are distorted. This can cause large inacc ...Read More
Learn how to correctly profile the power and energy consumption of your kernels using the built-in power sensor of K20 compute GPUs. The measurements do not directly follow the GPU activity but lag behind and are distorted. This can cause large inaccuracies, especially for short running kernels, when taking the power samples at face value. This session explains how to compute the true power and energy consumption and provides general guidelines on how to best profile the power draw of GPU kernels using NVIDIA's Management Library.  Back
 
Keywords:
Clusters & GPU Management, Developer - Performance Optimization, GTC 2014 - ID S4454
Streaming:
Download:
 
Design of a Virtualization Framework to Enable GPU Sharing in Cluster Environments
Kittisak Sajjapongse (University of Missouri)
We describe the design of a runtime component that enables the effective use of GPUs in cluster environments. In particular, our system allows:(1) Abstraction of GPUs from end-users; (2) Different GPU sharing and scheduling mechanisms; (3) Virtual me ...Read More
We describe the design of a runtime component that enables the effective use of GPUs in cluster environments. In particular, our system allows:(1) Abstraction of GPUs from end-users; (2) Different GPU sharing and scheduling mechanisms; (3) Virtual memory management; (4) Load balancing and dynamic recovery in case of GPU failure, upgrade and downgrade; (5) Integration with existing cluster-level schedulers and resource managers for CPU clusters.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2014 - ID S4473
Streaming:
Download:
 
Resources Affinity Can Impact Performance: How to Choose Right Affinity?
Matthieu Ospici (Bull)
In modern heterogeneous architectures for the HPC, several computing resources (CPU, accelerators) and I/O resources (InfiniBand cards, PCIe links, QPI links) should be used simultaneously to take the best of the hardware. This observation is even mo ...Read More
In modern heterogeneous architectures for the HPC, several computing resources (CPU, accelerators) and I/O resources (InfiniBand cards, PCIe links, QPI links) should be used simultaneously to take the best of the hardware. This observation is even more true with the rising of technologies such as GPU Direct RDMA, able to perform communications directly between GPUs and Infiniband links. In this context, resources affinity (i.e resources selection and processes placement) can have a strong impact on performance. Thus, the aim of the presentation is to, firstly, identify the main affinity issues that can occur in current heterogeneous architectures (i.e which CPU core to choose when a particular GPU is used? Which IB interface to chose when a GPU direct RDMA transfer is launched?). We will show visible impact on performance. Then, we propose solutions to handle these issues. We think that affinity selection should be managed globally at the cluster resource manager level (with SLURM in our work), and not by the HPC programmers.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing & HPC, GTC 2014 - ID S4491
Streaming:
Download:
 
OpenMPI with RDMA Support and CUDA
Rolf VandeVaart (NVIDIA)
Open MPI is an open source implementation of the Message Passing Interface (MPI) library used to support parallel applications. With GPUs being used more and more in large clusters, there has been work done to make CUDA and MPI work seamlessly togeth ...Read More
Open MPI is an open source implementation of the Message Passing Interface (MPI) library used to support parallel applications. With GPUs being used more and more in large clusters, there has been work done to make CUDA and MPI work seamlessly together. In this talk, we will cover new features added to the library to support sending and receiving of GPU buffers directly.   Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID S4589
Streaming:
Download:
 
Citrix 3D Engineering Cloud: A Practical Approach (Presented by IBM)
Bret Bailey (IBM)
In today's fast changing business environment, companies are looking for ways to deliver better designs faster and cheaper while creating high quality products across an ecosystem of partners. To succeed, a company must transform its design processe ...Read More
In today's fast changing business environment, companies are looking for ways to deliver better designs faster and cheaper while creating high quality products across an ecosystem of partners. To succeed, a company must transform its design processes by converting engineering silos into shared engineering clouds that improve collaboration, standardize processes and create a secure environment for sharing designs across operations and organizations including partners and suppliers. The 3D Engineering Cloud Solution is a high performance visual computing environment for organizations that have large 3D intensive graphics requirements and want to improve collaboration while protecting their assets and reducing costs. The 3D Engineering Cloud Solution is made possible due to a partnership between IBM, Citrix, and NVIDIA. This combination creates a unique 3D engineering environment in the Cloud.  Back
 
Keywords:
Clusters & GPU Management, Graphics Virtualization, Computer Aided Design, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4849
Streaming:
Download:
 
Slices: Provisioning Heterogeneous High-Performance Systems
Alexander Merritt (Georgia Institute of Technology)
We present a new abstraction for provisioning resources on high-performance heterogeneous GPGPU-based clusters: slices. Slices represent aggregated subsets of resources across a cluster for use by an application and target an environment where divers ...Read More
We present a new abstraction for provisioning resources on high-performance heterogeneous GPGPU-based clusters: slices. Slices represent aggregated subsets of resources across a cluster for use by an application and target an environment where diverse applications co-run on shared cluster resources. Our poster present studies examining application scalability and limitations, efficiency in mapping applications to slices using a novel GPGPU 'sensitivity' metric, and gains in multi-application throughput when mapping slices to underlying cluster resources, guided by application profiles. We evaluate behaviors of representative HPC codes: LAMMPS, NAS-LU and SHOC's S3D application kernel on clusters of 48 and 72 nodes.  Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID P4150
Download:
 
GPUSync: A Framework for Real-Time GPU Management
Glenn Elliott (University of North Carolina at Chapel Hill)
Traditional throughput-oriented GPGPU-based platforms are primarily designed to support a single GPGPU process at a time. This is problematic in deadline-oriented (real-time) systems when multiple processes compete for GPU resources. System-level ser ...Read More
Traditional throughput-oriented GPGPU-based platforms are primarily designed to support a single GPGPU process at a time. This is problematic in deadline-oriented (real-time) systems when multiple processes compete for GPU resources. System-level services are necessary to schedule competing work according to priority to ensure that deadlines are met. GPUSync is a framework for implementing such schedulers in multi-GPU, multicore, real-time systems. GPUSync enables GPUs to be shared among processes in safety-oriented applications, such as advanced driver assistance systems (ADAS) and autonomous vehicles, since timing constraints can be guaranteed to be met.  Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID P4286
Download:
 
Dynamic Intelligent Kernel Assignment in Heterogenous MultiGPU Systems
Joao Gazolla (Universidade Federal Fluminense)
The poster illustrates and present an initial concept of a research about Dynamic Intelligent Kernel Assignment in Heterogenous MultiGPU Systems, where given one application using the StarPU framework, our scheduler will select custom scheduling pol ...Read More
The poster illustrates and present an initial concept of a research about Dynamic Intelligent Kernel Assignment in Heterogenous MultiGPU Systems, where given one application using the StarPU framework, our scheduler will select custom scheduling policies and execute the kernels in an intelligent way, being responsible for the mapping of kernels to the correspondent devices in a seamless way, minimizing the execution time.  Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID P4200
Download:
 
Runtime Visualization of Application Progress and Monitoring of GPU-enabled Parallel Environment
Pawel Czarnul (Gdansk University of Technology)
The poster presents visualization tools for applications executed in a parallel environment that is a collection of clusters with multiple GPUs and CPUs. The system allows modeling applications as acyclic workflow graphs with customizable algorithms ...Read More
The poster presents visualization tools for applications executed in a parallel environment that is a collection of clusters with multiple GPUs and CPUs. The system allows modeling applications as acyclic workflow graphs with customizable algorithms for scheduling onto underlying network of clusters. The poster depicts visualization tools that provide three distinct views with runtime visualization of: 1. hardware infrastructure with clusters, nodes and computing devices as well as monitoring resources with presentation of computing loads, memory usage etc. 2. progress of execution of particular stages of the application workflow graph, 3. application state that can represent progress of numerical computations or physical phenomena graphically.   Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID P4250
Download:
 
Top Six Advantages of CUDA-Ready Clusters
Ian Lumb (Bright Computing)
CUDA-ready clusters enable developers to: Focus on coding, not maintaining infrastructure (drivers, configs) and toolchains (compilers, libraries) Routinely keep pace with innovation - from the latest in GPU hardware to the CUDA toolkit itself C ...Read More

CUDA-ready clusters enable developers to: Focus on coding, not maintaining infrastructure (drivers, configs) and toolchains (compilers, libraries) Routinely keep pace with innovation - from the latest in GPU hardware to the CUDA toolkit itself Cross-develop with confidence and ease - maintain, and shift between, highly customized CUDA development environments Exercise their preference in programming GPUs - choose CUDA or OpenCL or OpenACC and combine appropriately (with, for example, the Message Passing Interface, MPI) Exploit the convergence of HPC and Big Data Analytics - make simultaneous use HPC and Hadoop services in GPU applications   Make use of private and public clouds - create a CUDA-ready cluster in a cloud or extend an on-site CUDA infrastructure into a cloud In this webinar, participants will learn how Bright Cluster Manager provisions, monitors and manages CUDA-ready clusters for developer advantage. Case studies will be used to illustrate all six advantages for Bright developers. Specific attention will be given to: Cross-developing under CUDA 6.0 and CUDA 6.5 with Kepler-architecture GPUs (e.g., the NVIDIA Tesla K80 GPU accelerator) The challenges and opportunities for making use of private (using OpenStack) and public (using Amazon Web Services) clouds in GPU applications

  Back
 
Keywords:
Clusters & GPU Management, GTC Webinars 2015 - ID GTCE107
Streaming:
Download:
Collaborative & Large Resolution Displays
Presentation
Media
Multi-Display Systems: Pushing the State of the Art
Andrew Page (NVIDIA), Kenji Kato (NASA Ames, Dell Federal), Rajeev Surati, Ph.D. (Scalable Display Technologies), Doug Traill (NVIDIA)
Join a panel of NVIDIA experts and leading companies developing multi-display systems for an interactive discussion on the current trends in scaling the resolution of display walls. Panelists will share their insights on how they are pushing the ...Read More

Join a panel of NVIDIA experts and leading companies developing multi-display systems for an interactive discussion on the current trends in scaling the resolution of display walls. Panelists will share their insights on how they are pushing the state of the art with NVIDIA''s professional display technologies.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3052
Streaming:
Download:
 
See the Big Picture: Scalable Visualization Solutions for High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displa ...Read More

Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID SIG4113
Streaming:
Download:
 
Visualization Technology in Academic Domains
Howard Kaplan (University of South Florida)
This talk will focus on the creation and utilization of the University of South Florida''s ultra-high resolution, stereoscopic 3D visualization display wall. In this session we will describe how universities can benefit from low cost vis ...Read More

This talk will focus on the creation and utilization of the University of South Florida''s ultra-high resolution, stereoscopic 3D visualization display wall. In this session we will describe how universities can benefit from low cost visualization systems, hardware and software evaluation of displays, GPU technologies and use applications for academic settings. An inspection of current trends and future developments in GPU resources in the area of HPC and visualization in academics will be reviewed. We will also explore hardware and software technologies that allow flexible utilization for academic and research purposes.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Manufacturing Technical, GTC 2013 - ID S3068
Streaming:
Download:
 
Using Warp and Blend API in Distributed and Single Renderer's / Update on Warping Standards
Rajeev Surati (Scalable Display Technologies), Bei Wang (Walt Disney)
The NVIDIA Warp and Blend API has enabled a slew of cost effective scalable visualization systems. We will discuss two different applications: making a seamless edge blended desktop with Scalable Desktop, and making 3D (both projected, and stere ...Read More

The NVIDIA Warp and Blend API has enabled a slew of cost effective scalable visualization systems. We will discuss two different applications: making a seamless edge blended desktop with Scalable Desktop, and making 3D (both projected, and stereoscopic 3d) Virtual Reality and Simulation systems using multiple computers with Scalable Display Manager. We will give several real life examples including a 140 megapixel stereoscopic 3d cave, the SpaceX 16 megapixel control room display. Lastly Bei Wang of Disney will follow up with current progress on the VESA standard effort for Warping and Blending.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Architectural Mapping & Event Visualization, Combined Simulation & Real-Time Visualization, GTC 2013 - ID S3114
Streaming:
Download:
 
Ultra-high Resolution Displays and Managing Content
Andy Boud (ImmersaView), Alex Streit (ImmersaView)
This session takes an insightful look into new solutions for streaming high resolution and ultra-high resolution video over IP. How do you send, record and review ultra-high resolution data using GPU techniques? Do these software techniques offe ...Read More

This session takes an insightful look into new solutions for streaming high resolution and ultra-high resolution video over IP. How do you send, record and review ultra-high resolution data using GPU techniques? Do these software techniques offer a new approach to how we work with video? We share some of our experiences in this field.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, Scientific Visualization, GTC 2013 - ID S3161
Streaming:
Download:
 
High Performance Graphics for 4K and Ultra High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displa ...Read More

Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.

  Back
 
Keywords:
Collaborative & Large Resolution Displays, Large Scale Data Visualization & In-Situ Graphics, SIGGRAPH 2013 - ID SIG1307
Streaming:
Download:
 
See the Big Picture: Scalable Visualization Solutions for High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. A ...Read More
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.  Back
 
Keywords:
Collaborative & Large Resolution Displays, GTC 2014 - ID SIG4113
Streaming:
Download:
 
Mid-Tier VR: Cost Reducing the Cave by Embracing the GPU
Rajeev Surati (Scalable Display Technologies), Bei Yang (Walt Disney Imagineering)
We describe how to put together vr caves that used to cost 250k for a whole lot less using NVIDIA NVAPI, provide case studies, pictures, and diagrams of how to go about it. We believe that an substantial expansion in the VR market is occurring and th ...Read More
We describe how to put together vr caves that used to cost 250k for a whole lot less using NVIDIA NVAPI, provide case studies, pictures, and diagrams of how to go about it. We believe that an substantial expansion in the VR market is occurring and that these kinds of systems will become more commonplace and the market expands both by more effectively using the Quadro Cards in the System and use of the Warp and blend apis.   Back
 
Keywords:
Collaborative & Large Resolution Displays, Virtual Reality & Augmented Reality, Digital Product Design & Styling, GTC 2014 - ID S4452
Streaming:
Download:
 
Stereo3d Video Streaming for Remote Collaboration
Julien Berta (Mechdyne)
Learn how Mechdyne leverages video compression and streaming to create remote collaboration solutions. Connecting CAVEs, Powerwalls and other ultra-resolution displays to enable multi-site, multi-display sharing and decision making. We will explore m ...Read More
Learn how Mechdyne leverages video compression and streaming to create remote collaboration solutions. Connecting CAVEs, Powerwalls and other ultra-resolution displays to enable multi-site, multi-display sharing and decision making. We will explore multiple customer use-cases: immersive-to-immersive, desktop-to-immersive, immersive-to-desktop, monoscopic and stereoscopic.  Back
 
Keywords:
Collaborative & Large Resolution Displays, Virtual Reality & Augmented Reality, Remote Graphics & Cloud-Based Graphics, Video & Image Processing, GTC 2014 - ID S4631
Streaming:
Download:
Combined Simulation & Real-Time Visualization
Presentation
Media
An Interactive Visualization System for Lava Flows Cellular Automata Simulations using CUDA
Giuseppe Filippone (University of Calabria, Italy)
The poster describes the development of an extensible system for analysis and interactive visualization of lava flows simulations. The core of the system is a CUDA acceleration for Sciara fv-2 implementation, the latest release of the Sciara Cel ...Read More

The poster describes the development of an extensible system for analysis and interactive visualization of lava flows simulations. The core of the system is a CUDA acceleration for Sciara fv-2 implementation, the latest release of the Sciara Cellular Automata family. It resides in a remote multi-GPU node which provides a multilayered GPU implementation in order to compute single or multiple simultaneous simulations. Experiment results are interactively visualized in real-time by means of a 3D graphics engine implemented in C++ and VTK and integrated in Qt GUI.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Developer - Algorithms, GTC 2013 - ID P3114
Download:
 
Exact Soft-Body Collision Detection on the GPU
Christiaan Gribble (SURVICE Engineering)
We present a new method that accurately detects soft-body collisions, specifically the edge-edge collisions that most other methods would miss, interactively on modern GPUs. Our method guarantees that no pass-through will occur between objects b ...Read More

We present a new method that accurately detects soft-body collisions, specifically the edge-edge collisions that most other methods would miss, interactively on modern GPUs. Our method guarantees that no pass-through will occur between objects by using interpolation equations to represent motion between time steps, yielding nearly exact collision times and responses. GPU acceleration via CUDA allows this method to operate at interactive rates.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Computational Physics, GTC 2013 - ID P3203
Download:
 
Visual Simulation Laboratory
Christiaan Gribble (SURVICE Engineering)
We present the Visual Simulation Laboratory (VSL), an ongoing project aimed at bringing the power of GPU computing to a variety of DoD application domains. VSL is an open-source framework developed by the U.S. Army Research Laboratory and its co ...Read More

We present the Visual Simulation Laboratory (VSL), an ongoing project aimed at bringing the power of GPU computing to a variety of DoD application domains. VSL is an open-source framework developed by the U.S. Army Research Laboratory and its collaborators designed to transform legacy workflows into immersive, end-to-end physics-based simulation and analysis tools. GPU computing facilitates combined simulation and visualization, enabling analysts to interact with a visual representation of not just the results, but of the computational mechanisms as well. This poster highlights VSL and demonstrates the potential of GPU computing to transform a variety of applications across the DoD.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Real-Time Graphics, GTC 2013 - ID P3206
Download:
 
The Power of Real-time 3D Snow Simulations
Anne C Elster (Norwegian University of Science & Technology)
Learn how a simulation that combines nicely with graphics can be used as a visual test-bed for numerical algorithms, terrain interactions, road planning and more. This presentation includes the techniques and methods behind our 3D snow simulatio ...Read More

Learn how a simulation that combines nicely with graphics can be used as a visual test-bed for numerical algorithms, terrain interactions, road planning and more. This presentation includes the techniques and methods behind our 3D snow simulation that calculates how 4+ million particles are affected by the wind field and terrain in real-time by harnessing the compute power of modern GPUs . Our snow simulator is also being combined with ray tracing techniques for more realistic lighting and snow flake rendering as well as the A* search algorithm which is used to suggest how to map future roads to the terrain based on a set of criteria. We are also experimenting with adding SPH and other fluid techniques to simulate avalanches etc. Stereoscopic output is achieved by taking advantage of the features provided by NVIDIA''s Quadro card.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Computational Physics, GTC 2013 - ID S3060
Streaming:
Download:
 
Part 1 - Configuring, Programming and Debugging Applications for Multiple GPUs
Tom True (NVIDIA), Alina Alt (NVIDIA)
Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve ...Read More

Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve their performance and increase their useable data size by partitioning the processing and subsequent visualization among multiple GPUs. This tutorial explains the methodologies of how to program your application for a multi-GPU environment. Part 1 of this tutorial will cover GPU resources allocation and system configuration, including: What to expect when you add additional GPUs to your system; How to select, query and allocate all the necessary GPU resources; Provide a rudimentary introduction into the use of profiling and analysis tools. Throughout this tutorial, simple OpenGL and CUDA examples designed for a single GPU will be modified to efficiently work in a multi-GPU environment.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Graphics Performance Optimization, Media & Entertainment, GTC 2013 - ID S3070
Streaming:
Download:
 
Part 2 - Configuring, Programming and Debugging Compute-Graphics Applications for Multi-GPUs
Shalini Venkataraman (NVIDIA), Wil Braithwaite (NVIDIA)
Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve ...Read More

Workstation applications today demand a tightly coupled compute-graphics pipeline where the simulation and the graphics are done interactively and in parallel. The use of multiple GPUs provides an affordable way for such applications to improve their performance and increase their useable data size by partitioning the processing and subsequent visualization among mulitple GPUs. This tutorial explains the methodologies of how to program your application for a multi-GPU environment. Part 2 of this tutorial will cover programming methodologies, including: How to structure an application to optimize compute and graphics performance and manage synchronization; How to manage data transfers across the PCIE bus; Debugging and profiling; Programming considerations when scaling beyond two GPUs - multiple compute GPUs feeding to one or multiple graphics GPUs. Throughout this tutorial, simple OpenGL and CUDA examples designed for a single GPU will be modified to efficiently work in a multi-GPU environment.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Graphics Performance Optimization, Media & Entertainment, GTC 2013 - ID S3072
Streaming:
Download:
 
Proximity Computation on Heterogeneous Computing Systems
Duksu Kim (Korea Advanced Institute of Science and Technology)
This session will introduce a novel, optimization-based workload distribution algorithm that exploits heterogeneous systems to accelerate various proximity queries. To represent complicated performance relationships between computing resources a ...Read More

This session will introduce a novel, optimization-based workload distribution algorithm that exploits heterogeneous systems to accelerate various proximity queries. To represent complicated performance relationships between computing resources and different computations of proximity queries, we propose a simple model that measures the expected running time of these computations. Based on this model, we formulate an optimization problem that minimizes the largest time spent on computing resources, and propose a novel, iterative LP-based scheduling algorithm. We apply our method into various proximity queries used in five different applications that have different characteristics. Our method achieves an order of magnitude performance improvement by using four different GPUs and two hexa-core CPUs over using a hexa-core CPU only. In addition, we integrate our expected running time model with a work stealing method and achieve 16% performance improvement on average over the basicl work stealing method.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, GTC 2013 - ID S3166
Streaming:
Download:
 
Visual Computing Meets Computer Vision: Augmented Reality & the GPU
Trak Lord (Metaio)
The expectations we have of our mobile platforms continue to grow. For many of us, the tablet and smartphone have already become the platform of choice while we use laptops only when necessary. But we have only just started to explore the real c ...Read More

The expectations we have of our mobile platforms continue to grow. For many of us, the tablet and smartphone have already become the platform of choice while we use laptops only when necessary. But we have only just started to explore the real capabilities of having ubiquitous computing at our fingertips. The combination of low-power sensor technology and high-performance computing in mobile devices will enable new ways of accessing and interacting with digital information. Trak Lord of Metaio will discuss some of the new developments in platform technology and how they can be utilized to provide the better user experiences demanded by the next generation of mobile devices, and mobile device users. Metaio has been providing augmented reality software and solutions for nearly a decade. Metaio's development tools and strong developer-centric ecosystem have made it the leading augmented reality technology company offering open and agile platforms to the ever-expanding and evolving AR market.

  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Real-Time Graphics, SIGGRAPH 2013 - ID SIG1321
Streaming:
Download:
 
zSpace: An Integrated Immersive Stereoscopic 3D System
Doug Twilleager (zSpace)
This talk will provide the f