SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Acoustics & Audio Processing
Presentation
Media
Interactive 3D Audio Rendering Systems
Nicolas Tsingos
Learn how to leverage GPUs for interactive audio rendering. This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illus ...Read More

Learn how to leverage GPUs for interactive audio rendering. This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illustrate the benefits of GPU-accelerated audio rendering with results from 3D audio processing and sound scattering simulations. Finally, we will discuss best practices for GPU implementations as well as future opportunities for audio rendering on massively parallel architectures.

  Back
 
Keywords:
Acoustics & Audio Processing, Rendering & Ray Tracing, Signal & Audio Processing, GTC 2010 - ID 2042
Streaming:
Download:
 
Implementing CUDA Audio Networks
Giancarlo Del Sordo
Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices. Covers the ...Read More

Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices. Covers the use of inter-process communication to make a plug-in implementation loadable in 32 bit hosts installed in 64 bit systems, distributing the GPU load on remote servers, and creating a CUDA network for high-end purposes such as a big recording facility.

  Back
 
Keywords:
Acoustics & Audio Processing, Signal & Audio Processing, GTC 2010 - ID S102076
Streaming:
Download:
 
Real-time Multichannel Audio Convolution
Jose Antonio Belloch (PhD Student)
Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU. We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks. ...Read More

Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU. We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks.

  Back
 
Keywords:
Acoustics & Audio Processing, Signal & Audio Processing, GTC 2010 - ID S102116
Streaming:
Download:
 
Exploring Recognition Network Representations for Efficient Speech Inference on the GPU
Jike Chong
We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference e ...Read More

We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference engine using the simpler LLM representation evaluates 22x more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4x faster evaluation and 53-65x faster operands gathering for each state transition. We illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel GPUs.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2010 - ID P10C01
Download:
 
Efficient Automatic Speech Recognition on the GPU
Jike Chong
Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be explo ...Read More

Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be exploited on the GPU. However, the state-of-art ASR algorithm involves a highly parallel graph traversal on an irregular graph with millions of states and arcs, making efficient parallel implementations highly challenging. We present four generalizable techniques including: dynamic data-gather buffer, find-unique, lock-free data structures using atomics, and hybrid global/local task queues. When used together, these techniques can effectively resolve ASR implementation challenges on an NVIDIA GPU.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2010 - ID P10C02
Download:
 
HYDRA - A Hybrid CPU/GPU Speech Recognition Engine for Real-Time LVCSR
Jungsuk Kim (Carnegie Mellon Silicon Valley)
HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing ...Read More

HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic and language models due to the limited memory. To overcome this limitation, we have developed a novel architecture for speech recognition decoding that jointly leverages manycore graphic processing units (GPU) and multicore processors (CPU) to perform speech recognition even when large acoustic and language models are applied. The proposed architecture can perform speech recognition at up to 5x faster than real-time with a recognition vocabulary of more than 1 Million words.

  Back
 
Keywords:
Acoustics & Audio Processing, GTC 2013 - ID S3406
Streaming:
Download:
Advanced Driver Assistance Systems (ADAS)
Presentation
Media
Real-time Traffic Sign Recognition on Mobile Processors
Victor Eruhimov (Itseez, Inc.)
There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new ...Read More

There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new computer vision standard OpenVX. In addition, Itseez traffic sign detection will be showcased. The algorithm is capable of detecting speed limit signs for both North America and EMEA regions as well as several other signs, delivering faster than real-time performance on an embedded platform with a mobile grade GPU.

  Back
 
Keywords:
Advanced Driver Assistance Systems (ADAS), Automotive, Computer Vision, GTC 2013 - ID S3548
Streaming:
Download:
 
Virtualization of Tegra in Automotive Applications: Integration of Head-Unit and Instrument Cluster
Stefaan Sonck Thiebaut (OpenSynergy)
This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot ...Read More

This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot-times. In addition, to reduce costs, more functionality needs to be integrated on a single processor. An example of this is the integration of the head-unit and the instrument cluster as two displays of a single device. As a solution to these requirements, we describe a software architecture that uses virtualization with a micro-kernel and that is already implemented and available on NVIDIA Tegra3. We will give a brief outlook on the next steps regarding the sharing of the GPU and hardware virtualization.

  Back
 
Keywords:
Advanced Driver Assistance Systems (ADAS), Automotive, In-Vehicle Infotainment (IVI) & Safety, Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3577
Streaming:
Download:
Aerospace & Defense
Presentation
Media
XMP: An NVIDIA CUDA?-Accelerated Big Integer Library
Justin Luitjens (NVIDIA)
We''ll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hellma ...Read More
We''ll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hellman key exchange. We''ll focus on what the capabilities of the library are along with how to efficiently use the library.  Back
 
Keywords:
Aerospace & Defense, Tools & Libraries, GTC 2016 - ID S6151
Streaming:
Download:
 
Intelligent Mobile System for Improving Spatial Design Support and Security Inside Buildings
Janusz Bedkowski (Institute of Mathematical Machines)
This talk concerns the intelligent mobile application for spatial design support and security domain. Mobility has two aspects in our research: The first one is the usage of mobile robots for 3D mapping of urban areas and for performing some specific ...Read More
This talk concerns the intelligent mobile application for spatial design support and security domain. Mobility has two aspects in our research: The first one is the usage of mobile robots for 3D mapping of urban areas and for performing some specific tasks. The second is related to a novel software as a service system that allows access to robotic functionalities and data over the Ethernet. Thus, we demonstrate the use of the novel NVIDIA GRID technology, which virtualizes the GPU. We introduce Complex Shape Histogram, a core component of our artificial intelligence engine, used for classifying 3D point clouds with Support Vector Machine. We use NVIDIA CUDA for accelerating computations.  Back
 
Keywords:
Aerospace & Defense, Data Center & Cloud Computing, Robotics & Autonomous Machines, GTC 2016 - ID S6233
Streaming:
Download:
 
Big Geospatial Data + Deep Learning + High Performance Computing = Geospatial Intelligence
Bingcai Zhang (BAE Systems)
We present two algorithms that are specifically designed to accurately detect geospatial objects in geospatial images. Combining these two algorithms with deep learning algorithms, we have achieved detection accuracy over 99% for vehicles, positional ...Read More
We present two algorithms that are specifically designed to accurately detect geospatial objects in geospatial images. Combining these two algorithms with deep learning algorithms, we have achieved detection accuracy over 99% for vehicles, positional accuracy of within 6 pixels, orientation accuracy of less than 10 degrees, and false positive error rate of 0.001% with 7.5cm GSD aerial images. In essence, our algorithms induce learning capability from deep learning into template image matching in geospatial intelligence. Our algorithms reduce false positive error rate by an order of magnitude over softmax classifier. With over 99% accuracy, we believe this may be the game changer in geospatial intelligence domain.  Back
 
Keywords:
Aerospace & Defense, Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6260
Streaming:
Download:
 
GPU-Accelerated Graph Query for Cyber Applications
Jim Carbonaro (Blazegraph)
Cyberspace is a critical domain for government and commercial organizations. It is about networks, devices, and how they interact. Graphs model nodes and links and how they are connected. Defending the critical networks in cyberspace requires process ...Read More
Cyberspace is a critical domain for government and commercial organizations. It is about networks, devices, and how they interact. Graphs model nodes and links and how they are connected. Defending the critical networks in cyberspace requires processing and analyzing extremely large quantities of graph data in near-real time. Key cyber analytics and data sets ranging from Topological Vulnerability Analysis, Traffic Flow Analysis, and Network Attack Graphs are graphs. This session will discuss how Blazegraph GPU meets this challenge by delivering near-real time performance at a very large data scales, uses a flexible and updatable graph representation to support complex analytics, and supports existing graph frameworks (RDF, Tinkerpop) and query languages (SPARQL).  Back
 
Keywords:
Aerospace & Defense, Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6337
Streaming:
Download:
 
Deep Convolutional Neural Networks for Spoken Dialect Classification of Spectrogram Images Using DIGITS
Nigel Cannings (Intelligent Voice Limited)
Deep convolution neural networks are designed for classification tasks involving static images. We''ll outline the novel application of using such networks for speech processing tasks such as the identification of a speaker''s dialect. Representing s ...Read More
Deep convolution neural networks are designed for classification tasks involving static images. We''ll outline the novel application of using such networks for speech processing tasks such as the identification of a speaker''s dialect. Representing speech as spectrogram images, we''ll show our recent results from the NIST language recognition competition, and discuss how the network training results can be improved by manipulation of the spectrogram images in a way appropriate in the context of speech applications.  Back
 
Keywords:
Aerospace & Defense, Deep Learning & Artificial Intelligence, Signal & Audio Processing, GTC 2016 - ID S6371
Streaming:
Download:
 
Real-Time Non-Rigid Image Registration Engine
Randall Miles (Propulsion Science and Technology)
Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We''ll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU ...Read More
Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We''ll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU technology. Time improvements of ~80% were seen through implementing a succession of CUDA optimizations guided by the Nsight profiler results. Tests were conducted using available simulated rocket plume images to calculate run times and create performance measures.  Back
 
Keywords:
Aerospace & Defense, Performance Optimization, Video & Image Processing, GTC 2016 - ID S6397
Streaming:
Download:
 
Missile Defense Radar through Real-Time Electromagnetic Simulation Injection
Ted Selig (FishEye Software, Inc.)
Radars, electromagnetic sensors encoded transmit signals, focus beams, extract targets from noise, and perceive targets and environments. These real-time systems are expensive and risky to build and operate because they are complex, real-time, and di ...Read More
Radars, electromagnetic sensors encoded transmit signals, focus beams, extract targets from noise, and perceive targets and environments. These real-time systems are expensive and risky to build and operate because they are complex, real-time, and difficult to test. The evolution of the GPU has the potential to disrupt this sensor industry by dramatically reducing the cost of radars, accelerate innovation, and reduce sensor maintenance. The presentation will discuss processing techniques and data flow architecture required by these sensors. The discussion explores how GPU adoption can reduce the development costs and risks of sensor development for missile defense but also enable low-cost applications like the self-driving car, weather sensing, and air traffic management.  Back
 
Keywords:
Aerospace & Defense, Embedded, Signal & Audio Processing, GTC 2016 - ID S6434
Streaming:
Download:
 
HD GP-GPU Systems for HPC Applications
Sergio Tafur (Naval Research Laboratory), Christopher Kung (Engility)
We''ll be presenting how we fielded a High Density (HD) GP-GPU system, currently 227 on the Top 500, evaluated its performance, and overcame challenges that arose during testing phases. In addition, we will touch on using Python to code for and " ...Read More
We''ll be presenting how we fielded a High Density (HD) GP-GPU system, currently 227 on the Top 500, evaluated its performance, and overcame challenges that arose during testing phases. In addition, we will touch on using Python to code for and "glue" CPUs and GP-GPUs together in such HD GP-GPU systems.  Back
 
Keywords:
Aerospace & Defense, Algorithms, Supercomputing & HPC, GTC 2016 - ID S6641
Streaming:
Download:
 
Image Registration for Real-Time Database Extraction
Randall Miles (Propulsion Science and Technology)
We present GPU-enabled, real-time, high-fidelity image interpolation software implementing a non-rigid image registration method, i.e., morphing. Morphing is a mathematical method that modifies input images to create a smoothly evolving image set, wi ...Read More
We present GPU-enabled, real-time, high-fidelity image interpolation software implementing a non-rigid image registration method, i.e., morphing. Morphing is a mathematical method that modifies input images to create a smoothly evolving image set, with minimal image degradation. Morphing eliminates jitter of extracted database images and can also decrease the database size. Tests using simulated thermal images (128x256 pixels) of high-speed jet flow show image extraction speeds of over 500Hz (~80X over serial code). Application to HWIL and scene simulation can provide accurate target inputs with a much smaller database footprint.  Back
 
Keywords:
Aerospace & Defense, Video & Image Processing, GTC 2016 - ID P6186
Download:
 
Agile Condor: Scalable High Performance Embedded Computing Architecture
Mark Barnell (Air Force Research Laboratory), Christopher Capraro (SRC)
The Air Force Research Laboratory Information Directorate Advanced Computing and Communications Division is developing a new computing architecture using GPUs, designed to provide high-performance embedded computing (HPEC) pod solution to meet operat ...Read More
The Air Force Research Laboratory Information Directorate Advanced Computing and Communications Division is developing a new computing architecture using GPUs, designed to provide high-performance embedded computing (HPEC) pod solution to meet operational and tactical real-time processing intelligence surveillance and reconnaissance (ISR) missions. This newly designed system, Agile Condor, is a scalable and HPEC system and based on open industry standards that will increase, far beyond the current state of the art, computational capability within the restrictive size, weight, and power constraints of unmanned aircraft systems'' external "pod" payloads.  Back
 
Keywords:
Aerospace & Defense, Embedded, GTC 2016 - ID P6292
Download:
Algorithms
Presentation
Media
Parallel Low Rank LU and Cholesky Refactorization
Lung-Sheng Chien (NVIDIA)
Attendees can learn how to use a low-rank update in linear solver during a nonlinear process--for example, linear programming, structural mechanics, and circuit simulation. A GPU-friendly version is proposed, which is mainly based on BLAS2 operations ...Read More
Attendees can learn how to use a low-rank update in linear solver during a nonlinear process--for example, linear programming, structural mechanics, and circuit simulation. A GPU-friendly version is proposed, which is mainly based on BLAS2 operations. Compared to traditional approaches, with BLAS2 operations, we can hide instruction latency well and achieve full bandwidth of a many-core processor. In this talk, we describe the basic idea of low-rank update and show up to 5x speedup from complexity analysis.  Back
 
Keywords:
Algorithms, Computer-Aided Engineering, GTC 2016 - ID S6129
Streaming:
Download:
 
Optimizing Instruction-Bound Kernels in Dissipative Particle Dynamics
Yu-Hang Tang (Division of Applied Mathematics, Brown University)
In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logist ...Read More
In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logistic map. This RNG can take advantage of the higher FP-to-integer instruction throughput ratio of CUDA GPUs to generate a large number of high quality random streams in situ. Second, warp-votes and shared memory were used to consolidate workload from diverging warps. Last, inline PTX was used to emulate 24-bit integer arithmetics by their floating point counterparts in order to increase throughput. An implementation using C++ templates ensures that no type-casting overhead is triggered and also guards the technique from unintentional usage.  Back
 
Keywords:
Algorithms, Computational Chemistry, Performance Optimization, GTC 2016 - ID S6140
Streaming:
Download:
 
Effective Evaluation of Betweenness Centrality on Multi-GPU Systems
Massimo Bernaschi (National Research Council of Italy)
Learn how to use (multi) GPU and CUDA to speed up the process of ranking the importance of each node in a large scale network. You will see how to solve an extraordinary challenge, that is the exact computation of Betweenness Centrality, by using as ...Read More
Learn how to use (multi) GPU and CUDA to speed up the process of ranking the importance of each node in a large scale network. You will see how to solve an extraordinary challenge, that is the exact computation of Betweenness Centrality, by using as building blocks relatively simple algorithms, like the Breadth First Search, that have been highly tuned for latest generation GPU cards. Our approach is fully scalable and overcomes the limitation on the size of the graph that can be studied on a single GPU. We''ll present results obtained on both synthetic and real-world graphs.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID S6157
Streaming:
Download:
 
Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures
Adam McLaughlin (Georgia Institute of Technology)
Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an executio ...Read More
Contemporary microprocessors use relaxed memory consistency models to allow for aggressive optimizations in hardware. This enhancement in performance comes at the cost of design complexity and verification effort. In particular, verifying an execution of a program against its system''s memory consistency model is an NP-complete problem. This session improves upon existing work by introducing an algorithm that not only reduces the time complexity of the verification process, but also facilitates the development of parallel algorithms for solving these problems. For large tests of interest, our GPU implementation achieves an average application speedup of 26x over existing techniques in use at NVIDIA.  Back
 
Keywords:
Algorithms, Big Data Analytics, Tools & Libraries, GTC 2016 - ID S6180
Streaming:
Download:
 
Not Just a Universal Crutch: Other Useful Things to Do with atomicCAS
Elmar Westphal (Forschungszentrum Julich GmbH)
There is more to atomicCAS than the double-precision atomicAdd loop from the programming guide. Something different from the universal atomic operation loop it represents. We''ll show how to build shared, memory-based hash function loops to solve dif ...Read More
There is more to atomicCAS than the double-precision atomicAdd loop from the programming guide. Something different from the universal atomic operation loop it represents. We''ll show how to build shared, memory-based hash function loops to solve different counting and grouping problems at warp- and block-level. Variations of this loop can be used to count unique elements in a block, find threads sharing common data elements, or speed up histogram building for large numbers of bins. With the now natively implemented atomic operations on shared memory on Maxwell, these functions can be significantly faster than algorithms optimised for other architectures.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID S6220
Streaming:
Download:
 
Hierarchical Computations on Manycore Architectures
Hatem Ltaief (Extreme Computing Research Center, KAUST)
Learn about a new hierarchical matrix structure for fast linear algebra computations on GPUs! Recursivity, tree traversal, hierarchical data layout, and batched kernel executions are some of the ingredients of a new HPC recipe for computing challengi ...Read More
Learn about a new hierarchical matrix structure for fast linear algebra computations on GPUs! Recursivity, tree traversal, hierarchical data layout, and batched kernel executions are some of the ingredients of a new HPC recipe for computing challenging linear algebra operations and solving large scientific problems (e.g., spatial statistics) on GPUs. By exploiting the low-rank matrix representations, the original dense matrix of the problem can be approximated, which results in saving the memory footprint and reducing the algorithmic complexity, while still maintaining an adequate solution accuracy. In addition, the talk showcases a new high-performance hierarchical symmetric eigensolver and SVD, juicing the horsepower out of multiple GPUs to the fullest.  Back
 
Keywords:
Algorithms, Performance Optimization, Tools & Libraries, GTC 2016 - ID S6230
Streaming:
Download:
 
GPU Accelerated Markov Decision Process in Crowd Simulation
Benjamin Hernandez (Oak Ridge National Laboratory), Sergio Ruiz (Technologico de Monterrey)
Markov decision processes have been used in real-world path planning, where environment information is incomplete or dynamic. The problem with the MDP formalism is that its state space grows exponentially with the number of domain variables, and its ...Read More
Markov decision processes have been used in real-world path planning, where environment information is incomplete or dynamic. The problem with the MDP formalism is that its state space grows exponentially with the number of domain variables, and its inference methods grow with the number of available actions. To overcome this issue, we formulate an MDP solver in terms of matrix multiplications, based on the value iteration algorithm; thus we can take advantage of GPUs to produce interactively obstacle-free paths in the form of an optimal policy. We''ll present a performance analysis of our technique using Jetson TK1, CPU, and GPU platforms. Our algorithm presents 90x speed-up in GPUs, and 30x speed-up in the Jetson TK1 in contrast with its CPU multi-threaded version.  Back
 
Keywords:
Algorithms, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6268
Streaming:
Download:
 
XMP Library Internals: Modular Multiplication on Kepler and Maxwell
Niall Emmart (University of Massachusetts)
We''ll present an overview of the internals of the XMP multiple precision library and take a detailed look at the low-level algorithms used for modular squaring and modular multiplication on Kepler and present novel algorithms for Maxwell. Modular mu ...Read More
We''ll present an overview of the internals of the XMP multiple precision library and take a detailed look at the low-level algorithms used for modular squaring and modular multiplication on Kepler and present novel algorithms for Maxwell. Modular multiplication is a performance-critical primitive and widely used in cryptographic algorithms from prime testing and factorization to public key/private key algorithms such as RSA, Diffie-Hellman, and digital signatures.  Back
 
Keywords:
Algorithms, Tools & Libraries, GTC 2016 - ID S6349
Streaming:
Download:
 
Simulating a Quantum Annealer with GPU-Based Monte Carlo Algorithms
James King (D-Wave Systems)
Learn how the world''s most powerful quantum computers are simulated and benchmarked using GPU-based Monte Carlo algorithms. We''ll introduce D-Wave''s quantum annealing platform, describe several Monte Carlo algorithms for their simulation, and comp ...Read More
Learn how the world''s most powerful quantum computers are simulated and benchmarked using GPU-based Monte Carlo algorithms. We''ll introduce D-Wave''s quantum annealing platform, describe several Monte Carlo algorithms for their simulation, and compare CPU- and GPU-based implementations of these algorithms. In particular, we''ll focus on considerations of memory layout and fast mathematical functions to maximize speed. Finally, we''ll present benchmarking results, including CPU-based algorithms, GPU-based algorithms, and D-Wave''s latest-generation quantum annealers.  Back
 
Keywords:
Algorithms, Computational Physics, Supercomputing & HPC, GTC 2016 - ID S6380
Streaming:
Download:
 
GPU Acceleration of Cholesky's Factorization in CHOLMOD: Batching, Hybrid and Multi-GPU
Steven Rennich (NVIDIA)
Sparse matrix factorization is a fundamental tool in scientific computing and has been shown to be well accelerated using GPUs. Yet applying the full capability of the GPU to the factorization operation remains a challenge. This talk covers the lates ...Read More
Sparse matrix factorization is a fundamental tool in scientific computing and has been shown to be well accelerated using GPUs. Yet applying the full capability of the GPU to the factorization operation remains a challenge. This talk covers the latest GPU optimizations that have been applied to the Cholesky factorization algorithm within the well-known SuiteSparse/CHOLMOD linear solver. These optimizations include new NVIDIA CUDA versions of BLAS and LAPACK routines to accelerate operations on batches of small, non-uniformly sized matrices, hybrid computing enhancements, support for multi-GPU acceleration, and further avoidance of PCIe communication through refinements to the sub-tree algorithm.  Back
 
Keywords:
Algorithms, Performance Optimization, Tools & Libraries, GTC 2016 - ID S6387
Streaming:
Download:
 
Fast Detection of Neighboring Vectors
Krzysztof Kaczmarski (Warsaw University of Technology, Faculty of Mathematics and Information Science)
We''ll present several methods for detecting pairs of vectors, which are in Hamming distance 1. This problem is an important part of the cell graph construction in motion planning in a space with obstacles. We''ll begin with a naive square-time solut ...Read More
We''ll present several methods for detecting pairs of vectors, which are in Hamming distance 1. This problem is an important part of the cell graph construction in motion planning in a space with obstacles. We''ll begin with a naive square-time solution, which simply compares pairs of vectors, through building dedicated search trees, moving towards an optimal linear algorithm. Sequential linear time algorithms for the problem were already known, but due to high constants hidden in the complexity function, they appeared to be not very efficient for real-life data. Our GPU-based massively parallel solution promises acceptable execution times, opening dynamic cell graph construction for real-time applications like robotics and optimal path searching.  Back
 
Keywords:
Algorithms, Tools & Libraries, Robotics & Autonomous Machines, GTC 2016 - ID S6402
Streaming:
 
Accelerating Approximate Weighted Matching on GPUs
Antonino Tumeo (Pacific Northwest National Laboratory)
Matching is a fundamental graph problem with numerous applications in science and engineering. This talk discusses the efficient implementation of half-approximate weighted matching on GPUs. We start by describing the Suitor algorithm, currently cons ...Read More
Matching is a fundamental graph problem with numerous applications in science and engineering. This talk discusses the efficient implementation of half-approximate weighted matching on GPUs. We start by describing the Suitor algorithm, currently considered the best algorithm for this problem, and identifying by its key implementation challenges. In its basic formulation, the Suitor algorithm appears poorly suited to GPUs, due to the irregular memory accesses and the use of locks. We proceed by introducing four variants of the algorithm that progressively address these challenges by exploiting Kepler''s hardware features. We demonstrate that the final implementation outperforms by several times the performance of previous best matching algorithms for GPUs and of the Suitor algorithm on CPUs.  Back
 
Keywords:
Algorithms, Big Data Analytics, Aerospace & Defense, GTC 2016 - ID S6423
Streaming:
Download:
 
Exploring Scalable Implementations of Triangle Enumeration in Graphs of Diverse Densities: Apache Spark vs. GPUs
Michela Taufer (University of Delaware), Travis Johnston (University of Delaware)
We''ll present graphs as powerful tools when analyzing complex relationships between entities. We''ll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as gra ...Read More
We''ll present graphs as powerful tools when analyzing complex relationships between entities. We''ll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as graphs. Since many of the real graphs are very large and complex, the associated analysis algorithms must be very efficient and highly parallel. We present two implementations of a key graph-based analysis such as the triangle enumeration for two different parallel paradigms: GPU programming and Apache Spark. We''ll reveal the performance of the two different implementations for the different paradigms as the characteristics of the graph change.  Back
 
Keywords:
Algorithms, Tools & Libraries, Big Data Analytics, GTC 2016 - ID S6424
Streaming:
Download:
 
GPU-Oriented Sparse Multifrontal QR Method
Wissam Sid-Lakhdar (Texas A&M University)
We''ll present the sparse direct method, a multifrontal QR factorization intended specifically for GPU accelerators. Our approach relies on the use of a bucket scheduler that exploits an irregular parallelism on both a coarse grain, among a set of fr ...Read More
We''ll present the sparse direct method, a multifrontal QR factorization intended specifically for GPU accelerators. Our approach relies on the use of a bucket scheduler that exploits an irregular parallelism on both a coarse grain, among a set of fronts with different characteristics, and on a fine grain, through the exploitation of the staircase shape of these fronts. The scheduler then relies on dense GPU kernels which design and implementation target recent GPU architectures.  Back
 
Keywords:
Algorithms, Performance Optimization, Tools & Libraries, GTC 2016 - ID S6439
Streaming:
 
Quotient Filters: Approximate Membership Queries on the GPU
Afton Geil (UC Davis)
Most GPU data structures must be rebuilt (often on the CPU) any time they are modified. We''ll examine the challenges of building and maintaining mutable data structures on the GPU, and will present our solution for one particular data structure: the ...Read More
Most GPU data structures must be rebuilt (often on the CPU) any time they are modified. We''ll examine the challenges of building and maintaining mutable data structures on the GPU, and will present our solution for one particular data structure: the quotient filter. A quotient filter is used for performing fast database queries, similar to a Bloom filter. We describe our search for an efficient parallelization of construction, insertion, and query operations on the quotient filter data structure. We show that this data structure can outperform a Bloom filter for database lookups and insertions, while also providing much greater flexibility.  Back
 
Keywords:
Algorithms, Big Data Analytics, GTC 2016 - ID S6464
Streaming:
Download:
 
Testing Chordal Graphs with CUDA?
Agnieszka Lupinska (Jagiellonian University)
We''ll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two no ...Read More
We''ll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two non-adjacent vertices on the cycle. In total, the algorithm takes O(N) time on N-threads grid and it performs O(N+M) work for graphs of N vertices and M edges. We''ll compare the performance tests results achieved by the CUDA implementation on NVIDIA GeForce GTX TITAN X and the sequential implementation on CPU with four cores (eight threads). We''ll present the tests results for cliques, sparse graphs, dense graphs, and random chordal graphs.  Back
 
Keywords:
Algorithms, Big Data Analytics, GTC 2016 - ID S6489
Streaming:
Download:
 
High-Performance Batched Computations for GPUs: Approaches and Applications
Stanimire Tomov (UTK), Azzam Haidar (UTK)
Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learn ...Read More
Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learning, data mining, dense and sparse solvers, high-order FEM, astrophysics, and more. We will consider the development of batched computations for these applications, stressing innovative GPU techniques and algorithms for uniform, as well as variable-size batches, tensor contractions, batched BLAS, and more. Batched computations can fill up the GPU with work, remove scheduling overheads and costly CPU-GPU communications to accelerate the computation often by an order of magnitude compared to non-batched approaches.  Back
 
Keywords:
Algorithms, Tools & Libraries, Performance Optimization, GTC 2016 - ID S6509
Streaming:
Download:
 
GPU Optimization of the Kripke Neutral-Particle Transport Mini-App
David Appelhans (IBM)
For Sierra, a pre-exascale CORAL supercomputer arriving at Lawrence Livermore National Lab in 2017, neutral-particle transport codes will be a primary application and ensuring peak performance of these applications on this system (multiple IBM POWER9 ...Read More
For Sierra, a pre-exascale CORAL supercomputer arriving at Lawrence Livermore National Lab in 2017, neutral-particle transport codes will be a primary application and ensuring peak performance of these applications on this system (multiple IBM POWER9 CPUs + multiple Volta GPUs per node) is important. In preparation, transport mini-apps, like Kripke, are being optimized on today''s hybrid CPU-GPU clusters using different programming models. This talk discusses performance issues encountered by Kripke on these systems and their solutions. Specifically we will focus on: a) a novel implementation of the sweep algorithm; b) techniques useful for modeling physical problems requiring memory footprint exceeding the aggregated GPU memory; and c) porting Kripke using OpenMP4.  Back
 
Keywords:
Algorithms, Computational Physics, Supercomputing & HPC, GTC 2016 - ID S6513
Streaming:
Download:
 
GPU Multisplit
Saman Ashkiani (University of California, Davis)
Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, pr ...Read More
Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often use a sort instead. However, sort does more work than necessary to implement multisplit, and is thus inefficient. In this work, we provide a parallel model and multiple implementations for the multisplit problem, with a focus on a small number of buckets. In our implementations, we exploit the computational hierarchy of the GPU to perform most of the work locally, with minimal usage of global operations. We use warp-synchronous programming models as well as hierarchical reordering of input elements to achieve better performance.  Back
 
Keywords:
Algorithms, GTC 2016 - ID S6517
Streaming:
Download:
 
Portable Performance for Monte Carlo Simulations of Photon Migration in 3D Turbid Media for Single and Multiple GPUs
Leiming Yu (Northeastern University), Fanny Nina Paravecino (Northeastern University)
We present a parallel Monte Carlo (MCX) algorithm accelerated by GPUs for modeling time-resolved photon migration in a 3-D turbid media. We''ll present optimizations that benefit execution on a single GPU as well as multiple GPUs. By leveraging persi ...Read More
We present a parallel Monte Carlo (MCX) algorithm accelerated by GPUs for modeling time-resolved photon migration in a 3-D turbid media. We''ll present optimizations that benefit execution on a single GPU as well as multiple GPUs. By leveraging persistent threads, our single-GPU implementation provides a high-performance parallel simulation of MCX when run on an NVIDIA GPU. Our implementation is automatically tuned to leverage persistent threads for different GPU architectures. We achieved improvements over 25% for Kepler and 12% for Maxwell architecture as compared to using a heuristic approach. In addition, we propose a linear programming approach based on predictive modeling to optimize MCX execution on multiple devices.  Back
 
Keywords:
Algorithms, Performance Optimization, Rendering & Ray Tracing, GTC 2016 - ID S6635
Streaming:
Download:
 
Training Recurrent Neural Networks in FP16
Erich Elsen (Baidu USA, Inc.)
Reducing training time allows us to learn from our experiments more quickly and make new innovations based on what we''ve learned. Using less than the standard 32 bits to represent a number can help reduce training times. We''ll talk about how to use ...Read More
Reducing training time allows us to learn from our experiments more quickly and make new innovations based on what we''ve learned. Using less than the standard 32 bits to represent a number can help reduce training times. We''ll talk about how to use 16-bit floating point because it is starting to have wide hardware support with the release of Pascal. Unfortunately, naively converting all datatypes from 32- to 16-bits doesn''t work, as training stability and accuracy are comprised. We''ll discuss the reasons for the difficulties and solutions. Finally, we''ll show performance and scalability improvements due to using reduced precision.  Back
 
Keywords:
Algorithms, Deep Learning & Artificial Intelligence, Performance Optimization, GTC 2016 - ID S6661
Streaming:
Download:
 
Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
Guy Steele (Oracle Labs)
We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model for machine learning, that avoids computing a complete table of partial sums of the relative probabili ...Read More
We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model for machine learning, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete partial sums are computed on the fly during a binary search. Measurements using an NVIDIA TITAN Black GPU show that for a sufficiently large number of clusters or topics (K > 200), this technique alone more than doubles the speed of a latent Dirichlet allocation (LDA) application already highly tuned for GPU execution.  Back
 
Keywords:
Algorithms, Performance Optimization, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6665
Streaming:
Download:
 
Fast Splittable Pseudorandom Number Generators
Guy Steele (Oracle Labs)
We describe two new classes of algorithm for a "splittable" pseudorandom number generator (PRNG) that is quite fast: either 9 or 11 64-bit arithmetic/logical operations per 64 bits generated. A splittable PRNG provides a "split" o ...Read More
We describe two new classes of algorithm for a "splittable" pseudorandom number generator (PRNG) that is quite fast: either 9 or 11 64-bit arithmetic/logical operations per 64 bits generated. A splittable PRNG provides a "split" operation that creates a new PRNG that is computationally and statistically independent of its creator and therefore may be used in parallel. Splittable PRNG objects make it easy to organize the use of pseudorandom numbers in multithreaded programs where the number of threads may vary dynamically, but also have sufficient speed and quality to be useful when the number of threads is fixed. It is faster than MRG32k3a and of higher quality than XORWOW. No locking or synchronization is required, and the algorithm is quite suitable for SIMD or GPU implementation.  Back
 
Keywords:
Algorithms, Tools & Libraries, Performance Optimization, GTC 2016 - ID S6666
Streaming:
Download:
 
GPU Accelerated Streaming Algorithms for Halo Finders
Nikita Ivkin (Johns Hopkins University)
In this work we show the connection between two problems: halo-finding and heavy hitters. Finding haloes, dense clumps of matter, in output of cosmological simulation is crucial for verifying theoretical models using observation. Current algorithms r ...Read More
In this work we show the connection between two problems: halo-finding and heavy hitters. Finding haloes, dense clumps of matter, in output of cosmological simulation is crucial for verifying theoretical models using observation. Current algorithms require to load full dataset into memory, making computations infeasible on the desktop machine. We reduce halo-finding problem to problem of finding most frequent items (heavy hitters) in streaming data, and apply two algorithms: Pick-and-Drop and Count Sketch. These algorithms can find top 1000 largest haloes with logarithmical memory usage, but time performance is poor. GPU acceleration makes it possible to make several passes in reasonable time, thus helping to find more haloes in the future.  Back
 
Keywords:
Algorithms, Astronomy & Astrophysics, GTC 2016 - ID S6671
Streaming:
 
ABCD Algorithm for Tridiagonal Solver
Erh-Chung Chen (National Tsing Hua University)
We study and implement the Augmented Block Cimmino Distributed (ABCD) algorithm on the GPU. Because of the special structure of tridiagonal matrices, we investigate the boundary padding technique to eliminate the execution branches on GPU for better ...Read More
We study and implement the Augmented Block Cimmino Distributed (ABCD) algorithm on the GPU. Because of the special structure of tridiagonal matrices, we investigate the boundary padding technique to eliminate the execution branches on GPU for better performance. In addition, our implementation incorporates various performance optimization techniques, such as memory coalesce, to further enhance the performance.  Back
 
Keywords:
Algorithms, Supercomputing & HPC, GTC 2016 - ID P6120
Download:
 
Non-Local Lattice Encoding for Bit-Vectorized Cellular Automata GPU Implementations
Jeffrey Kelling (Helmholtz-Zentrum Dresden-Rossendorf)
In many areas, from physics to economics to social sciences, there are problems that can be mapped to stochastic cellular automata (SCA). In combination with machine learning techniques, cellular automata with learned rules can be used to efficiently ...Read More
In many areas, from physics to economics to social sciences, there are problems that can be mapped to stochastic cellular automata (SCA). In combination with machine learning techniques, cellular automata with learned rules can be used to efficiently predict real-world systems. In physics, they are used to study atomistically the size and shape evolution of micro- and nanostructures, providing insights into processes of self-organization crucial to today''s nanotechnology. We present an extremely efficient SCA implementation of a surface growth model using bit-vectorization enhanced by non-local encoding on GPU. The employed technique and non-local encoding can be transferred to other applications.  Back
 
Keywords:
Algorithms, Computational Physics, GTC 2016 - ID P6124
Download:
 
Fully Parallelized Lossless LZW Decompression for CUDA® Enabled GPUs
Koji Nakano (Hiroshima University)
LZW is a popular lossless compression method used in UNIX file compression utility "compress" and in the GIF/TIFF image formats. However, it is very hard to parallelize it, because it creates a dictionary sequentially by reading the input d ...Read More
LZW is a popular lossless compression method used in UNIX file compression utility "compress" and in the GIF/TIFF image formats. However, it is very hard to parallelize it, because it creates a dictionary sequentially by reading the input data one by one. The main contribution of this work is to show a fully parallelized LZW decompression, which assigns each thread to an input compressed code, and converts it into the corresponding original input string. We have implemented our fully parallelized LZW decompression using CUDA. The experimental results show that our CUDA implementation on GeForce GTX 980 can attain 40 times speedup over a sequential implementation on Intel Core i7-4790. We also show that our LZW decompression is useful for big data and deep learning applications.  Back
 
Keywords:
Algorithms, Video & Image Processing, GTC 2016 - ID P6128
Download:
 
Fast Sparse Matrix Vector Multiplication with Highly-Compressed Sparse Format
Yusuke Nagasaka (Tokyo Institute of Technology)
We show the acceleration of sparse matrix vector multiplication (SpMV) on GPU by highly reducing memory traffic. SpMV is a dominant kernel in many sparse algorithms. The performance of SpMV is limited by memory bandwidth and lower locality of memory ...Read More
We show the acceleration of sparse matrix vector multiplication (SpMV) on GPU by highly reducing memory traffic. SpMV is a dominant kernel in many sparse algorithms. The performance of SpMV is limited by memory bandwidth and lower locality of memory access to input vector causing performance degradation. We propose a new sparse matrix format, which alleviates these problems about memory bound by adaptive multi-level blocking techniques and compressing the index of the given matrix. Performance evaluations of SpMV for 40 matrix datasets show that we achieve speedups of 2.91X on maximum and 1.81X on average compared to NVIDIA''s cuSparse library. We also find out the memory traffic in SpMV can be estimated and the performance of SpMV strongly depends on the memory traffic.  Back
 
Keywords:
Algorithms, Supercomputing & HPC, GTC 2016 - ID P6132
Download:
 
High Performance Hierarchical Matrix-Vector Multiplication using Hardware Accelerators
Hatem Ltaief (KAUST)
We present a high performance hierarchical matrix vector multiplication using hardware accelerators. By properly mapping the tree structures to the GPU and overlapping the phases of the computation using streams, we greatly outperform the CPU impleme ...Read More
We present a high performance hierarchical matrix vector multiplication using hardware accelerators. By properly mapping the tree structures to the GPU and overlapping the phases of the computation using streams, we greatly outperform the CPU implementations and achieve up to 80% of the sustained bandwidth of the GPU.  Back
 
Keywords:
Algorithms, Supercomputing & HPC, GTC 2016 - ID P6140
Download:
 
GPU-Accelerated Isosurface Extraction
Marcin Adamski (Poznan Supercomputing and Networking Center), Michal Kierzynka (Poznan Supercomputing and Networking Center)
The algorithms for isosurface extraction from volumetric data have become crucial in the petroleum industry, medicine, and many other fields over the last years. They are computationally intensive, especially for large, high-resolution domains. Our G ...Read More
The algorithms for isosurface extraction from volumetric data have become crucial in the petroleum industry, medicine, and many other fields over the last years. They are computationally intensive, especially for large, high-resolution domains. Our GPU implementation of Marching Tetrahedra algorithm is not only immensely fast but allows us to split the domain across multiple GPUs. Processing of large domains is now a matter of seconds. For smaller domains, the algorithm is able to compute the isosurface in milliseconds and the resulting model is visualized in real time.  Back
 
Keywords:
Algorithms, Medical Imaging, GTC 2016 - ID P6141
Download:
 
Fourier Domain Pulsar Acceleration Searches on GPUs for the Square Kilometre Array.
Sofia Dimoudi (University of Oxford)
We describe the work done at the Oxford e-Research Centre (OeRC) at Oxford University toward accelerating one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world''s largest next generation radio t ...Read More
We describe the work done at the Oxford e-Research Centre (OeRC) at Oxford University toward accelerating one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world''s largest next generation radio telescope, the Square Kilometre Array (SKA). We introduce the problem of pulsar acceleration searches and a Fourier domain computational method used for detecting signals from accelerated pulsars. A GPU implementation and optimizations results are presented in the context of the SKA timing requirements. This work is done as part of Astro-Accelerate, a real-time time-domain data processing library, currently under development at the OeRC.  Back
 
Keywords:
Algorithms, Astronomy & Astrophysics, GTC 2016 - ID P6227
Download:
 
A Highly Parallel Implementation of the Faddeev-Leverrier Algorithm
Rahul Chandrashekhar (Trinity College, Hartford - CT)
We present an accelerated implementation of the Faddeev-Leverrier algorithm (FLA) to solve the Eigenvalue Problem. The problem, being recursive in nature, cannot be directly extended to a parallel implementation. Instead, a hybrid model is implemente ...Read More
We present an accelerated implementation of the Faddeev-Leverrier algorithm (FLA) to solve the Eigenvalue Problem. The problem, being recursive in nature, cannot be directly extended to a parallel implementation. Instead, a hybrid model is implemented to harness the combined computing power of the CPU and GPU more effectively.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID P6230
Download:
 
Fast Parallel Bulk Insertion in GPU MOLAP Databases
Steffen Wittmer (Jedox AG)
This work focuses on input processing of big data streams in a GPU-accelerated in-memory OLAP (MOLAP) database by Jedox. We present a solution that supports fast insertion of high data volumes by avoiding the compute-expensive task of multidimensiona ...Read More
This work focuses on input processing of big data streams in a GPU-accelerated in-memory OLAP (MOLAP) database by Jedox. We present a solution that supports fast insertion of high data volumes by avoiding the compute-expensive task of multidimensional sorting during the actual insertion phase. The main processing step achieves a significant speedup over existing CPU-only version.  Back
 
Keywords:
Algorithms, Big Data Analytics, GTC 2016 - ID P6256
Download:
 
cusFFT: A High-Performance Sparse Fast Fourier Transform Algorithm on GPUs
Cheng Wang (University of Houston)
The Fast Fourier Transform (FFT) is one of the most important numerical tools widely used in many scientific and engineering applications. The algorithm performs O(NlogN) operations on N input data points in order to calculate only small number of k ...Read More
The Fast Fourier Transform (FFT) is one of the most important numerical tools widely used in many scientific and engineering applications. The algorithm performs O(NlogN) operations on N input data points in order to calculate only small number of k large coefficients, while the rest of N-K numbers are zero or negligibly small. The algorithm is clearly inefficient, when N points input data lead to only k << N non-zero coefficients in the transformed domain. The sparse FFT (sFFT) algorithm provides a solution to this problem. In this poster, we present a parallel sFFT algorithm on GPUs using CUDA. Our CUDA-based sFFT, namely cusFFT, performs over 10x faster than the state-of-the-art cuFFT library on GPUs and over 28x faster than the parallel FFTW on multicore CPUs.  Back
 
Keywords:
Algorithms, GTC 2016 - ID P6261
Download:
 
Radar Signal Processing on GPU's and Performance Comparison with Vector Processors
Peter Joseph Basil Morris (Defense Research & Development Organisation)
We investigate the computing capabilities of GPUs for radar signal processing applications through the realization of a radar signal processor on a GPU, leveraging the inherent parallelization offered by the radar signal processing algorithms and the ...Read More
We investigate the computing capabilities of GPUs for radar signal processing applications through the realization of a radar signal processor on a GPU, leveraging the inherent parallelization offered by the radar signal processing algorithms and the extensive computing capability of the GPU.  Back
 
Keywords:
Algorithms, Signal & Audio Processing, GTC 2016 - ID P6264
Download:
 
One Kernel To Rule Them All. Performance-Portable FMM for CPUs and GPUs
Ivo Kabadshow (Juelich Supercomputing Centre)
We focus on a single code base for a certain scientific algorithm, a performance portable C++ implementation, using only a single code base that is easily executable in both CPU and GPU. For that purpose, we present our core algorithm -- the fast mul ...Read More
We focus on a single code base for a certain scientific algorithm, a performance portable C++ implementation, using only a single code base that is easily executable in both CPU and GPU. For that purpose, we present our core algorithm -- the fast multipole method -- embedded in a stack of abstraction layers, allowing us to achieve portability without maintaining separate kernels for each architecture. In addition, we''ll review common implementation pitfalls that might help other developers when aiming at a unified code base. Especially memory allocation, memory access, and the abstraction of SIMT for complex user-defined data structures are investigated. Finally, we present results/comparisons of the performance on a CPU and GPU.  Back
 
Keywords:
Algorithms, Supercomputing & HPC, GTC 2016 - ID P6265
Download:
 
A Parallel Floyd-Warshall Algorithm on GPU
Roussian Gaioso (Universidade Federal de Sao Carlos)
We propose a new parallel algorithm for solving the APSP problem. The algorithm is based on Floyd-Warshall and, therefore, borrows some of its advantages as having a predictable performance regardless of the underlying graph structure. It was efficie ...Read More
We propose a new parallel algorithm for solving the APSP problem. The algorithm is based on Floyd-Warshall and, therefore, borrows some of its advantages as having a predictable performance regardless of the underlying graph structure. It was efficiently implemented on a machine with a many-core GPU, which is less expensive than a cluster of computers. The tests were performed on a Tesla C2075 graphics card. The implementation was able to identify the shortest paths among all pairs of vertices of randomly generated graphs (all containing a maximum of 8192 vertices) in less than 15 seconds, which represents a speedup of 150x over the sequential Floyd-Warshall algorithm.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID P6272
Download:
 
Parallelization of Graph Algorithms on GPU Using CUDA®
Chetan Pise (Yeshwantrao Chavan College of Engineering Nagpur, India)
Graphs play a very important role in the field of science and technology for finding the shortest distance. Large graphs are common in scientific and engineering applications consisting of operation on millions of vertices and edges. For faster execu ...Read More
Graphs play a very important role in the field of science and technology for finding the shortest distance. Large graphs are common in scientific and engineering applications consisting of operation on millions of vertices and edges. For faster execution of such operations, parallel computation is essential. GPUs have high computation power and a low price. CUDA technology is becoming a new programming approach for GPGPUs. A multithreaded CUDA device makes various threads to run in parallel using GPUs. We demonstrate the comparison between serial and parallel implementation of BFS and DIJKSTRA algorithms.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID P6285
Download:
 
Evolutionary Methodology Framework for GPUs
Mihaly Retek (Corvinus University of Budapest)
In evolutionary methods, many processes of the same type can be processed in parallel. These processes are connected to different source and target datasets. For this reason, these methods are optimal for SIMD architectures. This poster shows an evol ...Read More
In evolutionary methods, many processes of the same type can be processed in parallel. These processes are connected to different source and target datasets. For this reason, these methods are optimal for SIMD architectures. This poster shows an evolutionary framework, in which evolutionary algorithms can be developed for GPUs and CPUs. The "Implemented Method" section of this poster is the foundation of this methodology and allows for the creation of more advanced forecasting.  Back
 
Keywords:
Algorithms, Deep Learning & Artificial Intelligence, GTC 2016 - ID P6286
Download:
 
A Rasterization Based Line Segment Intersection Algorithm for Urban Mobility Simulations
Benjamin Hernandez (Oak Ridge National Laboratory)
Road network data is an important component used to model city mobility. However, when using volunteered geographic information, such as OpenStreetMaps, road intersections usually are incomplete or invalid. A line segment intersection algorithm can ...Read More
Road network data is an important component used to model city mobility. However, when using volunteered geographic information, such as OpenStreetMaps, road intersections usually are incomplete or invalid. A line segment intersection algorithm can correct this issue. However, the naïve algorithm has O(N^2) complexity and one of the best solutions, the Bentley-Ottmann algorithm, O(NlogN). We propose GPGPU alternative solution that uses OpenGL 4 rasterization, per-pixel linked lists and almost zero driver overhead functions. Results show our method offers a speed up of 87x over these algorithms.  Back
 
Keywords:
Algorithms, Performance Optimization, GTC 2016 - ID P6345
 
GPU-Accelerated Molecular Dynamics Simulations for Systems with Lennard-Jones Type Potential
Jose Maria Zamora (Lufac Computacion S.A. de C.V.)
This work shows an implementation of a basic algorithm to study molecular systems with interactions of Lennard-Jones type potential. We present a parallelization strategy using CUDA to accelerate the computations in a GPU. After reviewing the results ...Read More
This work shows an implementation of a basic algorithm to study molecular systems with interactions of Lennard-Jones type potential. We present a parallelization strategy using CUDA to accelerate the computations in a GPU. After reviewing the results of simulations with a large number of particles (about 1 million) different states of equilibrium are observed dependent on the initial arrangement of particles. The cause of these trajectories is that the initial arrangements have different values of total energy due to pressure differences (which is a microscopic system variable) depending on their initial geometric configuration of particles in the simulation cubic box. These differences are accentuated when the number of particles is greater than 10^5.  Back
 
Keywords:
Algorithms, Computational Physics, GTC 2016 - ID P6351
 
GPU-Accelerated Batch-ACPF Solution for N-1 Static Security Analysis
Gan Zhou (Southeast University)
GPUs have been applied successfully in many scientific computing realms and have great potential in power system applications. The N-1 static security analysis (SSA) appears to be a candidate application in which massive alternating current power fl ...Read More
GPUs have been applied successfully in many scientific computing realms and have great potential in power system applications. The N-1 static security analysis (SSA) appears to be a candidate application in which massive alternating current power flow (ACPF) problems need to be solved. However, when applying existing GPU-accelerated algorithms to solve the N-1 SSA problem, the degree of parallelism is limited because existing research has been devoted to accelerating the solution of single ACPF. This paper proposes a GPU-accelerated solution that creates an additional layer of parallelism among batch ACPFs and consequently achieves a much higher level of parallelism. In comparison to its CPU counterpart on Xeon E5-2620, the GPU method and framework solves SSA on Tesla K20C achieves up to a 57.6X speedup.  Back
 
Keywords:
Algorithms, Other, GTC 2016 - ID P6109
Download:
 
CUDA Accelerated Cross Validated Best Subset Selection with XLSTAT
Arnaud Belletoile (Addinsoft)
Our implementation of a cross-validated best subset selection in linear regressions is presented. This algorithm is the latest GPU-enabled feature made available in our statistical solution XLSTAT. It is based on the binary tree regressions first pro ...Read More
Our implementation of a cross-validated best subset selection in linear regressions is presented. This algorithm is the latest GPU-enabled feature made available in our statistical solution XLSTAT. It is based on the binary tree regressions first proposed by Furnival & Wilson and is implemented through a QR factorization and subsequent updates of the R matrix using the cuSolver library. The last step of our model selection is done by a leave-one-out cross-validation test.  Back
 
Keywords:
Algorithms, Other, GTC 2016 - ID P6194
Download:
 
GPU Parallelization of a Distance Field Solver
Anup Shrestha (Boise State University)
Propagating interfaces occur in a wide variety of fields, including fluid mechanics and computer graphics. The distance field from an interface can be calculated by solving the Eikonal equation at each node using the Fast Sweeping Method (FSM) [Zhao, ...Read More
Propagating interfaces occur in a wide variety of fields, including fluid mechanics and computer graphics. The distance field from an interface can be calculated by solving the Eikonal equation at each node using the Fast Sweeping Method (FSM) [Zhao, 2004]. However, parallelization of FSM is not straightforward. We proposed a parallel algorithm using Cuthill-McKee ordering that is suitable for massively threaded architecture. Here, we implement and compare different parallel algorithms for FSM using CUDA, OpenACC, and MPI. The maximum performance is achieved using CUDA and the parallel algorithm of Detrixhe et al., whereas a comparable speedup was achieved using OpenACC with a few directives, substantially shortening the development cycle.  Back
 
Keywords:
Algorithms, Other, GTC 2016 - ID P6257
Download:
 
GPU-Accelerated Neighborhood Operators for Permutation-Based Problems
Victor Machado (Fluminense Federal University)
This poster presents an efficient GPU implementation of four neighborhood operators that are commonly applied in the local search of many metaheuristics for different permutation-based problems, such as the Traveling Salesman Problem and the Single ...Read More
This poster presents an efficient GPU implementation of four neighborhood operators that are commonly applied in the local search of many metaheuristics for different permutation-based problems, such as the Traveling Salesman Problem and the Single Row Facility Layout Problem. Although many optimization problems have been solved through GPU parallelization in the last few years, the authors are not aware of a thorough analysis of the neighborhood moves. Therefore, we perform an evaluation of the neighborhood operators rather than analyzing a specific metaheuristic. The parallel approach achieved good results when compared to the CPU version reaching speedups ranging from 14x to 68x faster.  Back
 
Keywords:
Algorithms, Other, GTC 2016 - ID P6273
Download:
Application Design & Porting Techniques
Presentation
Media
An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm
Martin Burtscher (Texas State University)
This session presents a complete CUDA implementation of the irregular Barnes-Hut n-body algorithm. This algorithm repeatedly builds and traverses unbalanced trees, making it difficult to map to GPUs. We explain in detail how our code exploits th ...Read More

This session presents a complete CUDA implementation of the irregular Barnes-Hut n-body algorithm. This algorithm repeatedly builds and traverses unbalanced trees, making it difficult to map to GPUs. We explain in detail how our code exploits the architectural features of GPUs, including lockstep operation and thread divergence, both of which are commonly viewed as hurdles to achieving high performance, especially for irregular codes. On a five million body simulation running on a Tesla C2050, our CUDA implementation is 30 times faster than a parallel pthreads version running on a high-end 6-core Xeon.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2111
Streaming:
Download:
 
GPU Task-Parallelism: Primitives and Applications
Stanley Tzeng (University of California), Anjul Patney (Davis)
We explore how a task-parallel model can be implemented on the GPU and address concerns and programming techniques for doing so. We discuss the primitives for building a task-parallel system on the GPU. This includes novel ideas for mapping task ...Read More

We explore how a task-parallel model can be implemented on the GPU and address concerns and programming techniques for doing so. We discuss the primitives for building a task-parallel system on the GPU. This includes novel ideas for mapping tasking systems onto the GPU including task granularity, load balancing, memory management, and dependency resolution. We also present several applications which demonstrate how a task-parallel model is more suitable than the regular data parallel model. These applications include a Reyes renderer, tiled deferred lighting renderer, and a video encoding demo.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2138
Streaming:
Download:
 
Large-Scale Reservoir Simulation on GPU
Song Yu (Chemical & Petroleum Department, University of Calgary)
Develop highly parallel GPU-based GMRES solver and several precondtioners, and couple them with the in-house reservoir simulator to speedup large-scale reservoir simulation with over one million grid blocks. For those preconditioners, we develop ...Read More

Develop highly parallel GPU-based GMRES solver and several precondtioners, and couple them with the in-house reservoir simulator to speedup large-scale reservoir simulation with over one million grid blocks. For those preconditioners, we develop the highly parallelized ILU(k), ILUT, and block ILU(k), block ILUT, with matrix partition by METIS on GPU. The excellent speedup and accurate results can demonstrate the great promising future of the GPU parallel device in parallel reservoir simulation.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2190
Streaming:
Download:
 
Levenberg-Marquardt Using Block Sparse Matrices on CUDA
Tetsuo Tawara (Koozyt, Inc.)
This session describes the experiences of constructing GPU based matrix-vector functions for block sparse matrices having multiple block sizes and a domain-specific numerical Jacobian generation function. The bundle adjustment algorithm is an op ...Read More

This session describes the experiences of constructing GPU based matrix-vector functions for block sparse matrices having multiple block sizes and a domain-specific numerical Jacobian generation function. The bundle adjustment algorithm is an optimization procedure which attempts to refine the relative camera pose, and 3D structure location variables, estimated from multiple sets of images. The Conjugate Gradient algorithm is used to solve the normal equations which appear in the inner loop to the non-linear least squares problem.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2231
Streaming:
Download:
 
LAtoolbox: A Multi-platform Sparse Linear Algebra Toolbox
Dimitar Lukarski (Karlsruhe Institute of Technology (KIT)), Jan-Philipp Weiss (Karlsruhe Institute of Technology)
Find out about an easy way for building sparse linear solvers for GPUs and multi-/many-core platforms. Based on data abstraction and virtualization of the hardware, the LAtoolbox supports several platforms such as GPUs, multi-core CPUs, and acce ...Read More

Find out about an easy way for building sparse linear solvers for GPUs and multi-/many-core platforms. Based on data abstraction and virtualization of the hardware, the LAtoolbox supports several platforms such as GPUs, multi-core CPUs, and accelerators. The various backends (CUDA, OpenCL, OpenMP, ...) utilize optimized and platform-specific routines and allow seamless integration of GPUs into scientific applications. By means of unified interfaces across all platforms the library enables you to build generic linear solvers and preconditioners on a single code base without specific information of your hardware. We demonstrate portability and flexibility of our open-source approach on heterogeneous platforms.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2291
Streaming:
Download:
 
Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation
Thomas Benson (Georgia Tech Research Institute)
This presentation describes our development of a GPU-accelerated backpropagation implementation for Synthetic Aperture Sonar systems that supports multiple nodes via MPI and multi-GPU nodes. This implementation can form a complex-valued gigapixe ...Read More

This presentation describes our development of a GPU-accelerated backpropagation implementation for Synthetic Aperture Sonar systems that supports multiple nodes via MPI and multi-GPU nodes. This implementation can form a complex-valued gigapixel image in one hour on a single C2050. We further scale this implementation to the Keeneland system where we can form the same gigapixel image in 21 seconds on 48 nodes with 144 C2070 Tesla GPUs. Our talk will discuss the details of our implementation, including our optimizations and scaling results for various node and GPU configurations, as well as the applicability to other domains, including Synthetic Aperture Radar.

  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID S2316
Streaming:
Download:
 
Debugging Floating Point Implementations on GPUs
Miriam Leeser (Northeastern University)
To debug GPU code it is important to understand differences between both CPU and GPU implementations. The differences arise due to floating point (FP) differences and casting from floating point to fixed point. FP differences arise due to the lack ...Read More
To debug GPU code it is important to understand differences between both CPU and GPU implementations. The differences arise due to floating point (FP) differences and casting from floating point to fixed point. FP differences arise due to the lack of associativity of FP, differences in instruction implementation, and choices made by the compiler. We analyzed medical image reconstruction code for breast reconstruction and showed that GPU and CPU code could be made to produce identical results. We also analyze the performance implications of choosing different implementation options on the GPU and CPU to make the codes match.   Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2179
Download:
 
KILO Transactional Memory for GPU
Wilson Wai Lun Fung (University of British Columbia)
GPUs are designed to efficiently execute of 1000s of concurrent threads on multiple SIMT cores to hide long latency operations. Currently, threads in different CUDA blocks can only communicate via global memory accesses, and programmers have to consi ...Read More
GPUs are designed to efficiently execute of 1000s of concurrent threads on multiple SIMT cores to hide long latency operations. Currently, threads in different CUDA blocks can only communicate via global memory accesses, and programmers have to consider data-races. Although fine-grained locks can be constructed using 32-/64-bit word atomic operations in recent GPUs, operations involving multiple locks can have deadlocks. We propose to solve these problems by extending GPUs to support transactional memory. Some of the major challenges are to support 1000s of concurrent transactions, to commit non-conflicting transactions in parallel, and to integrate with stack-based SIMT execution.  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2180
Download:
 
CUDA-Based GPU Computing Framework for GNU Octave
John Melonakos (AccelerEyes)
This poster presents the design of a CUDA-GPU based parallel processing framework for GNU Octave. Octave is a high-level interpreted language, primarily intended for numerical computations. GNU Octave being an open source alternative to Matlab, is wi ...Read More
This poster presents the design of a CUDA-GPU based parallel processing framework for GNU Octave. Octave is a high-level interpreted language, primarily intended for numerical computations. GNU Octave being an open source alternative to Matlab, is widely used in academic and research institutes. The GPU framework allows Octave users to accelerate their software written in Octave high-level M language on GPUs with minimal code modifications. To my knowledge, this is the first attempt to build a GPU framework for Octave, contrary to previous attempts to provide GPU variants for a set of Octave functions.  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2213
Download:
 
Dirk Pleiter (Juelich Supercomputing Centre)
The NVIDIA Application Lab at Julich, established by JSC and NVIDIA in June 2012, aims on enabling scientific applications for GPU-based architectures. Selected applications and their performance characteristics will be presented. Strategies for ...Read More

The NVIDIA Application Lab at Julich, established by JSC and NVIDIA in June 2012, aims on enabling scientific applications for GPU-based architectures. Selected applications and their performance characteristics will be presented. Strategies for multi-GPU parallelizations (necessary to meet computing demands) will be discussed.

  Back
 
Keywords:
Application Design & Porting Techniques, Supercomputing 2012 - ID SC2007
Download:
Architectural Mapping & Event Visualization
Presentation
Media
Real-time Lighting and Rendering for Architectural Visualization
Rodrigo Lopez (Neoscape), Matt Richardson (Neoscape)
When creating computer generated photorealistic imagery, there is a great deal of care taken to materiality and lighting. Having these aspects look as real as possible is essential to the overall quality and final look and feel of the image. In ...Read More

When creating computer generated photorealistic imagery, there is a great deal of care taken to materiality and lighting. Having these aspects look as real as possible is essential to the overall quality and final look and feel of the image. In the past, without the use of real time solutions, several iterations of test renders were needed to dial in the desired settings which was both time consuming for the artist as well as tying up valuable resources while rendering. With the use of an NVIDIA Maximus system along with real time render solutions, such as VRay RT and iray, this process has become greatly accelerated, giving the artist improved flexibility and more responsive interaction when fine-tuning these settings.

  Back
 
Keywords:
Architectural Mapping & Event Visualization, Manufacturing, GTC 2013 - ID S3551
Streaming:
Download:
Astronomy & Astrophysics
Presentation
Media
Gravitational N-body Simulations: How Massive Black Holes Interact with Stellar Systems
Alessandra Mastrobuono Battisti, Roberto Capuzzo-Dolcetta
- Sapienza Univ. of Roma
Astrophysics is a field where super computing is a must to obtain new scientific results. in particular, the study of the interaction among massive black holes and surrounding stars is a hot topic, ...Read More
Astrophysics is a field where super computing is a must to obtain new scientific results. in particular, the study of the interaction among massive black holes and surrounding stars is a hot topic, which requires heavy computations to have good representation of what happens in the inner regions of galaxies. We present the results obtained with our high precisioned N-body code, NBSymple, which exploits the joint power of a multi core CPU system together with the high performance NVIDIA Tesla C1060 GPUs. The code is available at the website: astrowww.phys.uniroma1.it/dolcetta/nbsymple.html  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2010 - ID S102000
Streaming:
Download:
 
GRASSY: Leveraging GPU Texture Units for Asteroseismic Data Analysis
Matt Sinclair
Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We ...Read More

Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We map these pre-computed tables to the GPU''s texture memory. Interpolation then becomes a texture lookup where the hardware automatically performs the interpolation, albeit at very low precision. Our mathematical framework reasons about the impact of this precision and our performance results show 500X speedups. This work generalizes the GPU texture units as computation engines and opens up new problems for GPU acceleration.

  Back
 
Keywords:
Astronomy & Astrophysics, High Performance Computing, GTC 2010 - ID S10044
Download:
 
CU-LSP: GPU-based Spectral Analysis of Unevenly Sampled Data
Richard Townsend
- University of Wisconsin-Madison
Standard FFT algorithms cannot be applied to spectral analysis of unevenly sampled data. Alternative approaches scale as O(N^2), making them an ideal target for harnessing the raw computing power of GPUs. ...Read More
Standard FFT algorithms cannot be applied to spectral analysis of unevenly sampled data. Alternative approaches scale as O(N^2), making them an ideal target for harnessing the raw computing power of GPUs. To this end, I have developed CU-LSP, a CUDA spectral analysis code based on the Lomb-Scargle periodogram. Preliminary benchmarking indicates impressive speed-ups, on the order of 400 relative to a single core of a modern CPU. An initial application of CU-LSP will be the analysis of time-series data from planet-search and asteroseismology satellites.   Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, Signal & Audio Processing, GTC 2010 - ID S102082
Streaming:
Download:
 
Cosmology Powered by GPUs Redux
Dominique Aubert
- Strasbourg University
Cosmological simulations aim at reproducing the physical processes which occur on the largest scales of the Universe since the Big-Bang by means of numerical calculations on supercomputers. ...Read More
Cosmological simulations aim at reproducing the physical processes which occur on the largest scales of the Universe since the Big-Bang by means of numerical calculations on supercomputers. Using CUDA, I have implemented standard cosmological techniques on GPU architecture (PM N-Body solver, Hydrodynamics & moment-based radiative transfer) and designed them to run on supercomputing facilities by means of MPI+CUDA mixed programming. These applications are able to run on 100 or more graphics devices with typical scalar x50 accelerations and with a communication overhead limited to 15%. It allow to explore physical regimes which were out of reach of current simulations.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2010 - ID S102099
Streaming:
Download:
 
Binary Black Holes Simulations using CUDA
Abdul Mroue
- CITA, Univ. Of Toronto
Get the latest information on how to evolve binary black holes simulations on GPUs. ...Read More
Get the latest information on how to evolve binary black holes simulations on GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, Physics Simulation, GTC 2010 - ID S102108
Streaming:
Download:
 
Using GPUs to Track Changes in the Sun
Mark Cheung
- Lockheed Martin Solar & Astrophysics Laboratory
Learn how GPU computing is enabling astrophysicists to study our closest star. NASA''s recently launched Solar Dynamics Observatory is continuously streaming full-disk images of the Sun at visible, UV and EUV wavelengths. ...Read More
Learn how GPU computing is enabling astrophysicists to study our closest star. NASA''s recently launched Solar Dynamics Observatory is continuously streaming full-disk images of the Sun at visible, UV and EUV wavelengths. This presentation will discuss ways that GPU computing is helping scientists cope with the analysis of the immense data volumes as well as in numerical modeling of the Sun.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Computer Vision & Machine Vision, Physics Simulation, GTC 2010 - ID S102178
Streaming:
Download:
 
Multiparticle Simulation
Alice Quillen
A diverse array of science, engineering, and computer graphics applications involve simulations of large numbers of particles. These involve computation of interactions between many particles, potentially mediated by a spatial data structure suc ...Read More

A diverse array of science, engineering, and computer graphics applications involve simulations of large numbers of particles. These involve computation of interactions between many particles, potentially mediated by a spatial data structure such as a grid. Improvements in computation efficiency can be achieved by sorting particles to determine which particles are involved in interactions or undergo close approaches. Nearest neighbor or collision pair groupings can be used to reduce the total number of computation steps by reducing the number of queries for collisions or can speed up and improve accuracy of simulations via a multiple timestep integrator. Identification of nearest neighbor and collision partner groupings is a task that can be efficiently implemented in parallel on the GPU reducing the number of interactions that must be computed. A broad class of problems known as Particle-In-Cell (PIC) code advect particles through cells of a surrounding grid. During this roundtable we will discuss strategies for increasing the efficiency of multiparticle simulations as a general problem as well as challenges for multiparticle simulation in specific settings such as astrophysics, SPH, PIC, and granular flows.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09056
Streaming:
Download:
 
Astrophysical Fluid Simulation Using Adaptive Meshes
Peng Wang (NVIDIA)
Adaptive mesh fluid simulations play a crucial role in many areas of astrophysical research including the formation and explosion of stars, jets from black holes, etc. A parallel adaptive mesh multi-physics fluid code, Enzo, has been widely used ...Read More

Adaptive mesh fluid simulations play a crucial role in many areas of astrophysical research including the formation and explosion of stars, jets from black holes, etc. A parallel adaptive mesh multi-physics fluid code, Enzo, has been widely used in astrophysical community in recent years. In this talk I will describe a CUDA implementation of the finite volume fluid solver used in Enzo. The GPU version shows significant speed-up compared to the CPU version.

  Back
 
Keywords:
Astronomy & Astrophysics, High Performance Computing, GTC 2009 - ID S09062
Streaming:
Download:
 
Diesel-Powered GPU Computing: Enabling a Real-Time Radio Telescope in the Australian Outback
Richard Edgar
The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. The raw data rate is 5 to 20 GiB/sec, precluding offline processing. Since the computing budget for ca ...Read More

The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. The raw data rate is 5 to 20 GiB/sec, precluding offline processing. Since the computing budget for calibration and imaging is 20 TFLOP/sec, a real-time high-performance computer is required on-site. We describe a scalable heterogeneous computing pipeline implementation, exploiting both the high computing density and FLOP-per-watt ratio of modern GPUs. The architecture is highly parallel within and across nodes, with all major processing elements performed by the GPUs. Necessary scatter-gather operations along the pipeline are loosely synchronized and implemented in MPI. Our initial port to NVIDIA hardware shows a typical 10x improvement over the reference CPU implementation, with some portions showing even more substantial gains. The MWA will be a frontier scientific instrument and a demonstrator for planned peta- and exascale facilities.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09065
Streaming:
Download:
 
Computational Fluid Dynamics (CFD) for the GPU
The field of computational fluid dynamics (CFD) has far-reaching applications and displays a consistent need for larger and faster simulations. At EM Photonics we have been studying this field and its computational needs for two years. We have i ...Read More

The field of computational fluid dynamics (CFD) has far-reaching applications and displays a consistent need for larger and faster simulations. At EM Photonics we have been studying this field and its computational needs for two years. We have identified the GPU as a strong performer in the CFD field and as such have implemented solvers that harness the power of GPUs in the application of CFD formulations. We will present some background on these innovations in this summary discussion.

  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Physics Simulation, GTC 2009 - ID S09074
Streaming:
Download:
 
Visualizing the Universe: Raycasting Astrophysical Simulation Data
Ralf Kaehler
We use GPU-assisted raycasting to render large, three-dimensional time-dependent astrophysical AMR data sets at interactive frame rates on standard desktop computers. Our approach allows us to embed unstructured point datasets, like stars or gal ...Read More

We use GPU-assisted raycasting to render large, three-dimensional time-dependent astrophysical AMR data sets at interactive frame rates on standard desktop computers. Our approach allows us to embed unstructured point datasets, like stars or galaxy splats, into the rendering of gaseous interstellar or intergalactic material. The approach supports a combined color-mapping of several input data fields and allows for a very flexible adaption to the special requirements of different types of simulations. Its interactivity makes it a useful tool for data analysis as well as for fast generation of high-quality animations from astrophysical datasets. We will show various resulting animations ranging from large scale structure formation in the early universe, to the evolution of the first stellar object and the cosmological reionization era. Finally, we will give an overview about lessons learned and opportunities for future work.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09112
Download:
 
Applications of Graphics Processing Units to the Binary Black Hole Evolutions
John Silberholz
We apply general-purpose computation on GPUs to obtain sizable speedups over a CPU in post-Newtonian evolutions of a binary black hole system. We discuss effective techniques for optimizing our GPU code on the CUDA architecture and present resul ...Read More

We apply general-purpose computation on GPUs to obtain sizable speedups over a CPU in post-Newtonian evolutions of a binary black hole system. We discuss effective techniques for optimizing our GPU code on the CUDA architecture and present results demonstrating the speedups obtained. We also describe an MPI-based approach for scaling a large number of binary black hole simulations over multiple GPUs. This approach will allow us to complete the largest scientific GPU calculation to date using the NCSA Lincoln cluster.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09402
Download:
 
Directing Experiments in the International Space Station With GPU-Assisted Image Analysis
Peter Lu
We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIAs CUDA. We use our code to analyze images of liquid-gas phase separation in a model colloid-p ...Read More

We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIAs CUDA. We use our code to analyze images of liquid-gas phase separation in a model colloid-polymer system, photographed in the absence of gravity aboard the International Space Station (ISS). Our GPU code is 4000 times faster than simple MATLAB code performing the same calculation on a central processing unit (CPU), 130 times faster than simple C code, and 30 times faster than optimized C++ code using single-instruction, multiple data (SIMD) extensions. The speed increases from these parallel algorithms enable us to analyze images downlinked from the ISS in a rapid fashion and send feedback to astronauts on orbit while the experiments are still being run.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09437
Download:
 
Binary Black Holes using GPUs
Frank Herrmann
We perform ensemble studies of binary black hole inspirals. The binary black hole problem is of great interest for the cosmological community (merger of galaxies with BHs at the center) as well as the gravitational wave community (where the merg ...Read More

We perform ensemble studies of binary black hole inspirals. The binary black hole problem is of great interest for the cosmological community (merger of galaxies with BHs at the center) as well as the gravitational wave community (where the merger of BHs is the most important signal source). The full binary black hole merger problem is computationally very demanding and even with advanced numerical techniques ensemble studies are currently not possible. Using a standard approximate solution to Einstein''s equation (the post-Newtonian equations) one can accurately model the inspiral until shortly before merger when the approximation techniques break down. Utilizing this approximation technique we study the 7-dimensional parameter space of the BH merger problem using a Monte-Carlo approach, which extends very naturally to GPUs.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09441
Download:
 
Numerical Cosmology Powered by GPUs
Dominique Aubert
By definition, cosmology cannot rely on lab experiments to reproduce the phenomenons observed in the sky and test its theories. For this very reason, the use of numerical simulations is widely spread within this community in order to understand ...Read More

By definition, cosmology cannot rely on lab experiments to reproduce the phenomenons observed in the sky and test its theories. For this very reason, the use of numerical simulations is widely spread within this community in order to understand the formation of the astrophysical objects and to put constrains on the physical ingredients that lead to the Universe as it is currently observed. Since 2001, I have been personnaly involved in trying to understand these questions through the intensive use of numerical simulations that reproduce the evolution of the Universe from the Big-Bang to our epoch. During the last two years, I have been investigating the new possibilities offered by GPUs to boost these numerical calculations, mostly using CUDA. At the current stage, three applications benefited from these studies and using 8800 GTX and Tesla C1060 devices, we found that accelerations range from factors 20 to 80 compared to CPU versions : - a cosmological N-Body integrator CUDAPM. It follows the evolution of millions of particles that interact through gravitation in an expanding Universe, modelling the rise of large scale structures. - a non-linear full multigrid solver, for the Poisson equation of modified Newtonian gravity (CUDAMOND). - a cosmological radiative transfer code CUDATON. It models the propagation of ionising radiation and its effect on the gas that filled the Early Universe. This application is multi-gpu and currently runs on 192 devices on the CCRT supercomputing centre. Most of the techniques used in these applications are fairly standard and are not specific to astrophysics and cosmology. Therefore describing my own experience of porting these applications to GPUs as a physicist is likely to benefit to a large public of numerical scientists.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09442
Download:
 
Black Holes in Galactic Nuclei Simulated with Large GPU Clusters in CAS
Rainer Spurzem
- National Astronomical Obersvatories, Chinese Academy of Sciences
Many, if not all galaxies harbour supermassive black holes. If galaxies merge, which is quite common in the process of hierarchical structure formation in the universe, ...Read More
Many, if not all galaxies harbour supermassive black holes. If galaxies merge, which is quite common in the process of hierarchical structure formation in the universe, their black holes sink to the centre of the merger remnant and form a tight binary. Depending on initial conditions and time supermassive black hole binaries are prominent gravitational wave sources, if they ultimately come close together and coalesce. We model such systems as gravitating N-body systems (stars) with two or more massive bodies (black holes), including if necessary relativistic corrections to the classical Newtonian gravitational forces (Kupi et al. 2006, Berentzen et al.2009).  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2010 - ID P10B01
Download:
 
GAMER: GPU-accelerated Adaptive-Mesh-Refinement Code for Astrophysics
His-Yu Schive
- Physics Dept., NTU
 
Keywords:
Astronomy & Astrophysics, GTC Taiwan 2011 - ID GTCT1105
Download:
 
Scalable Frameworks and Algorithms for Terascale Radio Astronomy Images
Christopher Fluke (Swinburne University of Technology - Centre for Astrophysics and Supercomputing)
Learn how the oldest science is using the newest processors to solve a critical problem: how to accomplish traditional image analysis and visualization tasks when the images are terabytes in size? Simple, standard operations such as displaying 2 ...Read More

Learn how the oldest science is using the newest processors to solve a critical problem: how to accomplish traditional image analysis and visualization tasks when the images are terabytes in size? Simple, standard operations such as displaying 2-d slices, evaluating image statistics, and applying histogram equalization become manifestly challenging when images dramatically exceed single-node memory capacity. We will explain how our hybrid CPU-GPU cluster framework - which can volume render a 200GB image at >50fps! - will support traditional radio astronomy tasks for the colossal images that the Square Kilometre Array and its precursor, the Australian SKA Pathfinder, will generate.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2022
Streaming:
Download:
 
GPU Acceleration of Dense Stellar Clusters Simulation
Bharath Pattabiraman (Northwestern University), Stefan Umbreit (Northwestern University)
Computing the interactions between stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. This paper presents the parallelization of a Monte Carlo algorithm for simulating stellar cluster evolutio ...Read More

Computing the interactions between stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. This paper presents the parallelization of a Monte Carlo algorithm for simulating stellar cluster evolution using programmable Graphics Processing Units. The kernels of this algorithm exhibit high levels of data dependent decision making and unavoidable non-contiguous memory accesses. However, we adopt various parallelization strategies and utilize the high computing power of the GPU to obtain substantial near-linear speedups which cannot be easily achieved on a CPU-based system. This acceleration allows to explore physical regimes which were out of reach of current simulations.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2087
Streaming:
Download:
 
Signal Processing on GPUs for Radio Telescopes
John Romein (ASTRON)
In this talk, we will present GPU implementations of four highly compute-intensive algorithms used by radio telescopes. ...Read More

In this talk, we will present GPU implementations of four highly compute-intensive algorithms used by radio telescopes.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2124
Streaming:
Download:
 
GPUs for Radio Imaging
Vamsi Krishna Veligatla (University Of Groningen)
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to pr ...Read More

With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process the large data-sets in a reasonable time. In this session we describe how we have used the computing power of GPU's to improve the performance of the standard radio imaging techniques as well as how this computational power is useful for creating a new generation of Radio Imaging Algorithms.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2187
Streaming:
Download:
 
Accelerating Radio Astronomy Cross-Correlation Beyond 1 Tflops Using Fermi
Michael Clark (NVIDIA)
Radio astronomy is a signal processing application that requires extreme supercomputing. While today's radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase to 1 Exaflops. The most compute ...Read More

Radio astronomy is a signal processing application that requires extreme supercomputing. While today's radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase to 1 Exaflops. The most compute intensive part of this problem is the so-called cross-correlation algorithm, which is a linear-algebra problem. In this session we demonstrate that the Fermi architecture is ideally suited to this problem, and through exploiting the Fermi memory hierarchy it is possible to achieve close to 80% of peak performance in a real application.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID S2347
Streaming:
Download:
 
Adaptive Beam-forming for Radio Astronomy on GPUS
Vamsi Krishna Veligatla (University Of Groningen)
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process ...Read More
With the advent of a new breed of Telescopes like the Low Frequency Array (LOFAR), which rely on software processing to process large data-sets that they generate, there is a need to improve the software to run as fast as possible in order to process the large data-sets in a reasonable time. In this session we describe how we have used the computing power of GPU's to improve the performance of the standard radio imaging techniques as well as how this computational power is useful for creating a new generation of Radio Imaging Algorithms.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2191
Download:
 
Accelerating Real-Time Processing of the ATST Adaptive Optics System
Vivek Venugopal (United Technologies Research Center)
The real-time processing of the four meter Advanced Technology Solar Telescope (ATST) adaptive optics (AO) system with approximately 1750 sub-apertures and 1900 actuators requires massive parallel processing to complete the task. The parallel process ...Read More
The real-time processing of the four meter Advanced Technology Solar Telescope (ATST) adaptive optics (AO) system with approximately 1750 sub-apertures and 1900 actuators requires massive parallel processing to complete the task. The parallel processing is harnessed with the addition of hardware accelerators such as Graphics Processing Unit (GPU). We investigate the hybrid data processing architecture of the Shack-Hartmann correlation and wavefront reconstruction using FPGAs and GPUs. The ATST AO algorithm is implemented, benchmarked on the FPGA-GPU system and compared with the existing legacy Digital Signal Processing (DSP) based hardware system.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2446
Download:
 
Cosmological Calculations on the GPU
Deborah Bard (SLAC National Accelerator Laboratory)
Cosmological measurements often involve the calculation of non-trivial quantities over increasingly large datasets. The next generation of survey telescopes will yield information for billions of galaxies. The scale of the datasets, and the type of c ...Read More
Cosmological measurements often involve the calculation of non-trivial quantities over increasingly large datasets. The next generation of survey telescopes will yield information for billions of galaxies. The scale of the datasets, and the type of calculations involved, are ideal models for use of the GPU. We present two cosmological measurements, and describe the implementation and improvements found with the GPU.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2509
Download:
 
Fast Cross-Matching of Astronomical Catalogs on GPUs
Matthias Lee (Johns Hopkins University)
We present a method of cross-matching objects of large astronomical catalogs, over 150 million objects, in under 4 minutes. We utilize up to 6 NVIDIA c2050 and have achieved an over 40x speedup versus conventional methods. ...Read More
We present a method of cross-matching objects of large astronomical catalogs, over 150 million objects, in under 4 minutes. We utilize up to 6 NVIDIA c2050 and have achieved an over 40x speedup versus conventional methods.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2012 - ID P2524
Download:
 
"Big Data" Astronomical Data Analysis and Visualization
Amr Hassan (Swinburne University of Technology)
I will present a high-performance; graphics processing unit (GPU)-based framework for the efficient analysis and visualization of ``big data'' astronomical data cubes. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: volume ...Read More

I will present a high-performance; graphics processing unit (GPU)-based framework for the efficient analysis and visualization of ``big data'' astronomical data cubes. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image: volume rendering at 10 fps; computation of basic statistics in 1.7 s; and evaluation of the median in 45s. The framework is one of the first solutions to the image analysis and visualization requirements of next-generation telescopes, including the forthcoming SKA pathfinder telescopes.

  Back
 
Keywords:
Astronomy & Astrophysics, Supercomputing 2012 - ID SB001
Download:
 
Parallel Simulation of the Galaxy with Dark Matter using GPUs/CPUs
Pawel Czarnul (Gdansk University of Technology)
The poster presents parallel simulation of the galaxy with dark matter. First, one of several models of dark matter distribution is assumed and based on the known laws, simulation of the galaxy proceeds in successive time steps. Computations have bee ...Read More
The poster presents parallel simulation of the galaxy with dark matter. First, one of several models of dark matter distribution is assumed and based on the known laws, simulation of the galaxy proceeds in successive time steps. Computations have been parallelized using both CPUs and GPUs and execution times are presented for particular devices for the aforementioned application. Furthermore, visualization of the simulation is provided which gives the view of the universe from a desired angle.  Back
 
Keywords:
Astronomy & Astrophysics, Scientific Visualization, GTC 2013 - ID P3141
Download:
 
Acceleration of a 3D WENO Scheme for Large-Scale Cosmological Simulations on GPU
Long Wang (Supercomputing Center, Computer Network Information Center, Chinese Academy of Sciences)
We present our implementation of a 3D 5th order finite-difference WENO scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution struc ...Read More
We present our implementation of a 3D 5th order finite-difference WENO scheme in double precision on CPU/GPU clusters, which targets on large-scale cosmological hydrodynamic flow simulations involving both shocks and complicated smooth solution structures. In the level of MPI parallelization, we subdivided the domain cubically. Then on each process, we ported the WENO computation to GPU. To avoid the memory limitation of GPUs, we performed a series of optimizations. Our tests on Fermi and Kepler GPU indicate that the GPU version achieve a 12~19 speedup and the computation part is about 19~36 times faster than the Serial Fortran code. At last, we discussed some future work.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, GTC 2013 - ID P3157
Download:
 
GPU-enabled Precision Measurements of the Structure of the Universe
Deborah Bard (SLAC National Accelerator Laboratory)
Future astronomical surveys will characterize tens of billions of galaxies. Calculating cosmological observables, such as correlation functions, over such vast datasets poses a significant computational challenge. Such calculations are ideally suited ...Read More
Future astronomical surveys will characterize tens of billions of galaxies. Calculating cosmological observables, such as correlation functions, over such vast datasets poses a significant computational challenge. Such calculations are ideally suited to parallelization. This poster describes the implementation of the full two-point correlation function on the GPU, and demonstrates the improvement in accuracy compared to current fast approximation methods. We take advantage of scaling capabilities of GPUS by showing how systematic errors can only be fully explored using the compute power of many GPUs.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2013 - ID P3164
Download:
 
Simulating Black Holes with CUDA
Adam Lewis (Canadian Institute for Theoretical Astrophysics (CITA))
This decade will see the first detections of gravitational waves: ripples in spacetime produced strongly by collisions of dense objects like black holes. It will then be possible to study such events through careful comparison of their gravitational ...Read More
This decade will see the first detections of gravitational waves: ripples in spacetime produced strongly by collisions of dense objects like black holes. It will then be possible to study such events through careful comparison of their gravitational radiation against predictions generated through simulations. These simulations are computationally very expensive, requiring 10,000's of FLOPS per grid point per time step. Using NVIDIA's CUDA framework, we have developed techniques to automatically port our black hole code to GPUs. We have also manually optimized certain key routines, which have sped up by 10-50 times in response.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID P3230
Download:
 
Black Holes and Star Clusters in Galactic Nuclei simulated with more than 100k GPU cores
Rainer Spurzem (National Astronomical Observatories, Chinese Academy of Sciences)
100k GPU core benchmark simulations of galactic nuclei and star clusters with high precision direct N-body; on the path to million cores and Exascale... ...Read More
100k GPU core benchmark simulations of galactic nuclei and star clusters with high precision direct N-body; on the path to million cores and Exascale...  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2013 - ID P3242
Download:
 
GPU Accelerated Simulations and Real-time Control of the E-ELT Adaptive Optics Systems
Damien Gratadour (LESIA - Observatoire de Paris)
Adaptive Optics (AO) is an instrumental technique for the correction of dynamically evolving aberrations in optical systems, used on astronomical telescopes to compensate, in real-time, for atmospheric turbulence. Our team has developped a simulation ...Read More
Adaptive Optics (AO) is an instrumental technique for the correction of dynamically evolving aberrations in optical systems, used on astronomical telescopes to compensate, in real-time, for atmospheric turbulence. Our team has developped a simulation code based on YoGA, an original binding between Yorick, an interpreted programming language and CUDA. Using this code, speedups of 10x are obtained as compared to currently available CPU codes. We will present the various features of the code and its performance for various system dimensioning and GPUs. Additionally, we will present profiles of a GPU-based AO real-time controller simulator demonstrating performance compatible with real-time operations.   Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID P3213
Download:
 
The Telescope Array Fluorescence Detector Simulation on GPUs
Tareq AbuZayyad (University of Utah)
The Telescope Array Cosmic Rays Detector located in the Western Utah Desert is used for the observation of ultra-high energy cosmic rays. The simulation of a fluorescence detector response to cosmic rays initiated air showers presents many oppor ...Read More

The Telescope Array Cosmic Rays Detector located in the Western Utah Desert is used for the observation of ultra-high energy cosmic rays. The simulation of a fluorescence detector response to cosmic rays initiated air showers presents many opportunities for parallelization. In this presentation we report on the Monte Carlo program used for the simulation of the Telescope Array fluorescence detector located at the Middle Drum site. The program makes extensive use of GPU acceleration to achieve a 50x speedup compared to running on a single CPU core. All of the physics simulation from shower development, light production and propagation with atmospheric attenuation, as well as, the realistic detector optics and electronics simulations are done on the GPU. A detailed description of the code implementation is given, and results on the accuracy and performance of the simulation are presented as well.

  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2013 - ID S3189
Streaming:
Download:
 
Powering Real-time Radio Astronomy Signal Processing with GPUs
Harshavardhan Reddy Suda (GMRT Observatory, National Centre for Radio Astrophysics, TIFR, Pune, India), Pradeep Kumar Gupta (NVIDIA)
The goal of this session is to demonstrate the power of GPUs in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. Modern radio astronomy telescop ...Read More

The goal of this session is to demonstrate the power of GPUs in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. Modern radio astronomy telescopes are multiple antenna instruments where the wideband data from each antenna needs to be processed in real-time to implement digital receiver systems such as correlators and beamformers. We will demonstrate how such compute and data I/O intensive algorithms can be implemented on a distributed GPGPU system, with a fully real-time realisation. Hybrid computing techniques such as CUDA on GPU, OpenMP & MPI to synchronise the distributed host machines and handle the large i/o between them are key elements of such designs. Optimised implementation of signal processing algorithms such as FFT and MAC on GPUs, as well as the use of streams to optimise computing and I/O on the GPU, will be addressed in detail. All these concpets will be illustrated with the example of the prototype GPGPU correlator and beamformer that has been developed by us for the GMRT which is a 30-antenna radio telescope with 400 MHz BW dual polarised signals from each antenna, coming in at a sustained input data rate of 24 GBytes/sec.

  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, GTC 2013 - ID S3225
Streaming:
Download:
 
Signal Processing on GPUs for Radio Telescopes
John Romein (ASTRON Netherlands Institute for Radio Astronomy)
This talk will present research on accelerator-based computing for radio telescopes. Showing GPU implementations of a dozen of (signal-processing) algorithms used by radio telescopes, e.g., filtering, correlating, beam forming, dedispersion, and ...Read More

This talk will present research on accelerator-based computing for radio telescopes. Showing GPU implementations of a dozen of (signal-processing) algorithms used by radio telescopes, e.g., filtering, correlating, beam forming, dedispersion, and peak detection. Glued together, these computational kernels form several processing pipelines. Each pipeline implements an observation mode, as used by the LOFAR radio telescope. Implemented pipelines create sky images, to search for pulsars, to observe known pulsars, and to detect ultra-high-energy particles - first on a Blue Gene/P, and ported these to GPUs. This talk will briefly explain these algorithms and processing pipelines, show performance results, multi-GPU scaling results, and impact on energy efficiency. The research is relevant to current radio telescopes like LOFAR, and the future SKA telescope, that needs exascale computing power.

  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, GTC 2013 - ID S2124
Streaming:
Download:
 
ENZO Hydrodynamics and Magnetohydrodynamics Solvers on GPU
Peng Wang (NVIDIA)
Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in ast ...Read More

Learn the porting of ENZO solvers to GPU. ENZO is a block-structured adaptive mesh refinement (AMR) astrophysical fluid dynamics code used for simulating cosmological structure formation. It is one of the most commonly used community code in astrophysics. We have ported the PPM Hydrodynamics and Magnetohydrodynamics solvers to GPU and integrated the GPU solvers fully into the AMR framework. This talk will describe the porting strategy and performance results.

  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2013 - ID S3401
Streaming:
Download:
 
Accelerating Radio Astronomy Cross-correlation Using the Kepler Architecture
Ben Barsdell (Harvard University)
Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflop ...Read More

Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflops regime, driven by the Hydrogen Epoch of Reionization Array (HERA) and the Square Kilometer Array (SKA). The most compute intensive part of this problem is the so-called cross-correlation algorithm, which can be recast as a linear-algebra problem similar in spirit to DGEMM. In this session we describe the cross-correlation engine that powers the pathfinder LEDA radio telescope and has been (re)optimized for the Kepler GK110 architecture to achieve over 2.5 Tflops in sustained performance. This level of efficiency is critical to meeting strict power and space constraints imposed by the instrument''s remote location.

  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2013 - ID S3497
Streaming:
Download:
 
Cosmology on the GPU
Claudio Gheller (ETH CSCS)
Numerical simulations represent one of the most effective tools to study and to solve astrophysical problems. Thanks to the enormous technological progress in the recent years, the available supercomputers allow now to study the details of compl ...Read More

Numerical simulations represent one of the most effective tools to study and to solve astrophysical problems. Thanks to the enormous technological progress in the recent years, the available supercomputers allow now to study the details of complex processes, like galaxy formation or the evolution of the large scale structure of the universe. Sophisticated numerical codes can exploit the most advanced HPC architectures to simulate such phenomena and process and visualize their results. Enzo, Ramses and Splotch are prime examples of such codes. Work is ongoing to enable such codes to GPUs using the CUDA and OpenACC programming models. The accomplished refactoring work together with recent tests and results are presented.

  Back
 
Keywords:
Astronomy & Astrophysics, Supercomputing & HPC, GTC 2013 - ID S3555
Streaming:
Download:
 
Follow the Light: Plasma Physics on 18,000 GPUs
Richard Pausch (Helmholtz-Zentrum Dresden - Rossendorf), Guido Juckeland (ZIH, Technical University Dresden)
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted ...Read More
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted by individual electrons in a large-scale, turbulent plasma, providing unique insight into the relation between the plasma dynamics and observable radiation spectra.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4139
Streaming:
Download:
 
Real-Time Imaging in Radio-Astronomy: A Fully GPU-Based Imager
Sanjay Bhatnagar (National Radio Astronomy Observatory), Pradeep Kumar Gupta (NVIDIA)
We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near ...Read More

We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near-real time. Imaging software running on conventional computers currently take many orders of magnitude longer for imaging. In this presentation, we will briefly describe the algorithms and describe in more detail their adaptation for GPUs in particular and for heterogeneous computing in general. We will discuss the resulting run-time performance on the GPU using deal data from existing radio telescopes. Test with our current implementation show a speed-up of upto 100x compared to CPU implementation in the critical parts of processing enabling us to reduce the memory footprint by replacing compute-and-cache with on-demand computing on the GPU. For scientific use cases requiring high resolution high sensitivity imaging such a GPU-based imager represents an enabler technology.

  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, GTC 2014 - ID S4223
Streaming:
Download:
 
High Resolution Astrophysical Fluid Dynamics Simulations on a GPU Cluster
Pierre Kestener (CEA)
A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We ...Read More
A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We will first report on technical expertise gained in developing code Ramses-GPU designed for efficient use of large cluster of GPUs in solving MHD flows. We will illustrate how challenging state-of-the-art highly resolved simulations requiring hundreds of GPUs can provide new insights into real case applications: (1) the study of the Magneto-Rotational Instability and (2) high Mach number MHD turbulent flows.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Supercomputing & HPC, GTC 2014 - ID S4274
Streaming:
Download:
 
Conquering the Titan Supercomputer: A Star-by-Star Simulation of the Milky Way Galaxy
Evghenii Gaburov (SURFsara), Jeroen Bedorf (Leiden Observatory)
In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. T ...Read More
In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. The audience will learn what it takes to parallelize an advanced hierarchical GPU tree-code to efficiently run on the Titan supercomputer. A gravitational N-body problem is by definition an all-to-all problem and it is of utmost importance for scalability to hide data communication behind computations. This turned out to be a major challenge on the Titan supercomputer because Bonsai's GPU kernels are ~3x faster on Kepler than on Fermi, which reduced compute time and as a result hampered scalability. We were able to solve this by redesigning the communication strategy by taking full advantage of each of the 16- CPU cores while the GPUs were busy computing gravitational forces. This allowed Bonsai to scale to more than 8192 GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4347
Streaming:
Download:
 
Driving the Next Generation of Extremely Large Telescopes Using Adaptive Optics with GPUs
Damien Gratadour (LESIA - Observatoire de Paris)
The European Southern Observatory is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the Universe ever built, with a first light foreseen in 2022. The E-E ...Read More
The European Southern Observatory is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the Universe ever built, with a first light foreseen in 2022. The E-ELT will be the first telescope that will entirely depend, for routine operations, on adaptive optics (AO), an instrumental technique for the correction of dynamically evolving aberrations in an optical system, used on astronomical telescopes to compensate, in real-time, for the effect of atmospheric turbulence. In this session, we will show how GPUs can provide the throughput required to both simulate at high framerate and drive in real-time these AO systems that provide tens of thousands of degrees of freedom activated several hundreds times per second.   Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Supercomputing & HPC, GTC 2014 - ID S4357
Streaming:
 
RAMSES on the GPU: An OpenACC-Based Approach
Claudio Gheller (ETH-CSCS)
We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e. ...Read More
We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e.g. star formation, galaxy dynamics, large scale structure of the universe) treating at the same time various components (dark energy, dark matter, baryonic matter, photons) and including a variety of physical processes (gravity, magneto-hydrodynamics, chemical reactions, star formation, supernova and AGN feedback, etc.). It is implemented in Fortran 90 and adopts the OpenACC paradigm to offload some the most computationally demanding algorithms to the GPU. Two different strategies have been pursued for code refactoring, in order to explore complementary solutions and select the most effective approach. The resulting algorithms are presented together with the results of tests, benchmarks and scientific use cases.  Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4365
Streaming:
Download:
 
Black Holes on the GPU: Experiences with Accelerated Relativity
Adam Lewis (University of Toronto/ CITA)
New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these ban ...Read More
New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these banks requires black hole mergers of many different masses, spins, and orbital eccentricities to be simulated. This is not yet feasible, since even a single simulation may take several months. GPU acceleration offers a theoretical speedup of 50X, but until now has been too laborious to attempt. This is no longer the case: using a combination of hand-coding in CUDA, calls to CUBLAS and cuSPARSE, and our own automatic porting routine "CodeWriter," we have successfully accelerated the C++-based "Spectral Einstein Code". I will discuss our porting strategy, the challenges we encountered, and the new science made possible by the GPU. This talk should be of particular interest to scientists working on GPU ports of their own codes.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Developer - Programming Languages, Supercomputing & HPC, GTC 2014 - ID S4423
Streaming:
 
COBALT: Creating a High-Throughput, Real-Time Production System Using CUDA, MPI and OpenMP
Wouter Klijn (ASTRON), Jan David Mol (ASTRON)
We present our experiences in designing, building and deploying a massively parallel processing system for the LOFAR radio telescope using off-the-shelf hardware and software. After numerous hurdles, we created a high-throughput system, based on CUDA ...Read More
We present our experiences in designing, building and deploying a massively parallel processing system for the LOFAR radio telescope using off-the-shelf hardware and software. After numerous hurdles, we created a high-throughput system, based on CUDA, MPI and OpenMP running on multi-GPU, multi-socket servers and InfiniBand. These techniques have established niches. However, due to conflicting memory models, incompatible requirements and abstractions, the otherwise orthogonal techniques do not cooperate well within the same application. Using the project's time line as a guide we will answer the following questions: (1)What problems appear when combining these techniques? (2) How did we adjust both the hardware and the software to meet our requirements? (3) How did we robustly develop and deploy to both development boxes and a production cluster? And, most importantly, (4)how does the system perform?   Back
 
Keywords:
Astronomy & Astrophysics, Developer - Programming Languages, Signal & Audio Processing, Supercomputing & HPC, GTC 2014 - ID S4441
Streaming:
Download:
 
Fire and Ice: How Temperature Affects GPU Performance
Danny Price (Harvard-Smithsonian Center for Astrophysics)
Is it worth cooling your GPUs, or should you run them hot? In this session, we discuss how operating temperature affects the computational performance of GPUs. Temperature-dependent leakage current effects contribute significantly to power dissipatio ...Read More
Is it worth cooling your GPUs, or should you run them hot? In this session, we discuss how operating temperature affects the computational performance of GPUs. Temperature-dependent leakage current effects contribute significantly to power dissipation in nanometer-scale circuits; within GPUs this corresponds to decreased performance per watt. We use the CUDA-based xGPU code for radio astronomy to benchmark Fermi and Kepler GPUs while controlling the GPU die temperature, voltage, and clock speed. We report on trends and relate these measurements to physical leakage current mechanisms.  Back
 
Keywords:
Astronomy & Astrophysics, Clusters & GPU Management, GTC 2014 - ID S4484
Streaming:
Download:
 
Petascale Cross-Correlation: Extreme Signal-Processing Meets HPC
Ben Barsdell (Harvard University)
How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from mon ...Read More
How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from monolithic single-dish telescopes to massive arrays of smaller antennas. In this session we will describe how general-purpose HPC installations can be used to achieve scaling of a cross-correlation pipeline to petascale with all the flexibility of a purely-software implementation. Optimisations we will discuss include tuning of the GPU cross-correlation kernel, maximising concurrency between compute and network operations, and minimising bandwidth bottlenecks in a streaming application. GPUs are already powering the world's biggest radio telescope arrays, and this work paves the way for entirely off-the-shelf correlators for the future exascale-generation of instruments.  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, Supercomputing & HPC, GTC 2014 - ID S4511
Streaming:
Download:
 
Real-Time RFI Rejection Techniques for the GMRT Using GPUs
Rohini Joshi (Drexel University)
Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, ...Read More

Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, etc. Seen in the form of spikes and bursts in raw voltage data, RFI is statistically seen as outliers in a Gaussian distribution. We present an approach to tackle the problem of RFI, in real-time, using a robust scale estimator such as the Median Absolute Deviation (MAD). Given the large data rate from each of the 30 antennas, sampled at 16 ns, it is necessary for the filter to work well within real-time limits. To accomplish this, the algorithm has been ported to the GPUs to work within the GMRT pipeline. Presently, the RFI rejection pipeline runs in real-time for 0.3-0.7 sec long data chunks. The GMRT will soon be upgraded to work at 10 times the current data rate. We are now working on improving the algorithm further so as to have the RFI rejection pipeline ready for the upgraded GMRT.

  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, Signal & Audio Processing, GTC 2014 - ID S4538
Streaming:
Download:
 
GPUs In High Energy Physics: Reconstruction of Particle Trajectories
Akitaka Ariga (University of Bern, Switzerland)
The history of particle physics is a history of particle detectors, namely developments of new detectors and data analysis tools. For recent experiments, the size of data coming from particle detectors is huge and therefore a reconstruction of partic ...Read More
The history of particle physics is a history of particle detectors, namely developments of new detectors and data analysis tools. For recent experiments, the size of data coming from particle detectors is huge and therefore a reconstruction of particle trajectories using GPU is worth implementing. LHEP Bern pioneered the use of GPUs in this field. Here, we show some applications of GPUs on the reconstruction of particle trajectories. This work is partially related to the talk S4372 - Does Antimatter Fall On The Earth? Measurement Of Antimatter Annihilation with GPU, and more general for high energy physics.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4228
Download:
 
Cosmology With the 3-Point Correlation Function on the GPU
Deborah Bard (SLAC National Accelerator Laboratory)
Information about the period immediately after the Big Bang is lost in most metrics used to study the large-scale structure of the Universe. However, the cosmological three-point correlation function (3ptCF) applied to galaxy positions can provide in ...Read More
Information about the period immediately after the Big Bang is lost in most metrics used to study the large-scale structure of the Universe. However, the cosmological three-point correlation function (3ptCF) applied to galaxy positions can provide information about this early time. The 3ptCF scales with the cube of the number of galaxies. Approximation functions can speed this, but can introduce systematic errors that will be unacceptable in the coming era of large astronomical datasets. Previous work (Bard et al., 2013) has established that the full calculation of the 2-point correlation function on the GPU reduces computation time by up to a factor of 140 compared to the CPU. In this work we consider the implementation of the full 3ptCF on the GPU, which presents very different challenges both cosmologically and computationally.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4236
Download:
 
Interactive Visualization of Astrophysical Data
Frederick Bogert (University of California, Santa Cruz)
The general purpose of the project is to provide a volume rendering suite that utilizes graphics cards to interactively visualize large astrophysical data sets. We are working with open source packages PyCuda and PyOpenGL to build inter operations be ...Read More
The general purpose of the project is to provide a volume rendering suite that utilizes graphics cards to interactively visualize large astrophysical data sets. We are working with open source packages PyCuda and PyOpenGL to build inter operations between CUDA and the yt-project, which has been optimized to handle various sets of astrophysical data. The result is a robust tool that provides researches with an interactive visual of their data.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4201
Download:
 
Using GPUs to Analyze Solar Spectral Observations and Synthetized 3D Radiative-MHD Simulations
Juan Martinez Sykra (Bay Area Environmental Research Institute)
Solar-physics observations and 3D radiative-MHD simulations of the Sun provide an enormous amount of data that makes difficult to analyze. NASA's recently launched Interface Region Imaging Spectrograph (IRIS) provides a very large 4D dataset (2D sp ...Read More
Solar-physics observations and 3D radiative-MHD simulations of the Sun provide an enormous amount of data that makes difficult to analyze. NASA's recently launched Interface Region Imaging Spectrograph (IRIS) provides a very large 4D dataset (2D space, time and spectra) and enable us to study with great detail the dynamics of the Sun of one of the most intriguing layers of the Sun, the chromosphere. Moreover, state-of-the-art 3D radiative MHD simulations are needed to interpret these observations. This poster will describe different tools using GPU computing which helps scientists analyze the immense observational and numerical modeling data volumes of the Sun, as well as we can compare of both of them creating synthetic observables from the simulations using GPUs.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4138
Download:
 
Streaming Multiframe Deconvolution of Atmospherically Distorted Images on GPUs
Matthias Lee (Johns Hopkins University)
We present an easily extensible, open source, GPU-accelerated tool for testing, comparing and experimenting with multiple approaches to multiframe deconvolution. Currently we provide options for a Gaussian, Richardson-Lucy and damped Richardson-Lucy ...Read More
We present an easily extensible, open source, GPU-accelerated tool for testing, comparing and experimenting with multiple approaches to multiframe deconvolution. Currently we provide options for a Gaussian, Richardson-Lucy and damped Richardson-Lucy approach as well as Wavelet filtering and Robust Statistics weighting. Our tool yields an over 20x speedup over the CPU implementation, allowing for interactive experimentation of parameters.   Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4285
Download:
 
News From Black Holes in Galactic Nuclei Simulated With Large GPU Clusters
Rainer Spurzem (National Astronomical Observatories, Chinese Academy of Sciences)
We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware and in one case a first preliminary test wit ...Read More
We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware and in one case a first preliminary test with Intel PHI. Our clusters are directly linked under the Chinese Academy of Sciences special GPU cluster program in the cooperation of ICCS (International Center for Computational Science). We reach about half of the peak Kepler K20 GPU performance for our production ready phiGPU code, in a real application scenario with individual hierarchically block time-steps with the high (4th, 6th and 8th) order Hermite integration schemes and a real core-halo density structure of the modeled stellar systems. The code is mainly used to simulate star clusters and galactic nuclei with supermassive black holes, in which correlations between distant particles (two body relaxation) cannot be neglected.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4270
Download:
 
Lunar-Forming Giant Impact Model Utilizing GPUs
Travis Salzillo (Tarleton State University)
Recent giant impact models focus on producing a circumplanetary disk of the proper composition around Earth and defer to earlier works for the accretion of this disk into the Moon. The discontinuity between creating the circumplanetary disk and accre ...Read More
Recent giant impact models focus on producing a circumplanetary disk of the proper composition around Earth and defer to earlier works for the accretion of this disk into the Moon. The discontinuity between creating the circumplanetary disk and accretion of the Moon is unnatural and lacks simplicity. Here we return to first principles and produce a highly parallelizable model that readily produces stable Earth-Moon systems from a single, continuous simulation. The resultant systems possess an iron-deficient, heterogeneously mixed Moon and accurate axial tilt of the Earth. This project was made financially feasible by the utilization of modern GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4139
Download:
 
Large-Scale Global MHD Simulation for Solar Wind-Magnetosphere Interaction on TSUBAME 2.5
Un-Hong Wong (Tokyo Institute of Technology)
Investigations of the space plasma environment are necessary to space exploration. MHD simulation has been a powerful tool to modeling space plasmas, but it is computationally expensive. In this poster, large-scale global MHD simulations of solar win ...Read More
Investigations of the space plasma environment are necessary to space exploration. MHD simulation has been a powerful tool to modeling space plasmas, but it is computationally expensive. In this poster, large-scale global MHD simulations of solar wind interacting with the planet's magnetosphere are presented. Simulation results of a 1350 x 900 x 900 domain of the space plasma environment around a planet was produced by our GPU accelerated MHD simulation code, running on the GPU-rich supercomputer TSUBAME 2.5 using 324 K20x (Kepler) GPUs. Performance test shows 7.8 TFOPS of our simulation code. Simulation results of solar wind interacting with the Earth's magnetic field and dipole magnetic fields with non-vertical magnetic pole are presented.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2014 - ID P4125
Download:
 
Acceleration of the Longwave Rapid RadiativeTransfer Module Using GPGPU
Pragati Dharmale (SNHU, NH)
This poster presents Weather Research and Forecast (WRF) model, a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, ...Read More
This poster presents Weather Research and Forecast (WRF) model, a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, one is the Long-Wave Rapid Radiative Transfer Model (RRTM). Even with the advent of large-scale parallelism in weather models, much of the performance increase has come from increasing processor speed rather than increased parallelism. We present an alternative method of scaling model performance.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID P5144
Download:
 
Galaxy Classification with Deep Convolutional Neural Networks
Honghui Shi (University of Illinois, Urbana-Champaign)
There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of information ...Read More
There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of information is not meant for humans to process, and CPUs and traditional algorithm both meet their bottleneck in processing. With the help of recent deep learning technologies and powerful implementations with NVIDIA's GPUs, the developed models can competitively accurately classify galaxies.  Back
 
Keywords:
Astronomy & Astrophysics, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID P5176
Download:
 
Time-Efficient Analysis of Simulations of the Sun's Magnetic Field
Christopher Scarborough (Lockheed Martin Space Sciences Corporation)
Dynamics in the solar atmosphere, including solar flares, coronal mass ejections, micro-flares and different types of jets, are powered by the evolution of the sun's intense magnetic field. 3D Radiative Magnetohydrodnamics (MHD) computer simulations ...Read More
Dynamics in the solar atmosphere, including solar flares, coronal mass ejections, micro-flares and different types of jets, are powered by the evolution of the sun's intense magnetic field. 3D Radiative Magnetohydrodnamics (MHD) computer simulations have furthered our understanding of the processes involved. Detailed analysis of this evolution entails tracing magnetic field lines, an operation which is not time-efficient on a single processor. By utilizing a GPU to trace lines in parallel, conducting such analysis is made feasible.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID P5196
Download:
 
Unified Representation for Collaborative Visualization of Planetary Terrain Data
Daniel Herman (DigitalFish, Inc.)
Current and future NASA planetary missions are generating ever-increasing volumes of terrain data from orbital and surface-based assets at vastly different resolutions. We have applied an alternative technology, subdivision surfaces, coupled with a n ...Read More
Current and future NASA planetary missions are generating ever-increasing volumes of terrain data from orbital and surface-based assets at vastly different resolutions. We have applied an alternative technology, subdivision surfaces, coupled with a novel volumetric reconstruction process, to help manage and present high-fidelity mesh representations of the disparate range of terrain data collected by rovers and satellites. Applications include terrain data visualization, autonomous navigation, and other localization and mapping problems.  Back
 
Keywords:
Astronomy & Astrophysics, Visualization - In-Situ & Scientific, GTC 2015 - ID P5307
Download:
 
Astrophysical Gamma-Ray Source Imaging with NASA's Swift Telescope Using Nvidia GPUs
Tim McMahon (Langston University)
We have implemented an extensive package of detector modeling and image reconstruction algorithms for Swift's BAT telescope on a Tesla K20 processor. Individual reconstructed images with 2 million pixels are reprocessed with a compute intensive nois ...Read More
We have implemented an extensive package of detector modeling and image reconstruction algorithms for Swift's BAT telescope on a Tesla K20 processor. Individual reconstructed images with 2 million pixels are reprocessed with a compute intensive noise reduction algorithm which has been modified to run under CUDA 6.5. Methods employed to port existing code to a GPU implementation with a minimum of code development are presented.  Back
 
Keywords:
Astronomy & Astrophysics, Developer - Algorithms, GTC 2015 - ID P5316
Download:
 
Exact and Approximate Methods in Stellar Dynamics
Yohai Meiron (Peking University)
Stellar systems come in many shapes and sizes. We present two new GPU-accelerated N-body codes focusing on two kind of systems: dwarf spheroidal galaxies and globular clusters. ETICS is based on series expansion of the Poisson equation and is ideal f ...Read More
Stellar systems come in many shapes and sizes. We present two new GPU-accelerated N-body codes focusing on two kind of systems: dwarf spheroidal galaxies and globular clusters. ETICS is based on series expansion of the Poisson equation and is ideal for diffuse objects such as dwarf galaxies. Since in globular clusters close stellar encounters and binaries play very important roles in the dynamics, a much more accurate integrator is needed. NBODY6++ is a direct-summation N-body code which can provide this kind of accuracy.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID P5323
Download:
 
Maximum Likelihood Estimation on GPUs: Leveraging Dynamic Parallelism
Michele Mastropietro (Italian National Institute for Astrophysics (INAF), Rome)
The estimation of the Maximum Likelihood (MLE) is the most robust algorithm used in gamma-ray astronomy but, particularly if used in conduction with "unbinned" analysis, it uses a huge amount of computing resources. Typically, the estimatio ...Read More
The estimation of the Maximum Likelihood (MLE) is the most robust algorithm used in gamma-ray astronomy but, particularly if used in conduction with "unbinned" analysis, it uses a huge amount of computing resources. Typically, the estimation of the maximum is left to a single-thread minimizer, like MINUIT, running on a CPU while providing a callback function that may estimate the likelihood on the GPU. We propose an alternative to the MINUIT package, that leverages Dynamic Parallelism and runs entirely on GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID P5327
Download:
 
HTC for Gamma-Ray Astronomy on Kayla and Low-Power Platforms
Alberto Madonna (Italian National Institute for Astrophysics (INAF), Rome)
Detectors for Gamma-ray Astronomy are the prototypes for distributed experiments. Single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them ...Read More
Detectors for Gamma-ray Astronomy are the prototypes for distributed experiments. Single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them to the central data acquisition provides a key advantage. We aim at developing and testing algorithms and techniques to implement such kind of local data sparsification at detector level. To reach this goal, we leveraged, and compare, the parallel capabilities of Kayla and Jetson TK1.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, GTC 2015 - ID P5328
Download:
 
Shooting for the Stars with GPUs
Hatem Ltaief (KAUST), Damien Gratadour (Université Paris Diderot & LESIA, Observatoire de Paris)
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the ...Read More
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the universe ever built. MOA is the most complex adaptive optics concept proposed for the E-ELT and simulating the instrument at full scale is extremely compute-intensive. The tomographic reconstructor (TR) is one of the core components of both the design simulations and eventually system operations and it requires the inversion of a large dense covariance matrix.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID S5122
Streaming:
 
GPU-Accelerated Imaging Processing for NASA's Solar Dynamics Observatory
Mark Cheung (Lockheed Martin Solar & Astrophysics Laboratory)
Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onb ...Read More

Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onboard SDO deliver 4096x4096 pixel images at a cadence of more than one image per second. Although SDO images are free from distortion by absorption and scattering in the Earth's atmosphere, images are still blurred by the intrinsic point spread functions of the telescopes. In this presentation, we show how the instrument teams have deployed CUDA-enabled GPUs to perform deconvolution of SDO images. The presentation will demonstrate how we leveraged cuFFT and Thrust to implement an efficient image processing pipeline.

  Back
 
Keywords:
Astronomy & Astrophysics, Video & Image Processing, GTC 2015 - ID S5209
Streaming:
Download:
 
Embedded Supercomputing: Radio Astronomy at the Limit
Simon Ratcliffe (SKA South Africa)
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the devel ...Read More
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the development of a highly parallel, low power, low cost imager using System on Chip devices. In particular NVIDIA's TK1 and successors are considered. The talk will also briefly describe the opportunities and solutions presented by the forthcoming Square Kilometer Array, whose processing costs require game changing technology shifts to become achievable.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, GTC 2015 - ID S5222
Streaming:
Download:
 
Taranis: Ray-Traced Radiative Transfer in Smoothed Particle Hydrodynamics
Sam Thomson (University of Edinburgh)
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is mot ...Read More
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is motivated by the current intractability of coupled radiation-hydrodynamics simulations. This talk focuses on Taranis' tracing component, which has been influenced by recent work in computer graphics. It outperforms a 32-core CPU code on a single GPU. Our scheme allows particles to be updated independently and requires fewer rays than a typical 'long characteristics' method. Taranis' radiation transport solver is also implemented on the GPU, and targets large-scale simulations of reionization. However, the tracing API exists as a standalone entity.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Rendering & Ray Tracing, GTC 2015 - ID S5266
Streaming:
Download:
 
Optimization of GPU-Based Signal Processing of Radio Telescopes
Vinay Deshpande (NVIDIA)
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio ...Read More
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio Telescope (GMRT) receiver with wide-band GPU-based back-end and extending this design as a proposal for back-end for the LOW frequency array of the SKA Telescope. We look at various processing stages involved in pipeline for exploring optimization possibilities with some interesting results already achieved.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID S5302
Streaming:
Download:
 
Statistics of the Universe: Exa-Calculations and Cosmology's Data Deluge
Matthew Bellis (Siena College), Deborah Bard (SLAC National Accelerator Laboratory)
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and ...Read More
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and three-point correlation functions, which quantify the clustering of galaxies. Cosmological datasets can number in the millions (and soon billions) of galaxies, making these O(N^2) and O(N^3) metrics computationally challenging. This talk will detail how we have ported solutions to the GPU. In particular we focus on the novel histogramming bottlenecks inherent in these calculations, and how they can be mitigated. Throughout we will emphasise how GPUs and heterogeneous computing can be used for everyday data analysis with large datasets.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID S5509
Streaming:
Download:
 
The Ramses Code for Numerical Astrophysics: Toward Full GPU Enabling
Claudio Gheller (ETHZ CSCS)
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that ...Read More
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that populate it: stars, galaxies, black holes The most powerful computing systems are required to pursue such goals and GPUs represent an outstanding opportunity. In this talk, we present one of these codes, Ramses, and the ongoing work to enable this code to efficiently exploit GPUs through the adoption of the OpenACC programming model. The most recent achievement will be shown together with some of the scientific challenges GPUs can help addressing.  Back
 
Keywords:
Astronomy & Astrophysics, OpenACC, Computational Physics, Supercomputing & HPC, GTC 2015 - ID S5531
Streaming:
Download:
 
Pulsar Hunting with the Square Kilometre Array
Ewan Barr (Swinburne University of Technology)
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsar ...Read More
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsars. Radio pulsars provide us with phenomenal tools with which we may probe the most extreme environments in the Universe. More massive than our Sun, yet spinning faster than a kitchen blender and sending jets of radio waves out from their magnetic poles, these exotic cosmic lighthouses are key to understanding gravity and allowing us to ask the question: was Einstein right? To answer this question we must use the SKA to scour the Galaxy in search of exotic pulsars binary systems. This task is extremely computationally expensive, requiring the execution of many billions of Fourier transforms. Here I will review the work being done to leverage the power of GPUs to solve the SKAs pulsar searching challenge.  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, Supercomputing & HPC, GTC 2015 - ID S5875
Streaming:
Download:
 
Computational Simulation of World's Biggest Eye on GPUs
Hatem Ltaief (Extreme Computing Research Center, KAUST), Damien Gratado (Universite Paris Diderot & Observatoire de Pari)
Have you heard about the world''s biggest eye? Learn how GPUs help design major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on dis ...Read More
Have you heard about the world''s biggest eye? Learn how GPUs help design major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we''ll explain how the resulting dense linear algebra operations associated with an efficient task-based programming model help design the next generation of telescope instruments.  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms, Performance Optimization, GTC 2016 - ID S6229
Streaming:
Download:
 
Shaping the Light with GPUs
Damien Gratadour (Universite Paris Diderot & Observatoire de Paris)
Learn how GPUs are used to shape the light on extreme diameter telescopes. By providing the means to process, in real time, large-scale images from wavefront sensors, GPUs are revolutionizing adaptive optics, an instrumental technique used to compens ...Read More
Learn how GPUs are used to shape the light on extreme diameter telescopes. By providing the means to process, in real time, large-scale images from wavefront sensors, GPUs are revolutionizing adaptive optics, an instrumental technique used to compensate fast-evolving aberrations in optical systems. We''ll show how GPUs are used to power the real-time controllers of these systems to provide millions of commands per second to deformable mirrors so as to stabilize the image quality at the output of a large telescope. The first results of the Green Flash project, a large-scale European initiative aimed at prototyping real-time controllers for the European Extremely Large Telescope, will be presented and illustrated with preliminary data obtained in the lab.  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, Supercomputing & HPC, GTC 2016 - ID S6236
Streaming:
Download:
 
A CUDA?-Based 3D Kinetic Model for Space Plasma Physics
Shahab Fatemi (University of California, Berkeley), Andrew R. Poppe (University of California, Berkeley)
We''ve developed the first three-dimensional, self-consistent kinetic plasma model that runs on NVIDIA GPUs using CUDA. The model self-consistently solves the charged-particles motion and their associated electromagnetic fields. We use this model to ...Read More
We''ve developed the first three-dimensional, self-consistent kinetic plasma model that runs on NVIDIA GPUs using CUDA. The model self-consistently solves the charged-particles motion and their associated electromagnetic fields. We use this model to explore the microphysics of plasma interactions with solar system objects, to understand fundamental kinetic processes of plasma, and to meet NASA''s requirements for planetary and space exploration.  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms, Computational Physics, GTC 2016 - ID S6265
Streaming:
Download:
 
Fourier Domain Pulsar Acceleration Searches on GPUs for the Square Kilometre Array
Sofia Dimoudi (University of Oxford)
We''ll describe how we can accelerate one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world''s largest next generation radio telescope, the Square Kilometre Array (SKA). We''ll explain the scien ...Read More
We''ll describe how we can accelerate one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world''s largest next generation radio telescope, the Square Kilometre Array (SKA). We''ll explain the scientific goals and importance of pulsar searches, along with the technical challenges facing pulsar signal processing on the SKA. Pulsar acceleration searches will be introduced, and an overview of a Fourier Domain method for recovering signal power from binary accelerated pulsars will be given. We''ll then present our GPU implementation of this method, discuss techniques used for optimisation, show comparative computational performance results, and consider performance projections with future GPU technology.  Back
 
Keywords:
Astronomy & Astrophysics, Algorithms, GTC 2016 - ID S6412
Streaming:
Download:
 
Bifrost: High-Throughput CPU/GPU Pipelines Made Easy
Ben Barsdell (NVIDIA)
We''ll present Bifrost, a lightweight new framework designed to ease the development and deployment of pipeline applications that demand sustained peak utilization of network, CPU, and GPU resources under soft real-time constraints. Such applications ...Read More
We''ll present Bifrost, a lightweight new framework designed to ease the development and deployment of pipeline applications that demand sustained peak utilization of network, CPU, and GPU resources under soft real-time constraints. Such applications are common in experimental science and computer vision, where processing must keep up with acquisition systems to avoid data loss. Bifrost enables operations to be wrapped in a simple task container with metadata-rich inputs and outputs. By connecting tasks together, complex branching pipelines can be constructed, with asynchronous communication handled by efficient ring buffers in host or device memory. We''ll demonstrate Bifrost using a high-performance radio astronomy application that has been deployed as part of the LEDA project.  Back
 
Keywords:
Astronomy & Astrophysics, Tools & Libraries, Signal & Audio Processing, GTC 2016 - ID S6627
Streaming:
Download:
 
Embedded Supercomputing: Radio Astronomy at the Limit
Simon Ratcliffe (SKA South Africa)
This talk will present designs and performance results for a highly parallel Tegra X1 based compute platform being developed as part of a next generation radio telescope. The MeerKAT radio telescope is currently under construction in the semi-desert ...Read More
This talk will present designs and performance results for a highly parallel Tegra X1 based compute platform being developed as part of a next generation radio telescope. The MeerKAT radio telescope is currently under construction in the semi-desert Karoo region of Southern Africa. This talk presents the ongoing work into developing novel computing technologies to deliver a large scale computational platform within the strict confines of power, space and emission that are in force at this remote site. Using the Tegra X1 as a building block, a rugged, oil-cooled platform has been developed that will power the imager that lies at the heart of the compute challenge. This is a follow on talk from an initial exploration presented in 2015.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, Press-Suggested Sessions: HPC & Science, GTC 2016 - ID S6692
Streaming:
Download:
 
Photometry of Fractal Meshes for Applications to Large-Scale Rough Planetary Surfaces
Antonio Gracia Berna (University of Bern)
The photometry measured by spacecrafts during space missions provides important information about the planetary surface composition and properties, like the roughness that influences its photometry. The model by B. Hapke has been one of the most used ...Read More
The photometry measured by spacecrafts during space missions provides important information about the planetary surface composition and properties, like the roughness that influences its photometry. The model by B. Hapke has been one of the most used models to fit the photometric data, but it presents drawbacks. We present a GPU-accelerated technique that simulates the photometry, produced on large-scale rough surfaces, as the interaction of millions of light rays. Reflectance values measured in the laboratory from real samples are used in the simulation. To prove the validity of the approach, a comparison with the Hapke model is proposed. This is a first step to relate real laboratory measurements to the photometry of solar system surfaces observed by past and future missions.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2016 - ID P6134
Download:
 
N-Body Simulation of Binary Star Mass Transfer Using NVIDIA GPUs
Baylor Fain (Tarleton State University), Taylor Hutyra (Tarleton State University), Edward Smith (Tarleton State University)
Over 70% of the stars in our galaxy are binary systems. Because of their interaction, the masses of these stars can be found using Newton''s and Kepler''s Laws. This allows astronomers to use these systems to study properties and processes of stars a ...Read More
Over 70% of the stars in our galaxy are binary systems. Because of their interaction, the masses of these stars can be found using Newton''s and Kepler''s Laws. This allows astronomers to use these systems to study properties and processes of stars and galaxies. Among the many types of binary stars observed, contact systems are the most interesting because they exhibit mass transfer, changing the functionality of both stars. But, due to the lack of precise observational data and the large time scale of this process, there is limited understanding of the mass transfer. In this work, a model was made to give astronomers a method for gaining a deeper knowledge and visual intuition of how the mass transfer between binary stars takes place.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2016 - ID P6197
Download:
 
Angular Momentum of Late Lunar Forming Impacts Using NVIDIA GPUs
Jonathan Petz (Tarleton State University), William Sumpter (Tarleton State University), Ty Turner (Tarleton State University)
Our Moon is no ordinary satellite! It is too large to be a captured asteroid. Could it be a twin planet formed alongside of Earth as our solar system was being created? Or, perhaps a captured rocky planet forced to light our night and give lovers'' i ...Read More
Our Moon is no ordinary satellite! It is too large to be a captured asteroid. Could it be a twin planet formed alongside of Earth as our solar system was being created? Or, perhaps a captured rocky planet forced to light our night and give lovers'' inspiration? Though this is romantic, the true answer is thought to be much more violent. We believe the Moon was born from a violent encounter of two young proto-planets. This giant impact hypothesis (GIH) is the main theory for the formation of our Moon, but has been questioned recently because simulations of the GIH leave the Earth-Moon system with excess angular momentum. In this work, we show how to remove the excess angular momentum from giant impact simulations, while preserving all the desired results from previous giant impact studies.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2016 - ID P6200
Download:
 
Data Reduction for Cherenkov Gamma-Ray Astronomy on Jetson TK1
Alberto Madonna (Italian National Institute for Astrophysics (INAF))
A mini-array of ASTRI SST-2M Cherenkov telescopes will be deployed soon in a remote site far away from human activities to achieve optimal observation conditions for gamma-ray astronomy. In such a scenario, the capability of each telescope to process ...Read More
A mini-array of ASTRI SST-2M Cherenkov telescopes will be deployed soon in a remote site far away from human activities to achieve optimal observation conditions for gamma-ray astronomy. In such a scenario, the capability of each telescope to process its own data before sending them to a central acquisition system provides a key advantage. We implemented the complete analysis chain required by a single telescope on a Jetson TK1 development board, overcoming the required real-time processing speed by more than a factor of two, while staying within a very small power budget.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded, GTC 2016 - ID P6233
Download:
 
Non-Uniform Diffusion of the Solar Surface Magnetic Field: Code Acceleration Using OpenACC for both GPUs and x86
Ronald Caplan (Predictive Science Inc.)
We show the results of implementing OpenACC into a non-uniform diffusion time integration Fortran code. The code''s application is to smooth observation-based radial magnetic field maps of the solar surface for use as inner boundary conditions of glo ...Read More
We show the results of implementing OpenACC into a non-uniform diffusion time integration Fortran code. The code''s application is to smooth observation-based radial magnetic field maps of the solar surface for use as inner boundary conditions of global magnetohydrodynamic simulations of the corona and heliosphere. The code uses a RKL2 super-time-stepping algorithm to allow time-steps that far exceed the standard explicit stability limit. The algorithm remains explicit, making the code a prime target for OpenACC acceleration. The OpenACC implementation is discussed and speedup results are shown. The newly released OpenACC x86 feature in the PGI compiler is also tested and shown to produce multicore CPU code from the OpenACC directives that can outperform our OpenMP implementation.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2016 - ID P6259
Download:
 
Implementation of a Real-Time Polyphase Filter in Radio Astronomy
Karel Adamek (University of Oxford)
We present our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe our implementation of the ...Read More
We present our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe our implementation of the polyphase filter algorithm. We have implemented the polyphase filter on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU, and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this. The first makes use of L1/Texture cache, the second uses shared memory. We present our results in terms of the sample rate that can be processed per second.  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, GTC 2016 - ID P6281
Download:
Audio, Image and Video Processing
Presentation
Media
Using the GPU Direct for Video API
Thomas True (NVIDIA), Alina Alt (NVIDIA)
This tutorial will demonstrate how video I/O devices can take advantage of the GPU Direct for Video API to optimize the data transfer performance for digital video, film and broadcast applications and computer vision applications. The GPU Direct ...Read More

This tutorial will demonstrate how video I/O devices can take advantage of the GPU Direct for Video API to optimize the data transfer performance for digital video, film and broadcast applications and computer vision applications. The GPU Direct for Video API is a technology that permits the DMA transfer of data buffers between video I/O devices and the GPU through the use of a shared system memory buffer for immediate processing by OpenGL, DirectX, CUDA and OpenCL. This direct transfer can improve synchronization and eliminate latency between video capture, GPU processing and video output.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2049
Streaming:
Download:
 
Fast High Quality Image and Video Background Removal with CUDA
Timo Stich (NVIDIA)
A tool to efficiently and easily cut out objects from a taken picture has great practical value. In this session we present aspects on how to efficiently implement such a tool with CUDA and the NPP library based on the GrabCut approach by Rother ...Read More

A tool to efficiently and easily cut out objects from a taken picture has great practical value. In this session we present aspects on how to efficiently implement such a tool with CUDA and the NPP library based on the GrabCut approach by Rother et al. Through GPU acceleration both runtime and accuracy is improved compared to CPU based implementations such as the one in MS Word 2011. Further we show how to extend our GPU implementation to enable live background removal in a webcam video stream.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2052
Streaming:
Download:
 
Cost-effective GPU Acceleration of a Video Restoration and Archiving Workflow
Klaus Gaedke (Technicolor)
The goal of this session is to present a complex GPU-accelerated video restoration and archiving workflow. The workflow consists of many different processing steps and a final review application. Fast and cost-effective processing and real-time ...Read More

The goal of this session is to present a complex GPU-accelerated video restoration and archiving workflow. The workflow consists of many different processing steps and a final review application. Fast and cost-effective processing and real-time display of the processed video material is a key requirement. It will be shown in detail how a GPU based acceleration can be achieved for many different processing steps and the review application based on the use of OpenCV, OpenCL, and OpenGL. Furthermore, an object oriented software architecture supporting the acceleration of several different processing tasks on the same graphics adapter will be presented.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2073
Streaming:
Download:
 
Multi-GPU Real-Time Ptychographic X-ray Image Reconstruction
Filipe Maia (Lawrence Berkeley National Laboratory)
Learn how a new imaging technique, combined with the computational power of GPUs and the brightness of modern X-ray synchrotrons can quickly and easily produce images with nanometer level resolution. Ptychography is a recent X-ray imaging techni ...Read More

Learn how a new imaging technique, combined with the computational power of GPUs and the brightness of modern X-ray synchrotrons can quickly and easily produce images with nanometer level resolution. Ptychography is a recent X-ray imaging technique in which overlapping regions of a sample are exposed in quick succession and the resulting scattering is used to reconstruct a high resolution image of the sample. Discover why GPUs can substitute for the lack of X-ray lenses and how they enabled a dramatic reduction in the feedback time for users of the technique from days to seconds.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2131
Streaming:
Download:
 
Rapid Training of Acoustic Models Using GPUs
Jike Chong (Carnegie Mellon University), Ian Lane (Carnegie Mellon University Co)
Learn how to realize robust and accurate speech recognition systems by training acoustic models on GPUs. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large ...Read More

Learn how to realize robust and accurate speech recognition systems by training acoustic models on GPUs. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large cluster of machines. To overcome this development bottleneck, we propose a new framework for rapid training of acoustic models using highly parallel GPUs. With a single NVIDIA GTX580 GPU, our proposed approach is shown to be 51x faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000-hour speech data in just over 9 hours.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2222
Streaming:
Download:
 
Building Real-Time Professional Visualization Solutions with OpenCL
Kristof Denolf (Barco), Samuel Maroy (Barco)
Professional visualization solutions, like high-quality high-resolution medical displays or very large screens for surveillance or entertainment, benefit from GPUs image and graphics compute capabilities to achieve real-time performance, but add ...Read More

Professional visualization solutions, like high-quality high-resolution medical displays or very large screens for surveillance or entertainment, benefit from GPUs image and graphics compute capabilities to achieve real-time performance, but add specific constraints, like low-latency, multiple HD streams and strict synchronization. This talk first motivates the industrial relevance of development in OpenCL on heterogeneous devices. It then explains the techniques currently explored to meet the specific design constraints, with a main focus on parallel data transfer and compute. The lessons learned are illustrated with a real-life example.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2252
Streaming:
Download:
 
Sensor Processing with Rugged Kepler GPUs (Presented by GE Intelligent Platforms)
Dustin Franklin (GE Intelligent Platforms)
Swimming in sensors and drowning in data? Turn the tide on high-bandwidth sensors with rugged next-generation GPUs from NVIDIA. See how we deploy NVIDIA GPUs into the most extreme of environments, providing GPGPU capabilities onboard platforms w ...Read More

Swimming in sensors and drowning in data? Turn the tide on high-bandwidth sensors with rugged next-generation GPUs from NVIDIA. See how we deploy NVIDIA GPUs into the most extreme of environments, providing GPGPU capabilities onboard platforms where SWaP and GFLOPS/watt is key. Dig into four realtime CUDA sensor processing applications - Hyperspectral Imaging, Wide-Area Surveillance, 360° Situational Awareness, and GSM cellular SIGINT. Discuss the CUDA algorithms, interconnects, and rugged platforms behind each. Learn how we utilize GPUDirect and realtime Linux for improved latency and determinism.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2253
Streaming:
Download:
 
Fast JPEG Coding on the GPU
Fyodor Serzhenko (Fastvideo), Victor Podlozhnyuk (NVIDIA)
The goal of this session is to demonstrate how high speed JPEG compression and decompression can be efficiently implemented on the GPU using CUDA. In this session we will present: detailed analysis of Baseline JPEG compression and decompression ...Read More

The goal of this session is to demonstrate how high speed JPEG compression and decompression can be efficiently implemented on the GPU using CUDA. In this session we will present: detailed analysis of Baseline JPEG compression and decompression processes and its constituent parts (such as Huffman Coding, RLE, Differential Coding, Quantization, Discrete Cosine Transform) and their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to high-speed imaging.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2273
Streaming:
Download:
 
Best Practices in GPU-Based Video Processing
Thomas True (NVIDIA)
The combination of the GPU's massively parallel compute engine with extremely high memory bandwidth and new programming paradigms such as CUDA and OpenCL have made the GPU well suited for image and video processing applications. This session ...Read More

The combination of the GPU's massively parallel compute engine with extremely high memory bandwidth and new programming paradigms such as CUDA and OpenCL have made the GPU well suited for image and video processing applications. This session will explore best practices and techniques for the development of efficient GPU-based video and image processing applications. Topics to be discussed include image segmentation and threading models for efficient parallelism, optimal memory usage strategies to reduce expensive data movement as well as multi-GPU considerations. Case studies and examples specific to video and image processing will be presented.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2328
Streaming:
Download:
 
GPU-Based Video Processing Round Table
Thomas TRUE (NVIDIA), Alina Alt (NVIDIA), Eric Young (NVIDIA), Ian Williams (NVIDIA), Andrew Page (NVIDIA)
Have questions, concerns or thoughts about the direction of GPU-based video and image processing? Join NVIDIA engineers and product managers for a lively discussion of such topics as application design, multi-GPU architecture, data movement, thr ...Read More

Have questions, concerns or thoughts about the direction of GPU-based video and image processing? Join NVIDIA engineers and product managers for a lively discussion of such topics as application design, multi-GPU architecture, data movement, threading, APIs, and color management as they apply to Video and Image processing applications.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID S2601
Streaming:
Download:
 
Rapid Training of Acoustic Models Using GPUs
Jike Chong (Carnegie Mellon University)
Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large c ...Read More
Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data, which can take weeks even with a large cluster of machines. To overcome this development bottleneck, we propose a new framework for rapid training of acoustic models using highly parallel GPUs. With a single NVIDIA GTX580 GPU, our proposed approach is shown to be 51x faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000-hour speech data in just over 9 hours.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2222
Download:
 
2 Million Pixel Experiment
Philipp Drieger (Noumentalia.de - Digital Arts & KU Eichstatt-Ingolstadt)
This experimental application has been created as a piece of computational art using visual computing technologies. It maps a high definition video source (1080p) into 3D space. The pixel transformation is accelerated by a CUDA kernel to achieve ...Read More

This experimental application has been created as a piece of computational art using visual computing technologies. It maps a high definition video source (1080p) into 3D space. The pixel transformation is accelerated by a CUDA kernel to achieve realtime accuracy. Beside the production of visual effects in arts this method may be utilized for video quality checking on lower pixel level.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2266
Download:
 
Speeding Up Camera Sabotage Detection on CUDA
Alptekin Temizel (Middle East Technical University)
Camera Sabotage Detection (CSD) algorithms, namely Camera Moved Detection, Camera Out of Focus Detection and Camera Covered Detection, are used to detect tampering attempts on surveillance cameras. CSD algorithms are required to be run on a high num ...Read More
Camera Sabotage Detection (CSD) algorithms, namely Camera Moved Detection, Camera Out of Focus Detection and Camera Covered Detection, are used to detect tampering attempts on surveillance cameras. CSD algorithms are required to be run on a high number of cameras in real-time, bringing high computational load to the video analytics systems. In this work, the CSD algorithms are accelerated by using CUDA. The overall system test results show that parallelization in GPU makes the system 18 times faster than its CPU counterpart and up to 400 cameras can be supported in real time on a GTX 470.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2381
Download:
 
Remote Sensing on GPU: A Case Study
Alptekin Temizel (Middle East Technical University)
Satellite images have become widely available; as a result there are increasing number of commercial applications utilizing these images. Satellites provide data in different wavelengths and they have higher resolution and larger data size compared t ...Read More
Satellite images have become widely available; as a result there are increasing number of commercial applications utilizing these images. Satellites provide data in different wavelengths and they have higher resolution and larger data size compared to typical images. Running complex algorithms on satellite images for large data volumes is highly time consuming using CPUs and can be speeded-up using GPUs. In this paper, performance of shadow detection and vegetation detection algorithms are investigated and their performance on GPU and CPU are compared. Results show that up to 10.2 times speed up could be achieved using GPU.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2387
Download:
 
Finite Difference-Based Sound Synthesis Using GPUs
Marc Sosnick (San Francisco State University)
Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. In this poster, ...Read More
Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. In this poster, we describe the current state of our implementation of a real-time sound synthesizer using an FD-based simulation of a two-dimensional membrane executed on GPUs. We demonstrate that it is possible to use this method to create a usable real-time audio synthesizer.   Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2397
Download:
 
Parallelization of Hough Transform for Circles using CUDA
Alptekin Temizel (Middle East Technical University)
Hough Transform (HT) is a well-known technique used for detection of parametric shapes in image processing. However, various optimizations are necessary in its implementation due to large memory and computational requirements. In this paper, we consi ...Read More
Hough Transform (HT) is a well-known technique used for detection of parametric shapes in image processing. However, various optimizations are necessary in its implementation due to large memory and computational requirements. In this paper, we consider the case of parallelization of Hough Transform for circles. A number of different implementation approaches of the algorithm is compared in CUDA. Results show that up to 360 times speed up could be achieved compared to its CPU version, enabling real time applications.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2438
Download:
 
Accelerating an Imaging Spectroscopy Algorithm Using GPUs
Matthew Sellitto (Northeastern University)
Graphics Processing Units (GPUs) have proven to be effective at accelerating a range of scientific applications. As data needs increase, and more complex data analysis methods are used, the processing requirements for solving scientific problems ...Read More

Graphics Processing Units (GPUs) have proven to be effective at accelerating a range of scientific applications. As data needs increase, and more complex data analysis methods are used, the processing requirements for solving scientific problems also increase. The parallel processing power of GPUs can be harnessed and used alongside multi-core CPUs to address this. As an example, many problems require solving optimization problems of multiple variables across large arrays of data. By utilizing modern optimization techniques and combining them with the computational throughput of a CPU-GPU computing platform, we can greatly decrease the processing time required to solve these problems.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2455
Download:
 
CUVILib - GPU Accelerated Vision & Imaging Library
Salman Ul Haq (TunaCode)
Image Processing algorithms are used in a variety of different domains, from surveillance to medicine to industry. CUVI (CUDA Vision and Imaging Library) provides GPU accelerated Vision and Imaging functionality with plug-and-play ease of use, simple ...Read More
Image Processing algorithms are used in a variety of different domains, from surveillance to medicine to industry. CUVI (CUDA Vision and Imaging Library) provides GPU accelerated Vision and Imaging functionality with plug-and-play ease of use, simple yet powerful interface and support for both NVIDIA and AMD GPUs. With over 1000 users of the Beta version, CUVI has fast grown into a mature solution of choice when it comes to delivering real-time performance for your Imaging/Vision applications and software-frameworks.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2462
Download:
 
Implementation of Raptor Code on GPU
Linjia Hu (Michigan Technological University)
Raptor Code comes as an improvement to LT-Code, which performs as close as possible to the Shannons channel limit and provides linear encoding and decoding time. It has been chosen for the forward error correction (FEC) scheme in 3GPP and DVB-H stan ...Read More
Raptor Code comes as an improvement to LT-Code, which performs as close as possible to the Shannons channel limit and provides linear encoding and decoding time. It has been chosen for the forward error correction (FEC) scheme in 3GPP and DVB-H standards. We implement Raptor Codes on GPU for the purpose of processing large block size and symbol size effectively and efficiently.Our GPU decoding achieve up to a 40x speedup over the sequential CPU decoding.   Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2473
Download:
 
Real-Time Wind Velocity Estimation from Aerosol Lidar Data using GPUs
Chris Mauzey (California State University, Chico)
The REAL is an atmospheric light detection and ranging (LIDAR) system. It produces near-horizontal and vertical cross-sectional images of the lower atmosphere. The images reveal the spatial distribution of atmospheric aerosol (particulate matter ...Read More

The REAL is an atmospheric light detection and ranging (LIDAR) system. It produces near-horizontal and vertical cross-sectional images of the lower atmosphere. The images reveal the spatial distribution of atmospheric aerosol (particulate matter). By applying motion estimation algorithms to image sequences, two-dimensional vector wind fields can be determined. We will explore the use of GPU computing in the real-time computation of wind vector fields.

  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2501
Download:
 
GPU Based Feature Extraction Implementation
Haofeng Kou (SCU)
In this poster, we introduce an efficient parallel implementation of Mel-frequency Cepstral Coefficient (MFCC)-based feature extraction and describe the optimizations required for effective throughput on many core Graphic Processing Units (GPU) proce ...Read More
In this poster, we introduce an efficient parallel implementation of Mel-frequency Cepstral Coefficient (MFCC)-based feature extraction and describe the optimizations required for effective throughput on many core Graphic Processing Units (GPU) processors. We demonstrate that the feature extraction process in automatic speech recognition is well suited for GPUs and a substantial reduction in computation time can be obtained by performing feature extraction on these platforms. Using a single NVIDIA GTX460 GPU our proposal approach is shown to be approximately 25x faster than a sequential CPU implementation, enabling feature extraction to be performed in real-time.  Back
 
Keywords:
Audio, Image and Video Processing, GTC 2012 - ID P2510
Download:
Augmented Reality & Virtual Reality
Presentation
Media
High Efficiency Near-Eye Light Field Display
Andrew Maimone (University of North Carolina at Chapel Hill)
We present a near-eye light field display design that supports accommodation and high spatial resolution while using with the same bandwidth as a conventional display. A light source array reflects light in multiple directions off a high speed binary ...Read More
We present a near-eye light field display design that supports accommodation and high spatial resolution while using with the same bandwidth as a conventional display. A light source array reflects light in multiple directions off a high speed binary display, creating a light field over the eye. The display bandwidth conventionally used for color gradations is instead used to create a high angular resolution binary light field; color gradations will be partially recovered when the light field is collected by the eye and focused on the retina.  Back
 
Keywords:
Augmented Reality & Virtual Reality, GTC 2015 - ID P5248
Download:
 
GPU Accelerated Cutting for Surgical Simulation Systems
Pourya Shirazian (University of Victoria)
One of the main objectives of virtual reality based surgical simulation systems is the removal of pathologic tissues. Cutting imposes many challenges in the development of a robust, interactive surgery simulation, not only because of the nonlinear ma ...Read More
One of the main objectives of virtual reality based surgical simulation systems is the removal of pathologic tissues. Cutting imposes many challenges in the development of a robust, interactive surgery simulation, not only because of the nonlinear material behavior exhibited by soft tissue but also due to the complexity of introducing the cutting-induced discontinuity. We propose a high performance cutting algorithm for complex tetrahedral meshes. As a proof of concept we integrated our algorithm in a craniotomy simulation.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computational Physics, GTC 2015 - ID P5254
Download:
 
Game-based Learning and Simulation System using Web Technologies and GPU
Ibrahim Demir (University of Iowa)
We developed a web-based 3D interactive learning environment for teaching hydrological concepts. The system provides a visually striking platform with realistic terrain information, and water simulation. Students can create scenarios, control paramet ...Read More
We developed a web-based 3D interactive learning environment for teaching hydrological concepts. The system provides a visually striking platform with realistic terrain information, and water simulation. Students can create scenarios, control parameters, and evaluate mitigation alternatives. The system utilizes web technologies and GPU for water simulation and object collisions on the terrain. The system supports virtual reality, augmented and immersive reality modes, and enables interaction using gesture, body movement and portable devices.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Education & Training, GTC 2015 - ID P5255
Download:
 
The Future of Human Vision: Preferential Augmentation Using GPUs
Muhammad Shamim (Baylor College of Medicine)
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more genera ...Read More
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, Medical Imaging, Video & Image Processing, GTC 2015 - ID S5182
Streaming:
Download:
 
Accelerating Computer Vision and Augmented Reality via GPGPU Computing
Jack Dashwood (Metaio)
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely comp ...Read More
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID S5626
Streaming:
Download:
 
VR Direct: How NVIDIA Technology Is Improving The VR Experience
Nathan Reed (NVIDIA), Dario L. Sancho Pradel (Crytek)
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this tal ...Read More
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this talk, we'll show how developers can use NVIDIA GPUs and VR Direct to improve the gaming experience on the Oculus Rift and other VR headsets.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Game Development, Real-Time Graphics, GTC 2015 - ID S5668
Streaming:
Download:
 
Augmented Reality with Google's Project Tango and NVIDIA Technology
Wil Braithwaite (NVIDIA)
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will de ...Read More
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will demonstrate showcases the NVIDIA® VCA cluster for cloud-based rendering, NVENC for low-latency video encoding, and Google's Project Tango with the Tegra K1 processor for pose tracking and video decoding. The demo system presented can also serve graphics to multiple low-latency devices, such as a Virtual Reality HMD, at a rate much faster than the graphics are rendered.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Media & Entertainment, Real-Time Graphics, GTC 2015 - ID S5733
Streaming:
 
VR Everywhere: Consumer Virtual Reality for Desktop, Mobile and Web
Tony Parisi (Third Eye)
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU a ...Read More
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU acceleration and cheap sensors has enabled low-cost consumer-grade VR, and the rapid adoption of software development kits is paving the way for creating virtual reality apps on platforms from desktops to smartphones, and even running in your web browser using WebGL. Join VR pioneer and WebGL developer Tony Parisi as he explores this exciting frontier. This session will take a look at the latest VR hardware devices, supported operating systems and software development kits, and a wide applications already being deployed.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Developer - Tools & Libraries, Real-Time Graphics, GTC 2015 - ID S5737
Streaming:
Download:
Automotive
Presentation
Media
Creating Mobile Apps for the Automotive Market
Kerry Johnson (QNX Software Systems)
The growing convergence of mobile handsets and automotive platforms is creating a new market opportunity for app developers. That said, many differences exist between the smartphone and the car, and understanding them is key to unlocking the pot ...Read More

The growing convergence of mobile handsets and automotive platforms is creating a new market opportunity for app developers. That said, many differences exist between the smartphone and the car, and understanding them is key to unlocking the potential of this new market. To give the app developer a jump-start, this session explores how a car infotainment system is structured, UX considerations for automotive applications, design principles for taking best advantage of SoCs like Tegra 3, and key differences between mobile and automotive platforms.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3223
Streaming:
Download:
 
Augmented Reality Head-up Display for Cars
Victor Ng-Thow-Hing (Honda Research Institute USA)
The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This s ...Read More

The challenge of introducing augmented reality to head-up displays for automobiles requires balancing between the visual, immersive richness this medium provides with the need for the driver to stay focused on the primary task of driving. This session explores how to solve these problems by combining design methodologies with technological research. Before field testing ideas in actual cars, high fidelity prototypes with driving simulators are utilized with an actual windshield head-up display to visualize the augmented graphics. UI Composer is leveraged with proprietary software to engage designers in the prototyping process.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Instrument Clusters & Heads-Up Display (HUD), Manufacturing Technical, GTC 2013 - ID S3230
Streaming:
Download:
 
High Performance Map Rendering for In-vehicle Navigation
Don Burns (NVIDIA)
This session will provide techniques for rendering 3D maps efficiently and at high frame rates, while still preserving quality. Topics to be discussed include tile cache management, tile fetch, tile rendering techniques, layer management, and 3D ...Read More

This session will provide techniques for rendering 3D maps efficiently and at high frame rates, while still preserving quality. Topics to be discussed include tile cache management, tile fetch, tile rendering techniques, layer management, and 3D object rendering.

  Back
 
Keywords:
Automotive, Navigation Systems, GTC 2013 - ID S3386
Streaming:
Download:
 
Optimizing Pedestrian Detection for Real-time Automotive Applications
Vladimir Glavtchev (NVIDIA)
This session will present a motion estimation approach to pedestrian and cyclist detection. Through analyzing motion across several frames, this technique accurately segments foreground objects from the background, including their positions and ...Read More

This session will present a motion estimation approach to pedestrian and cyclist detection. Through analyzing motion across several frames, this technique accurately segments foreground objects from the background, including their positions and velocities. Foreground objects are classified as pedestrians, bicyclists or motorcyclists, or other objects on the road surface. The entire process is optimized to minimize the computation resources needed for detection and classification. The optimizations make it possible to perform the entire process on a mobile grade GPU system with a modest host processor.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Computer Vision, GTC 2013 - ID S3396
Streaming:
Download:
 
Speech and Vision Processing for Immersive In-Vehicle Applications
Ian Lane (Carnegie Mellon University)
AIDAS, an Intelligent Driver Assistive System being developed at Carnegie Mellon University, enables the investigation of Immersive Interaction within vehicles. The AIDAS platform enables rich, speech-centric interaction with the driver. Interac ...Read More

AIDAS, an Intelligent Driver Assistive System being developed at Carnegie Mellon University, enables the investigation of Immersive Interaction within vehicles. The AIDAS platform enables rich, speech-centric interaction with the driver. Interactions are both context-aware, based on the location of the car and driver''s gaze direction, and are natural, akin to interacting with a human assistant. This session will introduce the core speech and vision components used within AIDAS and describe the approaches used to accelerate these technologies to realize a real-time interactive system.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3403
Streaming:
Download:
 
Automotive Advanced Driver Assistance Systems: Challenges & Opportuinities
Ian Riches (Strategy Analytics)
This session will examine the driving forces behind the adoption of Advanced Driver Assistance Systems (ADAS), one of the fastest growing application areas by car makers. Key battle grounds between new and existing suppliers will be examined, an ...Read More

This session will examine the driving forces behind the adoption of Advanced Driver Assistance Systems (ADAS), one of the fastest growing application areas by car makers. Key battle grounds between new and existing suppliers will be examined, and forecasts presented for key systems, semiconductors and sensors. Despite the high forecast growth, challenges remain to widespread adoption across the globe. These barriers will be explained, together with recommendations for what needs to be done to overcome them.

  Back
 
Keywords:
Automotive, Advanced Driver Assistance Systems (ADAS), Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3413
Streaming:
Download:
 
Overview of UI Composer Studio
Justin Ebert (NVIDIA)
UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. Developed by NVIDIA, it is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, m ...Read More

UI Composer Studio is the ground-breaking HMI design tool used for instrument clusters and infotainment systems. Developed by NVIDIA, it is used by automakers and Tier 1 automotive suppliers to rapidly develop proof of concepts for evaluation, market research, usability testing and ultimately final production.This session covers the basics of constructing an instrument cluster and IVI using Studio''s advanced authoring environment.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, Instrument Clusters & Heads-Up Display (HUD), GTC 2013 - ID S3419
Streaming:
Download:
 
Audi Urban Intelligent Assist: Taking Urban Mobility to the Next Level
Mario Tippelhofer (Audi)
The goal of the Audi Urban Intelligent Assist (AUIA) research initiative is to showcase different technologies and approaches to make the challenges of navigating the chaotic roadways of the world''s megacities less stressful, safer and ...Read More

The goal of the Audi Urban Intelligent Assist (AUIA) research initiative is to showcase different technologies and approaches to make the challenges of navigating the chaotic roadways of the world''s megacities less stressful, safer and more efficient a generation from now. This is mainly achieved through advancements in predictive technology, by harnessing the power of Big Data through algorithms, real time data, Human Machine Interfaces (HMI), advanced sensors and other innovative approaches. The AUIA project is the latest in a series of university collaborations that Audi has formed to explore the frontiers of automotive technologies and electronics.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3481
Streaming:
Download:
 
GPU Requirements for Automotive Infotainment Systems
Ron Szabo (Delphi Coroporation)
This session will cover the current and future requirements for GPUs in the automotive space for infotainment systems. Four areas contribute to the exponential growth of processing power required onboard: (1) traditional feature growth; (2) the ...Read More

This session will cover the current and future requirements for GPUs in the automotive space for infotainment systems. Four areas contribute to the exponential growth of processing power required onboard: (1) traditional feature growth; (2) the impact of mobile devices and brought in content; (3) the compounding effect of off-board services and cloud connectivity and; (4) development headroom to eventually eliminate optimization. Critical tradeoffs that Tier 1s and OEMs need to make will be discussed.

  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2013 - ID S3542
Streaming:
Download:
 
From Big Data to Thin Client: The GPU as an Experiential Enabler
Christopher Nelson (RTT USA, Inc.)
We will tour the life of data from PDM to POS (Point-Of-Sale). Some stops along the way will include: Design, Engineering and Perceived Quality. With an end result of high-end visualization, a focus on new hardware from NVIDIA will take the expe ...Read More

We will tour the life of data from PDM to POS (Point-Of-Sale). Some stops along the way will include: Design, Engineering and Perceived Quality. With an end result of high-end visualization, a focus on new hardware from NVIDIA will take the experience to uncharted territories.

  Back
 
Keywords:
Automotive, Cloud Visualization, SIGGRAPH 2013 - ID SIG1326
Streaming:
Download:
 
UI Composer for Automotive HMIs - Part 1: What, Why, and How
Gavin Kistner (NVIDIA), Stephen Mendoza (NVIDIA)
In depth view into content creation using UI Composer. Including the Digital asset pipeline, animation, materials, development of state-machines, and debugging. ...Read More
In depth view into content creation using UI Composer. Including the Digital asset pipeline, animation, materials, development of state-machines, and debugging.  Back
 
Keywords:
Automotive, Debugging Tools & Techniques, Digital Product Design & Styling, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4616
Streaming:
Download:
 
UI Composer for Automotive HMIs - Part 2: Building Content
Gavin Kistner (NVIDIA), Xavier Mendoza (NVIDIA)
A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this se ...Read More
A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this session, attendees are asked to bring their own Windows laptop with UI Composer installed. UI Composer is available for free from http://uicomposer.nvidia.com/  Back
 
Keywords:
Automotive, Debugging Tools & Techniques, Digital Product Design & Styling, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4806
Streaming:
Download:
 
Real-Time Electromagnetic Wave Propagation Using OptiX for Simulation of Car-to-Car-Communication
Manuel Schiller (Technische Universitat Munchen)
In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless C ...Read More
In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless Car-to-Car communication. Learn how ray tracing performance can be improved to archieve real-time simulations and how the ray tracing results are post-processed to perform the electromagnetic calculations on the GPU using the Thrust library.  Back
 
Keywords:
Automotive, Computational Physics, Rendering & Ray Tracing, GTC 2014 - ID S4359
Streaming:
Download:
 
Tegra K1 and the Automotive Industry
Gernot Ziegler (NVIDIA), Timo Stich (NVIDIA)
Discover how mobile GPUs enable modern features of car driving in a power-efficient and standardized way, by providing the fundamental building blocks of computer vision to the higher-level reasoning functions that enable the car to detect lanes, par ...Read More
Discover how mobile GPUs enable modern features of car driving in a power-efficient and standardized way, by providing the fundamental building blocks of computer vision to the higher-level reasoning functions that enable the car to detect lanes, park automatically, avoid obstacles, etc. We explain the challenges of having to fit into a given time budget, and how the low-level machine vision such as corner detection, feature tracking and even more advanced functionality such as 3D surrounding reconstruction is achieved in the context of the car's systems and its outside environment.  Back
 
Keywords:
Automotive, Computer Vision, Machine Learning & Deep Learning, Mobile Applications, GTC 2014 - ID S4412
Streaming:
Download:
 
Beyond Pedestrian Detection: Deep Neural Networks Level-Up Automotive Safety
Hideki Niihara (Denso IT Laboratory, Inc.), Ikuro Sato (Denso IT Laboratory, Inc.)
People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We hav ...Read More
People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We have a vision that future Advanced Driver Assistance Systems enable not just detecting pedestrians but recognizing how the pedestrians are and understanding the level of danger to avoid emergency situations. We claim deep Convolutional Neural Networks (CNN) are the right tools for these highly non-trivial tasks, and Tegra is the best partner. We demonstrate real-time deep CNN using Tegra.   Back
 
Keywords:
Automotive, Computer Vision, Machine Learning & Deep Learning, GTC 2014 - ID S4621
Streaming:
Download:
 
One Car Fits You: Technology and Opportunities in the Personalized Car
Ryan Middleton (Delphi)
Learn about two Delphi projects that are pushing the concept of a personalized in-vehicle experience. As drivers bring more of their personal content and personal style into the car, opportunities are emerging for car makers and platform providers to ...Read More
Learn about two Delphi projects that are pushing the concept of a personalized in-vehicle experience. As drivers bring more of their personal content and personal style into the car, opportunities are emerging for car makers and platform providers to differentiate their offerings. We will explore the infotainment architecture of the future - enabling feature upgrades at the same rate as mobile devices. We will also explore how GPU technology enables "months-to-minutes" user interfaces, and greater flexibility in end-user personalization.  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4659
Streaming:
 
NVIDIA Vision Toolkit for Advanced Driver Assistance Systems, Computational Photography and Beyond
Elif Albuz (NVIDIA), Frank Brill (NVIDIA)
In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision ap ...Read More
In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision applications. It leverages state-of-the-art Computer Vision research and offers a variety of functions to its developers,initially targeting Advanced Driver Assistance Systems (ADAS) and Augmented Reality (AR) applications. The toolkit will be highly GPU accelerated on mobile platforms, offering significant speedup and reducing engineering effort to design real-time vision applications. The toolkit includes open source samples and offers a flexible framework that enables users to extend and contribute new functionality. It will be deployed on different operating systems including Android and the Linux on ARM to registered developers and partners through NVIDIA's web site.  Back
 
Keywords:
Automotive, Computational Photography, Computer Vision, Mobile Summit, GTC 2014 - ID S4714
Streaming:
Download:
 
Today's LiDARs and GPUs Enable Ultra-Accurate GPS-Free Navigation with Affordable Simultaneous Localization and Mapping
Louay Eldada (Quanergy Systems, Inc.)
With recent advances in low-cost high-performance LiDARs (laser-based Light Detection and Ranging sensors) and GPUs, ultra-accurate GPS-free navigation based on SLAM (Simultaneous Localization and Mapping) is becoming a reality. Learn how the latest ...Read More
With recent advances in low-cost high-performance LiDARs (laser-based Light Detection and Ranging sensors) and GPUs, ultra-accurate GPS-free navigation based on SLAM (Simultaneous Localization and Mapping) is becoming a reality. Learn how the latest 360? field of view long-range 3D mapping LiDARs capable of generating data streams at gigasample-per-second (GSPS) sampling rates are used with 192 CUDA core GPUs based on the Kepler architecture to run artificial intelligence software and deliver advanced vehicular safety and navigation systems capable of real-time object detection, tracking, identification and classification, as well as offline full-availability jam-proof centimeter-accurate navigation.  Back
 
Keywords:
Automotive, Combined Simulation & Real-Time Visualization, In-Vehicle Infotainment (IVI) & Safety, Machine Learning & Deep Learning, GTC 2014 - ID S4761
Streaming:
Download:
 
Embedded Development For Tegra K1
Jesse Clayton (NVIDIA)
The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. ...Read More
The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. Jesse Clayton from NVIDIA will articulate the embedded development process for Tegra K1. The talk will cover the platform, programming paradigm, and development tools, and provide details on the Tegra K1 architecture relevant to embedded applications.   Back
 
Keywords:
Automotive, Defense, Computer Vision, Machine Learning & Deep Learning, GTC 2014 - ID S4938
Streaming:
 
Audi Piloted Parking on zFAS: Valet Parking for the 21st Century
Miklos Kiss (Audi Electronics Venture GmbH)
What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century. ...Read More
What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century.  Back
 
Keywords:
Automotive, Video & Image Processing, GTC 2014 - ID S4961
Streaming:
 
Object Detection: GPU-Friendly Soft Cascades
Alexander Smorkalov (Itseez)
Fast on-road object detection is an important ADAS feature (advanced driver assistance systems). We propose CUDA implementation of soft cascade detector that allows real-time object detection on Tegra K1 platform. Applicable for pedestrian and vehicl ...Read More
Fast on-road object detection is an important ADAS feature (advanced driver assistance systems). We propose CUDA implementation of soft cascade detector that allows real-time object detection on Tegra K1 platform. Applicable for pedestrian and vehicle detection.  Back
 
Keywords:
Automotive, GTC 2014 - ID P4289
Download:
 
Predicting ADAS Algorithms Performances on K1 Architecture
Romain Saussard (Renault)
Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remains a ...Read More
Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remains a problem for the car manufacturer. We propose a method to predict performance of computer vision algorithms on multiple, heterogeneous architectures in order to help choosing the best algorithm - architecture association. The approach is illustrated with a lane detection algorithm embedded on the K1.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, GTC 2015 - ID P5158
Download:
 
GPUService: GPU Acceleration of Robotic Services: Real Time 3D Point Cloud Processing
Leonardo Christino (Universidade de São Paulo)
GPU acceleration of robotic services focused on 3D point cloud processing of robotic depth sensors to approach real time for use in self-driving automobiles. ...Read More
GPU acceleration of robotic services focused on 3D point cloud processing of robotic depth sensors to approach real time for use in self-driving automobiles.  Back
 
Keywords:
Automotive, Embedded, GTC 2015 - ID P5192
Download:
 
Vision-Based Driver Assistance: Seeing the Way Forward
Ian Riches (Strategy Analytics)
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implication ...Read More
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5108
Streaming:
Download:
 
Through the Eyes of a Car: Visualizing a Car's Camera System
Gernot Ziegler (NVIDIA)
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even a ...Read More
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID S5123
Streaming:
Download:
 
Rapidly Prototyping Automotive User Experiences at Jaguar Land Rover
Matt Jones (Jaguar Land Rover)
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars. ...Read More
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars.  Back
 
Keywords:
Automotive, Embedded, Manufacturing, Real-Time Graphics, GTC 2015 - ID S5137
Streaming:
 
Next Generation Surround-View for Cars
Miguel Sainz (NVIDIA), Timo Stich (NVIDIA)
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final v ...Read More
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID S5295
Streaming:
Download:
 
Pimp My Ride: How to Mod Cars with Tegra
Dave Anderson (NVIDIA)
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted int ...Read More
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted into these cars as a proof-of-concept for next-generation digital clusters and infotainment systems.  Back
 
Keywords:
Automotive, Embedded, Video & Image Processing, GTC 2015 - ID S5396
Streaming:
 
Enabling Next-Gen Vehicle Architectures with Embedded Supercomputing
Uday Pitambare (Delphi)
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computin ...Read More
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computing to up-integrate traditionally disparate vehicle systems. We will also discuss the advantages and challenges involved in this process.  Back
 
Keywords:
Automotive, GTC 2015 - ID S5469
Streaming:
Download:
 
Safe and Seamless Integration of Tegra into the In-Vehicle Network
Stefaan Sonck Thiebaut (OpenSynergy)
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing ...Read More
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing and safety constraints required by the automotive industry. In addition, learn how the solution allows controlled communication between virtualized operating systems and the vehicle networks while maintaining the isolation between both.  Back
 
Keywords:
Automotive, GTC 2015 - ID S5532
Streaming:
Download:
 
Benchmarking Real-World In-Vehicle Applications
Michael Carstens-Behrens (mycable GmbH)
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world appli ...Read More
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world applications, such as infotainment systems, will find the bottlenecks in your system. Find them before the project fails or find options to transfer tasks to the GPU (e.g. using CUDA). Attendees will see how to transform your system architecture into a ""System Resource Model"" then find the ""Critical Use Cases"" of your application and match them with this model. This practical approach will show how to setup benchmarks in parallel to emulate use cases under reproducible conditions based on an example for an automotive infotainment system.  Back
 
Keywords:
Automotive, Embedded, Developer - Performance Optimization, GTC 2015 - ID S5587
Streaming:
Download:
 
Self-Driving Vehicles: Changing the Mission of Human-Machine Interface
Walter Sullivan (Elektrobit)
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can i ...Read More
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can it be selected? Is the idea of a work load manager still relevant? On the other hand, autonomous driving brings new challenges for the vigilance and distraction of the driver. How can the driver be pulled back into the loop when required? When is it required? How can drivers be informed about the limits of the machine? We will also discuss methods on how to "measure" HMI and driving performance in automation, such as steering wheel reversal rate, standard deviation lane position, speed keeping and more.  Back
 
Keywords:
Automotive, Augmented Reality & Virtual Reality, GTC 2015 - ID S5588
Streaming:
Download:
 
Gesture Recognition: Using a Multi Sensor Approach
Shalini Gupta (NVIDIA)
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. ...Read More
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, GTC 2015 - ID S5599
Streaming:
Download:
 
Robust Speech Recognition for Cars
Ian Lane (Carnegie Mellon University)
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, ...Read More
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, we are able to significantly improve the performance of speech recognition, even in challenging noise conditions.  Back
 
Keywords:
Automotive, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5633
Streaming:
 
ZFAS - The Brain of Piloted Driving at Audi
Matthias Rudolph (Audi AG)
During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computat ...Read More

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5637
Streaming:
 
The Fast Lane from Silicon Valley to Munich
Uwe Higgen (BMW Group)
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. ...Read More
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.  Back
 
Keywords:
Automotive, Embedded, Computer Vision & Machine Vision, GTC 2015 - ID S5789
Streaming:
Download:
 
Audi Piloted Driving: In the Fast Lane to the Future
Daniel Lipinski (Audi of America)
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car ...Read More
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5870
Streaming:
Download:
 
Ubiquitous Perceptive 3D Sensing for a Smart Internet of Things
Louay Eldada (Quanergy Systems, Inc.)
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart s ...Read More
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart sensors provide real-time information on billions of 'Things' and their surroundings (through 3D object detection, tracking, and classification) and, when needed, the ability to control them. The 'Things' include vehicles, infrastructure, buildings, homes, appliances, light controls, thermostats, medical devices, computers, and handheld devices. Growth of personal devices (phones, tablets, laptops, game consoles) is limited by the number of people in the world. The largest growth will come from connected devices in areas such as smart energy, home automation, and transportation.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID S5918
Streaming:
 
Electronics & APIs: The Aftermarket's new Bondo
John Waraniak (Specialty Equipment Market Association (SEMA)), John Ellis (Ellis & Associates)
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the v ...Read More
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the vehicle? Drawing heavily on the Vehicle Dynamics Program, The Specialty Equipment Market Association ("SEMA") has developed the Vehicle Electronics Program to ensure that the next generation of in-car electronics realizes its full potential. Learn about this new program including the new proposed federal motor vehicle standard, FMVSS 150. In addition, we'll cover the resources and opportunities available to developers for designing and customizing vehicles.  Back
 
Keywords:
Automotive, Product Design & Styling, GTC 2015 - ID S5545
Streaming:
Download:
 
Artificial Intelligence is Accelerating the Race to Self Driving Cars
Danny Shapiro (Sr. Director, Automotive, NVIDIA)
Overview of AI in self-driving cars ...Read More

Overview of AI in self-driving cars

  Back
 
Keywords:
Automotive, GTC Washington D.C. 2016 - ID DCS16149
Download:
 
The Future of Autonomous Vehicles in a Nation of Autos
Bruce Daley (Principal Analyst, Tractica)
Looking beyond the current work being done in the field, this presentation examines how autonomous vehicles are most likely to change the future. U.S. culture is built around the automobile. Songs are sung about it. Driving is an important part ...Read More

Looking beyond the current work being done in the field, this presentation examines how autonomous vehicles are most likely to change the future. U.S. culture is built around the automobile. Songs are sung about it. Driving is an important part of parenting. Getting a license is an significant rite of passage. How will customs change when cars drive themselves? What will be the demand for Consumer and Commercial vehicles in the years ahead? Will the role of Government change as a consequence? Is the human driven car fated to suffer the same fate as the horse? All these questions and more will be examined during the course of this presentation.

  Back
 
Keywords:
Automotive, GTC Washington D.C. 2016 - ID DCS16181
Download:
 
Anders Eugensson (Director Government Affairs, Volvo Cars)
Transportation is the backbone of modern society. However, with transportation comes a number of challenges. Congestion, lack of space, air pollution and traffic casualties all are global issues that have to be addressed. New technologies linked ...Read More

Transportation is the backbone of modern society. However, with transportation comes a number of challenges. Congestion, lack of space, air pollution and traffic casualties all are global issues that have to be addressed. New technologies linked to autonomous vehicles have the outlook of changing the future of mobility and will offer many opportunities. Urban citizens will be able to save time and being connected while being mobile and there are opportunities for saving fuel and reshaping cities at the same time as creating the no crash, and no casualties road transportation system. Volvo is, since 2014, working on its DriveMe project, a project preparing the launch of self-driving autonomous vehicles to be sold to customers in early 2020s.

  Back
 
Keywords:
Automotive, GTC Washington D.C. 2016 - ID DCS16177
Download:
 
How Data and AI will Transform the Nation's Roads
Chris Gerdes (Chief Innovation Officer, United States Department of Transport)
Automated vehicles offer an unparalleled opportunity to eliminate the 94% of vehicle crashes attributable to human choice or error and dramatically reduce the 35,092 fatalities that occur annually in the United States. Replacing human drivers wi ...Read More

Automated vehicles offer an unparalleled opportunity to eliminate the 94% of vehicle crashes attributable to human choice or error and dramatically reduce the 35,092 fatalities that occur annually in the United States. Replacing human drivers with automation, however, is no simple task and requires real-world testing to ensure that the automated vehicles can handle the range of conditions that human drivers navigate routinely. Furthermore, the public rightfully expects that such testing must itself be safe. To enable the safe testing and deployment of automated vehicles, the United States Department of Transportation recently released guidance for developers, including a 15 Point Safety Assessment that should be performed prior to testing. The guidance is not prescriptive and enables developers to take a variety of approaches to address areas such as the vehicle's operational design domain, fall-back behavior, human-machine interface and ethical considerations. Thus approaches that hard code specific behaviors and those that learn from data are both possible under the guidance.Data driven approaches in particular are appealing because of their ability to leverage the large amount of data that can be easily generated by an automated vehicle. But they raise other questions of performance guarantees in the event that a situation the vehicle encounters is different from those used in the training set. Such issues are not unique to automated vehicles but rather represent a broader issue at the center of AI and regulation, as highlighted in the recent report "Preparing for the Future of Artificial Intelligence" by the National Science and Technology Council. This talk discusses some of the benefits and challenges of data-driven approaches and how some level of data sharing across the automated vehicle ecosystem can advance development, public acceptance and safety. The talk concludes with a look at the government's role in this rapidly developing area and the opportunities for developers to weigh in on the guidance both now and as it develops in the future.

  Back
 
Keywords:
Automotive, GTC Washington D.C. 2016 - ID DCS16175
Download:
 
Elif Albuz (Vision Software Manager, NVIDIA)
We'll introduce NVIDIA VisionWorks™ toolkit, a software development package for computer vision (CV) and image processing. VisionWorks originated with Khronos OpenVX standard and extended beyond. VisionWorks Library is optimized for CU ...Read More

We'll introduce NVIDIA VisionWorks™ toolkit, a software development package for computer vision (CV) and image processing. VisionWorks originated with Khronos OpenVX standard and extended beyond. VisionWorks Library is optimized for CUDA-capable GPUs and SOCs enabling computer vision applications on a scalable and flexible platform. VisionWorks implements a thread-safe API and framework for seamlessly adding user defined primitives. The talk will give an overview of the VisionWorks toolkit, its API and framework, and computer vision pipeline samples exercising its API.

  Back
 
Keywords:
Automotive, Robotics & Autonomous Machines, GTC Washington D.C. 2016 - ID DCS16156
Download:
Best of GTC Talks
Presentation
Media
Advanced Rendering Solutions from NVIDIA
Phillip Miller (NVIDIA)
Learn about the latest breakthroughs and offerings in NVIDIAs Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will ...Read More

Learn about the latest breakthroughs and offerings in NVIDIAs Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will be explored and demonstrated, along with what''s possible with the latest in NVIDIA OptiX for accelerating custom ray tracing development. Industry trends and production examples will also be explored as advanced in both interactive and production rendering possibilities continue to revolutionize workflows.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4111
Streaming:
Download:
 
How V-Ray RT and GPU Rendering are Defining a New Filmmaking Paradigm
Chris Nichols (Chaos Group), Kevin Margo (Blur Studio)
Blur Studio''s CG and VFX Supervisor Kevin Margo and Chaos Group''s Creative Director Christopher Nichols will discuss how they collaborated with NVIDIA in the production of Margo''s short CONSTRUCT. Using GPU accelerated ...Read More

Blur Studio''s CG and VFX Supervisor Kevin Margo and Chaos Group''s Creative Director Christopher Nichols will discuss how they collaborated with NVIDIA in the production of Margo''s short CONSTRUCT. Using GPU accelerated V-Ray-RT, along with the latest hardware from NVIDIA, they were able to hyper accelerate rendering allowing Margo to be able to focus on the creative process without being slowed down by the technology.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4112
Streaming:
Download:
 
See the Big Picture: Scalable Visualization Solutions for High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displa ...Read More

Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4113
Streaming:
Download:
 
Practical Real-Time Voxel-Based Global Illumination for Current GPUs
Alexey Panteleev (NVIDIA)
This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin''s research, a library has be ...Read More

This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin''s research, a library has been developed that allows applications to render GI effects for large and fully dynamic scenes at 30 frames per second or more, producing soft diffuse indirect lighting and blurry specular reflections, and providing emissive material support. During the session, Alexey will talk about the cone tracing GI algorithm in general and get into the details of scene representation, efficient multi-resolution voxelization, and indirect light gathering.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4114
Streaming:
Download:
 
Sharing Physically Based Materials between Renderers with MDL
Jan Jordan (NVIDIA), Lutz Kettner (NVIDIA)
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-b ...Read More

The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based definitions can be defined while developers will learn what''s entailed in supporting MDL within their own product/renderer.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4115
Streaming:
Download:
 
Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's Latest Developer Tools Suite
Sebastien Domine (NVIDIA)
The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this technical presentation spans from advanced graphics to compute and ...Read More

The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this technical presentation spans from advanced graphics to compute and multi-core CPU tools to enable developers to fully take advantage of the heterogeneous computing horsepower available. More specifically, compute developers will learn about the tools available to program CUDA on Tegra K1. Graphics developers will be introduced to the new Tegra Graphics Debugger for Tegra K1. This new mobile graphics development tool supports all the advanced features that Tegra K1 has to offer, via OpenGL ES 2.0, 3.0 and OpenGL 4.3. Finally, game developers will see how to manage their Android build configuration and debugging sessions all within the latest Visual Studio 2013, profile their application to identify hot spots and corresponding call stacks with our brand new release of Tegra System Profiler.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4116
Streaming:
Download:
 
OpenGL Scene Rendering Techniques
Christoph Kubisch (NVIDIA)
OpenGL provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermor ...Read More

OpenGL provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4117
Streaming:
Download:
 
OpenGL Update for NVIDIA GPUs
Piers Daniell (NVIDIA), Mark Kilgard (NVIDIA)
Attend this session to get the most out of OpenGL on NVIDIA Quadro, GeForce, and Tegra GPUs. NVIDIAs OpenGL experts explain how the OpenGL standard is evolving and NVIDIAs latest support. See examples of the latest features for compute, tessella ...Read More

Attend this session to get the most out of OpenGL on NVIDIA Quadro, GeForce, and Tegra GPUs. NVIDIAs OpenGL experts explain how the OpenGL standard is evolving and NVIDIAs latest support. See examples of the latest features for compute, tessellation, vector graphics, and modern high-performance usage including AZDO (approximately zero driver overhead) techniques. Learn how your application can benefit from NVIDIA''s leadership driving OpenGL as a cross-platform, open industry standard.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4121
Streaming:
Download:
 
Image and Vision Processing on Tegra
Elif Albuz (NVIDIA)
Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fa ...Read More

Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fast processing of these algorithms enable new paradigms in embedded and mobile applications. Tegra K1 is built to address data parallel embedded and mobile applications, with CUDA enabled GPU, Image Signal processing Engine, NEON enabled quad-core ARM and encode and decode accelerator hardware. Tegra software libraries wrap all this capability and provide to the use of developers. In this session, an overview of software libraries and architecture that are relevant for image and vision computing on Tegra platforms will be presented.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4122
Streaming:
Download:
 
NVIDIA FlameWorks - Real-time Volumetric Fire and Smoke Simulation
Simon Green (NVIDIA)
Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big difference ...Read More

Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big differences between real-time and offline visual effects. In this talk we will show how volumetric effects are now practical on current GPU hardware. We will describe several new simulation and rendering techniques, including new solvers, combustion models, optimized ray marching and shadows, which together can make volumetric effects a practical alternative to particle-based methods for game effects.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4123
Streaming:
Download:
 
NVIDIA OptiX for High Performance Ray Tracing
David McAllister (NVIDIA), Damien Fagnou (MPC)
This session will cover everything developers need to get started with ray tracing in OptiX, including OptiX C and C++ APIs, the execution model, acceleration structures, programmable entry points, and best practices. We will also cover exciting ...Read More

This session will cover everything developers need to get started with ray tracing in OptiX, including OptiX C and C++ APIs, the execution model, acceleration structures, programmable entry points, and best practices. We will also cover exciting customer use cases and the new OptiX Prime API that provides to-the-metal ray tracing without shading or recursion.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4118
Streaming:
Download:
 
Delivering High-Performance Remote Graphics with NVIDIA GRID Virtual GPU
Andy Currid (NVIDIA)
Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the ...Read More

Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the GPU between multiple virtual machines, a walkthrough of Virtual GPU setup on Citrix XenServer with remote graphics, and examples of how to tune the configuration for optimum remote graphics performance.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4119
Streaming:
Download:
 
Cloud Architectures and Game Streaming with NVIDIA GRID Technologies
Eric Young (NVIDIA), Samuel Gateau (NVIDIA)
This session will cover the technologies behind NVIDIA GRID and game streaming in the cloud. We will present NVIDIA GRID technologies and the software components of the GRID SDK used for capturing graphics and using the hardware compression engi ...Read More

This session will cover the technologies behind NVIDIA GRID and game streaming in the cloud. We will present NVIDIA GRID technologies and the software components of the GRID SDK used for capturing graphics and using the hardware compression engine enabling developers to deliver the ultimate low latency cloud gaming experience. The second part will review our set of optimization guidelines to enable efficient game streaming from the cloud for improvements in performance and enhancements in the gameplay experience. We will also present research in cloud exclusive techniques that enable the use of of Global Illumination, Multiple-Viewport Rendering, and Hybrid and Cloud rendering for advanced game engines.

  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2014 - ID SIG4120
Streaming:
Download:
 
Rendering Faster and Better with VRWorks
Cem Cebenoyan (NVIDIA)
This talk will introduce developers to NVIDIA VRWorks?, an SDK for VR game, engine, and headset developers that cut latency and accelerate stereo rendering performance on NVIDIA GPUs. We'll explain the features of this SDK, including VR SLI?, multi- ...Read More
This talk will introduce developers to NVIDIA VRWorks?, an SDK for VR game, engine, and headset developers that cut latency and accelerate stereo rendering performance on NVIDIA GPUs. We'll explain the features of this SDK, including VR SLI?, multi-resolution shading, context priorities, and direct mode. We'll discuss the motivation for these features, how they work, and how developers can use VRWorks in their renderers to improve the VR experience on Oculus Rift, HTC Vive, and other VR headsets.   Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1601
Streaming:
Download:
 
Best Practices in GPU-Based Video Processing
Thomas TRUE (NVIDIA)
We'll explore best practices and techniques for the development of efficient GPU-based video and image- processing applications. Topics to be discussed include threading models for efficient parallelism, CPU affinity to optimize system memory and GP ...Read More
We'll explore best practices and techniques for the development of efficient GPU-based video and image- processing applications. Topics to be discussed include threading models for efficient parallelism, CPU affinity to optimize system memory and GPU locality, image segmentation for overlapped asynchronous transfers, optimal memory usage strategies to reduce expensive data movement, and image format considerations to reduce and eliminate data conversions. Single and multi-GPU systems for uncompressed real time 4K video capture, processing, display, and play-out will be considered. Takeaways should prove applicable to developers of video broadcast and digital post production systems, as well as to developers of large scale visualization systems that require video ingest.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1602
Streaming:
Download:
 
See the Big Picture: How to Build Large Display Walls Using NVIDIA DesignWorks APIs and Tools
Doug Traill (NVIDIA)
The need to drive multiple displays, be it for digital signage, a corporate conference room, or even an immersive VR room, is becoming more common. We'll provide an overview of the display management tools and APIs that are part of NVIDIA's DesignW ...Read More
The need to drive multiple displays, be it for digital signage, a corporate conference room, or even an immersive VR room, is becoming more common. We'll provide an overview of the display management tools and APIs that are part of NVIDIA's DesignWorks? SDK. Attendees will learn about NVIDIA? Mosaic, display setup and management using NVAPI + NVWMI, synchronization methods, and warp and blend APIs  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1603
Streaming:
Download:
 
Textures: Achieving an Infinite Resolution Image
Alexander Reshetov (NVIDIA)
We propose a new texture sampling approach that preserves crisp silhouette edges when magnifying during close-up viewing, and benefits from image pre-filtering when minifying for viewing at farther distances. During a pre-processing step, we extract ...Read More
We propose a new texture sampling approach that preserves crisp silhouette edges when magnifying during close-up viewing, and benefits from image pre-filtering when minifying for viewing at farther distances. During a pre-processing step, we extract curved silhouette edges from the underlying images. These edges are used to adjust the texture coordinates of the requested samples during magnification. The original image is then sampled?only once?with the modified coordinates. The new technique provides a resolution-independent image representation capable of billions of texels per second on a mid-range graphics card.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1604
Streaming:
Download:
 
Reflectance Capture by Parametric Texture Synthesis
Jaako Lehtinen (NVIDIA)
We?ve developed an algorithm that?s able to capture a spatially -varying reflectance model suitable for real-time rendering (normal map, diffuse map, gloss maps) from a single cell phone photo. We build on a statistical descriptor of natural textures ...Read More
We?ve developed an algorithm that?s able to capture a spatially -varying reflectance model suitable for real-time rendering (normal map, diffuse map, gloss maps) from a single cell phone photo. We build on a statistical descriptor of natural textures based on deep convolutional neural networks, and combine it with a renderer by a non-linear optimizser that ?fuzzily? searches for the maps that give the best reproduction when fed to a shader. This is joint work with Aalto University in Helsinki, and will be published in the SIGGRAPH 2016 technical papers program.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1605
Streaming:
 
Using MDL to Share Physically Based Materials
Lutz Kettner (NVIDIA), Jan Jordan (NVIDIA)
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. Users will learn how physically -based defini ...Read More
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. Users will learn how physically -based definitions can be defined while developers will learn what?s entailed in supporting MDL within their own product/renderer.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1606
Streaming:
Download:
 
Advances in NVIDIA?s OptiX
Steven Parker (NVIDIA)
Learn about the NVIDIA OptiX? ray tracing engine, a sophisticated library for performing GPU ray tracing. We'll provide an overview of the OptiX ray tracing pipeline and the programmable components that allow for the implementation of many algorithm ...Read More
Learn about the NVIDIA OptiX? ray tracing engine, a sophisticated library for performing GPU ray tracing. We'll provide an overview of the OptiX ray tracing pipeline and the programmable components that allow for the implementation of many algorithms and applications. OptiX can be used in many domains, ranging from rendering to acoustic modeling to scientific visualization. Several case studies will be presented describing the benefits of integrating this solution into third-party applications.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1607
Streaming:
 
NVIDIA OpenGL in 2016
Mark Kilgard (NVIDIA), Jeffrey Kiel (NVIDIA)
Attend this session to get the most out of OpenGL on NVIDIA Quadro, GeForce, and Tegra GPUs. Hear straight from an OpenGL expert at NVIDIA how the OpenGL standard continues to evolve with NVIDIA's support. See examples of the latest features for vir ...Read More
Attend this session to get the most out of OpenGL on NVIDIA Quadro, GeForce, and Tegra GPUs. Hear straight from an OpenGL expert at NVIDIA how the OpenGL standard continues to evolve with NVIDIA's support. See examples of the latest features for virtual reality, vector graphics, interoperability with Vulkan, and modern high-performance usage--including the latest features of NVIDIA's Pascal GPU generation. Learn how your application can benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1609
Streaming:
Download:
 
How to Render AEC Interiors for 2D and VR in Minutes
Pascal Gautron (NVIDIA)
When full photorealism is simply not fast enough, Iray Interactive renders images of CAD-grade models in a matter of seconds or minutes with ray-tracing quality. A set of render modes adapted to numerous use cases will be demonstrated, such as intera ...Read More
When full photorealism is simply not fast enough, Iray Interactive renders images of CAD-grade models in a matter of seconds or minutes with ray-tracing quality. A set of render modes adapted to numerous use cases will be demonstrated, such as interactive design and interior layout. We?ll explore different Iray Interactive implementations, such as in SOLIDWORKS Visualize for instant preview and material tuning, or 3DVIA HomeByMe enabling full detailed interior design to be rendered within minutes on AWS cloud. A beta version of Iray VR for Iray Interactive will also be showcased, illustrating the potential of highly reduced render times for creating enriched Iray- quality VR experiences.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1610
Streaming:
Download:
 
Programming for High Dynamic Range Rendering and Display on NVIDIA GPUs
Thomas TRUE (NVIDIA)
We?ll provide an introduction to High Dynamic Range (HDR) and describe application programming techniques for HDR rendering and display on NVIDIA GPUs. Concepts to be discussed include color spaces, expanding chromaticity versus luminance, and scene ...Read More
We?ll provide an introduction to High Dynamic Range (HDR) and describe application programming techniques for HDR rendering and display on NVIDIA GPUs. Concepts to be discussed include color spaces, expanding chromaticity versus luminance, and scene and display referred imaging. For application developers, takeaways will include methods to query and set GPU and display capabilities for HDR as well as OpenGL and DirectX programming to render and display HDR imagery.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1611
Streaming:
Download:
 
Vulkan and the Khronos API Ecosystem
Neil Trevett (NVIDIA)
Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding graphics and compute applications. This session includes the very late ...Read More
Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding graphics and compute applications. This session includes the very latest roadmap and ecosystem updates for the newly announced Vulkan and SPIR-V, with details about NVIDIA's Vulkan rollout across its product range.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1612
Streaming:
 
Vulkan and NVIDIA: A Deep Dive
Tristan Lorach (NVIDIA), Jeffrey Kiel (NVIDIA)
NVIDIA is bringing the power of Vulkan to a range of platforms to extend the choice of APIs for developers. This rapid-fire session will cover the essentials of NVIDIA's Vulkan rollout across its product range ? with insights to help you judge wheth ...Read More
NVIDIA is bringing the power of Vulkan to a range of platforms to extend the choice of APIs for developers. This rapid-fire session will cover the essentials of NVIDIA's Vulkan rollout across its product range ? with insights to help you judge whether Vulkan is right for your next development project. You will also get a sneak peak at what?s in store for Vulkan support in Nsight Visual Studio Edition!  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1613
Streaming:
Download:
 
VR: You Are Here
David Luebke (NVIDIA)
In this ?state of the union? survey, we will review the technology, the components, and the challenges of virtual reality. We?ll describe how GPUs fit into these challenges, and lay out NVIDIA Research?s vision for the future of VR ...Read More
In this ?state of the union? survey, we will review the technology, the components, and the challenges of virtual reality. We?ll describe how GPUs fit into these challenges, and lay out NVIDIA Research?s vision for the future of VR  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1614
Streaming:
 
Overcoming Challenges for Virtual and Augmented Reality Display
David Luebke (NVIDIA)
We'll describe work by NVIDIA Research and our partners on challenges common to all wearable VR and AR displays:(1) FOCUS: how to put a display as close to the eye as a pair of eyeglasses, where we cannot bring it into focus? (2) FIELD OF VIEW: how ...Read More
We'll describe work by NVIDIA Research and our partners on challenges common to all wearable VR and AR displays:(1) FOCUS: how to put a display as close to the eye as a pair of eyeglasses, where we cannot bring it into focus? (2) FIELD OF VIEW: how to fill the user's entire vision with displayed content (3) RESOLUTION: how to fill that wide field of view with enough pixels. A "brute force" display would require 10,000 x 8,000 pixels per eye! (4) BULK: displays should be vanishingly unobtrusive, as light and forgettable as a pair of sunglasses, but the laws of optics dictate that most VR displays today are bulky boxes bigger than ski goggles. I will describe several "computational display" prototypes which sidestep these challenges by co-designing the optics, display, and rendering algorithm.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1615
Streaming:
 
Next-Gen Material Edition with Substance Designer Native MDL Visual UI and NVIDIA Iray?
Sebastien Deguy (Allegorithmic), Jerome Derel (Allergorithmic)
Allegorithmic has been integrating Iray render engine to combine its expertise of procedural texture rendering (aka substances) with multi-layered MDL physically based materials and Iray, GPU-accelerated unbiased raytracer. With latest Substance Desi ...Read More
Allegorithmic has been integrating Iray render engine to combine its expertise of procedural texture rendering (aka substances) with multi-layered MDL physically based materials and Iray, GPU-accelerated unbiased raytracer. With latest Substance Designer release, you are now able to natively create your MDL and substance materials from scratch through the node-tree editor. Materials can then be exported to your predilection MDL capable 3D software (Iray plugins for Maya, 3ds Max, Rhino), enabling infinite capabilities for material rendering. The MDL editor addition to Allegorithmic Substance Designer will help solving artists and developers PBR material challenges from creation and edition, to final frame rendering for artistic shots. During the session, some actual industrial use cases will be showcased, including some of the work achieved with Hyundai Genesis G380 interior and exterior design.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1616
Streaming:
 
Advanced Rendering Solutions from NVIDIA
Phil Miller (NVIDIA)
Come learn of NVIDIA?s latest rendering technologies powering the most popular 3D tools in the entertainment and design markets. The underlying offerings will be explained for those looking to add GPU acceleration and/or rendering to their own soluti ...Read More
Come learn of NVIDIA?s latest rendering technologies powering the most popular 3D tools in the entertainment and design markets. The underlying offerings will be explained for those looking to add GPU acceleration and/or rendering to their own solutions, along with what cutting edge solutions are accessible to 3D artists and designers.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1617
Streaming:
 
Machine Learning and Making of Things
Mike Haley (Autodesk)
We live in a world where everything around us is designed by someone. The pace of innovation is escalating and with new methods of manufacturing, such as 3D printing,the demands placed on designers and design technology are increasing. What if there ...Read More
We live in a world where everything around us is designed by someone. The pace of innovation is escalating and with new methods of manufacturing, such as 3D printing,the demands placed on designers and design technology are increasing. What if there was a better way to organize all of this information and allow ideas and creations to emerge more organically? We will explore how the design software of the future will help designers rise to the challenge through the application of machine learning to 3D data. We introduce a geometric shape analysis and machine learning technology we call the Design Graph. By learning from millions of 3D models and then assembling a knowledge graph it is able to react to a constantly evolving world, guiding the designs of the future.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1663
Streaming:
Download:
 
Rendering Sparse Volumes with NVIDIA? GVDB in DesignWorks
Rama Hoetzlein (NVIDIA)
We introduce GVDB Sparse Volumes as a new offering with NVIDIA DesignWorks to focus on high quality raytracing of sparse volumetric data for motion pictures. Based on the VDB topology of Museth, with a novel GPU-based data structure and API, GVDB is ...Read More
We introduce GVDB Sparse Volumes as a new offering with NVIDIA DesignWorks to focus on high quality raytracing of sparse volumetric data for motion pictures. Based on the VDB topology of Museth, with a novel GPU-based data structure and API, GVDB is designed for efficient compute and raytracing on a sparse hierarchy of grids. Raytracing on the GPU is accelerated with indexed memory pooling, 3D texture atlas storage and a new hierarchical traversal algorithm. GVDB integrates with NVIDIA OptiX, and is developed as an open source library as a part of DesignWorks.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1664
Streaming:
Download:
 
Massive Time-lapse Point Cloud Rendering in Virtual Reality
Markus Schuetz (NVIDIA)
We?ll present a system that allows us to render and play through time-slices of large point clouds scans while meeting the high performance and quality requirements of virtual reality systems. Our viewer is capable of rendering our currently availabl ...Read More
We?ll present a system that allows us to render and play through time-slices of large point clouds scans while meeting the high performance and quality requirements of virtual reality systems. Our viewer is capable of rendering our currently available times-slices of a building site: 200 time-slices captured daily by drones using photogrammetry and consisting of roughly 40 million points each, as well as 10 high-density laser-scans with roughly 800 million points each. The viewer is also built to handle additional and larger scans that will be produced by the ongoing scan operations in the future. We will discuss the challenges of rendering point clouds and the methods used to meet the increased performance and quality requirements of VR.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1665
Streaming:
Download:
 
Mars 2030
Julian Reyes (Fusion Media Network), David Flamburis (Consultant), Justin Sonnekalb (Consultant)
Mars 2030 is an interactive virtual reality project that offers a breathtaking look into the life of an astronaut hard at work studying and exploring the Martian landscape. Produced in conjunction with NASA and Fusion Media Network (a joint venture b ...Read More
Mars 2030 is an interactive virtual reality project that offers a breathtaking look into the life of an astronaut hard at work studying and exploring the Martian landscape. Produced in conjunction with NASA and Fusion Media Network (a joint venture between ABC and Disney), Mars 2030 aims to be the most photo realistic and scientifically accurate depiction of the Red Planet to date. We'll expound on the project's scope and technical capacities, in addition to showcasing a full VR demo of the game itself. Those in attendance will be among the first to glimpse the results of this exciting and wholly unprecedented multimedia collaboration.   Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1669
Streaming:
 
Rendering Highly Specular Materials
Anton Kaplanyan (NVIDIA)
High-frequency illumination effects, such as sparkling and highly glossy highlights on curved surfaces, are challenging to render in a stable manner. In this talk, we will discuss two methods for rendering glints and filtering highly glossy highlight ...Read More
High-frequency illumination effects, such as sparkling and highly glossy highlights on curved surfaces, are challenging to render in a stable manner. In this talk, we will discuss two methods for rendering glints and filtering highly glossy highlights. We provide practical solutions applicable for real-time rendering. Our real-time methods are GPU-friendly, temporally stable, and compatible with deferred shading, normal maps, as well as with filtering methods for normal maps.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1666
Streaming:
Download:
 
Bringing Pascal to Professionals
Allen Bourgoyne (NVIDIA)
Designs are becoming more complex. Media is becoming richer with higher fidelity, combining greater resolutions and complex visual effects. Scientific visualization and compute problems are larger than ever. VR is changing all facets of entertainment ...Read More
Designs are becoming more complex. Media is becoming richer with higher fidelity, combining greater resolutions and complex visual effects. Scientific visualization and compute problems are larger than ever. VR is changing all facets of entertainment, design, engineering, architecture, and medicine. Customers want to experience ideas, validate designs, rehearse procedures, and visualize problems interacting with them naturally and at scale.  Back
 
Keywords:
Best of GTC Talks, SIGGRAPH 2016 - ID SIG1667
Streaming:
Best of GTC Theater
Presentation
Media
Visualization Applications on NVIDIA DGX-1
Charlie Boyle (NVIDIA)
An introduction to the NVIDIA DGX-1 System including discussion of containerized applications for professional visualization & deep learning. ...Read More
An introduction to the NVIDIA DGX-1 System including discussion of containerized applications for professional visualization & deep learning.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1618
Streaming:
 
VR--Not Just for Games!
Simon Jones (Epic), Solomon Rogers (Rewind.io)
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1620
Streaming:
 
Exclusively Using Nvidia GPUs and Redshift 3D to Deliver the Next Wave of Original Content
Yurie Rocha (Guru Studios)
GPU rendering of final frames is beginning to make its way into mainstream production. Yurie will discuss the immediate benefits to building a GPU exclusive pipeline, and how Guru Studio is choosing NVidia's technology to eventually achieve near rea ...Read More
GPU rendering of final frames is beginning to make its way into mainstream production. Yurie will discuss the immediate benefits to building a GPU exclusive pipeline, and how Guru Studio is choosing NVidia's technology to eventually achieve near real-time ray tracing. With the appetite for original content at an all-time high, Guru is re-imagining how it will meet the demands of broadcasters and push the quality of their work.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1621
Streaming:
 
Independence Day: Resurgence--Killer Queen
Matt Aitken (Weta Digital)
We'll discuss Weta Digital's creation of the Alien Queen in this year's Independence Day: Resurgence. We will focus on the big showdown with the Queen at Area 51. He will also cover some of the unique fx simulation work, new innovations with their ...Read More
We'll discuss Weta Digital's creation of the Alien Queen in this year's Independence Day: Resurgence. We will focus on the big showdown with the Queen at Area 51. He will also cover some of the unique fx simulation work, new innovations with their skydome lighting setup as well as some of the techniques that allowed Weta to move to a large number of all CG shots.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1622
Streaming:
 
Building VR Funhouse with UE4
Victoria Rege (NVIDIA)
VR Funhouse is midway carnival game built to bring a new level of immersion to VR by enhancing what you see, hear and touch through a combination of great graphics, fully interactive audio and simulated physics. Built in UE4, it incorporates several ...Read More
VR Funhouse is midway carnival game built to bring a new level of immersion to VR by enhancing what you see, hear and touch through a combination of great graphics, fully interactive audio and simulated physics. Built in UE4, it incorporates several graphics technologies to simulate realistic hair, destruction, fire and more. Launched in July 2016, Lightspeed Studios open sourced blueprints and assets so that VR artists and content creators can take advantage of cutting edge VR development. This talk will walk through the game producer's perspective and share how to build your own carnival game.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1623
Streaming:
Download:
 
VR Multi GPU Acceleration Featuring Autodesk VRED
Paul Schmucker (Autodesk), Tobias France (Hyundai)
Hyundai Design Research & Autodesk VR Team presents a virtual design review of the Hyundai N Vision 2025 Gran Turismo. Tobias France, Hyundai Designer, and Paul Schmucker, Autodesk Automotive SME, will demonstrate their multi-user car design revi ...Read More
Hyundai Design Research & Autodesk VR Team presents a virtual design review of the Hyundai N Vision 2025 Gran Turismo. Tobias France, Hyundai Designer, and Paul Schmucker, Autodesk Automotive SME, will demonstrate their multi-user car design review utilizing the HTC VIVE and Autodesk VRED Pro software; powered by NVIDIA Quadro Graphics.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1624
Streaming:
 
Vulkan on NVIDIA: The Essentials
Tristan Lorach (NVIDIA)
NVIDIA is bringing the power of Vulkan to a range of platforms to extend the choice of APIs for developers. This rapid-fire session will cover the essentials of NVIDIA's Vulkan rollout across its product range ? with insights to help you judge wheth ...Read More
NVIDIA is bringing the power of Vulkan to a range of platforms to extend the choice of APIs for developers. This rapid-fire session will cover the essentials of NVIDIA's Vulkan rollout across its product range ? with insights to help you judge whether Vulkan is right for your next development project.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1625
Streaming:
Download:
 
NVIDIA Mental Ray and Iray? Plug-ins: New Rendering Solutions
Phil Miller (NVIDIA)
Come learn of NVIDIA?s latest rendering product offerings for use in Maya, 3ds Max, Cinema4D and Rhino. Topics will include: GPU production rendering, lighting simulation, VR production, cluster rendering and options for outfitting a studio. ...Read More
Come learn of NVIDIA?s latest rendering product offerings for use in Maya, 3ds Max, Cinema4D and Rhino. Topics will include: GPU production rendering, lighting simulation, VR production, cluster rendering and options for outfitting a studio.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1626
Streaming:
Download:
 
Video Processing and Deep Learning and the Importance of the GPU
Juan Carlos Riveiro (Vilynx)
Deep Learning and Machine Learning have enabled many new applications in image processing over the last few years. However, these technologies have not yet been widely used for practical video applications due to the heavy processing requirements and ...Read More
Deep Learning and Machine Learning have enabled many new applications in image processing over the last few years. However, these technologies have not yet been widely used for practical video applications due to the heavy processing requirements and advanced capabilities needed. Vilynx has developed an advanced video solution that overcomes the hurdles to leverage ML/DL technologies to provide next generation products for top publishers and the YouTube market. Through the use of audience data, social networks data, video contextual data and video processing algorithms; Vilynx is able to a) automatically detect the best moments of any video, b) auto tag the clips and c) relate these tags with traffic and audience analytics, so that they match what people are looking for and saying about a particular topic. This allows content creators to effortlessly connect their videos with the right audience and amplify their message. During the tech talk we will cover how the Vilynx stack combines Machine Learning, Video Processing and Deep Learning to enable a new video discovery and video sharing experience while showcasing the importance of GPU computing to optimize the deployment of the technology.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1627
Streaming:
Download:
 
Look Development in Real Time
Jean-Daniel Nahmias (Pixar), David Pesare (Pixar)
Pixar's next-generation look development tool, Flow, allows artists to quickly develop and visualize complex shader networks in order to create rich and compelling materials for film assets. Flow interactively displays images using RTP, our real tim ...Read More
Pixar's next-generation look development tool, Flow, allows artists to quickly develop and visualize complex shader networks in order to create rich and compelling materials for film assets. Flow interactively displays images using RTP, our real time GPU ray tracer built on top of NVIDIA's OptiX toolkit and supporting our Universal Scene Description (USD). This enables us to match Pixar's RenderMan output by sharing our studio's lights and surfaces.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1628
Streaming:
 
Production-Quality, Final-Frame Renderin on the GPU
Robert Slater (Redshift)
We'll discuss the latest features of Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. A few customer work examples will be demonstrated. This talk will be ...Read More
We'll discuss the latest features of Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. A few customer work examples will be demonstrated. This talk will be of interest both to the industry professional who wants to learn more about GPU-accelerated production-quality rendering as well as the software developer who's interested on GPU-accelerated rendering.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1629
Streaming:
Download:
 
NVIDIA Iray?: Changing the Face of Architecture and Design
Scott DeWoody (Gensler)
NVIDIA's Iray technology was a game changer in the design process of its new corporate campus. Gensler teamed up with developers at NVIDIA to help integrate this technology into the process to accurately simulate how the design of the campus would l ...Read More
NVIDIA's Iray technology was a game changer in the design process of its new corporate campus. Gensler teamed up with developers at NVIDIA to help integrate this technology into the process to accurately simulate how the design of the campus would look in the real world. This process ended up helping everyone understand how light and materials were going to act in the 500,000-square-foot space. Being able to accurately compute how the massive amount of daylight coming into the space would react to changes in the design was incredible feedback for the designers. The data that Iray visualized helped with almost every design decision from start to finish.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1630
Streaming:
Download:
 
MDL Materials to GLSL Shaders: Theory and Practice
Andreas Mank (ESI Group)
Learn how you can map arbitrarily complex materials described with NVIDIA's Material Definition Language (MDL) onto sets of material-specific GLSL shaders using the MDL SDK. We use a skeleton of a general purpose main shader per stage, where a coupl ...Read More
Learn how you can map arbitrarily complex materials described with NVIDIA's Material Definition Language (MDL) onto sets of material-specific GLSL shaders using the MDL SDK. We use a skeleton of a general purpose main shader per stage, where a couple of pre-defined evaluation and sample functions are called. The body of those functions is composed by some code-snippets selected by the material analyzer. This approach has been adopted by ESI into their new rendering framework to showcase the power and flexibility of MDL. A demo will show the implementation results with focus on material re-use and sharing.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1631
Streaming:
Download:
 
Cutting Edge Tools and Techniques for Real-Time Rendering with NVIDIA GameWorks
David Coombes (NVIDIA)
he GameWorks program gets the best tools and techniques into the hands of game developers everywhere. Increasingly these tools are for film, VR, simulation and other demanding applications as well as AAA games. Attend this talk to gain a board overvi ...Read More
he GameWorks program gets the best tools and techniques into the hands of game developers everywhere. Increasingly these tools are for film, VR, simulation and other demanding applications as well as AAA games. Attend this talk to gain a board overview of our technologies and how you can use them in your project. Highlights include real time volumetric lighting, voxel based ambient occlusion, and voxel based physics simulation as well as tools like our class leading graphics debugger and native android development tools.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1632
Streaming:
Download:
 
Give Life to your 3D Art with MDL and NVIDIA Iray? in Substance Painter
Manuel Kraemer (NVIDIA), Jeremie Noguer (Allegorithmic)
Allegorithmic and NVIDIA will show how combining Substance, worldwide reference for procedural textures, MDL, the new standard to define multi-layer materials, and NVIDIA Iray, GPU-accelerated unbiased raytracer, will help solving artists and develop ...Read More
Allegorithmic and NVIDIA will show how combining Substance, worldwide reference for procedural textures, MDL, the new standard to define multi-layer materials, and NVIDIA Iray, GPU-accelerated unbiased raytracer, will help solving artists and developers PBR material challenges from edition to final frame rendering for artistic shots. After explaining MDL basics and the associated material workflow in Substance Designer, we will showcase the latest edition of Substance Painter, market's most innovative real-time 3D painting software. Now embedding Iray as alternate viewport, Substance Painter fully leverages the power of MDL and Substance and natively enhances your art with the most advanced rendering quality reduced to minimal compute time thanks to GPU acceleration.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1633
Streaming:
 
Leveraging Microsoft Azure's GPU N-Series for Compute Workloads and Visualization
Karan Batta (Microsoft)
This talk will cover the recently announced state-of-the-art GPU visualization and compute infrastructure in Microsoft's Azure cloud, the session will cover how you can leverage them for rendering, encoding, visualization and dynamic creation of ass ...Read More
This talk will cover the recently announced state-of-the-art GPU visualization and compute infrastructure in Microsoft's Azure cloud, the session will cover how you can leverage them for rendering, encoding, visualization and dynamic creation of assets. This session is aimed at folks who would like to learn more about how to utilize and leverage Azure for their production pipelines.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1635
Streaming:
Download:
 
WetBrush: GPU-Based 3D Painting Simulation at the Bristle Level
Zhili Chen (Adobe), Chris Hebert (NVIDIA)
We built a real-time oil painting system that simulates the physical interactions among brush, paint, and canvas at the bristle level entirely using CUDA. To simulate sub-pixel paint details given the limited computational resource, we propose to def ...Read More
We built a real-time oil painting system that simulates the physical interactions among brush, paint, and canvas at the bristle level entirely using CUDA. To simulate sub-pixel paint details given the limited computational resource, we propose to define paint liquid in a hybrid fashion: the liquid close to the brush is modeled by particles, and the liquid away from the brush is modeled by a density field. Based on this representation, we develop a variety of techniques to ensure the performance and robustness of our simulator under large time steps, including brush and particle simulations in non-inertial frames, a fixed-point method for accelerating Jacobi iterations, and a new Eulerian-Lagrangian approach for simulating detailed liquid effects.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1636
Streaming:
 
Visualization Applications on NVIDIA DGX-1
Charlie Boyle (NVIDIA)
An introduction to the NVIDIA DGX-1 System including discussion of containerized applications for professional visualization & deep learning. ...Read More
An introduction to the NVIDIA DGX-1 System including discussion of containerized applications for professional visualization & deep learning.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1637
Streaming:
 
Learning Representations for Automatic Colorization
Gustav Larsson (University of Chicago)
We developed a fully automatic image colorization system. Our approach leverages recent advances in deep networks, exploiting both low-level and semantic representations during colorization. As many scene elements naturally appear according to multim ...Read More
We developed a fully automatic image colorization system. Our approach leverages recent advances in deep networks, exploiting both low-level and semantic representations during colorization. As many scene elements naturally appear according to multimodal color distributions, we train our model to predict per-pixel color histograms. This intermediate output can be used to automatically generate a color image, or further manipulated prior to image formation; our experiments consider both scenarios. On both fully and partially automatic colorization tasks, our system significantly outperforms all existing methods.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1639
Streaming:
Download:
 
Audi?s drive for ?The Best Car Configurator On The Internet?
Thomas Orenz (Audi, AG), Francois de Bodinat (ZeroLight)
After delivering the pinnacle of commercial VR earlier this year, Audi and strategic partners ZeroLight unveil the next generation in online retail. Hailed by Jalopnik as ?the best car configurator on the internet? and winner of the Techies 2016 Clou ...Read More
After delivering the pinnacle of commercial VR earlier this year, Audi and strategic partners ZeroLight unveil the next generation in online retail. Hailed by Jalopnik as ?the best car configurator on the internet? and winner of the Techies 2016 Cloud Technology Award. Audi?s new 3D web solution utilizes revolutionary techniques to deliver a self-repairing, extremely stable and responsive cloud configurator. Born as a response to changing consumer behavior, with 96% of research conducted online, Audi will address the importance of introducing a highly advanced, immersive and engaging proposition for Audi customers to ensure they receive the premium quality experience synonymous with the four rings. The discussion will include insight into the development challenges faced, and how the solution combines with the automated omnichannel, forming a cohesive, interactive customer journey.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1640
Streaming:
Download:
 
Face2Face: Real-time Face Capture and Reenactment
Justus Thies (University of Erlangen-Nuremberg), Matthias Niessner (Stanford Univesrity)
We present a novel approach for real-time facial reenactment of a monocular target-video sequence, where the goal is to animate the facial expressions of the target video with a source actor and re-render the manipulated output video in a photo-reali ...Read More
We present a novel approach for real-time facial reenactment of a monocular target-video sequence, where the goal is to animate the facial expressions of the target video with a source actor and re-render the manipulated output video in a photo-realistic fashion.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1641
Streaming:
Download:
 
Introducing NVIDIA? GVDB Sparse Volumes
Rama Hoetzlein (NVIDIA), Ken Museth (Dreamworks Animation & SpaceX)
We introduce GVDB Sparse Volumes as a new offering with NVIDIA DesignWorks to focus on high quality raytracing of sparse volumetric data for motion pictures. Based on the VDB topology of Museth, with a novel GPU-based data structure and API, GVDB is ...Read More
We introduce GVDB Sparse Volumes as a new offering with NVIDIA DesignWorks to focus on high quality raytracing of sparse volumetric data for motion pictures. Based on the VDB topology of Museth, with a novel GPU-based data structure and API, GVDB is designed for efficient compute and raytracing on a sparse hierarchy of grids. Raytracing on the GPU is accelerated with indexed memory pooling, 3D texture atlas storage and a new hierarchical traversal algorithm. GVDB integrates with NVIDIA OptiX, and is developed as an open source library as a part of DesignWorks.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1643
Streaming:
Download:
 
Light Field Rendering and Streaming for VR & AR
Jules Urbach (Otoy, Inc.)
We will discuss OTOY's cutting edge light field rendering toolset and platform. OTOY's light field rendering technology allows for immersive experiences on mobile HMDs and next gen displays, ideal for VR and AR. OTOY is actively developing a ground ...Read More
We will discuss OTOY's cutting edge light field rendering toolset and platform. OTOY's light field rendering technology allows for immersive experiences on mobile HMDs and next gen displays, ideal for VR and AR. OTOY is actively developing a groundbreaking light field rendering pipeline, including the world's first portable 360 LightStage capture system and a cloud-based graphics platform for creating and streaming light field media for virtual reality and emerging holographic displays.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1644
Streaming:
 
Large Scale Video Processing for VR
Daniel Kopeinigg (Jaunt VR)
Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algo ...Read More
Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algorithms that require sharded processing of the data across hundreds of cloud based GPU instances.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1645
Streaming:
 
Digital Actors at MPC: Bridging the Uncanny Valley with GPU Technology
Damien Fagnou (MPC)
Discover the next generation of GPU-enabled facial rigs for digital actors at MPC. Through a mixed approach of linear deformers and non-linear analysis, MPC aims to improve the performance and appearance of its digital actors and improve upon the sta ...Read More
Discover the next generation of GPU-enabled facial rigs for digital actors at MPC. Through a mixed approach of linear deformers and non-linear analysis, MPC aims to improve the performance and appearance of its digital actors and improve upon the state of the art in the visual effects industry. You'll learn from industry experts how MPC is using the latest fabric engine technology to ease the transition to GPUs, enabling fast drawing of characters and fast parallel computation of deformers on CUDA.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1646
Streaming:
 
Look Development in Real Time
Jean-Daniel Nahmias (Pixar), David Pesare (Pixar)
Pixar's next-generation look development tool, Flow, allows artists to quickly develop and visualize complex shader networks in order to create rich and compelling materials for film assets. Flow interactively displays images using RTP, our real tim ...Read More
Pixar's next-generation look development tool, Flow, allows artists to quickly develop and visualize complex shader networks in order to create rich and compelling materials for film assets. Flow interactively displays images using RTP, our real time GPU ray tracer built on top of NVIDIA's OptiX toolkit and supporting our Universal Scene Description (USD). This enables us to match Pixar's RenderMan output by sharing our studio's lights and surfaces.   Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1647
Streaming:
 
Virtual Reality Rendering Features of NVIDIA GPUs
Mark Kilgard (NVIDIA)
ome and learn about the virtual reality (VR) rendering features of NVIDIA?s latest GeForce, Quadro, and Tegra GPUs. Geared for a general audience, this talk visually explains the VR rendering process and how NVIDIA GPUs with support for Simultaneous ...Read More
ome and learn about the virtual reality (VR) rendering features of NVIDIA?s latest GeForce, Quadro, and Tegra GPUs. Geared for a general audience, this talk visually explains the VR rendering process and how NVIDIA GPUs with support for Simultaneous Multi-Projection can render VR more efficiently and at higher quality than conventional VR rendering techniques.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1648
Streaming:
Download:
 
Giant VR - A Sundance Movie
Milica Zec (Giant VR), Winslow Porter (Giant VR)
Trapped in an active war-zone, two parents struggle to distract their young daughter by inventing a fantastical tale. Inspired by real events, this immersive virtual-reality experience, which mixes both game engine and live-action video, transports t ...Read More
Trapped in an active war-zone, two parents struggle to distract their young daughter by inventing a fantastical tale. Inspired by real events, this immersive virtual-reality experience, which mixes both game engine and live-action video, transports the viewer into the family's makeshift basement shelter. Giant had its world premiere at 2016 Sundance Film Festival New Frontier and its European premiere at Cannes Film Festival where it garnered strong emotional responses from both the general public and the press. Come learn about the making of this cinematic virtual reality experience from its creators Milica Zec and Winslow Porter.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1649
Streaming:
Download:
 
Visualization Applications on NVIDIA DGX-1
Deepti Jain (NVIDIA)
An introduction to the NVIDIA DGX-1 System including discussion of containerized applications for professional visualization & deep learning. ...Read More
An introduction to the NVIDIA DGX-1 System including discussion of containerized applications for professional visualization & deep learning.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1650
Streaming:
 
Independence Day: Resurgence - Killer Queen
Matt Aitken (Weta Digital)
We'll discuss Weta Digital's creation of the Alien Queen in this year's Independence Day: Resurgence. We will focus on the big showdown with the Queen at Area 51. He will also cover some of the unique fx simulation work, new innovations with their ...Read More
We'll discuss Weta Digital's creation of the Alien Queen in this year's Independence Day: Resurgence. We will focus on the big showdown with the Queen at Area 51. He will also cover some of the unique fx simulation work, new innovations with their skydome lighting setup as well as some of the techniques that allowed Weta to move to a large number of all CG shots.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1651
Streaming:
 
NUKE Studio for Film Pipelines
Juan Salazar (The Foundry)
In this demo, NUKE STUDIO Product Manager Juan Salazar looks at how NUKE STUDIO fits into a film pipeline by assuming the roles of a supervisor, a 2D lead and a NUKE artist. Follow the entire collaborative process, from the supervisor setting up proj ...Read More
In this demo, NUKE STUDIO Product Manager Juan Salazar looks at how NUKE STUDIO fits into a film pipeline by assuming the roles of a supervisor, a 2D lead and a NUKE artist. Follow the entire collaborative process, from the supervisor setting up projects, ingesting media, annotating shots and exporting assets; to the lead visualizing sequences with timeline effects and quick compositing; to the artist creating the final composite using the advanced features in NUKE 10; and finally to review and finishing back in NUKE STUDIO?s timeline.http://www.nvidia.com/object/siggraph2016.html#utm_source=shorturl&utm_medium=referral&utm_campaign=siggraph2016  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1652
Streaming:
 
Look Development in Real Time
Jean-Daniel Nahmias (Pixar), Davide Pesare (Pixar)
Pixar's next-generation look development tool, Flow, allows artists to quickly develop and visualize complex shader networks in order to create rich and compelling materials for film assets. Flow interactively displays images using RTP, our real tim ...Read More
Pixar's next-generation look development tool, Flow, allows artists to quickly develop and visualize complex shader networks in order to create rich and compelling materials for film assets. Flow interactively displays images using RTP, our real time GPU ray tracer built on top of NVIDIA's OptiX toolkit and supporting our Universal Scene Description (USD). This enables us to match Pixar's RenderMan output by sharing our studio's lights and surfaces.   Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1653
Streaming:
 
Processing VR Video in the Cloud
Sean Safreed (Pixvana)
At Pixvana, we are designing a platform for XR storytelling that enables new experiences for augmented or virtual reality systems. The media processing system for our new platform is based in the cloud, allowing us to create accelerated processes for ...Read More
At Pixvana, we are designing a platform for XR storytelling that enables new experiences for augmented or virtual reality systems. The media processing system for our new platform is based in the cloud, allowing us to create accelerated processes for delivering high quality VR video. We will share insights on building around GPU-accelerated cloud infrastructure for both batch and interactive systems along with details about our cloud processing system for video transformation and encoding that dramatically improves the quality of VR video streaming.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1654
Streaming:
Download:
 
VR Multi GPU Acceleration Featuring Autodesk VRED
Paul Schmucker (Autodesk), Tobias France (Hyundai)
Hyundai Design Research & Autodesk VR Team present a virtual design review of the Hyundai N Vision 2025 Gran Turismo. Tobias France, Hyundai Designer, and Paul Schmucker, Autodesk Automotive SME, will demonstrate their multi-user car design revie ...Read More
Hyundai Design Research & Autodesk VR Team present a virtual design review of the Hyundai N Vision 2025 Gran Turismo. Tobias France, Hyundai Designer, and Paul Schmucker, Autodesk Automotive SME, will demonstrate their multi-user car design review utilizing the HTC VIVE and Autodesk VRED Pro software; powered by NVIDIA Quadro Graphics.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1655
Streaming:
Download:
 
Rendering Faster and Better with VRWorks on Pascal
Ryan Prescott (NVIDIA)
This talk will introduce developers to NVIDIA's VRWorks, an SDK for VR game, engine, and headset developers that cut latency and accelerate stereo rendering performance on NVIDIA GPUs. We'll explain the features of this SDK, including VR SLI, multi ...Read More
This talk will introduce developers to NVIDIA's VRWorks, an SDK for VR game, engine, and headset developers that cut latency and accelerate stereo rendering performance on NVIDIA GPUs. We'll explain the features of this SDK, including VR SLI, multi-resolution shading, context priorities, and direct mode. We'll discuss the motivation for these features, how they work, and how developers can use VRWorks in their renderers to improve the VR experience on Oculus Rift, HTC Vive, and other VR headsets.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1656
Streaming:
Download:
 
NV Research: The Magic Behind GameWorks' Hybrid Frustum Traced Shadows (HFTS)
Chris Wyman (NVIDIA)
Games commonly use filtered shadow maps to shadow their worlds, but these introduce blocky aliasing artifacts that can introduce distracting shadow popping and flickering. At NVIDIA Research, we developed a fast algorithm using "irregular z-buff ...Read More
Games commonly use filtered shadow maps to shadow their worlds, but these introduce blocky aliasing artifacts that can introduce distracting shadow popping and flickering. At NVIDIA Research, we developed a fast algorithm using "irregular z-buffers" that still leverages decades of shadow map technology, but avoids this aliasing to provide ray traced quality shadows. Working with others at NVIDIA, we combined this work with our proven PCSS technology to significantly increase shadow quality for today's games. We will discuss some of the technical innovations behind this work, which is now available in NVIDIA GameWorks and has shipped in UbiSoft's Tom Clancy's The Division.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1657
Streaming:
Download:
 
Bringing Pascal to Professionals
Allen Bourgoyne (NVIDIA)
Designs are becoming more complex. Media is becoming richer with higher fidelity, combining greater resolutions and complex visual effects. Scientific visualization and compute problems are larger than ever. VR is changing all facets of entertainment ...Read More
Designs are becoming more complex. Media is becoming richer with higher fidelity, combining greater resolutions and complex visual effects. Scientific visualization and compute problems are larger than ever. VR is changing all facets of entertainment, design, engineering, architecture, and medicine. Customers want to experience ideas, validate designs, rehearse procedures, and visualize problems interacting with them naturally and at scale.  Back
 
Keywords:
Best of GTC Theater, SIGGRAPH 2016 - ID SIG1658
Streaming:
Download:
Big Data Analytics
Presentation
Media
Accelerate Distributed Data Mining with Graphics Processing Units
Nam-Luc Tran (EURA NOVA)
Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more ...Read More
Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more flexible models exist based on the DFG processing model. None of the existing frameworks however have considered the case when the individual processing nodes are equipped with GPUs to accelerate parallel computations. In this talk, we discuss this challenge and the implications of the presence of GPUs on some of the processing nodes on the DFG model representation of such heterogeneous jobs and on the scheduling of the jobs, with big data mining as principal use case.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4169
Streaming:
Download:
 
GPU-Accelerated Large-Scale Dense Subgraph Detection
Andy Wu (Xerox Research Center)
Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation ...Read More
Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation limitation, traditional approaches are infeasible when dealing with large-scale graph with millions or billions vertices. In this presentation, we proposed a GPU accelerated dense subgraph detection algorithm to solve the large-scale dense subgraph detection problem. It successfully mapped the irregular graph clustering problem into the GPGPU platform, and extensive experimental results demonstrated our strong scalability on the GPU computing platforms.  Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, GTC 2014 - ID S4215
Streaming:
Download:
 
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Haicheng Wu (Georgia Institute of Technology)
This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel f ...Read More
This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel fusion which can be applied to other applications.   Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, GTC 2014 - ID S4222
Streaming:
Download:
 
Histograms in CUDA: Privatized for Fast, Level Performance
Nicholas Wilt (The CUDA Handbook)
Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to u ...Read More
Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to use "privatized" (per-thread) histograms to balance performance of the average case against data-dependent performance of degenerate cases.  Back
 
Keywords:
Big Data Analytics, Video & Image Processing, GTC 2014 - ID S4249
Streaming:
 
Packet-based Network Traffic Monitoring & Analysis with GPUs
Wenji Wu (Fermilab)
In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications f ...Read More
In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications face tremendous performance and scalability challenges. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. At Fermilab, we have prototyped a GPU-assisted network traffic monitoring & analysis system, which analyzes network traffic on a per-packet basis. We implemented a GPU-accelerated library for network traffic capturing, monitoring, and analysis. The library consists of various CUDA kernels, which can be combined in various ways to perform monitoring and analysis tasks. In this talk, we will describe our architectural approach in developing a generic GPU-assisted network traffic monitoring and analysis capability. Multiple examples will be given to demonstrate how to use GPUs to analyze network traffic.  Back
 
Keywords:
Big Data Analytics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing & HPC, GTC 2014 - ID S4320
Streaming:
Download:
 
The Energy Case for Graph Processing on Hybrid CPU and GPU Systems
Elizeu Santos-Neto (University of British Columbia)
This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the ...Read More
This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the algorithmic tasks exercise each of the processing units where they perform best; GPUs have much higher TDP thus their impact on overall energy consumption is unclear. An evaluation on large real-world graphs as well as on synthetic graphs as large as 1 billion vertices and 16 billion edges shows that efficiency - in terms of both performance and power, can be achieved.   Back
 
Keywords:
Big Data Analytics, Energy Exploration, GTC 2014 - ID S4338
Streaming:
 
Real-Time Quantification Filters for Multidimensional Databases
Peter Strohm (Jedox AG)
Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, g ...Read More
Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, given a set of dimensional elements, returns all those elements for which ANY (or ALL) numeric cells in the respective slice of a user-defined subcube satisfy a given condition. Such filters are especially useful for the exploration of big data spaces, for zero-suppression in large views, or for top-k analyses. In addition to the main algorithmic aspects, attendees will see how our implementation solves challenges such as economic utilization of the CUDA memory hierarchy or minimization of threading conflicts in parallel hashing.  Back
 
Keywords:
Big Data Analytics, Finance, GTC 2014 - ID S4395
Streaming:
Download:
 
Rhythm: Harnessing Data Parallel Hardware for Server Workloads
Sandeep Agrawal (Duke University)
We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the sh ...Read More
We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the shelf hardware used for individual machines leading to an inefficient usage of energy and area. Rhythm improves upon this by harnessing data parallel hardware to execute "cohorts" of web service requests, grouping requests together based on similar control flow and using intelligent data layout optimizations. An evaluation of the SPECWeb Banking workload for future server platforms on the GTX Titan achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4447
Streaming:
Download:
 
Parallel Lossless Compression Using GPUs
Evangelia Sitaridi (Columbia University)
Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute re ...Read More
Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute resources. We focus on a the DEFLATE algorithm that is a combination of the LZSS and Huffman entropy coding algorithms, used in common compression formats like gzip. Both algorithms are inherently serial and trivial parallelization methods are inefficient. We show how to parallelize these algorithms efficiently on GPUs and discuss trade-offs between compression ratio and increased parallelism to improve performance. We conclude our presentation with a head-to-head comparison to a multi-core CPU implementation, demonstrating up to half an order of performance improvement using a single Kepler GPU. This is joint work with IBM researchers Rene Mueller and Tim Kaldewey.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4459
Streaming:
Download:
 
GPUs and Regular Expression Matching for Big Data Analytics
Alon Shalev Housfater (IBM)
Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based r ...Read More
Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based regular expression technology will be introduced, its basic performance characteristics will be presented. We'll demonstrate that the GPU enables impressive performance gains in pattern matching tasks and compare its performance against latest generation processors. Finally, we'll examine the key challenges in using such accelerators in large software products and highlight open problems in GPU implementation of pattern matching tasks.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4462
Streaming:
 
High Speed Analysis of Big Data Using NVIDIA GPUs and Hadoop
Partha Sen (Fuzzy Logix)
Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs ...Read More
Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs to accelerate analytics on Hadoop is an optimal solution that drives high price to performance benefits. In this session, we'll demonstrate a solution using NVIDIA GPUs for the analysis of big data in Hadoop. The demo will show how you can leverage the Hadoop file system, it's map reduce architecture and GPUs to run computationally intense models bringing together both data and computational parallelism. Methods demonstrated will include classification techniques such as decision trees, logistic regression and support vector machines and clustering techniques like k means, fuzzy k means and hierarchical k means on marketing, social and digital media data.   Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, Finance, GTC 2014 - ID S4471
Streaming:
 
Recursive Interaction Probability: A New Paradigm in Parallel Data Processing
Richard Heyns (brytlyt)
This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will e ...Read More
This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will end with how RIP was implemented on a NVIDIA Kepler K20c, the design choices and how these affect performance. Use cases that play to the strengths of RIP as well as use cases that reveal its weaknesses will also be shared.   Back
 
Keywords:
Big Data Analytics, Numerical Algorithms & Libraries, Clusters & GPU Management, GTC 2014 - ID S4483
Streaming:
 
Indexing Documents on GPU - Can You Index Web in Real Time?
Michael Frumkin (NVIDIA)
Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an applicatio ...Read More
Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an application that has a large degree of parallelism, but medium divergence. Specifically, we concentrate on text processing used to index web documents. We present indexing algorithms for both GPU and CPU and show that GPU outperforms CPU on two common workloads. We argue that a medium sized GPU enabled cluster will be able to index all internet documents in one day. Indexing of web documents on GPU opens a new area for GPU computing. Companies that provide search services spend a lot of cycles on indexing. Faster and more energy efficient indexing on GPU may provide a valuble alternative to CPU-only clusters used today.   Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2014 - ID S4506
Streaming:
Download:
 
Evaluation of Parallel Hashing Techniques
Rajesh Bordawekar (IBM T. J. Watson Research Center)
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present ...Read More
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present different ways of implementing these functions on the GPU, with emphasis on data structures that exploit GPU's data parallel features as well as memory constraints.   Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, GTC 2014 - ID S4507
Streaming:
Download:
 
A High-Speed 2-Opt TSP Solver for Large Problem Sizes
Martin Burtscher (Texas State University)
Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and t ...Read More
Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and tiling, introducing non-determinism to avoid synchronization, and parallelizing each operation rather than across operations to minimize thread divergence and drastically lower the latency of result production. The final code evaluates 68.8 billion moves per second on a single Titan GPU.  Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, Supercomputing & HPC, GTC 2014 - ID S4534
Streaming:
Download:
 
Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL
Jeremy Meredith (Oak Ridge National Laboratory)
Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific ...Read More
Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific data model and targets future high performance computing ecosystems. This talk shows how a productive programming API built upon an efficient data model can help algorithm developers achieve high performance with little code. Discussions will include examples and lessons learned.  Back
 
Keywords:
Big Data Analytics, Scientific Visualization, GTC 2014 - ID S4553
Streaming:
Download:
 
Middleware Framework Approach for BigData Analytics Using GPGPU
Ettikan Kandasamy Karuppiah (MIMOS Bhd)
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing ...Read More
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing across heterogeneous computing resources for structured and unstructured BigData applications. Thus, we propose a middleware framework for 'Big Data' analytics that provides mechanisms for automatic data segmentation, distribution, execution, information retrieval across multiple cards (CPU & GPU) and machines, a modular design for easy addition of new GPU kernels at both analytic and processing layer, and information presentation. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory ac-cess, algorithms for parallel GPU computation and results for various test con-figurations are shown. Our results show proposed middleware framework pro-vides alternative and cheaper HPC solution to users.   Back
 
Keywords:
Big Data Analytics, Finance, Video & Image Processing, GTC 2014 - ID S4583
Streaming:
Download:
 
Extending Python for High-Performance Data-Parallel Programming
Siu Kwan Lam (Continuum Analytics, Inc)
Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich ...Read More
Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich library support and language simplicity makes Python ideal for subject matter experts to rapidly develop powerful applications. Python enables fast turnaround time and flexibility for custom analytic pipelines to react to immediate demands. However, CPython has been criticized as being slow and the existence of the global interpreter lock (GIL) makes it difficult to take advantage of parallel hardware. To solve this problem, Continuum Analytics has developed LLVM based JIT compilers for CPython. Numba is the open-source JIT compiler. NumbaPro is the proprietary compiler that adds CUDA GPU support. We aim to extend and improve the current GPU support in NumbaPro to further increase the scalability and portability of Python-based GPU programming.  Back
 
Keywords:
Big Data Analytics, Large Scale Data Analytics, Defense, Developer - Programming Languages, GTC 2014 - ID S4608
Streaming:
Download:
 
High-Performance Graph Primitives on GPU: Design and Implementation of Gunrock
Yangzihao Wang (UC Davis)
Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future developm ...Read More
Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. The talk will share experience on how to design the framework and APIs for computing efficient graph primitives on GPUs. We will focus on the following two aspects: 1) Details of the implementations of several graph algorithms on GPUs. 2) How to abstract these graph algorithms using general operators and functors on GPUs to improve programmer productivity.  Back
 
Keywords:
Big Data Analytics, Large Scale Data Analytics, Defense, GTC 2014 - ID S4609
Streaming:
Download:
 
Speeding Up GraphLab Using CUDA
Vishal Vaidyanathan (Royal Caliber)
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and ...Read More
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple gpus within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU based GAS framework.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, Large Scale Data Analytics, Defense, GTC 2014 - ID S4611
Streaming:
 
Speeding Up GraphLab Using CUDA
Vishal Vaidyanathan (Royal Caliber)
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and ...Read More
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple GPUs within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU-based GAS framework.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2014 - ID S4612
Streaming:
 
A High Level API for Fast Development of High Performance Graphic Analytics on GPUs
Zhisong Fu (SYSTAP)
The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performan ...Read More
The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performance graph analytics are critical for a large range of application domains. The SIMT architecture of the GPUs and the irregularity nature of the graphs make it difficult to develop efficient graph analytics programs. In this session, we present an open source library that provides a high level abstraction for efficient graph analytics with minimal coding effort. We use several specific examples to show how to use our abstraction to implement efficient graph analytics in a matter of hours.   Back
 
Keywords:
Big Data Analytics, Large Scale Data Analytics, Defense, GTC 2014 - ID S4617
Streaming:
Download:
 
Getting Big Data Done On a GPU-Based Database
Ori Netzer (SQream Technologies)
We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our dat ...Read More
We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our database provides close to real-time analytics and provides up to 100X faster insights all in a very cost-effective manner. We will elaborate on these features and more in order to provide a clear understanding of how our technology works and why it is beneficial for teleco companies.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID S4644
Streaming:
Download:
 
Parallel Decomposition Strategies in Modern GPU
Sean Baxter (NVIDIA)
Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join. ...Read More
Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2014 - ID S4674
Streaming:
 
Extreme Machine Learning with GPUs
John Canny (UC Berkeley)
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach ...Read More
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach is an easy-to-use, interactive environment similar to SciPy/Matlab, but with qualitatively higher performance. The session will discuss: Performance: BIDMach follows a "Lapack" philosophy of building high-level algorithms on fast low-level routines (like BLAS). It exploits the unique hardware features of GPUs to provide more than order-of-magnitude gains over alternatives. Accuracy: Monte-Carlo methods (MCMC) are the most general way to derive models, but are slow. We have developed a new approach to MCMC which provides two orders of magnitude speedup beyond hardware gains. Our "cooled" MCMC is fast and improves model accuracy. Interactivity: We are developing interactive modeling/visualization capabilities in BIDMach to allow analysts to guide, correct, and improve models in real time.  Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, Machine Learning & Deep Learning, Scientific Visualization, GTC 2014 - ID S4811
Streaming:
Download:
 
First Glimpse into the OpenPOWER Software Stack with Big Data Workload Example (Presented by IBM)
Keith Campbell (IBM), Ken Rozendal (IBM)
The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables uni ...Read More
The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables unique innovation across the full hardware and software stack. OpenPOWER ecosystem partners and developers now have more choice, control and flexibility to optimize at any level of the technology from the processor on up for next-generation, hyperscale and cloud datacenters. Integrating support for NVIDIA GPUs on the POWER platform enables high performance enterprise and technical computing applications such as Big Data and analytics workloads. This presentation will cover the software stack and developer tools for OpenPOWER, the planned support for CUDA, and a proof of concept showing GPU acceleration. This proof of concept will be available as a demo in the IBM booth.  Back
 
Keywords:
Big Data Analytics, Debugging Tools & Techniques, Developer - Programming Languages, GTC 2014 - ID S4882
Streaming:
Download:
 
Dynamic GPU Graph Analytics
Adam McLaughlin (Georgia Institute of Technology)
Graphs that model social networks, numerical simulations, and the structure of the internet are enormous and continuously changing with time. Contemporary software packages neglect temporal variations in these networks and can only analyze them stati ...Read More
Graphs that model social networks, numerical simulations, and the structure of the internet are enormous and continuously changing with time. Contemporary software packages neglect temporal variations in these networks and can only analyze them statically. This poster presents an optimized GPU implementation of dynamic betweenness centrality, a popular analytic with applications in power grid analysis, the study of protein interactions, and community detection. By avoiding unnecessary accesses to memory, we achieve up to a 110x speedup over a CPU implementation of the algorithm and can update the analytic 45x faster on average than a static recomputation on the GPU.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4171
Download:
 
Kinetic Parameter Estimation in Metabolic Networks With GPGPU
Ali Khodayari (PSU)
In this study, the recently introduced Ensemble Modeling (EM) approach was used to construct a kinetic model of E. coli metabolism. We put forth a metabolic model composed of 34 reactions and 22 metabolites representing E. coli's core metabolism. We ...Read More
In this study, the recently introduced Ensemble Modeling (EM) approach was used to construct a kinetic model of E. coli metabolism. We put forth a metabolic model composed of 34 reactions and 22 metabolites representing E. coli's core metabolism. We developed a Newton-Raphson based estimation approach to identify the kinetic parameters of a given metabolic network. The solver is designed and implemented using CUDA, in order to accelerate the overall process. The application initially parses a large set of equations using the Boost::Spirit C++ framework, finds an analytic Jacobian J, and then iteratively updates the 'best' solution with delta by solving J.delta=-f using GMRES from CUSP. Successive updates of the parameter set, Jacobian matrix, and function updates, as well as the system solver are all implemented on GPU.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4258
Download:
 
Fast Vertical Data Classification Using GPUs
Arjun G. Roy (NDSU)
Massive amounts of data is being generated in recent times. Current classification methods are quite accurate but extremely slow on big data. We propose a two-pronged approach a) Treat data vertically instead of conventional horizontal treatment and ...Read More
Massive amounts of data is being generated in recent times. Current classification methods are quite accurate but extremely slow on big data. We propose a two-pronged approach a) Treat data vertically instead of conventional horizontal treatment and use our vertical-data specific classification algorithm and b) Exploit GPUs fast mathematical computational speed to process vertical data quickly which significantly benefits from our data structure called P-Tree. Our classification algorithm is O(k) where k is number of attributes and achieves significantly high accuracy.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4263
Download:
 
GPU Accelerated PrefixSpan Algorithm for Sequential Pattern Mining
Benuraj Sharma (Sri Sathya Sai Institute Of Higher Learning)
This poster describes the CUDA version of PrefixSpan algorithm implemented on NVIDIA Kepler GPU's that extracts the inherent task parallelism and leverage the dynamic parallelism feature for implementing recursion. The results show that the GPU acce ...Read More
This poster describes the CUDA version of PrefixSpan algorithm implemented on NVIDIA Kepler GPU's that extracts the inherent task parallelism and leverage the dynamic parallelism feature for implementing recursion. The results show that the GPU accelerated PrefixSpan i.e., CUDAPrefixSpan achieves a speedup of ~5x for sequence database of varying sizes.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4187
Download:
 
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Haicheng Wu (Georgia Institute of Technology)
This poster presents the Red Fox system sponsored by NVIDIA Grad Fellowship program. It introduces the compilation flow and performance results of executing relational queries such as TPC-H on GPUs. ...Read More
This poster presents the Red Fox system sponsored by NVIDIA Grad Fellowship program. It introduces the compilation flow and performance results of executing relational queries such as TPC-H on GPUs.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4182
Download:
 
GPU Accelerated Histogram Based Analytics Engine
Jack Gerrity (Center For Advanced Public Safety, Computer Science Department at The University of Alabama)
In this poster, we show a column-based approach with the use of multiple GPUs to quickly produce ad-hoc histograms from previously compiled data. We then compare this approach's histogram building speed to Apache's Lucene based Solr. ...Read More
In this poster, we show a column-based approach with the use of multiple GPUs to quickly produce ad-hoc histograms from previously compiled data. We then compare this approach's histogram building speed to Apache's Lucene based Solr.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4230
Download:
 
Galactica - Accelerated Queries Processing
Keh Kok Yong (MIMOS Berhad)
Information is one of the most influential forces transforming the growth of business. Companies churn out a burgeoning volume of transactional data, capturing and matching trillions of bytes of information. This has caused the data to grow exponen ...Read More
Information is one of the most influential forces transforming the growth of business. Companies churn out a burgeoning volume of transactional data, capturing and matching trillions of bytes of information. This has caused the data to grow exponentially. Our works progressively research the mechanisms for accelerating SQL query operations by using GPU. This proposed system is able to process large volume of data, which is exceeding the total size of the GPU RAM. It performs fundamental SQL operations such as select, like, order by, join, sum, min and others. In addition, it works with PostgreSQL and MySQL.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4149
Download:
 
I/O Acceleration With GPU for I/O-bound Applications
Kento Sato (Tokyo Institute of Technology)
Recent several supercomputers usually have GPUs on each compute node to accelerate computation. However, not all applications can be accelerated by GPUs. For example, performance of I/O-bound applications is limited by underlying I/O device performan ...Read More
Recent several supercomputers usually have GPUs on each compute node to accelerate computation. However, not all applications can be accelerated by GPUs. For example, performance of I/O-bound applications is limited by underlying I/O device performance. Such I/O-bound applications require more I/O bandwidth rather than computational power. If we execute such non-GPU applications, the GPUs are not utilized, and we waste the resources. To accelerate I/O-bound applications, we develop GPU-accelerated I/O interface (gmfs). Our experimental results show gmfs can accelerate sequential read/write, and utilize 82% of PCIe-gen2 peak bandwidth, 50% of PCIe-gen3 peak bandwidth.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4185
Download:
 
Preliminary I/O Performance Evaluation on GPU Accelerator and External Memory
Koichi Shirahata (Tokyo Institute of Technology)
Recent supercomputers deploy not only many-core accelerators such as GPU but also Non-Volatile Memory (NVM) such as flash memory as an external memory, in order to handle large-scale data processing for a wide range of applications. However, how to c ...Read More
Recent supercomputers deploy not only many-core accelerators such as GPU but also Non-Volatile Memory (NVM) such as flash memory as an external memory, in order to handle large-scale data processing for a wide range of applications. However, how to construct NVM as local disks at a low cost with large volume for heterogeneous supercomputers is not clear. In order to clarify I/O characteristics between GPU and NVM, we comparatively investigate I/O strategies on GPU and multiple mini SATA SSDs. Our preliminary results exhibit 3.06GB/s of throughput from 8 mini SATA SSDs to GPU by using RAID0 with appropriate stripe size.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4251
Download:
 
Acceleration of K-Means and K-Means++ Using CUDA
Marek Fiser (Purdue University)
The research project focuses on GPU implementation of commonly used clustering algorithm K-means. Our implementation minimizes the overhead caused by copying the data between CPU and GPU. We were able to implement the entire algorithm on GPU which gr ...Read More
The research project focuses on GPU implementation of commonly used clustering algorithm K-means. Our implementation minimizes the overhead caused by copying the data between CPU and GPU. We were able to implement the entire algorithm on GPU which greatly improved the performance over CPU reaching up to 15x speedup. Our work also analyses an improved version of the algorithm called K-means++. This algorithm builds on the original version of K-means improving it by more careful initialization which leads to better performance. We adjusted a K-means++ algorithm to work on a GPU, which led to 9x speedup.   Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4264
Download:
 
Space and Speed Advantage of pTree for Big Data Processing
Mohammad Hossain (NDSU CS Department)
This poster shows the space compression and speed gain while processing 'Big Data' using pTrees which is a vertical bit slice of column of a data set. Our experiment shows 92% speed gain over traditional processing of data set of size in the range ...Read More
This poster shows the space compression and speed gain while processing 'Big Data' using pTrees which is a vertical bit slice of column of a data set. Our experiment shows 92% speed gain over traditional processing of data set of size in the range of billion records.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4268
Download:
 
Towards A Hash Based GroupBy/Aggregate Algorithm for Fast Query Processing on GPU
Sina Meraji (IBM Canada Ltd.)
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. As part of our research, we are developing a high performance GPU library for costly database opera ...Read More
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. As part of our research, we are developing a high performance GPU library for costly database operations. Our work will leverage the latest NVIDIA GPU features (i.e. Unified Virtual Addressing, Multi-Streaming) and various host side partitioning algorithms to run database operations on large size tables. The focus of this article is on the prototype for GroupBy/Aggregate operations that we created to exploit GPUs. The algorithm has two main steps. In the first step, we create a hash table by doing coalesced reads from the table on which we run the Groupby/aggregate query. The aggregation operations occur at the same time as creating the hash table. After creating the hash table, we only need a probe phase to retrieve result from hash. Our results indicate that by using GPU shared memory we can get 28X speed up over the CPU implementation  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4211
Download:
 
Processing Data Streams With Hard Real-Time Constraints on CPU/GPU Systems
Uri Verner (Technion)
Growing rates of collected data present a challenge when it comes to scalable solutions for data transmission and processing. Even more challenging is the problem of real-time stream processing. In such applications, the system needs to react to the ...Read More
Growing rates of collected data present a challenge when it comes to scalable solutions for data transmission and processing. Even more challenging is the problem of real-time stream processing. In such applications, the system needs to react to the incoming data within given time bounds. This poster presents the challenges in processing multiple real-time data streams on CPU/GPU systems, and the results of our efforts for dealing with these challenges. The work addresses various issues related to single- and multi-GPU systems, including resource sharing in computation and communication under real-time constraints.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4141
Download:
 
Real-Time GPU Computation of Ballistic Thermal Signatures
Glenn Parker (Georgia Tech Research Institute)
GTRI has implemented and tested a CUDA upgrade to a ground-to-air Hardware-in-the-Loop missile simulator. By breaking a single-threaded thermal integrator loop into multiple independent kernels, a speedup of 20X is achieved for complex targets. This ...Read More
GTRI has implemented and tested a CUDA upgrade to a ground-to-air Hardware-in-the-Loop missile simulator. By breaking a single-threaded thermal integrator loop into multiple independent kernels, a speedup of 20X is achieved for complex targets. This speed increase reduces computation time from days to hours, and preliminary results show that multiple GPUs may allow additional speedup by removing stream concurrency limits.  Back
 
Keywords:
Big Data Analytics, Computational Physics, GTC 2015 - ID P5136
Download:
 
Parallel Map Projection of Vector-based Big Spatial Data
Wenpeng Feng (University of North Carolina at Charlotte)
Because of large volume and complex algorithm of map projection, transformation of Geo-referenced data among various projection are facing big computational challenge, especially for big spatial data. In order to overcome this challenge, we present a ...Read More
Because of large volume and complex algorithm of map projection, transformation of Geo-referenced data among various projection are facing big computational challenge, especially for big spatial data. In order to overcome this challenge, we present a cloud-based parallel computing framework for accelerating the map projection of vector-based big spatial data in this work. GPU-enabled parallel map projection algorithms were developed based on CUDA platform for our framework.  Back
 
Keywords:
Big Data Analytics, Supercomputing & HPC, GTC 2015 - ID P5161
Download:
 
Large-Scale Pattern Recognition Using GPU-Accelerated Relational Database
Matthew England (University of Missouri, Columbia)
We are leveraging the abilities of relational databases for scalable storage and retrieval, and massively parallelized computation on the GPU to perform large-scale pattern recognition tasks. We have successfully integrated these two technologies, pr ...Read More
We are leveraging the abilities of relational databases for scalable storage and retrieval, and massively parallelized computation on the GPU to perform large-scale pattern recognition tasks. We have successfully integrated these two technologies, providing the database with the means to do high-performance computation on massive stored datasets. Internalizing this capability within the database facilitates blending of advanced relational and spatial operations into pattern matching tasks; which is applicable in a variety of fields.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5233
Download:
 
Accelerating Topological Data Analysis Using GPUs
Ryan Hsu (Ayasdi)
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analy ...Read More
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analytics software to simplify the analysis of complex, multi-variate datasets. In this poster, we illustrate how GPGPU's can be leveraged to accelerate key operations in TDA by over 14X.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID P5239
Download:
 
GPU Based Data Analysis on the example of Time-of-flight Spectroscopy
Gregor Hartmann (DESY)
Free electron lasers enable to study non-linear multi-photon processes within one single FEL shot which is less than 50ft long and has a repetition rate of 120Hz at LCLS. As a result, a huge amount of data is created in a very short acquisition time. ...Read More
Free electron lasers enable to study non-linear multi-photon processes within one single FEL shot which is less than 50ft long and has a repetition rate of 120Hz at LCLS. As a result, a huge amount of data is created in a very short acquisition time. Analyzing this data on a single shot level needs a lot of computing power but can be massively parallelized. In order to decrease the evaluation time, we created a GPU-based evaluation software for our electron time-of-flight spectrometer setup.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5276
Download:
 
Massively Parallel Geo-Spatial Coordinates Computation with GalacticaDB
Keh Kok Yong (MIMOS Berhad)
Persons or things have become a mobile sensor, with GPS enabled. These self-quantified technologies generate humongous amounts of raw data, which can turn to be a valuable information for users and business. A massively parallel SQL-like parallel eng ...Read More
Persons or things have become a mobile sensor, with GPS enabled. These self-quantified technologies generate humongous amounts of raw data, which can turn to be a valuable information for users and business. A massively parallel SQL-like parallel engine, GalacticaDB with extended geo-spatial capabilities. It accelerates analytic computation with optimizing queries processing and exploiting NVIDIA Tesla GPUs. Our results indicate that the GPU is an effective and energy efficient co-processor for executing database query operations.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5277
Download:
 
Fuzzy String Matching of Vehicle Identification Numbers in a Highly Parallel Environnent
Mason Saucier (Center For Advanced Public Safety, Computer Science Department at The University of Alabama)
This poster analyzes GPU implementation of the Levenshtein Distance function for fuzzy string matching of a Vehicle Identification Number (VIN) against a dataset of known VINs. Our solution gives a 13x - 15x increase over similar CPU solutions. Our w ...Read More
This poster analyzes GPU implementation of the Levenshtein Distance function for fuzzy string matching of a Vehicle Identification Number (VIN) against a dataset of known VINs. Our solution gives a 13x - 15x increase over similar CPU solutions. Our work aims to help correct human error that occurs during data entry and return meaningful information to the user, that they can then use to inform their decisions.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID P5305
Download:
 
Fighting Malware with GPUs in Real Time
Libor Morkovsky (Avast s.r.o)
Today's malware ecosystem produces hundreds of thousands distinct samples per day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a se ...Read More
Today's malware ecosystem produces hundreds of thousands distinct samples per day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID P5313
Download:
 
GPU Accelerated Multi-predicate Join Algorithms for Listing Cliques in Graphs
Haicheng Wu (Georgia Institute of Technology)
This poster introduces how to run general join algorithm in GPU to solve an important graph problem: clique listing. In particular, two different join algorithms are presented for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a ...Read More
This poster introduces how to run general join algorithm in GPU to solve an important graph problem: clique listing. In particular, two different join algorithms are presented for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a recently presented worst-case optimal multi-predicate join algorithm. The second is a novel approach, inspired by the first approach but more suitable for GPU architectures. The performance benchmarks show that for both approaches using GPUs is efficient.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID P5319
Download:
 
Gunrock: A High-Performance Graph Processing Library on the GPU
Yangzihao Wang (University of California, Davis)
For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock" ...Read More
For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system, uses a high-level bulk-synchronous abstraction with traversal and computation steps, designed specifically for the GPU. It is a framework that is general, straightforward to program, and fast (on par with hardwired primitives and faster than any other programmable GPU library).  Back
 
Keywords:
Big Data Analytics, Developer - Tools & Libraries, GTC 2015 - ID P5326
Download:
 
From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA
Paul Richmond (University of Sheffield)
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system c ...Read More
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system can be simulated and visualized at large scales using the open source FLAME GPU framework. Methods of code generation from XML documents and use of CUDA streams for heterogeneous state execution are presented. Examples include cellular tissue modelling and large scale crowd dynamics.  Back
 
Keywords:
Big Data Analytics, Developer - Tools & Libraries, Life & Material Science, GTC 2015 - ID S5133
Streaming:
Download:
 
Coordinating More Than 3 Million CUDA Threads for Social Network Analysis
Adam McLaughlin (Georgia Institute of Technology)
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in commu ...Read More
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running BC on 192 GPUs.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Supercomputing & HPC, GTC 2015 - ID S5156
Streaming:
Download:
 
Fast Triangle Counting for Social Network Analytics on the K40
Oded Green (ArrayFire)
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks. ...Read More
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5176
Streaming:
Download:
 
Big Data on a Budget: Cost Efficient Large-Scale Graph Analytics
Joe Schneible, Ph.D. (Technica Corporation)
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include in ...Read More
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include insights into data storage structures for I/O efficient processing as well as the application of the massive parallelism of the GPU to real world graph data.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Machine Learning & Deep Learning, GTC 2015 - ID S5200
Streaming:
Download:
 
High Performance Indexing of Large Data Sets Using GPU
Massimo Bernaschi (National Research Council of Italy)
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at imp ...Read More
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at improving efficiency and reliability of the indexing process.The solution we propose is scalable and exploits in-memory computing to minimize I/O operations and enhance performance. Moreover we describe the CUDA-based parallelization of the most compute-intensive tasks involved in the indexing process. The integration of the CUDA components within an architecture that is mostly Java-based led us to develop a technique for Java-CUDA interoperability that can be applied to other applications. Some visualisation results will also be presented.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5212
Streaming:
Download:
 
Maximize the Performance of your Cluster: Marrying GPUs and Dataflow Graph Processing
Nam-Luc Tran (EURA NOVA)
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardw ...Read More
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardware. Among these the dataflow graph processing model is the most general, representing jobs as distributed operators (nodes) connected by data channels (edges). In this talk, we explain how we have extended an existing dataflow graph processing framework to fully take into account GPU resources in the cluster. We show how this paradigm fully exploits the batch and streaming features of the GPU in a distributed job. We then finally expose our model for the scheduling on this heterogeneous processing framework.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5215
Streaming:
Download:
 
Unleashing The Power Of GPUs Over The Web
Vishal Vaidyanathan (Royal Caliber)
GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure tha ...Read More

GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure that provides a suite of GPU-driven machine learning and graph algorithms as a web service. The effortless usability of an HTTP API unlocks the power of GPU computing with none of the attendant complexities. As examples, we will show interactive analytics on web-scale graphs and deep learning on large data sets using nothing more than a modern web browser.

  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5224
Streaming:
Download:
 
Towards Fast SQL Query Processing in DB2-BLU Using GPUs
Sina Meraji (IBM)
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processin ...Read More
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processing in such databases, GPUs can be used as fast, high bandwidth co-processors. As part of our work, we integrate Nvidia GPUs to DB2-BLU by changing the infrastructure of DB2-BLU and developing GPU kernels. We have a hybrid design in which we use some of DB2-BLU features on IBM's POWER8 processor and NVIDIA's GPU accelerator technology for fast query processing. This work was done in collaboration with Peter Kokosielis.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5229
Streaming:
Download:
 
PG-Strom: Query Acceleration Engine of PostgreSQL Powered by GPGPU
Kohei KaiGai (NEC)
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, h ...Read More
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, however, increasion of data size makes performance concerns. PG-Strom is an extension of PostgreSQL database, designed to off-load several CPU intensive query workloads (scan, join and aggregation; right now) to GPGPU, then x10 times faster than existing SQL implementation. Its characteristics well fits usual workloads of BI (business intelligence) tools in cost effective way, but not all. PG-Strom extension is released under the GPLv2 terms, and will be supported by PostgreSQL v9.5.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID S5276
Streaming:
Download:
 
Recent Advances in Multi-GPU Graph Processing
Giancarlo Carbone (Sapienza Universtity of Rome)
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achi ...Read More
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achieve excellent performances in the traversal, via a level synchronous Breadth First Search (BFS), of large scale graphs (i.e. million of nodes and billions of edges) using multiple GPUs system. We are going to present our recent activities related to GPU-based graph processing: a new implementation of the BFS based on a 2D partitioning exploiting the atomic operations of the Kepler architecture, two solutions to the st-connectivity problem and all-pairs shortest path. Some of these can be of immediate use in the analysis of large sets of data.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5337
Streaming:
Download:
 
GPU-Accelerated Network Centrality
Erik Saule (University of North Carolina at Charlotte, Department of Computer Science)
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) ...Read More
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) that need to be performed for ensuring the computation is correct. We will show how to interleave shortest path based computation in the context of network centrality metric to reduce the number of memory accesses and to maximize their coalescing. We will also see how the representation of the network in memory is key to balance thread divergence and the number of atomic operations.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5425
Streaming:
Download:
 
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Computing
Rajesh Bordawekar (IBM T. J. Watson Research Center), Ruchir Puri (IBM Research)
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM ...Read More
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5459
Streaming:
Download:
 
Multi-Dimensional, In-GPU-Memory Databases: Streaming Conditional Calculations in Big Data Sets
Peter Strohm (Jedox AG)
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated ...Read More
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated values as input for further processing like conditional calculating (if-then-else) or top-k evaluation and therefore often run into memory problems. We present the design of optimized condition-based processors in large data sets combined with a floating frame approach to stream through these data areas. Conditional calculations are especially useful to split large value sets into clusters for further analyzing or aggregating and we will provide examples on real world social media data including localized Twitter trends and Wikipedia page hits.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2015 - ID S5481
Streaming:
Download:
 
Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems by Extending Cloudera Impala
Jianting Zhang (The City College of New York)
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do ...Read More
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do not support geospatial data. In addition to our work on managing spatial data on single-node GPUs, we have integrated our parallel designs with an open source, a big data system called Impala to support both efficient and scalable distributed spatial query processing in an interactive SQL environment. We present system architecture, data parallel designs for spatial indexing and query processing as well as performance on real datasets for point-in-polygon test based spatial joins.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5489
Streaming:
 
Map-D: Hyper-Interactive GPU-Powered Visualytics for Big Data
Todd Mostak (Map-D)
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while sing ...Read More
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Real-Time Graphics, GTC 2015 - ID S5544
Streaming:
 
Scaling Data Visualization with GPUs and Design
Leo Meyerovich (Graphistry, Inc.)
GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximiz ...Read More

GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximize visibility. The bad news is that these layouts and basic interactions are computationally intensive enough that analysts can no longer simply slide a slider, drag a graph cluster, etc. With the availability of GPUs, however, the rules have changed. This talk shows examples of smarter designs and how we use GPUs to turn them into interactive tools. For experts, we will discuss how running in browsers and even phones led to Graphistry's tiered GPU visualization engine approach, and touch on our use of WebGL, WebCL, and our own in-house libraries.

  Back
 
Keywords:
Big Data Analytics, Web Acceleration, Visualization - In-Situ & Scientific, GTC 2015 - ID S5589
Streaming:
Download:
 
Fighting Malware With GPUs in Real Time
Peter Kovac (Avast Software)
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosys ...Read More
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosystem every day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5612
Streaming:
Download:
 
Single CUDA Block Implementation of Time Synchronous Viterbi Search for Speech Recognition
Nigel Cannings (Chase Information Technology Services Limited)
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utteranc ...Read More
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utterances are decoded simultaneously using CUDA streams. The single CUDA block approach is shown to be substantially more efficient and enables overlapping of CPU and GPU computation by merging ten thousands of separate CUDA kernel calls for each utterance. The proposed approach has the disadvantage of large GPU global memory requirement because of the simultaneous decoding feature. However, the latest GPU cards with up to 12GB of global memory fulfill this requirement and the full utilization of the GPU card is possible using all available SMs.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5658
Streaming:
 
POWER8 and GPUs: Helping Unfold the Intricate Loops of Genome Architecture (Presented by IBM)
Ido Machol (Baylor College of Medicine)
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for tr ...Read More
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for true chromatin loops. This team is working with IBM POWER8 and NVIDIA Tesla GPU technologies to creating customized algorithms for enabling genomics scientists to see fine details about genome folding and learn more about genetic regulation. The maps of looping revealed thousands of hidden switches not known to have existed before. For genes that cause diseases or cancers, locating these switches is essential. GPUs help speed up these algorithms up to 200x, reducing the cycle time to process a single chromosome from a week long process to less than a coffee break.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Life & Material Science, GTC 2015 - ID S5821
Streaming:
Download:
 
SenDISA: Distributed Intelligent, Video, Sensor & Actuator Analytics Platform for Smart Cities (Presented by Sensen)
Dr. Subhash Challa (Sensen Networks)
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Serv ...Read More
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Services Australia, Westgate Bridge, Melbourne, Australia, City of Trondheim, Norway, City of Brisbane, Ipswich city, Manly city and more. We will present how our innovative algorithms powered by the GPGPU based SenDISA platform is enabling Big Data analytic applications by fusing data from Video, Sensor & IoT devises and combining them with other transaction data to deliver smart city solutions across the globe. We will provide insights into the architecture of SenDISA and the market specific Big Data solutions serving different market verticals.  Back
 
Keywords:
Big Data Analytics, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5869
Streaming:
Download:
 
In-Place Computing on PostgreSQL: SQL as a Shortcut of GPGPU
Kohei KaiGai (NEC)
Near data computing is one recent technology trend. Cost of data translation is never ignorable, thus people are inclined to run their tasks on the location of data (e.g., Hadoop). Our PG-Strom technology transparently off-loads some CPU-intensive SQ ...Read More
Near data computing is one recent technology trend. Cost of data translation is never ignorable, thus people are inclined to run their tasks on the location of data (e.g., Hadoop). Our PG-Strom technology transparently off-loads some CPU-intensive SQL workloads to GPU devices, using automatic SQL-to-CUDA code generator. It enables users to describe their mathematical/statistical algorithm by SQL, then run these logic on the location very close to the data managed with PostgreSQL database. Usually, users have to export an entire dataset once, prior to what they really want to process. However, integration of GPU computing power within SQL database eliminates the necessity of these tasks, and allows researchers to focus on what they really want to dive into.  Back
 
Keywords:
Big Data Analytics, GTC 2016 - ID S6118
 
Attack Graphs: Visualizing 200M Alerts a Day with GPU Clouds and JavaScript
Leo Meyerovich (Graphistry, Inc.), Joshua Patterson (Accenture), Michael Wendt (Accenture)
Enterprises "assume breach": someone, somewhere, already compromised them. Analysts sift through a GB/min (or more!) of attack logs from hundreds of thousands of systems. For every identified incident, they then map out the entire breach by ...Read More
Enterprises "assume breach": someone, somewhere, already compromised them. Analysts sift through a GB/min (or more!) of attack logs from hundreds of thousands of systems. For every identified incident, they then map out the entire breach by backtracking through months of alerts. This talk shares how Graphistry and Accenture tackled the visual analytics problem: how do we explore big graphs? We''ll drill into two of our GPU technologies for visualizing graphs: [1] StreamGL, our distributed real-time renderer for delivering buttery interactions, smart designs, and responsive analytics to standard web devices; [2] Node-OpenCL and our CLJS client: open source JavaScript libraries for server-side GPU scripting.  Back
 
Keywords:
Big Data Analytics, Aerospace & Defense, Large Scale and Multi-Display Visualization, GTC 2016 - ID S6114
Streaming:
Download:
 
Unblock Performance Limit of DNN by CUDA? in R
Patric Zhao (NVIDIA)
You''ll learn technical solutions to accelerate R by CUDA. R''s DNN has become a very popular approach in statistical analysis areas. Even though there are several DNN packages in R, it is rarely used in big data and deep neural networking because th ...Read More
You''ll learn technical solutions to accelerate R by CUDA. R''s DNN has become a very popular approach in statistical analysis areas. Even though there are several DNN packages in R, it is rarely used in big data and deep neural networking because the single core performance of R is limited and the current design of DNN packages in R is not GPU-friendly. Firstly, we''ll introduce how we apply specific patterns, such as general matrix multiplication (GEMM), to DNN in R, which is a GPU-friendly pattern and can be easily accelerated by cuBLAS. Secondly, we''ll show the tradeoff between performance and memory usage in R for DNN. Finally, we''ll package all of these CUDA approaches into a R package and publish to CRAN so than anyone can install it in R quickly, and get significant performance improvement from NVIDIA GPUs.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Performance Optimization, GTC 2016 - ID S6156
Streaming:
Download:
 
CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs
Wei Tan (IBM T. J. Watson Research Center)
We present cuMF, a highly optimized matrix factorization system on GPUs. Matrix factorization (MF) is a key algorithm in recommender systems. On a single GPU, we introduce a memory-optimized alternating least square (ALS) method; it alleviates discon ...Read More
We present cuMF, a highly optimized matrix factorization system on GPUs. Matrix factorization (MF) is a key algorithm in recommender systems. On a single GPU, we introduce a memory-optimized alternating least square (ALS) method; it alleviates discontiguous access and aggressively uses registers, so as to reduce memory latency. On multiple GPUs, we combine data parallelism with model parallelism, and introduce a topology-aware parallel reduction method, so as to scale ALS to multiple GPUs. Using only one machine with four NVIDIA GPU cards, cuMF can be 6-10 times as fast, and 33-100 times as cost-efficient, compared with the state-of-art distributed CPU solutions. Moreover, cuMF can solve the largest matrix factorization problem ever reported.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Performance Optimization, GTC 2016 - ID S6211
Streaming:
Download:
 
Data Analytics and Machine Learning at Your Finger Tips - No CUDA Required
Bryan Thompson (Blazegraph), James Lewis (Blazegraph)
Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) a ...Read More
Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) and graph pattern matching (SPARQL) that provide speedups of up to 1,000x over Spark native and up to 300x over leading graph databases when executed on the BlazeGraph platform. These high-level languages are translated into task graphs that expose the available parallelism. The mapgraph runtime evaluates the task graphs and provides a scalable architecture on GPUs and GPU clusters. This presentation discusses the concepts for graph algorithms and queries, the mapgraph architecture, and how algorithms are evaluated on a GPU cluster.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Aerospace & Defense, GTC 2016 - ID S6267
Streaming:
Download:
 
Accelerating Spark Workloads Using GPUs
Rajesh Bordawekar (IBM Research)
The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We''ll discus ...Read More
The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We''ll discuss opportunities for exploiting GPUs for accelerating different Spark components such as MLLib. The talk will first overview the Spark programming and execution model and the describe key issues in integrating GPUs into the Spark infrastructure. We then describe our approach for enabling Spark to use multiple GPUs in a distributed manner and provide details of accelerating key MLLib kernels without changing the source Spark program.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Algorithms, GTC 2016 - ID S6280
Streaming:
Download:
 
Anomaly Detection and Categorization Using Unsupervised Deep Learning
Stephen McGough (Durham University)
The potential information buried within datasets is immense -- though extracting this information is difficult when the data is large, noisy, unlabeled, and unstructured. We present the use of GPGPU-powered unsupervised deep learning to identify the ...Read More
The potential information buried within datasets is immense -- though extracting this information is difficult when the data is large, noisy, unlabeled, and unstructured. We present the use of GPGPU-powered unsupervised deep learning to identify the anomalies within such datasets. Analysis of these anomalies can be performed to determine which are "pertinent" and which are "benign." Once the significance of an anomaly has been determined, this then becomes a label, which is added to the data. Repeating this process will lead to unlabeled data becoming labeled. This newly labeled data can be used to train a supervised deep learning system to identify new instances of that stereotype. We demonstrate how GPGPUs can be used to enable real-time anomaly detection and stereotyping.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6340
Streaming:
Download:
 
Visual Sensemaking with GPU-Driven Machine Learning
Stef van den Elzen (SynerScope BV)
We show how our interactive, integrated analytics solution allows a new class of users to perform machine-assisted visual sensemaking. Up till now, machine learning techniques such as predictive analytics and deep learning are mostly used as part of ...Read More
We show how our interactive, integrated analytics solution allows a new class of users to perform machine-assisted visual sensemaking. Up till now, machine learning techniques such as predictive analytics and deep learning are mostly used as part of a complex tool-chain that serves as an endpoint in the decision making process. We combine the strengths of human decision making and GPU-driven machine learning in a multi-coordinated visual analytics solution. This enables the discovery of actionable insights by bridging the gap between data scientist and business user.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Self-Driving Cars & Automotive, GTC 2016 - ID S6356
Streaming:
 
Graph Analytics: Using GPU-Accelerated Sparse Linear Algebra Routines
Paul Fox (EM Photonics, Inc.)
Large-scale graph analytics frameworks provide a convenient and highly scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. ...Read More
Large-scale graph analytics frameworks provide a convenient and highly scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. We''re developing an implementation of the high-level functions supported by these APIs in terms of linear algebra operations, which will be parallel on each pair of vertices connected by an edge. This technology can reduce the number of nodes required and map well to computational accelerators such as GPUs, thus enabling users to perform more complex analysis with less hardware at lower cost. We''ll detail our latest work on this project, including challenges, specifics of our approach, and preliminary results.  Back
 
Keywords:
Big Data Analytics, Algorithms, GTC 2016 - ID S6360
Streaming:
 
Dominoes: Exploratory Data Analysis of Software Repositories Through GPU Processing
Jose Ricardo da Silva Junior (Universidade Federal Fluminense), Esteban Clua (Universidade Federal Fluminense)
Learn how to perform data analysis over software repositories in GPU architecture Dominoes tool. We''ll give an overview and introduction of the tool and its capabilities, which provides a unified view of the computational resources. Dominoes allows ...Read More
Learn how to perform data analysis over software repositories in GPU architecture Dominoes tool. We''ll give an overview and introduction of the tool and its capabilities, which provides a unified view of the computational resources. Dominoes allows anyone to explore large software repositories at any grain (files, methods, or classes), without using any programming language. Due to its high-level parallel architecture in GPU, the results are processed in real time. The attendees will learn the strategy used by Dominoes to allow big data to be processed over GPU.  Back
 
Keywords:
Big Data Analytics, Tools & Libraries, GTC 2016 - ID S6372
Streaming:
Download:
 
Gunrock: A Fast and Programmable Multi-GPU Graph Processing Library
Yangzihao Wang (University of California Davis), Yuechao Pan (University of California Davis)
We present Gunrock, a multi-GPU graph processing library, that enables easy graph algorithm implementation and extension onto multiple GPUs for scalable performance on large graphs with billions of edges. Attendees can learn how to 1) solve large-sca ...Read More
We present Gunrock, a multi-GPU graph processing library, that enables easy graph algorithm implementation and extension onto multiple GPUs for scalable performance on large graphs with billions of edges. Attendees can learn how to 1) solve large-scale graph problems with high-performance GPU computing primitives and optimization strategies, using our high-level data-centric abstraction that focuses on vertex or edge frontier operations, and 2) utilize multi-GPU computing power by just a few algorithm-dependent blocks, using our multi-GPU framework that handles most multi-GPU implementation details and memory allocation. We will also share experience on the library''s design and implementation that helps it achieve the best performance among programmable GPU graph libraries.  Back
 
Keywords:
Big Data Analytics, Tools & Libraries, Supercomputing & HPC, GTC 2016 - ID S6374
Streaming:
Download:
 
Graph Database and Analytics in a GPU-Accelerated Cloud Offering
Brad Bebee (Blazegraph), Dave Driggers (Cirrascale Corporation)
Blazegraph GPU provides 300X acceleration for SPARQL graph query and graph database management with acceleration for existing RDF/SPARQL and Property Graph (Tinkerpop) applications. Multi-GPU configurations can effectively manage billion+ edge graphs ...Read More
Blazegraph GPU provides 300X acceleration for SPARQL graph query and graph database management with acceleration for existing RDF/SPARQL and Property Graph (Tinkerpop) applications. Multi-GPU configurations can effectively manage billion+ edge graphs on single-node machines with 4 or 8 K80 GPU accelerators. This is a cost-effective way to deliver high performance for graphs, but many end-users and applications do not have existing multi-GPU systems; current cloud offerings at this scale are not generally available. Cirrascale has developed a cloud-based solution for provisioning multi-GPU Tesla systems using its switch riser technology. This session details the Blazegraph GPU cloud offering on Cirrascale, demonstrates how to quickly deploy it in the cloud, and shows graph benchmarks on cloud systems.  Back
 
Keywords:
Big Data Analytics, Data Center & Cloud Computing, Aerospace & Defense, GTC 2016 - ID S6395
Streaming:
Download:
 
Production Intelligence: GPU-Databases for Predictive Maintenance and In-Line Controlling in Automobile Manufacturing
Peter Strohm (Jedox AG)
Learn how in-GPU-memory databases optimize complex manufacturing processes by enabling real-time data input into big datasets, in-line decision making, and predictive maintenance. In general, manufacturing processes today provide tons of data, e.g., ...Read More
Learn how in-GPU-memory databases optimize complex manufacturing processes by enabling real-time data input into big datasets, in-line decision making, and predictive maintenance. In general, manufacturing processes today provide tons of data, e.g., on the process itself, workpieces, machine sensor data, parts delivered by external vendors, etc. In the Production Intelligence project, our goal is to turn this unspecific data into "smart data" to gain better insight in the manufacturing process, e.g., prevent machine shutdowns or decrease the amount of junk parts. We''ll present our solutions to streaming input data vectors into big datasets, analyzing incoming data in real time and predicting production or system errors with the help of deep learning algorithms.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, Computer Vision & Machine Vision, GTC 2016 - ID S6426
Streaming:
Download:
 
Accelerating Influence Spread Estimation on Social Networks in the Continuous-Time Domain
Zissis Poulos (University of Toronto, Sysomos Inc.)
This session showcases how to leverage GPUs to accelerate influence spread estimation in large social networks. Estimating the spread of an opinion or product across members of a graph-modelled social network is a hard problem requiring compute-inten ...Read More
This session showcases how to leverage GPUs to accelerate influence spread estimation in large social networks. Estimating the spread of an opinion or product across members of a graph-modelled social network is a hard problem requiring compute-intensive approximation algorithms. The complexity of the problem further rises in the continuous-time domain, where influence transmission rates on network edges are derived from stochastic distributions. Spread estimation algorithms that operate on stochastic transmission rates, such as naive sampling and neighbourhood size estimation, require a plethora of samples to achieve convergence. By exploiting the inherent independence across multiple sampling iterations of these algorithms we achieve up to 11x improvement in run-time using GPUs.  Back
 
Keywords:
Big Data Analytics, Algorithms, GTC 2016 - ID S6471
Streaming:
Download:
 
The Promise of GPU Analytics or Why GPU is the New CPU
Todd Mostak (MapD)
We''ll explain why GPU-powered in-memory databases and analytics platforms are the logical successor to CPU in-memory systems, largely due to recent increases in onboard memory available on GPUs. With sufficient memory, GPUs possess numerous advantag ...Read More
We''ll explain why GPU-powered in-memory databases and analytics platforms are the logical successor to CPU in-memory systems, largely due to recent increases in onboard memory available on GPUs. With sufficient memory, GPUs possess numerous advantages over CPUs, including much greater compute and memory bandwidth and a native graphics pipeline. We''ll demo how MapD is able to leverage multiple GPUs per server to extract orders-of-magnitude performance increases over CPU-based systems, bringing interactive querying and visualization to multi-billion row datasets.  Back
 
Keywords:
Big Data Analytics, Performance Optimization, GTC 2016 - ID S6472
Streaming:
Download:
 
Data Science Applications of GPUs in the R Language
Norm Matloff (University of California, Davis)
In this presentation, you will learn about the use of GPUs in data science applications using the R language, as well as a general method, Software Alchemy, for parallelizing statistical applications. The talk will provide an overview of R libraries ...Read More
In this presentation, you will learn about the use of GPUs in data science applications using the R language, as well as a general method, Software Alchemy, for parallelizing statistical applications. The talk will provide an overview of R libraries available for interfacing with GPUs, and discussion of issues involved in writing such libraries, before showing you how to use Software Alchemy (with or without R) to overcome GPU memory limitations in statistical applications.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, GTC 2016 - ID S6708
Streaming:
Download:
 
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning (Presented by Hewlett Packard Enterprise)
Natalia Vassilieva (Hewlett Packard Enterprise)
Applications of deep learning in sensor data analysis has not been studied as extensively as in speech and vision. However, sensor data have properties similar to those of images and audio: multidimensional, with intrinsic dependencies and correlatio ...Read More
Applications of deep learning in sensor data analysis has not been studied as extensively as in speech and vision. However, sensor data have properties similar to those of images and audio: multidimensional, with intrinsic dependencies and correlations in the data, and hard to analyze with conventional approaches. Our results prove that deep learning has better generalization capabilities compared to conventional methods on sensor data and has high potential in sensor data analytics. We also address scalability issues of the training process for models best suited for sensor data. The training of these models do not scale-out beyond a certain number of nodes.  Back
 
Keywords:
Big Data Analytics, Deep Learning & Artificial Intelligence, IoT, GTC 2016 - ID S6773
Streaming:
Download:
 
The OpenPOWER Foundation: Revolutionizing Data-Centric Transformation (Presented by IBM)
Sumit Gupta (IBM Power Systems)
The growth of the OpenPOWER Foundation has been phenomenal. Why, you might ask? In less than two years, OpenPOWER has grown from five members to over 180, with membership across all tiers of hardware, software, and end users themselves. The Foundatio ...Read More
The growth of the OpenPOWER Foundation has been phenomenal. Why, you might ask? In less than two years, OpenPOWER has grown from five members to over 180, with membership across all tiers of hardware, software, and end users themselves. The Foundation provides a compelling and rapidly growing open approach to infrastructure and software for rapidly changing workloads and evolving IT consumption models. This is a revolution that is making a profound difference in the price/performance criteria of end users, as well as accelerating compelling development for performance to drive business advantage. OpenPOWER members are co-creating their approach to technology?as innovators, producers, and consumers utilizing IBM''s Power Architecture.  Back
 
Keywords:
Big Data Analytics, Data Center & Cloud Computing, Supercomputing & HPC, GTC 2016 - ID S6825
Streaming:
Download:
 
Towards a High Performance Analytics and Computing Platform for Brain Research
Dirk Pleiter (Forschungszentrum Juelich)
Understanding and modeling the human brain continues to be one of the biggest challenges of research. The Human Brain Project is a European flagship, which is in the process of creating a research infrastructure that will facilitate this research. Ma ...Read More
Understanding and modeling the human brain continues to be one of the biggest challenges of research. The Human Brain Project is a European flagship, which is in the process of creating a research infrastructure that will facilitate this research. Many research topics in this field require scalable compute resources or the ability to process extreme-scale data volumes (in some cases even both). Examples are approaches to simulate the network of a human brain in its full complexity and the efforts to create high-resolution brain atlases. GPUs play already today an important role to realize the necessary computational capabilities. We''ll give an overview of the efforts of building an high-performance analytics and computing platform for brain research.  Back
 
Keywords:
Big Data Analytics, Press-Suggested Sessions: HPC & Science, Supercomputing & HPC, GTC 2016 - ID S6655
Streaming:
Download:
Bioinformatics & Genomics
Presentation
Media
Algorithms and Tools for Bioinformatics on GPUs
Bertil Schmidt (Nanyang Technological University)
Learn how to use GPUs to accelerate compute- and data-intensive applications and algorithms Bioinformatics. High-throughput techniques for DNA sequencing and gene expression analysis with microarrays have led to a rapid growth in the amount of d ...Read More

Learn how to use GPUs to accelerate compute- and data-intensive applications and algorithms Bioinformatics. High-throughput techniques for DNA sequencing and gene expression analysis with microarrays have led to a rapid growth in the amount of digital biological data, e.g. the NCBI Sequence Read Archive (SRA) houses raw sequence data generated by next-generation sequencing (NGS) technologies which succeeds 25 trillion base-pairs. Therefore, modern bioinformatics tools need to be scalable; i.e. they need to deal with an ever growing amount of data. GPUs and CUDA provide the opportunity to significantly reduce the runtime of many biological algorithms on inexpensive hardware.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2008
Streaming:
Download:
 
SeqNFind: Application Of CUDA GPU Technologies To Sequence Alignment Techniques
D. Andrew Carr (Accelerated Technology Laboratories Inc.)
Explosive growth in the amount of genomic data has created a need for faster systems that align and compare nucleotide sequences. With the development of tools for leveraging the massively parallel architecture of NVIDIA GPUs it is a logical nex ...Read More

Explosive growth in the amount of genomic data has created a need for faster systems that align and compare nucleotide sequences. With the development of tools for leveraging the massively parallel architecture of NVIDIA GPUs it is a logical next step to construct algorithms for genomic analysis on GPU clouds/clusters. Although a seemingly simple task, there are a number of challenges to deploying the current algorithms. Every algorithm from Smith-Waterman to BLAST has its own unique set of barriers. Presented here some of the lessons learned and how ongoing genomic research projects have benefitted from the increased speed and accuracy.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2037
Streaming:
Download:
 
Swift: A GPU-based Smith-Waterman Sequence Alignment Program
Pankaj Gupta (St Jude Children's Research Hospital)
This session describes Swift, a GPU-based Smith-Waterman implementation for aligning short DNA sequences to large genomes. Swift has been designed to reduce computation time and lower hardware cost. Also, unlike other leading GPU-based Smith-Wat ...Read More

This session describes Swift, a GPU-based Smith-Waterman implementation for aligning short DNA sequences to large genomes. Swift has been designed to reduce computation time and lower hardware cost. Also, unlike other leading GPU-based Smith-Waterman sequence alignment programs like CUDASW++ and SWCUDA which focus on protein sequence alignment, Swift has been developed for DNA sequence alignment. Swift performs 200x faster than CUDASW++ using a test data set containing 1000 reads (100 bases each) and 1000 references (1000 bases each), and it performs 11x faster than the CPU-based implementation of Smith-Waterman using 24 million reads (100 bases each) and human chromosome 1.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2083
Streaming:
Download:
 
CUMACH - A Fast GPU-based Genotype Imputation Tool
Agatha Hu (NVIDIA)
The goal of this session is to introduce a GPU-implemented tool in bioinformatics. Genotype imputation is method which extrapolates genetic correlations from a densely characterized reference panel to a sparsely typed study sample. There have al ...Read More

The goal of this session is to introduce a GPU-implemented tool in bioinformatics. Genotype imputation is method which extrapolates genetic correlations from a densely characterized reference panel to a sparsely typed study sample. There have already been lots of CPU-based tools, but they all cost lots of time for large data-set. In this session, we try to implement a GPU-based imputation tool which can get relatively good result and fast speed. There will be three main parts for the session: 1) Introduce the background and its HMM based algorithm, 2) GPU implementation and optimization, 3) Results.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2084
Streaming:
Download:
 
SOAP3: GPU-based Compressed Indexing and Ultra-fast Parallel Alignment of Short Reads
BingQiang Wang (BGI)
We give the fi_x000C_rst implementation of a compressed index (Burrows-Wheeler Transform) on the GPU, supporting very efficient parallel alignment of short patterns (reads) onto the human genome. The new alignment software SOAP3 is tens of times ...Read More

We give the fi_x000C_rst implementation of a compressed index (Burrows-Wheeler Transform) on the GPU, supporting very efficient parallel alignment of short patterns (reads) onto the human genome. The new alignment software SOAP3 is tens of times faster than existing ones and can catch up the throughput (Giga to Tera bp) of next generation DNA sequencer. It takes 2.4 seconds to perform exact matching for one million length-100 reads (tens of seconds for small-error approximate matching). Technically, we show how to minimize memory accesses to the index from individual threads and to control the branching and divergence of the threads.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2109
Streaming:
Download:
 
Accurate Sequence Alignment Using Distributed Filtering on GPU Clusters
Reza Farivar (University of Illinois at Urbana-Champaign), Shivaram Venkataraman (UC Berkeley)
Learn how GPUs enable new ways to rethink a complex bioinformatics problem: Accurate sequence alignment. What was once prohibitive to compute can become the basic block of novel GPU-based algorithms. Modern DNA sequencing machines generate enorm ...Read More

Learn how GPUs enable new ways to rethink a complex bioinformatics problem: Accurate sequence alignment. What was once prohibitive to compute can become the basic block of novel GPU-based algorithms. Modern DNA sequencing machines generate enormous amounts of short sequences within minutes, and they should be aligned to a reference genome in real time. Most solutions only find a few locations that match a short sequence. We introduce a new technique to find all matching locations inside a reference sequence for a given number of mismatches. Our technique is based on a distributed filtering scheme and GPU based processing.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2152
Streaming:
Download:
 
Towards Computing the Cure for Cancer
Wu Feng (Virginia Tech), Heshan Lin (Virginia Tech)
Learn about how to create "designer" genomic analysis pipelines as part of the "Compute the Cure" for cancer initiative from NVIDIA Foundation. Get an overview of an open-source framework that enables the creation of customiz ...Read More

Learn about how to create "designer" genomic analysis pipelines as part of the "Compute the Cure" for cancer initiative from NVIDIA Foundation. Get an overview of an open-source framework that enables the creation of customized genomic analysis pipelines. Discover how different plug-ins from the "mapping/realignment/discovery" repositories, respectively, can be composed to form a genomic analysis pipeline. Learn to use next-generation sequencing data to characterize previously undetectable genetic changes between normal and malignant cells. Find out how you can contribute to the "Compute the Cure" cause.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2156
Streaming:
Download:
 
High-Throughput Epistasis Screening Using GPUs
Mark Seligman (Insilicos LLC)
Epistasis is the interaction of two or more genes in coding for a biological property. Epistasis is believed to be an important factor in an individual's susceptibility to disease, and the search for epistasis is a major component in the dev ...Read More

Epistasis is the interaction of two or more genes in coding for a biological property. Epistasis is believed to be an important factor in an individual's susceptibility to disease, and the search for epistasis is a major component in the development of personalized approaches to genomic medicine. Statistical tests for epistasis are typically confounded by the multiple-testing problem, that is, the aggregated loss of precision incurred through repeated hypothesis testing. One way to circumvent this problem is to simulate a false-discovery rate via resampling. We report success in using GPUs to accelerate these highly compute-intensive resampling techniques.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2337
Streaming:
Download:
 
GPGPU Accelerated Protein Similarity Measures Identifying Biological Relevant Structure
Edward Lowe (Vanderbilt University), Nils Woetzel (Vanderbilt University)
Atomic structure similarity measures for proteins help in de novo protein structure prediction. For a large set of computationally generated protein structures (~20k) all pairwise similarities have to be calculated to cluster structures. Common ...Read More

Atomic structure similarity measures for proteins help in de novo protein structure prediction. For a large set of computationally generated protein structures (~20k) all pairwise similarities have to be calculated to cluster structures. Common similarity measures are root mean square deviation (RMSD) and global distance test total score (GDT_TS). Although GDT_TS has advantages over RMSD, it is not used due to its time consuming calculation. Afore mentioned and other similarity measures are ported for parallel execution on GPGPUs to make them amenable for clustering de novo generated structural models to find the largest cluster representing the biological relevant protein conformations.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2346
Streaming:
Download:
 
Dynamic Programming on CUDA: Finding the Most Similar DNA Sequence
Grzegorz Kokosinski (IBM Poland), Krzysztof Zarzycki (IBM Poland)
Learn a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU. Our particular problem regarded DNA sequences: given a reference sequence, how to find the one most similar to it among a large database? The seque ...Read More

Learn a couple of techniques to speed up compute-heavy Dynamic Programming algorithms on the GPU. Our particular problem regarded DNA sequences: given a reference sequence, how to find the one most similar to it among a large database? The sequences are millions characters long, and their similarity is calculated with a (quadratic) DP algorithm, which makes the problem very tough even for the GPUs. We speed up both the theoretical and practical side: we present programming techniques that enable Dynamic Programming to be performed at the hardware speed, and improvements to the algorithm itself that drastically lower the execution time.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2376
Streaming:
Download:
 
The Advantage of GPU Computation for Analyzing Complex Traits
Jun Zhu (Zhejiang University)
Most import agriculture traits and human diseases are complex traits which are controlled by gene network with gene by gene interaction (epistasis) and gene by environment interaction (GE). New statistic methods and software are developed for an ...Read More

Most import agriculture traits and human diseases are complex traits which are controlled by gene network with gene by gene interaction (epistasis) and gene by environment interaction (GE). New statistic methods and software are developed for analyzing genetic architecture for complex traits based on genome-wide association study (GWAS). When deal with large mapping population and huge amount of molecular information, GPU computation has an advantage over CPU computation. We will demonstrate the newly developed GPU based software QTLNetwork V3.0 and GWAS-GMDR for mapping genes with epistasis and GE interaction for complex traits of human, crops, and mouse.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2516
Streaming:
Download:
 
GPU Accelerated Bioinformatics Research at BGI
BingQiang Wang (BGI)
After digitizing DNA double helix by sequencing, computation is the key connecting raw sequences with life science discoveries. As massive data is generated, how to process and analysis as well as storage them in an efficiently manner turns out ...Read More

After digitizing DNA double helix by sequencing, computation is the key connecting raw sequences with life science discoveries. As massive data is generated, how to process and analysis as well as storage them in an efficiently manner turns out to be a major challenge. By developing GPU accelerated bioinformatics tools and integrate them into pipelines, BGI researchers now run analysis pipelines in several hours instead of several days. These tools include SOAP3 aligner, SNP calling and tool for population genomics. The speed up is generally around 10-50x comparing with traditional counterparts.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID S2519
Streaming:
Download:
 
Acceleration of Complex Network Analysis
Athanasios Grivas (Newcastle University)
The scientific role of complex networks nowadays is of great importance. Their universal characteristics can be adopted for use from all over the scientific fields as network pharmacology.There is need for acceleration where the time execution of the ...Read More
The scientific role of complex networks nowadays is of great importance. Their universal characteristics can be adopted for use from all over the scientific fields as network pharmacology.There is need for acceleration where the time execution of the used algorithms will be decreased in a large scale.The breakthrough is the use of GPUs and parallel computing in order to accelerate the whole process.The transformation of common algorithms as matrix multiplication to a parallel model has shown large acceleration, which is a promising point for the field of network analysis.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID P2451
Download:
 
GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics
Shuji Suzuki (Tokyo Institute of Technology)
A vast amount of sensitive homology searches is required for mapping sequence data to known protein sequence databases in metagenomic analysis. However, fast search tools such as BLAT do not have enough search sensitivity for metagenomic analysis. Th ...Read More
A vast amount of sensitive homology searches is required for mapping sequence data to known protein sequence databases in metagenomic analysis. However, fast search tools such as BLAT do not have enough search sensitivity for metagenomic analysis. Thus a sensitive and efficient homology search tool is highly required. We develop GPU optimized algorithm for performing sensitive sequence homology searches. We implemented as the GPU-Accelerated Homology Search Tool for Metagenomics (GHOSTM), achieves calculation speeds faster and search accuracy higher than BLAT program. Our results indicate that GHOSTM offers a potentially cost-efficient solution to the increasingly difficult computational analysis of metagenomic data.   Back
 
Keywords:
Bioinformatics & Genomics, GTC 2012 - ID P2500
Download:
 
Photorealistic and Interactive Molecule Visualizer
Cyrille Favreau
IMV is an interactive molecule visualizer based on a ray-tracing engine. Targeting high quality images and ease of interaction, IPV uses the latest GPU computing acceleration techniques, combined with natural user interfaces such as Kinect and Wiimot ...Read More
IMV is an interactive molecule visualizer based on a ray-tracing engine. Targeting high quality images and ease of interaction, IPV uses the latest GPU computing acceleration techniques, combined with natural user interfaces such as Kinect and Wiimotes.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID P3106
Download:
 
Fast GPU Applications in Bioinformatics
Fang Liu (SuperComputing Center of Chinese Academy of Sciences)
This work presents a fast algorithm to compute linkage disequilibrium (LD) on GPU using CUDA. The traditional accumulations can be converted to bitwise operations equivalently, thus can benefit from a specially designed single instruction '__popc' ...Read More
This work presents a fast algorithm to compute linkage disequilibrium (LD) on GPU using CUDA. The traditional accumulations can be converted to bitwise operations equivalently, thus can benefit from a specially designed single instruction '__popc' on NVIDIA GPU devices. So the algorithm processes 32 samples simultaneously using only several bitwise instructions, and reduces the input data of each allele to 1/4 from a 8-bit 'char' to two bits. Experimental results shows that our algorithm can gain around a thousand times speedup than its serial counterparts on CPU using NVIDIA C2075 cards.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID P3153
Download:
 
Parallel strategies for identifying genetic networks describing the biological clock using GPUs
Ahmad Al-Omari (Institute of Bioinformatics/The University of Georgia)
A graphics processing unit (GPU) offers a solution to a very important and fundamental problem of interest to many researchers, a problem that would be prohibitive to solve without the technology of GPUs. The problem is how the biological clock contr ...Read More
A graphics processing unit (GPU) offers a solution to a very important and fundamental problem of interest to many researchers, a problem that would be prohibitive to solve without the technology of GPUs. The problem is how the biological clock controls the rhythms of ~2400 genes in the genome (with 11,000 genes) of a model system, the filamentous fungus, Neurospora crassa (Dong, et al.,2008). Ultimately, we want to be able to predict and hence understand the dynamics of all of these genes and their products in a genetic network describing how the clock functions.   Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID P3194
Download:
 
An Ultra-Fast Computing Pipeline For Metagenome Analysis With GPUs
Shuji Suzuki (Tokyo Institute of Technology)
Metagenome analysis is useful for not only understanding symbiotic systems but also watching environment pollutions. However, metagenome analysis requires sensitive sequence homology searches which require large computation time and it is thus a bott ...Read More
Metagenome analysis is useful for not only understanding symbiotic systems but also watching environment pollutions. However, metagenome analysis requires sensitive sequence homology searches which require large computation time and it is thus a bottleneck in current metagenome analysis based on the data from the latest DNA sequencers generally called a next-generation sequencer. To solve the problem, we developed a large-scale computing pipeline for metagenome analysis on TSUBAME2 in Tokyo Institute of Technology.   Back
 
Keywords:
Bioinformatics & Genomics, Supercomputing & HPC, GTC 2013 - ID P3196
Download:
 
G-MSA - a Powerful GPU-based Tool for Multiple Sequence Alignment
Wojciech Frohmberg (Poznan University of Technology)
Life sciences experience a great deal of problems that computer-based tools solve automatically. These methods, although accurate and effective, need to face rapidly increasing size of input datasets. This also applies to the Multiple Sequence ...Read More
Life sciences experience a great deal of problems that computer-based tools solve automatically. These methods, although accurate and effective, need to face rapidly increasing size of input datasets. This also applies to the Multiple Sequence Alignment problem. G-MSA, that is our tool addressing this problem, is able to handle growing input instances effectively. This has been achieved by an adaptation of existing algorithm to distributed environment of powerful machines equipped with multiple graphics cards. The poster outlines the mechanisms that allow G-MSA to deal with a massive number of simultaneously working threads and presents algorithmic tricks behind the tool accuracy.   Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID P3217
Download:
 
BCL: ChemInfo - GPU-Accelerated Cheminformatics Suite for Probe Development and Drug Discovery
Edward Lowe (Vanderbilt University)
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, methods for Ligand-Based Computer-Aided Drug Discovery (LB-CADD) have the potential to accelerate, reduce cost, and increases quality of probe deve ...Read More
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, methods for Ligand-Based Computer-Aided Drug Discovery (LB-CADD) have the potential to accelerate, reduce cost, and increases quality of probe development and drug discovery efforts. From a computational science and technology perspective, increased public availability of large HTS data sets stimulates the development of innovative LB-CADD tools that should than be applied in academic research. Here, we present BCL::ChemInfo, a cheminformatics framework featuring GPU acceleration, MYSQL integration, and automation of model optimization. We present several current studies leveraging BCL::ChemInfo against targets indicated in cancer, malaria, and neuroscience.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID P3241
Download:
 
Acceleration of Biological Circuit Reconstruction: Biological Clock System in Neurospora Crassa
Chulwoo Lim (University of Georgia)
A fundamental and ubiquitous difficulty of systems biology is identifying relevant model parameters. A genetic network model of the biological clock of Neurospora crassa that is quantitatively consistent with the available RNA and protein profiling d ...Read More
A fundamental and ubiquitous difficulty of systems biology is identifying relevant model parameters. A genetic network model of the biological clock of Neurospora crassa that is quantitatively consistent with the available RNA and protein profiling data was proposed. However, the oscillating nature of biological models poses more challenge for identifying model parameters due to the high dimensional complex search space and computational cost of numerically solving ODEs. In this work, an evolutionary algorithm leveraging the GPU architecture is proposed. Our implementation identified promising model parameters with speedup of two orders of magnitude versus CPU implementation.  Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID P3245
Download:
 
CUDA-Enabled Applications for Next-generation Sequencing
Bertil Schmidt (Johannes Gutenberg University Mainz)
Next-Generation Sequencing (NGS) refers to new technologies for high-throughput DNA sequencing which produce up to billions of DNA or RNA reads in short time and at low cost. To exploit NGS, efficient parallel and scalable algorithms and tools a ...Read More

Next-Generation Sequencing (NGS) refers to new technologies for high-throughput DNA sequencing which produce up to billions of DNA or RNA reads in short time and at low cost. To exploit NGS, efficient parallel and scalable algorithms and tools are needed to process the massive amount of generated reads within a reasonable amount of time. This talk will present several CUDA-enabled algorithms and data structures to accelerate (i) the accurate processing of short/long read alignment to human genomes (i.e. CUSHAW and CUSHAW2) and (ii) the analysis of metagenomic data from microbial environmental sequencing studies (CRiSPy-CUDA and CRiSPy-Embed).

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3004
Streaming:
Download:
 
Ultra Fast Sequence Alignment for the DNA Assembly Problem
Michal Kierzynka (Poznan University of Technology, Poznan Supercomputing and Networking Center)
The goal of this session is to present a software tool performing pairwise sequence alignment of nucleotide sequences as well as GPU optimizations used to achieve the top performance. The software uses the dynamic programming method to efficient ...Read More

The goal of this session is to present a software tool performing pairwise sequence alignment of nucleotide sequences as well as GPU optimizations used to achieve the top performance. The software uses the dynamic programming method to efficiently compute the exact alignment in a form that may be conveniently used in the DNA de-novo assembly problem. Its uniqueness is also due to the fact that it has been optimized for nucleotide reads coming from modern sequencers (Illumina/Solexa, Roche/454, AB/SOLiD). As a result, it is currently the fastest implementation of the Needlemen-Wunch algorithm, reaching up to 89GCUPS on a single GPU, and scaling up well on multiple GPUs systems. The following real-world use case will be presented: the application of the software in finding similar sequences in huge datasets coming from the next-generation Illumina sequencer.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3025
Streaming:
Download:
 
A Scalable Short-read Sequence Aligner Using a CUDA Kernel Pipeline
Richard Wilton (Johns Hopkins University -- Department of Physics and Astronomy)
The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data ...Read More

The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data. Compute nodes in the cluster contain multiple CPU cores, 100GB or more of system RAM, and one or more GPUs; a prototype node is implemented with 12 CPU cores (24 hyperthreads), 144GB of RAM, and four NVIDIA C2070s. In this session we will describe the design of a genomic sequence-alignment application that targets the cluster compute-node hardware. We will discuss the algorithms we use and how they are implemented as CUDA kernels, point out the key optimizations in the implementation, and look at the performance of the software.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3092
Streaming:
Download:
 
GWIS-GS: GPU-Accelerated Screening Platform for Second Order Genome Wide Interaction Search
Qiao Wang (National ICT Australia Victoria Lab), Adam Kowalczyk (Victorian Research Laboratory of National ICT Australia)
This talk will show a developed platform enabling an exhaustive analysis of all pairwise (2nd order) interactions of Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Studies (GWAS) data. Given typical datasets of 300K SNPs and 3 ...Read More

This talk will show a developed platform enabling an exhaustive analysis of all pairwise (2nd order) interactions of Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Studies (GWAS) data. Given typical datasets of 300K SNPs and 3K samples, our GPU-accelerated solution is capable of completing the search below 3 minutes on single NVIDIA GTX470. The method involves construction of contingency tables for all SNP-pairs followed by a battery of conventional statistical tests such as Fisher-Exact and Variance Explained. All previous implementations described in the literature required hours, days or even months to complete the same analysis. In addition, presented will be an interface that allows users to define their own statistical tests at runtime and describe our latest developments towards practical 3rd order implementation.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3169
Streaming:
Download:
 
Unveiling Cellular Mechanisms Using GPU-based Sparse Linear Algebra
Marco Maggioni (University of Illinois at Chicago)
In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical M ...Read More

In this session we present an innovative system biology application of GPU computing, as an alternative to molecular dynamics simulation for studying biochemical mechanisms inside the cells. For the first time we are able to apply the Chemical Master Equation (CME) stochastic framework at large scale, determining both probabilistic steady-state and transient dynamic of biochemical reaction networks. Our GPU implementation leverages the structure of the problem to optimize the sparse linear algebra routines needed by the stochastic model. As a result, we achieve an average 15.57x speedup over the optimized Intel MKL library running on a 64-core architecture.

  Back
 
Keywords:
Bioinformatics & Genomics, Quantum Chemistry, GTC 2013 - ID S3245
Streaming:
Download:
 
Tackling Big Data in Genomics with GPU
BingQiang Wang (Beijing Genomics Institute)
GPU can help scientists find new clues for curing cancer as well as other diseases from massive genomics data. How to compute and mine against huge volume of genomics and derivated data becomes a major challenge. Compression technique helps redu ...Read More

GPU can help scientists find new clues for curing cancer as well as other diseases from massive genomics data. How to compute and mine against huge volume of genomics and derivated data becomes a major challenge. Compression technique helps reduce volume and more efficient access. GPU accelerated version for typical compression algorithms are developed with speed up around 2-10x. Integrating Hadoop framework with GPU is very promising for large-scale analysis over big data like genome-wide associate study (GWAS), which made the entire analysis more balanced in terms of computing to data access ratio.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3257
Streaming:
Download:
 
Using SIMD Video Instructions to Achieve 200 GCUPs with K10 for Smith-Waterman
Erich Elsen (Royal Caliber)
Learn how to use the SIMD video instructions introduced with the Kepler architecture to accelerate Smith-Waterman and Needleman-Wunsch DNA sequence alignment. Speedups of up to 3x over scalar code are possible - new code achieves over 80 GCUPs o ...Read More

Learn how to use the SIMD video instructions introduced with the Kepler architecture to accelerate Smith-Waterman and Needleman-Wunsch DNA sequence alignment. Speedups of up to 3x over scalar code are possible - new code achieves over 80 GCUPs on a GeForce GTX 680 and close to 150 GCUPs on a Tesla K10 GPU accelerator. Specific implementation is for the case of performing many independent alignment problems of length < 1024 simultaneously, however the techniques that will be discussed are generally applicable to any sequence alignment problem. SIMD video instructions allow one to split a 32-bit register into two 16-bit or four 8-bit parts and operate on them independently.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3279
Streaming:
Download:
 
Computing Protein Size Distributions Using Centrifugation Techniques and the Tesla K20 GPU
Robert Zigon (Beckman Coulter)
Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first p ...Read More

Analytical Ultracentrifugation is a technique used to compute attributes of a protein like gross shape, sample heterogeneity or size. By applying a centrifugal force to the sample and simultaneously measuring the distribution, we can use first principles to derive the relative molecule sizes. Learn how the solution to the resulting regularized least squares problem can be computed in real time with the Tesla K20.

  Back
 
Keywords:
Bioinformatics & Genomics, Quantum Chemistry, GTC 2013 - ID S3330
Streaming:
Download:
 
Implementing Modern Short Read DNA Alignment Algorithms in CUDA
Jonathan Cohen (NVIDIA)
Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map hig ...Read More

Because of their inherently parallel and high-throughput nature, NVIDIA GPUs are a natural fit for the types of data-intensive computing required in bioinformatics applications. For many genomics applications, the primary challenge is to map highly divergent and control flow-heavy code to a SIMD architecture. By transforming complex serial flow of control into a sequence of communicating sequential processors running in parallel, we are able to achieve high throughput on very branchy code, while maintaining memory coherence and avoiding execution divergence. I will present initial results from NVIDIA''s internal "nvbio" project to develop efficient computational building blocks for analysis of Next-Generation Sequencing data, with a focus on implementations of BWA and Bowtie2-type aligners.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2013 - ID S3580
Streaming:
Download:
 
GPU Accelerated Signal Processing in Ion Proton Whole Genome Sequencer
Mohit Gupta (Life Technologies), Jakob Siegel (Life Technologies)
Learn how GPUs are enabling whole genome sequencing by accelerating primary data analysis pipeline of benchtop Ion Proton sequencer. Leveraging the compute power of GPUs to process the high-throughput data generated by this sequencer for a fast, ...Read More

Learn how GPUs are enabling whole genome sequencing by accelerating primary data analysis pipeline of benchtop Ion Proton sequencer. Leveraging the compute power of GPUs to process the high-throughput data generated by this sequencer for a fast, scalable and cost-effective desktop compute solution to democratize DNA sequencing and accelerate the path towards personalized medicine. In this talk, the implementation of data fitting algorithms on the GPU and a streaming execution model to overlap data transfer and kernel execution for this high throughput system will be dicussed. Explained will be how changing the algorithms to suit the GPU compute model while still maintaining quality of the results. 

  Back
 
Keywords:
Bioinformatics & Genomics, Developer - Algorithms, GTC 2013 - ID S3229
Streaming:
Download:
 
Accelerating Computational Genomics and Other Best Practices Using OpenACC
Florent Lebeau (CAPS Entreprise), Stephane Chauveau (CAPS Entreprise)
 
Keywords:
Bioinformatics & Genomics, Developer - Programming Languages, GTC Webinars 2012 - ID GTCE016
Download:
 
Introduction to SeqAn, an Open-source C++ Template Library
Knut Reinert (Freie Universitšt Berlin)
SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix ...Read More

SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs.

In this webinar, Knut Reinert, Professor, Freie Universität Berlin will introduce SeqAn and string indices, then explain his team’s generic parallelization concept and end with details on how they achieved an up to 47 speedup using an FM-index on a NVIDIA Tesla K20.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2013 - ID GTCE059
Download:
 
Folding@home and OpenMM: Using a Cluster of 50,000 GPUs to Simulate Disease Relevant Protein Dynamics
Vijay Pande (Stanford University)
With the combined power of large-scale distributed computing resources such as Folding@home or supercomputers such as Blue Waters or Titan, one can now routinely simulate atomistic protein dynamics on the milliseconds timescale. Join Profes ...Read More

With the combined power of large-scale distributed computing resources such as Folding@home or supercomputers such as Blue Waters or Titan, one can now routinely simulate atomistic protein dynamics on the milliseconds timescale. Join Professor Vijay Pande, Stanford University as he presents efforts to push the limits of this methodology even further to the seconds timescale for protein folding, as well as to a variety of new applications in protein conformational change. The results of these simulations suggest novel targets for disease intervention (for Alzheimer’s and Cancer), as well as new biophysical insights into protein dynamics.

  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2014 - ID GTCE071
Streaming:
 
The Next Steps for Folding@home
Vijay Pande (Stanford University)
Folding@home is a large-scale volunteer distributed computing project, started in October 1, 2000. For over a decade, new types of hardware (such as GPUs, multi-core CPUs, and PS3) and algorithms have been pioneered in order to make significant ...Read More

Folding@home is a large-scale volunteer distributed computing project, started in October 1, 2000. For over a decade, new types of hardware (such as GPUs, multi-core CPUs, and PS3) and algorithms have been pioneered in order to make significant advances in our ability to simulate diseases at the molecular scale. Join Professor Vijay Pande from Stanford University for a brief introduction to the goals of Folding@home, followed by the successes so far. Prof. Pande will end with a discussion of what’s being done today, as well as the plans for greatly enhancing what Folding@home can do through new initiatives currently under way. 

  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2014 - ID GTCE082
Streaming:
 
Restricting the Seed-and-Extend Search Space in GPU-Based Short-Read Alignment
Richard Wilton (Johns Hopkins University)
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 ...Read More
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 to 10 times faster than CPU implementations with comparable sensitivity and mapping quality. We have taken a different approach to the use of GPUs by implementing a series of CUDA kernels that filter the set of reference locations at which to compute seed-and-extend alignments, thereby decreasing the amount of parallel sequence-matching computation and improving the overall throughput of the GPU/CPU pipeline. Even without extreme CUDA code optimization, we observe increased sensitivity (i.e., a larger number of reported valid mappings) with throughput as good as or better than existing GPU-based sequence aligners.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4248
Streaming:
Download:
 
Parallel Implementation of PK-PD Parameter Estimation on GPU Using Grid Search Method
Nishant Agrawal (Tata Consultancy Services Limited), Rihab Abdulrazak (Tata Consultancy Services Limited)
The goal of this session is to showcase the performance improvements acheived during parallel implementation of PK-PD (Pharmacokinetic-Pharmacodynamic) parameters estimation on GPU. Grid Search method is being used here for PK-PD model initial parame ...Read More
The goal of this session is to showcase the performance improvements acheived during parallel implementation of PK-PD (Pharmacokinetic-Pharmacodynamic) parameters estimation on GPU. Grid Search method is being used here for PK-PD model initial parameters estimation. Parallel implementation on GPUs provides much faster solutions to time-consuming problems in pharma domain as discovery of new drugs has become increasingly challenging because of sheer volume of data. Parallelizing serial version of the application on GPU keeping device architectural aspects in mind helps in achieving high performance i.e. to reduce the overall execution time. This talk is about stepwise approaches used to optimize application further and to leverage Tesla & Kepler hardware architecture capabilities for high performance. A substantial improvement in execution time was observed after implementation in parallel.  Back
 
Keywords:
Bioinformatics & Genomics, Clusters & GPU Management, Supercomputing & HPC, GTC 2014 - ID S4396
Streaming:
 
Hybrid Clustering Algorithms for Degenerate Primer Development on the GPU
Trevor Cickovski (Eckerd College)
Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When ...Read More
Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When analyzing multiple related genomes the primer must be degenerate, containing an amount of uncertainty that we must minimize. We use graphics processing units (GPUs) to analyze the performance of a parallelized hierarchical clustering algorithm for grouping related genomes prior to degenerate primer construction, and also hybridize this algorithm with strategies from K-Means and Fuzzy C-Means. We demonstrate an order of magnitude improvement when running these algorithms on nearly one thousand sequences of more than seven thousand nucleotides from the human genome.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4424
Streaming:
Download:
 
GPU-Based Bayesian Phylogenetic Inference Beyond Extreme Scale
Mitchel Horton (Georgia Institute of Technology)
See how researchers can, for the first time, infer phylogenetics trees of unlimited size using the bonanza of biological sequence data available to them today. We will present a phylogenetic inference approach that combines an existing GPU-based Bay ...Read More
See how researchers can, for the first time, infer phylogenetics trees of unlimited size using the bonanza of biological sequence data available to them today. We will present a phylogenetic inference approach that combines an existing GPU-based Bayesian phylogenetic reconstruction application (BEAST/BEAGLE) with the notion of performing an independent Markov chain Monte Carlo (MCMC) run on any number of GPUs, on any number of nodes, of any size HPC GPU cluster. The approach will be shown to scale indefinitely for sufficiently large problems. In addition, we will present a new batch matrix-matrix product CUDA kernel used for the matrix exponentiaton at the heart of the phylogenetic inference algorithm.  Back
 
Keywords:
Bioinformatics & Genomics, Numerical Algorithms & Libraries, Supercomputing & HPC, GTC 2014 - ID S4476
Streaming:
Download:
 
Training Random Forests on the GPU: Genomic Implications on HIV Susceptibility
Mark Seligman (Rapidics LLC)
The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions fo ...Read More
The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions for new data. Recent efforts at acceleration have focused on the independence of both the construction, and walking, of distinct trees using, for example, multi-CPU and Hadoop-based approaches. Here, by contrast, we report progress in parallelizing the construction of individual trees themselves using the GPU. This enables the algorithm to treat very wide data sets, such as those common in genomic studies, in times significantly shorter than have been reported before now. This also makes practical iterative invocation and enables, for example, reweighted and variational applications of the algorithm. We demonstrate recent results on studies of HIV-susceptibility in subjects from Sub-Saharan Africa.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, Machine Learning & Deep Learning, Supercomputing & HPC, GTC 2014 - ID S4502
Streaming:
Download:
 
GPU Accelerated Genomics Data Compression
BingQiang Wang (BGI)
A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated co ...Read More
A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated compression algorithms, and 3) column-major storage. This approach fully exploit similarity within individual columns in popular genomics data formats, by using appropriate compression scheme (combination of algorithms), then GPU is employed to speedup compression / decompression thus several folds faster bandwidth.  Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4526
Streaming:
Download:
 
GPU Enables Bivariate and Trivariate Routine Analysis of Case-Control GWAS
Adam Kowalczyk (National ICT Australia), Qiao Wang (National ICT Australia)
Genome-wide association studies (GWAS) examine millions of DNA loci in an attempt to associate DNA mutations with a given disease. Complex aetiologies of many common diseases are believed to involve combinations of different genes, requiring evaluati ...Read More
Genome-wide association studies (GWAS) examine millions of DNA loci in an attempt to associate DNA mutations with a given disease. Complex aetiologies of many common diseases are believed to involve combinations of different genes, requiring evaluation of trillions of (non-additive) combinations of loci. We have developed solutions using a single GPU to evaluate association of all bivariate features within minutes (available via a free web service). Although exhaustive trivariate analysis currently requires a GPU cluster, focused trivariate analysis can be accomplished routinely on a single GPU within hours.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4592
Streaming:
Download:
 
GPU-Accelerated Algorithms in Bioinformatics and Data Mining
Bertil Schmidt (Johannes Gutenberg University Mainz)
The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-accelera ...Read More
The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-acceleration of the following methods will be discussed: (1) Smith-Waterman algorithm on Kepler (CUDASW++ 3.0) compared to an equivalent Xeon Phi implementation (SWAPHI); (2) Short read aligment (CUSHAW2-GPU and CUSHAW3); (3) Clustering of protein structures; (4) Alignment of time series with a Dynamic Time Warp inspired similarity measure; and (5) an effective scalable clustering algorithm for large data sets that builds upon the concept of divide-and-conquer.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4603
Streaming:
 
Current Uses and Future Prospects of Many-Core GPUs for High-Throughput Sequencing Data Analyses
Brian Lam (Cambridge University)
High-throughput sequencing (HTS) instruments can produce enormous amounts of information about our genome in a short period of time and enable us to better understand the biology of our everyday lives. HTS also poses a substantial challenge to the IT ...Read More
High-throughput sequencing (HTS) instruments can produce enormous amounts of information about our genome in a short period of time and enable us to better understand the biology of our everyday lives. HTS also poses a substantial challenge to the IT infrastructure and human resources, where analyzing data from these instruments often involves the use of high-performance computing (HPC) clusters and expertise from interdisciplinary professionals, who are literate in both biology and computing, thus restricting the access of the technology to large and well-established laboratories only. Many-core architectures, which can be seen in many high-end computer graphics processing units, or GPUs, may provide us an answer to this challenge. Packed with thousands of cores on a physical chip, a GPU can be just as quick as a small HPC cluster in many cases. In this session, we will explore the use of GPUs in accelerating the data analysis pipeline associated with HTS and investigate its future in this area.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4623
Streaming:
Download:
 
BWT Indexing: Big Data from Next Generation Sequencing and GPU
Jeanno Cheung (HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory)
With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text ind ...Read More
With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text indexing based on BWT has found to be very useful in speeding up the analysis of the high-throughput sequencing data. In this talk we consider two major problems in this context, namely, alignment of sequencing data onto a reference genome (for genetic variations detection), and indexing of sequencing data. These two problems have different applications and different technical challenges. We show how GPU can be exploited to achieve tremendous improvement in each case. In particular, our alignment solution makes it feasible to conduct NGS analysis even in the time-critical clinical environment; for example, 30+ fold whole genome sequencing data of human (~100 Gigabases) can be aligned and analyzed in a few hours, with sensitivity and accuracy even higher than before.  Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics, GTC 2014 - ID S4628
Streaming:
Download:
 
Accelerating the DNA Sequencing Variant Calling Pipeline
Mauricio Carneiro (Broad Institute of MIT and Harvard)
Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and acceler ...Read More
Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and accelerate key parts of this pipeline. First we will give you an overview of the process and how researchers around the world are using DNA sequencing data to understand complex and rare variants and their associations with disease. Second we will show you the work we have done to speed up this pipeline through use of GPUs and other technologies. Third we will discuss a new version of the pipeline that takes advantage of the optimizations to enable incremental analysis, that is, leveraging all historical data on every new sequencing project with minimal overhead. We close this presentation by discussing the many points that are still open for optimization and how the community can get involved.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4679
Streaming:
 
Introducing NVBIO: High Performance Primitives for Computational Genomics
Jonathan Cohen (NVIDIA), Nuno Subtil (NVIDIA)
Learn about NVIDIA's new open source CUDA/C++ library for high-performance computational genomics, NVBIO. NVBIO includes primitives for fast alignment using many variants of Smith-Waterman, text indexing via an FM-Index and related data structures, ...Read More
Learn about NVIDIA's new open source CUDA/C++ library for high-performance computational genomics, NVBIO. NVBIO includes primitives for fast alignment using many variants of Smith-Waterman, text indexing via an FM-Index and related data structures, and approximate string matching with backtracking. It also provides basic services like file IO and inter-thread communication. The design of NVBIO supports pipeline parallelism, where computation is expressed as a sequence of stages with queues to communicate between stages. Using this design concept, we have engineered an implementation of the Bowtie2 aligner on top of NVBIO, which aligns short read data 2-7x faster than the original Bowtie2 running on a high-end multicore CPU at comparable quality. In this talk we will introduce the codebase and demonstrate how to use it for your own applications.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4741
Streaming:
Download:
 
Solving Large Nonlinear Systems of ODE With Hierarchical Structure Using Multi-GPGPUs and an Adaptive Runge Kutta(ARK)
Ahmad Al-Omari (The University of Georgia)
The Adaptive Runge Kutta Method (ARK) on multi-General-Purpose Graphical Processing Units GPGPUs is used for solving large nonlinear systems of first-order ordinary differential equations with over ~10,000 variables describing a large genetic network ...Read More
The Adaptive Runge Kutta Method (ARK) on multi-General-Purpose Graphical Processing Units GPGPUs is used for solving large nonlinear systems of first-order ordinary differential equations with over ~10,000 variables describing a large genetic network in systems biology for the biological clock. To carry out the computation of the trajectory of the system, a hierarchical structure of the ODEs is exploited, and an ARK solver is implemented in CUDA/C++ on GPGPUs(Kepler 20-x). The result is a 75-fold speedup for calculations of 2436 independent modules within the genetic network describing clock function relative to a comparable CPU architecture.   Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4108
Download:
 
Adaptive GenCodex : A Novel Algorithm for Compressing DNA Sequences on Multi-Cores and GPUs
Ajith Padyana (Sri Sathya Sai Institute of Higher Learning, India)
High Performance Computing has become an enabling agent to many a scientific domain of investigation. The issue of sequence analysis is a core problem in Bio-Informatics. Various strategies and implementations have been proposed and used to meet the ...Read More
High Performance Computing has become an enabling agent to many a scientific domain of investigation. The issue of sequence analysis is a core problem in Bio-Informatics. Various strategies and implementations have been proposed and used to meet the end. But the amount of data that is generated and need to be handled remains a big challenge. In this context from the point of view of computing "storage" and "communication bandwidth" become critical issues. This can be captured together and termed as "I/O Bottleneck". To overcome this we propose compression technique on the Bio-Sequences. Thereby we reduce the storage requirement and apparently increase the communication bandwidth effectively. Once this problem is overcome, we need to look for possibility of performing the essential analysis of Bio-Sequences in the compressed form. Unless this is achieved, the task remains half done. We intend to make an oral presentation of our work that includes a compression strategy in GPUs  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4256
Download:
 
CUSHAW Software Package: Harnessing CUDA-Enabled GPUs for Next Generation Sequencing Read Alignment
Bertil Schmidt (University of Mainz, Germany)
We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using CUDA-enabled GPUs. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2 and GEM. Furthermore, C ...Read More
We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using CUDA-enabled GPUs. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2 and GEM. Furthermore, CUSHAW2-GPU with a Tesla K20c GPU achieves significantly speedups over the multi-threaded CUSHAW2, BWA-SW, Bowtie2 and GEM on the 12 cores of a high-end CPU for both single-end and paired-end alignments. In addition, we have presented some features of CUSHAW3, an extension of CUSHAW2 to further improve the alignment quality of base-space reads and offer new support for color-space reads. For color-space alignment, CUSHAW3 is consistently one of the best aligners compared to SHRiMP2 and BFAST.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4117
Download:
 
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU
Jing Zhang (Virginia Tech)
BLAST, short for Basic Local Alignment Search Tool, is a fundamental algorithm in the life sciences that searches for similarities between a short query sequence and a large set of database sequences. However, with the advent of next-generation seque ...Read More
BLAST, short for Basic Local Alignment Search Tool, is a fundamental algorithm in the life sciences that searches for similarities between a short query sequence and a large set of database sequences. However, with the advent of next-generation sequencing (NGS), the exponential growth of sequence databases is arguably outstripping our ability to analyze the data. The previous studies for accelerating BLAST on GPU used coarse-grained parallel approaches, which are not adapted to GPU architecture and cannot fully utilize massively parallel computational capability of GPU. We propose a faster GPU-BLAST, mapping most time-consuming phases to GPU using a fine-grained multithreaded approach.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4214
Download:
 
Modeling the Molecular Basis of Cardiac Arrhythmia
Mohsin Jafri (George Mason University)
Heart disease is the leading cause of death in the developed world and an increasing problem in developing nations. Heart failure accounts for a majority of those affected by heart disease with a high likelihood of death. Death most often results fr ...Read More
Heart disease is the leading cause of death in the developed world and an increasing problem in developing nations. Heart failure accounts for a majority of those affected by heart disease with a high likelihood of death. Death most often results from cardiac arrhythmia. However, the mechanisms behind the initiation of this the fatal arrhythmia is yet unknown. Using a multi-scale GPU-enabled simulation we show how stochastic molecular events can trigger a cardiac arrhythmia using a hierarchy of cellular and tissue models to describe the individual proteins, the heart muscle cell and the geometry of the heart.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4259
Download:
 
Reverse Engineering of Genome-Scale Biological Networks
Raghuram Thiagarajan (University of Michigan)
Availability of genome-scale data sets in biology present a great opportunity as well as a challenge for computational biologists. Simulation and model based analysis on such large-scale dynamical systems pose compute-intensive problems. A reverse-en ...Read More
Availability of genome-scale data sets in biology present a great opportunity as well as a challenge for computational biologists. Simulation and model based analysis on such large-scale dynamical systems pose compute-intensive problems. A reverse-engineering algorithm optimized for parallel architectures has been developed to study these dynamical systems. The parallel architecture and processing power of Graphics processing units(GPUs) provide a platform to carry out genome-scale simulations. We show that genome-scale networks can be inferred using this reverse-engineering algorithm in a matter of days on a single Tesla K20 GPU.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4220
Download:
 
Applying GPU Dynamic Parallelism to High-Performance Normalization of Gene Expressions
Roberto Pinto Souto (National Laboratory for Scientific Computing, LNCC/Brazil)
This work presents methods for computing a quantile normalization (Q-norm) of high-density oligonucleotide array data on GPUs. Our approach focuses on CUDA-5.5, which allows for exploiting dynamic parallelism, and also takes advantage of the expressi ...Read More
This work presents methods for computing a quantile normalization (Q-norm) of high-density oligonucleotide array data on GPUs. Our approach focuses on CUDA-5.5, which allows for exploiting dynamic parallelism, and also takes advantage of the expressive processing power offered by the GPU Kepler architecture. We believe that our contribution represents a step forward to provide computational support to a generic Q-norm for large microarray data sets as well as a low-cost and high-performance alternative to high-end workstation systems.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4209
Download:
 
GPMoo: Genomic Selection Related Analyses
Scott Winkleblack (California Polytechnic State University - San Luis Obispo)
Exploring the use of GPGPU processing to decrease the runtime of genetic selection algorithms. We present a tool, GenSel, which can be used to efficiently infer the effects of genetic markers on a desired trait or to determine the genomic estimated b ...Read More
Exploring the use of GPGPU processing to decrease the runtime of genetic selection algorithms. We present a tool, GenSel, which can be used to efficiently infer the effects of genetic markers on a desired trait or to determine the genomic estimated breeding values (GEBV) of genotyped individuals. GenSel performs Bayesian inference using Gibbs sampling, a Markov chain Monte Carlo (MCMC) algorithm. Parallelizing this algorithm proves to be a technically challenging problem because there exists a loop carried dependence between each iteration of the Markov chain.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4244
Download:
 
Accelerating Identification of Frequent K-Mers in DNA Sequences With GPU
Shuji Suzuki (Tokyo Institute of Technology)
Identifying the frequencies of k-mers (substring of length k) in strings is an important sub-problem in many bioinformatics applications. In this research, we propose a new k-mer counting algorithm suitable for GPUs calculations based on sorting algo ...Read More
Identifying the frequencies of k-mers (substring of length k) in strings is an important sub-problem in many bioinformatics applications. In this research, we propose a new k-mer counting algorithm suitable for GPUs calculations based on sorting algorithm. We implemented this algorithm on a CPU-GPU heterogeneous environment. We implemented our algorithm onto GPU by using CUDA 5.0 and evaluated it by using real G. Gallus genome sequence data. As results, our algorithm with 12 CPU cores and 2 GPUs was 2.4 times faster than Turtle software on 12 CPU cores.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4190
Download:
 
Parallel Brain Network Analysis Platform
Xiaoming Chen (Tsinghua University, Beijing, China)
In this poster, we introduce a hybrid CPU-GPU platform to accelerate the computation of the human brain connectome. The 2 main steps of the platform are network construction from non-invasive neuroimaging data and network analysis. Using this platfor ...Read More
In this poster, we introduce a hybrid CPU-GPU platform to accelerate the computation of the human brain connectome. The 2 main steps of the platform are network construction from non-invasive neuroimaging data and network analysis. Using this platform, you can calculate the correlation matrix, the small-world property (cluster coefficient and characteristic path length), the modular structure, and the betweenness centrality. Also you can do probabilistic fiber tracking with this tool.   Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID P4128
Download:
 
HOOMD-blue 1.0: Easy-to-use and Highly Scalable Molecular Dynamics on GPUs
From the GPU-equipped desktop computer to a supercomputer, learn how to accelerate your MD simulations with HOOMD-blue. Complex fluids, polymers, and nano-particles are only some of the possibilities. In this webinar, Joshua A. Anderson, Senio ...Read More
From the GPU-equipped desktop computer to a supercomputer, learn how to accelerate your MD simulations with HOOMD-blue. Complex fluids, polymers, and nano-particles are only some of the possibilities. In this webinar, Joshua A. Anderson, Senior Research Area Specialist and Jens Glaser, Research Fellow at the University of Michigan, will show how you can combine any of the versatile features of HOOMD-blue to meet your research needs, and how you can easily exploit the highly flexible Python script interface in your workflow. The most important new feature in HOOMD-blue v1.0 is its multi-GPU capability, which scales HOOMD-blue’s remarkable single-GPU performance to clusters and supercomputers with many GPUs. We will demonstrate how HOOMD-blue scales on the latest generation of high performance computing systems, and give practical tips for obtaining optimal performance.
 
  Back
 
Keywords:
Bioinformatics & Genomics, GTC Webinars 2014 - ID GTCE099
Streaming:
Download:
Climate, Weather, Ocean Modeling
Presentation
Media
Large-Scale CFD and a Full GPU Implementation of Weather Prediction Code on the TSUBAME Supercomputer
Takayuki Aoki
- Global Scientific Information and Computing Center (GSIC) of Tokyo Institute of Technology (Tokyo Tech)
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2010 - ID SC1014
Download:
 
GPU Considerations for Next Generation Weather Simulations
Thomas Schulthess
- Swiss National Supercomputing Centre
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2010 - ID SC1010
Download:
 
Tsubame 2.0: 2 Petaflops Performance of a GPU Stencil Application
Takayuki Aoki
- Tokyo Institute of Technology
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2011 - ID SC138
Download:
 
Successes and Challenges using GPUs for Weather and Climate Models
Mark Govett
- National Oceanic and Atmospheric Administration
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2011 - ID SC134
Download:
 
GPU-based Operational Weather Model with Horizontal 500m Resolution
Takayuki Aoki
Numerical weather prediction is one of the major applications in high performance computing and demands fast and high-precision simulation over fine-grained grids. In order to drastically shorten the runtime of a weather prediction code, we have ...Read More

Numerical weather prediction is one of the major applications in high performance computing and demands fast and high-precision simulation over fine-grained grids. In order to drastically shorten the runtime of a weather prediction code, we have rewritten its huge entire code for GPU computing from scratch in CUDA. The code ASUCA is a high resolution meso-scale atmosphere model that is being developed by the Japan Meteorological Agency (JMA) for the purpose of the next-generation weather forecasting service. A benchmark on the 3996 GPUs on TSUBAME 2.0 achieves extremely high performance of 145 Tflops in single precision for 14368 × 14284 × 48 mesh. With the initial data and the boundary condition currently used in the JMA weather forecast, we have carried out the run with 500m horizontal mesh 4792 × 4696 × 48, covering whole Japan area with 437 GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1173
Streaming:
 
Unified Modeling System for Seamless Weather and Climate Predictions of Monsoons
Subodh Kumar
We present our recently undergoing work to design and develop a GPU based unified modeling system for seamless weather and climate predictions of Monsoons. The system design is capable of handling different time and spatial scales of atmospheric ...Read More

We present our recently undergoing work to design and develop a GPU based unified modeling system for seamless weather and climate predictions of Monsoons. The system design is capable of handling different time and spatial scales of atmospheric phenomena that are crucial for accurate forecasting of weather and regional climates, and of monsoons in particular. Our focus is on high-resolution model utilizing accurate approximations on the icosahedral-hexagonal grid. We also develop parameterizations of fine and multi-scale moist convective processes, cloud microphysics and precipitation, radiative transfer, hydrology and land surface processes, atmospheric and oceanic turbulence. Starting with the core of LMDZ model, we are developing from scratch a parallel version appropriate for efficient computation on GPUs and CPUs. Another goal of our system design is to rid the programmer with low level programming details using a programming model that automatically distributes computation among all available CPUs and GPUs appropriately. We are developing a programming API to unify parallel code development on CPUs and GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1174
Streaming:
 
GPU Considerations for Next Generation Weather and Climate Simulations
Thomas Schulthess
Numerical weather prediction is among the oldest fields of computational science, and existed before the advent of electronic computing. Thanks to the performance of modern computers, the fidelity of weather simulations has reached a point where ...Read More

Numerical weather prediction is among the oldest fields of computational science, and existed before the advent of electronic computing. Thanks to the performance of modern computers, the fidelity of weather simulations has reached a point where they are indispensible in weather forecasting, and thus have become one of the economically most impactful domains of computational science. Typically, the dynamical cores of models of weather simulations are grid based and memory bandwidth bound, thus performing poorly on modern X86 type processors. In this presentation, we will discuss a refactoring project of the COSMO code that implements a regional climate model used by several weather services and academic institutions worldwide. The dynamical core has been rewritten and is easily portable to multiple architectures, including GPU. The physics part of the code is being ported to GPU with OpenACC directives. Preliminary performance results for production scale problems will be presented. Other contributors to this research include Oliver Fuhrer, Swiss Federal Office of Meteorology and Climatology MeteoSwiss, Tobias Gysi and David Müller, Supercomputing Systems AG, Xavier Lapillonne, Center for Climate Systems Modeling, ETH Zurich, William Sawyer, Ugo Varetto, and Mauro Bianco, Swiss National Supercomputing Center.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1175
Streaming:
 
GRAPES Weather Code Porting to and Optimization on the GPU Platform
Bin Zhou
In this session, we will discuss the GRAPES weather model and the basic porting techniques to GPU platform. It will cover how to start from the very beginning to the low level optimization. The MPI+CUDA pattern will be discussed. Four different ...Read More

In this session, we will discuss the GRAPES weather model and the basic porting techniques to GPU platform. It will cover how to start from the very beginning to the low level optimization. The MPI+CUDA pattern will be discussed. Four different modules, including GCR, Radiation, WSM6 and PBL, will be demonstrated. The performance consideration will be discussed and results showed. It will be a good example for a real-life scientific application porting procedure.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1176
Streaming:
 
GPU Computing in Numerical Space Weather Modeling
Xueshang Feng
Space weather refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and that affect human life ...Read More

Space weather refers to conditions on the sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and that affect human life or health. Space weather has two focal points: scientific research and applications. In order to make the real- or faster than real-time numerical prediction of adverse space weather events and their influence on the geospace environment, high performance computational models are required. The main objective in this talk is how programmable GPUs can be used in the numerical space weather modeling and its visualization. As an example study, GPU programming is realized for our Solar-Interplanetary-CESE MHD model (SIP-CESE MHD model) and the visualization of its numerical results by numerically studying the solar corona. Our initial tests with available hardware show speedups of roughly 10x compared to traditional software implementation. This work presents a novel application of GPU to the space weather study.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC Asia 2011 - ID GTCA1177
Streaming:
 
Real Time GPU-Based Marine Scenes Simulation
Jerome Graindorge (ALYOTECH), Julien Houssay (ALYOTECH)
Marine survey, carried out by sea or by air, is of major concern for current defense and security applications. Essential surveillance/ observation/ identification systems involve electro-optics (visible and infra-red) and radar. Optimizing thei ...Read More

Marine survey, carried out by sea or by air, is of major concern for current defense and security applications. Essential surveillance/ observation/ identification systems involve electro-optics (visible and infra-red) and radar. Optimizing their performance requires amounts of expensive observational data spanning the wide variability of the marine environment. Computer simulation provides a valuable flexible and inexpensive alternative. Since 2007, ALYOTECH, in partnership with the IFREMER (French Research Institute for Exploration of the Sea), has been developing a GPU-based real-time ocean scene simulator for visible, infrared and radar sensors, in order to meet the challenging requirements arising from marine survey issues.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID S2053
Streaming:
Download:
 
A Stencil Library for the New Dynamic Core of COSMO
Tobias Gysi (SCS), Peter Messmer (NVIDIA)
We will present a stencil library used in the heart of the COSMO numeric weather prediction model. During the talk we'll show how we implemented an abstraction that allows easy development of new stencils and solvers on top of a framework al ...Read More

We will present a stencil library used in the heart of the COSMO numeric weather prediction model. During the talk we'll show how we implemented an abstraction that allows easy development of new stencils and solvers on top of a framework allowing execution on both CPU and GPU. The library makes efficient use of GPU resources and we will show how to structure memory accesses and computation optimally. Developers involved in porting or writing fully-featured C++ libraries for CUDA will also be interested in attending.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID S2256
Streaming:
Download:
 
CUDA/JAVA Model for Gas Line-by-Line Absorption of Atmospheric Radiation
William Godoy (NASA Langley Research Center)
The potential of graphics processing units (GPU) to speed up the calculation of radiative energy absorption by atmospheric gases is presented. Gas absorption calculations are needed at millions of electromagnetic waves to have an accurate depiction o ...Read More
The potential of graphics processing units (GPU) to speed up the calculation of radiative energy absorption by atmospheric gases is presented. Gas absorption calculations are needed at millions of electromagnetic waves to have an accurate depiction of the Earths in-coming and out-coming radiative energies. The CUDA/GPU portion obtains the gases' Voigt lineshapes, whereas the Java/CPU portion performs efficient I/O tasks on the large HITRAN database of molecular gas parameters. A modular combination of the lower-level CUDA algorithms and the higher-level Java language results in an accessible interface to the end-user that is not an expert in GPU.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID P2485
Download:
 
Heat Transfer Ray Tracing with OptiX
Scot Halverson (University of Minnesota Duluth)
QUIC Radiant is part of a suite of GPU-assisted tools developed by our research group that aim to increase knowledge for how environment and urban form interact. Our hypothesis is that urban structures exist that can minimize energy use while also mi ...Read More
QUIC Radiant is part of a suite of GPU-assisted tools developed by our research group that aim to increase knowledge for how environment and urban form interact. Our hypothesis is that urban structures exist that can minimize energy use while also minimizing air pollution exposure. Our efforts investigate the complex interactions of various types of urban structures by developing design strategies for optimizing urban form under a variety of constraints.   Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2012 - ID P2495
Download:
 
Running the FIM and NIM Weather Models on GPUs
Mark Govett (NOAA Earth System Research Laboratory)
Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed ...Read More

Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The FIM runs at 15 KM resolution and is expected to be run by the U.S. National Weather Service in the next year. The NIM is a next-generation forecast model designed to run at 4KM resolution. This presentation will give an update on our efforts to parallelize and run these models on GPUs.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2012 - ID SC2018
Download:
 
Hybrid CPU-GPU Solutions for Weather and Cloud Resolving Climate Simulations
Thomas Schulthess (Swiss National Supercomputing Center)
Reliable weather prediction for the Alpine region and cloud resolving climate modeling require simulations that run at 1-2 km resolution. Additionally, since the largest possible ensembles are needed, high fidelity models have to run on the most ...Read More

Reliable weather prediction for the Alpine region and cloud resolving climate modeling require simulations that run at 1-2 km resolution. Additionally, since the largest possible ensembles are needed, high fidelity models have to run on the most economical resource in a given time to solution. In this presentation we will give an update on the refactoring of COSMO, a widely used production code in academia as well as seven European weather services, and discuss the performance experience on hybrid CPU-GPU systems.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing 2012 - ID SC2036
Download:
 
Analysis of GPU-acceleration for a Climate Modeling Application
Mohamed Wahib (Advanced Institute for Computational Science, RIKEN, JAPAN)
The use of GPUs to accelerate applications has always been a nontrivial task. SCALE3, a climate simulation model developed at AICS, RIKEN in Japan is no exception. The features of SCALE required careful adjustments when ported for the Fermi architect ...Read More
The use of GPUs to accelerate applications has always been a nontrivial task. SCALE3, a climate simulation model developed at AICS, RIKEN in Japan is no exception. The features of SCALE required careful adjustments when ported for the Fermi architecture. Moreover, porting SCALE to use the Kepler architecture brought in new adjustments affected by the features of Kepler. This poster discusses the challenges for porting SCALE3 to nVidia's Fermi and Kepler architectures. Moreover, the change in design choices when porting from Fermi to Kepler are highlighted. The results show that utilizing architecture features of Kepler can be greatly effective when carefully taking into account the nature of the application.   Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3145
Download:
 
Modeling Vegetative Heat Transfer in Urban Environments with OptiX
Matthew Overby (University of Minnesota Duluth)
Our research group is developing QUIC Energy, a software tool that models radiative heat transfer in three dimensional urban environments. We hypothesize that trees, vegetative roofing, and other green infrastructure have the potential to reduce heat ...Read More
Our research group is developing QUIC Energy, a software tool that models radiative heat transfer in three dimensional urban environments. We hypothesize that trees, vegetative roofing, and other green infrastructure have the potential to reduce heat load in urban environments and lower power consumption required for heating and cooling buildings. Additionally, certain building materials, shapes, and urban layouts can mitigate trapped heat and air pollutants. By taking advantage of parallel computation on the GPU using NVIDIA's OptiX ray tracing engine, we are able to model urban domains upwards of five square kilometers, containing thousands of trees and buildings.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3178
Download:
 
Hybrid Fortran - New Directive Based GPGPU / CPU Framework
Michel Muller (RIKEN)
One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pure CUD ...Read More
One of the biggest challenges when applying GPGPU frameworks to complex codebases is the necessity for redesigning loops and data access patterns. Experiences with the physical core of the ASUCA weather prediction model have shown that using pure CUDA Fortran or OpenACC leads to a lengthy manual redesign and large execution time overheads when executing the new code back on the CPU. The Hybrid Fortran meta programming framework has been designed to (a) automate this process and (b) be able to run the user code in CPU optimized loop structure as well, thus enabling optimal performance both on GPU and CPU. Results when using it for the ASUCA physical core show High GPU performance, CPU performance on par with the original x86-optimized code, and reduced portation overhead.   Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Tools & Libraries, GTC 2013 - ID P3199
Download:
 
Global High Resolution Estimation of Evapotranspiration - SEBS on GPU using CUDA-C
Mohammad Abouali (Computational Science Research Center - San Diego State University)
This poster introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). It uses Compute Unified Device Architecture C (CUDA-C) programming model. The o ...Read More
This poster introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). It uses Compute Unified Device Architecture C (CUDA-C) programming model. The output of the new implementation is compared to a MATLAB code that has already been fully tested in the Water Cycle Multimission Observation Strategy (WACMOS) project. The code is timed against both MATLAB and a purely high-performance C implementation of the same algorithm. The code has been tested on several different NVIDIA cards, with different compute capabilities.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3225
Download:
 
Numerical Ocean Modeling and Simulation with CUDA
Chris Lupo (California Polytechnic State University)
Learn how GPGPUs have been targeted to the Regional Ocean Modeling System (ROMS) software package. We describe a multi-model parallelization approach that uses CUDA Fortran and the OpenACC directives based models supported PGI compilers. Initial ...Read More

Learn how GPGPUs have been targeted to the Regional Ocean Modeling System (ROMS) software package. We describe a multi-model parallelization approach that uses CUDA Fortran and the OpenACC directives based models supported PGI compilers. Initial research using only CUDA Fortran on one Tesla card offers comparable performance to a 16-node CPU cluster, and a 2.5x speedup compared to an OpenMP implementation on an eight-core CPU system. We are currently targeting multiple GPU devices, and the use of OpenACC to parallelize more of the ROMS software to obtain even greater performance enhancements to allow larger, higher resolution ocean models to be simulated.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID S3082
Streaming:
Download:
 
Accelerating Shallow Water Flow and Mass Transport Using Lattice Boltzmann Methods on GPUs
Kevin Tubbs (Dell, Inc.)
A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing P ...Read More

A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing Processors. GPU computing is performed using the Jacket GPU engine for MATLAB and ArrayFire. Mass transport with velocity-dependent dispersion in shallow water flow is simulated by combining the MRT-LBM model and the TRT-LBM model. This talk will demonstrate the GPU parallel performance for modeling mass transport phenomena in shallow water flows.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Algorithms, GTC 2013 - ID S3324
Streaming:
Download:
 
Porting Marine Ecosystem Model Spin-up Using Transport Matrices to GPUs
Jaroslaw Piwonski (Institute for Computer Science and Kiel Marine Science, Centre for Interdisciplinary Marine Science, Christian-Albrechts Universitaet zu Kiel)
This session shows the necessary steps of porting an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory archi ...Read More

This session shows the necessary steps of porting an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The used programming languages are C and Fortran. A special emphasis lies on using biogeochemical models written in Fortran without any modifications to the original code. Using the GPU Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Supercomputing & HPC, GTC 2013 - ID S3385
Streaming:
Download:
 
Towards GPU-accelerated Operational Weather Forecasting
Oliver Fuhrer (MeteoSwiss)
A full GPU implementation of the COSMO numerical weather prediction and regional climate model will will presented. Design criteria such as high performance, retaining maintainability, enforcing a single source code which still compiles and runs ...Read More

A full GPU implementation of the COSMO numerical weather prediction and regional climate model will will presented. Design criteria such as high performance, retaining maintainability, enforcing a single source code which still compiles and runs on x86-based, led us to opt for different approaches in different parts of the model code. Performance critical parts are implemented employing a stencil library built on top of a domain specific embedded language (DSEL) with a CUDA back-end. Other parts were ported by restructuring of the legacy Fortran code and inserting OpenACC compiler directives. The session will also highlight the integration of these different technologies.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID S3417
Streaming:
Download:
 
Running the FIM and NIM Weather Models on GPUs
Mark Govett (NOAA Earth System Research Laboratory)
Two U.S. global-scale weather models, developed at NOAA, are running on GPUs. The F