SEARCH SESSIONS

Search All
 
Refine Results:
All tags
All Events
 
All Years
All Types

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

GTC On-Demand Featured Talks

GPU computing is a transformational force in high performance computing and is enabling developers, engineers, programmers and researchers across a myriad of industry verticals, as well as academia to accelerate research and mission critical applications. See our featured sessions highlighting some of our best talks or delve head-long into the many other keynotes, technical sessions, presentations, research posters, webinars and tutorials we make available to you at any time on GTC On-Demand.

Astronomy & Astrophysics
Presentation
Media
Follow the Light: Plasma Physics on 18,000 GPUs
Richard Pausch (Helmholtz-Zentrum Dresden - Rossendorf), Guido Juckeland (ZIH, Technical University Dresden)
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted ...Read More
We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted by individual electrons in a large-scale, turbulent plasma, providing unique insight into the relation between the plasma dynamics and observable radiation spectra.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Supercomputing, GTC 2014 - ID S4139
Streaming:
Download:
 
Real-Time Imaging in Radio-Astronomy: A Fully GPU-Based Imager
Sanjay Bhatnagar (National Radio Astronomy Observatory), Pradeep Kumar Gupta (NVIDIA)
We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near ...Read More

We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near-real time. Imaging software running on conventional computers currently take many orders of magnitude longer for imaging. In this presentation, we will briefly describe the algorithms and describe in more detail their adaptation for GPUs in particular and for heterogeneous computing in general. We will discuss the resulting run-time performance on the GPU using deal data from existing radio telescopes. Test with our current implementation show a speed-up of upto 100x compared to CPU implementation in the critical parts of processing enabling us to reduce the memory footprint by replacing compute-and-cache with on-demand computing on the GPU. For scientific use cases requiring high resolution high sensitivity imaging such a GPU-based imager represents an enabler technology.

  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4223
Streaming:
Download:
 
High Resolution Astrophysical Fluid Dynamics Simulations on a GPU Cluster
Pierre Kestener (CEA)
A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We ...Read More
A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We will first report on technical expertise gained in developing code Ramses-GPU designed for efficient use of large cluster of GPUs in solving MHD flows. We will illustrate how challenging state-of-the-art highly resolved simulations requiring hundreds of GPUs can provide new insights into real case applications: (1) the study of the Magneto-Rotational Instability and (2) high Mach number MHD turbulent flows.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4274
Streaming:
Download:
 
Conquering the Titan Supercomputer: A Star-by-Star Simulation of the Milky Way Galaxy
Evghenii Gaburov (SURFsara), Jeroen Bedorf (Leiden Observatory)
In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. T ...Read More
In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. The audience will learn what it takes to parallelize an advanced hierarchical GPU tree-code to efficiently run on the Titan supercomputer. A gravitational N-body problem is by definition an all-to-all problem and it is of utmost importance for scalability to hide data communication behind computations. This turned out to be a major challenge on the Titan supercomputer because Bonsai's GPU kernels are ~3x faster on Kepler than on Fermi, which reduced compute time and as a result hampered scalability. We were able to solve this by redesigning the communication strategy by taking full advantage of each of the 16- CPU cores while the GPUs were busy computing gravitational forces. This allowed Bonsai to scale to more than 8192 GPUs.  Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing, GTC 2014 - ID S4347
Streaming:
Download:
 
Driving the Next Generation of Extremely Large Telescopes Using Adaptive Optics with GPUs
Damien Gratadour (LESIA - Observatoire de Paris)
The European Southern Observatory is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the Universe ever built, with a first light foreseen in 2022. The E-E ...Read More
The European Southern Observatory is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the Universe ever built, with a first light foreseen in 2022. The E-ELT will be the first telescope that will entirely depend, for routine operations, on adaptive optics (AO), an instrumental technique for the correction of dynamically evolving aberrations in an optical system, used on astronomical telescopes to compensate, in real-time, for the effect of atmospheric turbulence. In this session, we will show how GPUs can provide the throughput required to both simulate at high framerate and drive in real-time these AO systems that provide tens of thousands of degrees of freedom activated several hundreds times per second.   Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4357
Streaming:
 
RAMSES on the GPU: An OpenACC-Based Approach
Claudio Gheller (ETH-CSCS)
We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e. ...Read More
We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e.g. star formation, galaxy dynamics, large scale structure of the universe) treating at the same time various components (dark energy, dark matter, baryonic matter, photons) and including a variety of physical processes (gravity, magneto-hydrodynamics, chemical reactions, star formation, supernova and AGN feedback, etc.). It is implemented in Fortran 90 and adopts the OpenACC paradigm to offload some the most computationally demanding algorithms to the GPU. Two different strategies have been pursued for code refactoring, in order to explore complementary solutions and select the most effective approach. The resulting algorithms are presented together with the results of tests, benchmarks and scientific use cases.  Back
 
Keywords:
Astronomy & Astrophysics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing, GTC 2014 - ID S4365
Streaming:
Download:
 
Black Holes on the GPU: Experiences with Accelerated Relativity
Adam Lewis (University of Toronto/ CITA)
New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these ban ...Read More
New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these banks requires black hole mergers of many different masses, spins, and orbital eccentricities to be simulated. This is not yet feasible, since even a single simulation may take several months. GPU acceleration offers a theoretical speedup of 50X, but until now has been too laborious to attempt. This is no longer the case: using a combination of hand-coding in CUDA, calls to CUBLAS and cuSPARSE, and our own automatic porting routine "CodeWriter," we have successfully accelerated the C++-based "Spectral Einstein Code". I will discuss our porting strategy, the challenges we encountered, and the new science made possible by the GPU. This talk should be of particular interest to scientists working on GPU ports of their own codes.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4423
Streaming:
 
COBALT: Creating a High-Throughput, Real-Time Production System Using CUDA, MPI and OpenMP
Wouter Klijn (ASTRON), Jan David Mol (ASTRON)
We present our experiences in designing, building and deploying a massively parallel processing system for the LOFAR radio telescope using off-the-shelf hardware and software. After numerous hurdles, we created a high-throughput system, based on CUDA ...Read More
We present our experiences in designing, building and deploying a massively parallel processing system for the LOFAR radio telescope using off-the-shelf hardware and software. After numerous hurdles, we created a high-throughput system, based on CUDA, MPI and OpenMP running on multi-GPU, multi-socket servers and InfiniBand. These techniques have established niches. However, due to conflicting memory models, incompatible requirements and abstractions, the otherwise orthogonal techniques do not cooperate well within the same application. Using the project's time line as a guide we will answer the following questions: (1)What problems appear when combining these techniques? (2) How did we adjust both the hardware and the software to meet our requirements? (3) How did we robustly develop and deploy to both development boxes and a production cluster? And, most importantly, (4)how does the system perform?   Back
 
Keywords:
Astronomy & Astrophysics, Programming Languages & Compilers, Signal & Audio Processing, Supercomputing, GTC 2014 - ID S4441
Streaming:
Download:
 
Fire and Ice: How Temperature Affects GPU Performance
Danny Price (Harvard-Smithsonian Center for Astrophysics)
Is it worth cooling your GPUs, or should you run them hot? In this session, we discuss how operating temperature affects the computational performance of GPUs. Temperature-dependent leakage current effects contribute significantly to power dissipatio ...Read More
Is it worth cooling your GPUs, or should you run them hot? In this session, we discuss how operating temperature affects the computational performance of GPUs. Temperature-dependent leakage current effects contribute significantly to power dissipation in nanometer-scale circuits; within GPUs this corresponds to decreased performance per watt. We use the CUDA-based xGPU code for radio astronomy to benchmark Fermi and Kepler GPUs while controlling the GPU die temperature, voltage, and clock speed. We report on trends and relate these measurements to physical leakage current mechanisms.  Back
 
Keywords:
Astronomy & Astrophysics, Clusters & GPU Management, GTC 2014 - ID S4484
Streaming:
Download:
 
Petascale Cross-Correlation: Extreme Signal-Processing Meets HPC
Ben Barsdell (Harvard University)
How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from mon ...Read More
How do you cross-correlate 10,000 signals 100 million times per second? This is an example of the type of compute-bound problem facing modern radio astronomy, which, paralleling the paradigm shift in computing architectures, has transitioned from monolithic single-dish telescopes to massive arrays of smaller antennas. In this session we will describe how general-purpose HPC installations can be used to achieve scaling of a cross-correlation pipeline to petascale with all the flexibility of a purely-software implementation. Optimisations we will discuss include tuning of the GPU cross-correlation kernel, maximising concurrency between compute and network operations, and minimising bandwidth bottlenecks in a streaming application. GPUs are already powering the world's biggest radio telescope arrays, and this work paves the way for entirely off-the-shelf correlators for the future exascale-generation of instruments.  Back
 
Keywords:
Astronomy & Astrophysics, Signal & Audio Processing, Supercomputing, GTC 2014 - ID S4511
Streaming:
Download:
 
Real-Time RFI Rejection Techniques for the GMRT Using GPUs
Rohini Joshi (Drexel University)
Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, etc. ...Read More
Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, etc. Seen in the form of spikes and bursts in raw voltage data, RFI is statistically seen as outliers in a Gaussian distribution. We present an approach to tackle the problem of RFI, in real-time, using a robust scale estimator such as the Median Absolute Deviation (MAD). Given the large data rate from each of the 30 antennas, sampled at 16 ns, it is necessary for the filter to work well within real-time limits. To accomplish this, the algorithm has been ported to the GPUs to work within the GMRT pipeline. Presently, the RFI rejection pipeline runs in real-time for 0.3-0.7 sec long data chunks. The GMRT will soon be upgraded to work at 10 times the current data rate. We are now working on improving the algorithm further so as to have the RFI rejection pipeline ready for the upgraded GMRT.  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics & Data Algorithms, Signal & Audio Processing, GTC 2014 - ID S4538
Streaming:
Automotive
Presentation
Media
UI Composer for Automotive HMIs - Part 1: What, Why, and How
Gavin Kistner (NVIDIA), Stephen Mendoza (NVIDIA)
In depth view into content creation using UI Composer. Including the Digital asset pipeline, animation, materials, development of state-machines, and debugging. ...Read More
In depth view into content creation using UI Composer. Including the Digital asset pipeline, animation, materials, development of state-machines, and debugging.  Back
 
Keywords:
Automotive, Debugging Tools & Techniques, Digital Product Design & Styling, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4616
Streaming:
Download:
 
UI Composer for Automotive HMIs - Part 2: Building Content
Gavin Kistner (NVIDIA), Xavier Mendoza (NVIDIA)
A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this se ...Read More
A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this session, attendees are asked to bring their own Windows laptop with UI Composer installed. UI Composer is available for free from http://uicomposer.nvidia.com/  Back
 
Keywords:
Automotive, Debugging Tools & Techniques, Digital Product Design & Styling, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4806
Streaming:
Download:
 
Real-Time Electromagnetic Wave Propagation Using OptiX for Simulation of Car-to-Car-Communication
Manuel Schiller (Technische Universitat Munchen)
In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless C ...Read More
In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless Car-to-Car communication. Learn how ray tracing performance can be improved to archieve real-time simulations and how the ray tracing results are post-processed to perform the electromagnetic calculations on the GPU using the Thrust library.  Back
 
Keywords:
Automotive, Computational Physics, Ray Tracing, GTC 2014 - ID S4359
Streaming:
Download:
 
Tegra K1 and the Automotive Industry
Gernot Ziegler (NVIDIA), Timo Stich (NVIDIA)
Discover how mobile GPUs enable modern features of car driving in a power-efficient and standardized way, by providing the fundamental building blocks of computer vision to the higher-level reasoning functions that enable the car to detect lanes, par ...Read More
Discover how mobile GPUs enable modern features of car driving in a power-efficient and standardized way, by providing the fundamental building blocks of computer vision to the higher-level reasoning functions that enable the car to detect lanes, park automatically, avoid obstacles, etc. We explain the challenges of having to fit into a given time budget, and how the low-level machine vision such as corner detection, feature tracking and even more advanced functionality such as 3D surrounding reconstruction is achieved in the context of the car's systems and its outside environment.  Back
 
Keywords:
Automotive, Computer Vision, Machine Learning & AI, Mobile Applications, GTC 2014 - ID S4412
Streaming:
Download:
 
Beyond Pedestrian Detection: Deep Neural Networks Level-Up Automotive Safety
Hideki Niihara (Denso IT Laboratory, Inc.), Ikuro Sato (Denso IT Laboratory, Inc.)
People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We hav ...Read More
People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We have a vision that future Advanced Driver Assistance Systems enable not just detecting pedestrians but recognizing how the pedestrians are and understanding the level of danger to avoid emergency situations. We claim deep Convolutional Neural Networks (CNN) are the right tools for these highly non-trivial tasks, and Tegra is the best partner. We demonstrate real-time deep CNN using Tegra.   Back
 
Keywords:
Automotive, Computer Vision, Machine Learning & AI, GTC 2014 - ID S4621
Streaming:
Download:
 
One Car Fits You: Technology and Opportunities in the Personalized Car
Ryan Middleton (Delphi)
Learn about two Delphi projects that are pushing the concept of a personalized in-vehicle experience. As drivers bring more of their personal content and personal style into the car, opportunities are emerging for car makers and platform providers to ...Read More
Learn about two Delphi projects that are pushing the concept of a personalized in-vehicle experience. As drivers bring more of their personal content and personal style into the car, opportunities are emerging for car makers and platform providers to differentiate their offerings. We will explore the infotainment architecture of the future - enabling feature upgrades at the same rate as mobile devices. We will also explore how GPU technology enables "months-to-minutes" user interfaces, and greater flexibility in end-user personalization.  Back
 
Keywords:
Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4659
Streaming:
 
NVIDIA Vision Toolkit for Advanced Driver Assistance Systems, Computational Photography and Beyond
Elif Albuz (NVIDIA), Frank Brill (NVIDIA)
In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision ap ...Read More
In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision applications. It leverages state-of-the-art Computer Vision research and offers a variety of functions to its developers,initially targeting Advanced Driver Assistance Systems (ADAS) and Augmented Reality (AR) applications. The toolkit will be highly GPU accelerated on mobile platforms, offering significant speedup and reducing engineering effort to design real-time vision applications. The toolkit includes open source samples and offers a flexible framework that enables users to extend and contribute new functionality. It will be deployed on different operating systems including Android and the Linux on ARM to registered developers and partners through NVIDIA's web site.  Back
 
Keywords:
Automotive, Computational Photography, Computer Vision, Mobile Summit, GTC 2014 - ID S4714
Streaming:
Download:
 
Today's LiDARs and GPUs Enable Ultra-Accurate GPS-Free Navigation with Affordable Simultaneous Localization and Mapping
Louay Eldada (Quanergy Systems, Inc.)
With recent advances in low-cost high-performance LiDARs (laser-based Light Detection and Ranging sensors) and GPUs, ultra-accurate GPS-free navigation based on SLAM (Simultaneous Localization and Mapping) is becoming a reality. Learn how the latest ...Read More
With recent advances in low-cost high-performance LiDARs (laser-based Light Detection and Ranging sensors) and GPUs, ultra-accurate GPS-free navigation based on SLAM (Simultaneous Localization and Mapping) is becoming a reality. Learn how the latest 360? field of view long-range 3D mapping LiDARs capable of generating data streams at gigasample-per-second (GSPS) sampling rates are used with 192 CUDA core GPUs based on the Kepler architecture to run artificial intelligence software and deliver advanced vehicular safety and navigation systems capable of real-time object detection, tracking, identification and classification, as well as offline full-availability jam-proof centimeter-accurate navigation.  Back
 
Keywords:
Automotive, Combined Simulation & Real-Time Visualization, In-Vehicle Infotainment (IVI) & Safety, Machine Learning & AI, GTC 2014 - ID S4761
Streaming:
Download:
 
Embedded Development For Tegra K1
Jesse Clayton (NVIDIA)
The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. ...Read More
The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. Jesse Clayton from NVIDIA will articulate the embedded development process for Tegra K1. The talk will cover the platform, programming paradigm, and development tools, and provide details on the Tegra K1 architecture relevant to embedded applications.   Back
 
Keywords:
Automotive, Defense, Computer Vision, Machine Learning & AI, GTC 2014 - ID S4938
Streaming:
 
Audi Piloted Parking on zFAS: Valet Parking for the 21st Century
Miklos Kiss (Audi Electronics Venture GmbH)
What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century. ...Read More
What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century.  Back
 
Keywords:
Automotive, Video & Image Processing, GTC 2014 - ID S4961
Streaming:
Big Data Analytics & Data Algorithms
Presentation
Media
Accelerate Distributed Data Mining with Graphics Processing Units
Nam-Luc Tran (EURA NOVA)
Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more ...Read More
Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more flexible models exist based on the DFG processing model. None of the existing frameworks however have considered the case when the individual processing nodes are equipped with GPUs to accelerate parallel computations. In this talk, we discuss this challenge and the implications of the presence of GPUs on some of the processing nodes on the DFG model representation of such heterogeneous jobs and on the scheduling of the jobs, with big data mining as principal use case.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, GTC 2014 - ID S4169
Streaming:
Download:
 
GPU-Accelerated Large-Scale Dense Subgraph Detection
Andy Wu (Xerox Research Center)
Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation ...Read More
Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation limitation, traditional approaches are infeasible when dealing with large-scale graph with millions or billions vertices. In this presentation, we proposed a GPU accelerated dense subgraph detection algorithm to solve the large-scale dense subgraph detection problem. It successfully mapped the irregular graph clustering problem into the GPGPU platform, and extensive experimental results demonstrated our strong scalability on the GPU computing platforms.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Bioinformatics & Genomics, GTC 2014 - ID S4215
Streaming:
Download:
 
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Haicheng Wu (Georgia Institute of Technology)
This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel f ...Read More
This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel fusion which can be applied to other applications.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Programming Languages & Compilers, GTC 2014 - ID S4222
Streaming:
Download:
 
Histograms in CUDA: Privatized for Fast, Level Performance
Nicholas Wilt (The CUDA Handbook)
Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to u ...Read More
Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to use "privatized" (per-thread) histograms to balance performance of the average case against data-dependent performance of degenerate cases.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Video & Image Processing, GTC 2014 - ID S4249
Streaming:
 
Packet-based Network Traffic Monitoring & Analysis with GPUs
Wenji Wu (Fermilab)
In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications f ...Read More
In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications face tremendous performance and scalability challenges. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. At Fermilab, we have prototyped a GPU-assisted network traffic monitoring & analysis system, which analyzes network traffic on a per-packet basis. We implemented a GPU-accelerated library for network traffic capturing, monitoring, and analysis. The library consists of various CUDA kernels, which can be combined in various ways to perform monitoring and analysis tasks. In this talk, we will describe our architectural approach in developing a generic GPU-assisted network traffic monitoring and analysis capability. Multiple examples will be given to demonstrate how to use GPUs to analyze network traffic.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Numerical Algorithms & Libraries, Computational Physics, Supercomputing, GTC 2014 - ID S4320
Streaming:
Download:
 
The Energy Case for Graph Processing on Hybrid CPU and GPU Systems
Elizeu Santos-Neto (University of British Columbia)
This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the ...Read More
This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the algorithmic tasks exercise each of the processing units where they perform best; GPUs have much higher TDP thus their impact on overall energy consumption is unclear. An evaluation on large real-world graphs as well as on synthetic graphs as large as 1 billion vertices and 16 billion edges shows that efficiency - in terms of both performance and power, can be achieved.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Energy Exploration, GTC 2014 - ID S4338
Streaming:
 
Real-Time Quantification Filters for Multidimensional Databases
Peter Strohm (Jedox AG)
Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, g ...Read More
Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, given a set of dimensional elements, returns all those elements for which ANY (or ALL) numeric cells in the respective slice of a user-defined subcube satisfy a given condition. Such filters are especially useful for the exploration of big data spaces, for zero-suppression in large views, or for top-k analyses. In addition to the main algorithmic aspects, attendees will see how our implementation solves challenges such as economic utilization of the CUDA memory hierarchy or minimization of threading conflicts in parallel hashing.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Finance, GTC 2014 - ID S4395
Streaming:
Download:
 
Rhythm: Harnessing Data Parallel Hardware for Server Workloads
Sandeep Agrawal (Duke University)
We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the sh ...Read More
We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the shelf hardware used for individual machines leading to an inefficient usage of energy and area. Rhythm improves upon this by harnessing data parallel hardware to execute "cohorts" of web service requests, grouping requests together based on similar control flow and using intelligent data layout optimizations. An evaluation of the SPECWeb Banking workload for future server platforms on the GTX Titan achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, GTC 2014 - ID S4447
Streaming:
Download:
 
Parallel Lossless Compression Using GPUs
Evangelia Sitaridi (Columbia University)
Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute re ...Read More
Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute resources. We focus on a the DEFLATE algorithm that is a combination of the LZSS and Huffman entropy coding algorithms, used in common compression formats like gzip. Both algorithms are inherently serial and trivial parallelization methods are inefficient. We show how to parallelize these algorithms efficiently on GPUs and discuss trade-offs between compression ratio and increased parallelism to improve performance. We conclude our presentation with a head-to-head comparison to a multi-core CPU implementation, demonstrating up to half an order of performance improvement using a single Kepler GPU. This is joint work with IBM researchers Rene Mueller and Tim Kaldewey.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, GTC 2014 - ID S4459
Streaming:
Download:
 
GPUs and Regular Expression Matching for Big Data Analytics
Alon Shalev Housfater (IBM)
Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based r ...Read More
Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based regular expression technology will be introduced, its basic performance characteristics will be presented. We'll demonstrate that the GPU enables impressive performance gains in pattern matching tasks and compare its performance against latest generation processors. Finally, we'll examine the key challenges in using such accelerators in large software products and highlight open problems in GPU implementation of pattern matching tasks.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, GTC 2014 - ID S4462
Streaming:
 
High Speed Analysis of Big Data Using NVIDIA GPUs and Hadoop
Partha Sen (Fuzzy Logix)
Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs ...Read More
Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs to accelerate analytics on Hadoop is an optimal solution that drives high price to performance benefits. In this session, we'll demonstrate a solution using NVIDIA GPUs for the analysis of big data in Hadoop. The demo will show how you can leverage the Hadoop file system, it's map reduce architecture and GPUs to run computationally intense models bringing together both data and computational parallelism. Methods demonstrated will include classification techniques such as decision trees, logistic regression and support vector machines and clustering techniques like k means, fuzzy k means and hierarchical k means on marketing, social and digital media data.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Bioinformatics & Genomics, Finance, GTC 2014 - ID S4471
Streaming:
 
Recursive Interaction Probability: A New Paradigm in Parallel Data Processing
Richard Heyns (brytlyt)
This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will e ...Read More
This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will end with how RIP was implemented on a NVIDIA Kepler K20c, the design choices and how these affect performance. Use cases that play to the strengths of RIP as well as use cases that reveal its weaknesses will also be shared.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Numerical Algorithms & Libraries, Clusters & GPU Management, GTC 2014 - ID S4483
Streaming:
 
Indexing Documents on GPU - Can You Index Web in Real Time?
Michael Frumkin (NVIDIA)
Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an applicatio ...Read More
Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an application that has a large degree of parallelism, but medium divergence. Specifically, we concentrate on text processing used to index web documents. We present indexing algorithms for both GPU and CPU and show that GPU outperforms CPU on two common workloads. We argue that a medium sized GPU enabled cluster will be able to index all internet documents in one day. Indexing of web documents on GPU opens a new area for GPU computing. Companies that provide search services spend a lot of cycles on indexing. Faster and more energy efficient indexing on GPU may provide a valuble alternative to CPU-only clusters used today.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Machine Learning & AI, GTC 2014 - ID S4506
Streaming:
Download:
 
Evaluation of Parallel Hashing Techniques
Rajesh Bordawekar (IBM T. J. Watson Research Center)
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present ...Read More
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present different ways of implementing these functions on the GPU, with emphasis on data structures that exploit GPU's data parallel features as well as memory constraints.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Programming Languages & Compilers, GTC 2014 - ID S4507
Streaming:
Download:
 
A High-Speed 2-Opt TSP Solver for Large Problem Sizes
Martin Burtscher (Texas State University)
Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and t ...Read More
Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and tiling, introducing non-determinism to avoid synchronization, and parallelizing each operation rather than across operations to minimize thread divergence and drastically lower the latency of result production. The final code evaluates 68.8 billion moves per second on a single Titan GPU.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4534
Streaming:
Download:
 
Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL
Jeremy Meredith (Oak Ridge National Laboratory)
Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific ...Read More
Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific data model and targets future high performance computing ecosystems. This talk shows how a productive programming API built upon an efficient data model can help algorithm developers achieve high performance with little code. Discussions will include examples and lessons learned.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Scientific Visualization, GTC 2014 - ID S4553
Streaming:
Download:
 
Middleware Framework Approach for BigData Analytics Using GPGPU
Ettikan Kandasamy Karuppiah (MIMOS Bhd)
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing ...Read More
Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing across heterogeneous computing resources for structured and unstructured BigData applications. Thus, we propose a middleware framework for 'Big Data' analytics that provides mechanisms for automatic data segmentation, distribution, execution, information retrieval across multiple cards (CPU & GPU) and machines, a modular design for easy addition of new GPU kernels at both analytic and processing layer, and information presentation. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory ac-cess, algorithms for parallel GPU computation and results for various test con-figurations are shown. Our results show proposed middleware framework pro-vides alternative and cheaper HPC solution to users.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Finance, Video & Image Processing, GTC 2014 - ID S4583
Streaming:
Download:
 
Extending Python for High-Performance Data-Parallel Programming
Siu Kwan Lam (Continuum Analytics, Inc)
Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich ...Read More
Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich library support and language simplicity makes Python ideal for subject matter experts to rapidly develop powerful applications. Python enables fast turnaround time and flexibility for custom analytic pipelines to react to immediate demands. However, CPython has been criticized as being slow and the existence of the global interpreter lock (GIL) makes it difficult to take advantage of parallel hardware. To solve this problem, Continuum Analytics has developed LLVM based JIT compilers for CPython. Numba is the open-source JIT compiler. NumbaPro is the proprietary compiler that adds CUDA GPU support. We aim to extend and improve the current GPU support in NumbaPro to further increase the scalability and portability of Python-based GPU programming.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Large Scale Data Analytics, Defense, Programming Languages & Compilers, GTC 2014 - ID S4608
Streaming:
Download:
 
High-Performance Graph Primitives on GPU: Design and Implementation of Gunrock
Yangzihao Wang (UC Davis)
Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future developm ...Read More
Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. The talk will share experience on how to design the framework and APIs for computing efficient graph primitives on GPUs. We will focus on the following two aspects: 1) Details of the implementations of several graph algorithms on GPUs. 2) How to abstract these graph algorithms using general operators and functors on GPUs to improve programmer productivity.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Large Scale Data Analytics, Defense, GTC 2014 - ID S4609
Streaming:
Download:
 
Speeding Up GraphLab Using CUDA
Vishal Vaidyanathan (Royal Caliber)
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and ...Read More
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple gpus within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU based GAS framework.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Performance Optimization, Large Scale Data Analytics, Defense, GTC 2014 - ID S4611
Streaming:
 
Speeding Up GraphLab Using CUDA
Vishal Vaidyanathan (Royal Caliber)
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and ...Read More
We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple GPUs within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU-based GAS framework.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Performance Optimization, GTC 2014 - ID S4612
Streaming:
 
A High Level API for Fast Development of High Performance Graphic Analytics on GPUs
Zhisong Fu (SYSTAP)
The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performan ...Read More
The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performance graph analytics are critical for a large range of application domains. The SIMT architecture of the GPUs and the irregularity nature of the graphs make it difficult to develop efficient graph analytics programs. In this session, we present an open source library that provides a high level abstraction for efficient graph analytics with minimal coding effort. We use several specific examples to show how to use our abstraction to implement efficient graph analytics in a matter of hours.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, Large Scale Data Analytics, Defense, GTC 2014 - ID S4617
Streaming:
Download:
 
Getting Big Data Done On a GPU-Based Database
Ori Netzer (SQream Technologies)
We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our dat ...Read More
We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our database provides close to real-time analytics and provides up to 100X faster insights all in a very cost-effective manner. We will elaborate on these features and more in order to provide a clear understanding of how our technology works and why it is beneficial for teleco companies.   Back
 
Keywords:
Big Data Analytics & Data Algorithms, GTC 2014 - ID S4644
Streaming:
Download:
 
Parallel Decomposition Strategies in Modern GPU
Sean Baxter (NVIDIA)
Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join. ...Read More
Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Performance Optimization, GTC 2014 - ID S4674
Streaming:
 
Extreme Machine Learning with GPUs
John Canny (UC Berkeley)
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach ...Read More
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach is an easy-to-use, interactive environment similar to SciPy/Matlab, but with qualitatively higher performance. The session will discuss: Performance: BIDMach follows a "Lapack" philosophy of building high-level algorithms on fast low-level routines (like BLAS). It exploits the unique hardware features of GPUs to provide more than order-of-magnitude gains over alternatives. Accuracy: Monte-Carlo methods (MCMC) are the most general way to derive models, but are slow. We have developed a new approach to MCMC which provides two orders of magnitude speedup beyond hardware gains. Our "cooled" MCMC is fast and improves model accuracy. Interactivity: We are developing interactive modeling/visualization capabilities in BIDMach to allow analysts to guide, correct, and improve models in real time.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Bioinformatics & Genomics, Machine Learning & AI, Scientific Visualization, GTC 2014 - ID S4811
Streaming:
Download:
 
First Glimpse into the OpenPOWER Software Stack with Big Data Workload Example (Presented by IBM)
Keith Campbell (IBM), Ken Rozendal (IBM)
The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables uni ...Read More
The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables unique innovation across the full hardware and software stack. OpenPOWER ecosystem partners and developers now have more choice, control and flexibility to optimize at any level of the technology from the processor on up for next-generation, hyperscale and cloud datacenters. Integrating support for NVIDIA GPUs on the POWER platform enables high performance enterprise and technical computing applications such as Big Data and analytics workloads. This presentation will cover the software stack and developer tools for OpenPOWER, the planned support for CUDA, and a proof of concept showing GPU acceleration. This proof of concept will be available as a demo in the IBM booth.  Back
 
Keywords:
Big Data Analytics & Data Algorithms, Debugging Tools & Techniques, Programming Languages & Compilers, GTC 2014 - ID S4882
Streaming:
Download:
Bioinformatics & Genomics
Presentation
Media
Restricting the Seed-and-Extend Search Space in GPU-Based Short-Read Alignment
Richard Wilton (Johns Hopkins University)
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 ...Read More
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 to 10 times faster than CPU implementations with comparable sensitivity and mapping quality. We have taken a different approach to the use of GPUs by implementing a series of CUDA kernels that filter the set of reference locations at which to compute seed-and-extend alignments, thereby decreasing the amount of parallel sequence-matching computation and improving the overall throughput of the GPU/CPU pipeline. Even without extreme CUDA code optimization, we observe increased sensitivity (i.e., a larger number of reported valid mappings) with throughput as good as or better than existing GPU-based sequence aligners.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4248
Streaming:
Download:
 
Parallel Implementation of PK-PD Parameter Estimation on GPU Using Grid Search Method
Nishant Agrawal (Tata Consultancy Services Limited), Rihab Abdulrazak (Tata Consultancy Services Limited)
The goal of this session is to showcase the performance improvements acheived during parallel implementation of PK-PD (Pharmacokinetic-Pharmacodynamic) parameters estimation on GPU. Grid Search method is being used here for PK-PD model initial parame ...Read More
The goal of this session is to showcase the performance improvements acheived during parallel implementation of PK-PD (Pharmacokinetic-Pharmacodynamic) parameters estimation on GPU. Grid Search method is being used here for PK-PD model initial parameters estimation. Parallel implementation on GPUs provides much faster solutions to time-consuming problems in pharma domain as discovery of new drugs has become increasingly challenging because of sheer volume of data. Parallelizing serial version of the application on GPU keeping device architectural aspects in mind helps in achieving high performance i.e. to reduce the overall execution time. This talk is about stepwise approaches used to optimize application further and to leverage Tesla & Kepler hardware architecture capabilities for high performance. A substantial improvement in execution time was observed after implementation in parallel.  Back
 
Keywords:
Bioinformatics & Genomics, Clusters & GPU Management, Supercomputing, GTC 2014 - ID S4396
Streaming:
 
Hybrid Clustering Algorithms for Degenerate Primer Development on the GPU
Trevor Cickovski (Eckerd College)
Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When ...Read More
Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When analyzing multiple related genomes the primer must be degenerate, containing an amount of uncertainty that we must minimize. We use graphics processing units (GPUs) to analyze the performance of a parallelized hierarchical clustering algorithm for grouping related genomes prior to degenerate primer construction, and also hybridize this algorithm with strategies from K-Means and Fuzzy C-Means. We demonstrate an order of magnitude improvement when running these algorithms on nearly one thousand sequences of more than seven thousand nucleotides from the human genome.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4424
Streaming:
Download:
 
GPU-Based Bayesian Phylogenetic Inference Beyond Extreme Scale
Mitchel Horton (Georgia Institute of Technology)
See how researchers can, for the first time, infer phylogenetics trees of unlimited size using the bonanza of biological sequence data available to them today. We will present a phylogenetic inference approach that combines an existing GPU-based Bay ...Read More
See how researchers can, for the first time, infer phylogenetics trees of unlimited size using the bonanza of biological sequence data available to them today. We will present a phylogenetic inference approach that combines an existing GPU-based Bayesian phylogenetic reconstruction application (BEAST/BEAGLE) with the notion of performing an independent Markov chain Monte Carlo (MCMC) run on any number of GPUs, on any number of nodes, of any size HPC GPU cluster. The approach will be shown to scale indefinitely for sufficiently large problems. In addition, we will present a new batch matrix-matrix product CUDA kernel used for the matrix exponentiaton at the heart of the phylogenetic inference algorithm.  Back
 
Keywords:
Bioinformatics & Genomics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4476
Streaming:
Download:
 
Training Random Forests on the GPU: Genomic Implications on HIV Susceptibility
Mark Seligman (Rapidics LLC)
The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions fo ...Read More
The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions for new data. Recent efforts at acceleration have focused on the independence of both the construction, and walking, of distinct trees using, for example, multi-CPU and Hadoop-based approaches. Here, by contrast, we report progress in parallelizing the construction of individual trees themselves using the GPU. This enables the algorithm to treat very wide data sets, such as those common in genomic studies, in times significantly shorter than have been reported before now. This also makes practical iterative invocation and enables, for example, reweighted and variational applications of the algorithm. We demonstrate recent results on studies of HIV-susceptibility in subjects from Sub-Saharan Africa.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics & Data Algorithms, Machine Learning & AI, Supercomputing, GTC 2014 - ID S4502
Streaming:
Download:
 
GPU Accelerated Genomics Data Compression
BingQiang Wang (BGI)
A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated co ...Read More
A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated compression algorithms, and 3) column-major storage. This approach fully exploit similarity within individual columns in popular genomics data formats, by using appropriate compression scheme (combination of algorithms), then GPU is employed to speedup compression / decompression thus several folds faster bandwidth.  Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4526
Streaming:
Download:
 
GPU Enables Bivariate and Trivariate Routine Analysis of Case-Control GWAS
Adam Kowalczyk (National ICT Australia), Qiao Wang (National ICT Australia)
Genome-wide association studies (GWAS) examine millions of DNA loci in an attempt to associate DNA mutations with a given disease. Complex aetiologies of many common diseases are believed to involve combinations of different genes, requiring evaluati ...Read More
Genome-wide association studies (GWAS) examine millions of DNA loci in an attempt to associate DNA mutations with a given disease. Complex aetiologies of many common diseases are believed to involve combinations of different genes, requiring evaluation of trillions of (non-additive) combinations of loci. We have developed solutions using a single GPU to evaluate association of all bivariate features within minutes (available via a free web service). Although exhaustive trivariate analysis currently requires a GPU cluster, focused trivariate analysis can be accomplished routinely on a single GPU within hours.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4592
Streaming:
Download:
 
GPU-Accelerated Algorithms in Bioinformatics and Data Mining
Bertil Schmidt (Johannes Gutenberg University Mainz)
The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-accelera ...Read More
The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-acceleration of the following methods will be discussed: (1) Smith-Waterman algorithm on Kepler (CUDASW++ 3.0) compared to an equivalent Xeon Phi implementation (SWAPHI); (2) Short read aligment (CUSHAW2-GPU and CUSHAW3); (3) Clustering of protein structures; (4) Alignment of time series with a Dynamic Time Warp inspired similarity measure; and (5) an effective scalable clustering algorithm for large data sets that builds upon the concept of divide-and-conquer.   Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4603
Streaming:
 
Current Uses and Future Prospects of Many-Core GPUs for High-Throughput Sequencing Data Analyses
Brian Lam (Cambridge University)
High-throughput sequencing (HTS) instruments can produce enormous amounts of information about our genome in a short period of time and enable us to better understand the biology of our everyday lives. HTS also poses a substantial challenge to the IT ...Read More
High-throughput sequencing (HTS) instruments can produce enormous amounts of information about our genome in a short period of time and enable us to better understand the biology of our everyday lives. HTS also poses a substantial challenge to the IT infrastructure and human resources, where analyzing data from these instruments often involves the use of high-performance computing (HPC) clusters and expertise from interdisciplinary professionals, who are literate in both biology and computing, thus restricting the access of the technology to large and well-established laboratories only. Many-core architectures, which can be seen in many high-end computer graphics processing units, or GPUs, may provide us an answer to this challenge. Packed with thousands of cores on a physical chip, a GPU can be just as quick as a small HPC cluster in many cases. In this session, we will explore the use of GPUs in accelerating the data analysis pipeline associated with HTS and investigate its future in this area.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4623
Streaming:
Download:
 
BWT Indexing: Big Data from Next Generation Sequencing and GPU
Jeanno Cheung (HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory)
With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text ind ...Read More
With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text indexing based on BWT has found to be very useful in speeding up the analysis of the high-throughput sequencing data. In this talk we consider two major problems in this context, namely, alignment of sequencing data onto a reference genome (for genetic variations detection), and indexing of sequencing data. These two problems have different applications and different technical challenges. We show how GPU can be exploited to achieve tremendous improvement in each case. In particular, our alignment solution makes it feasible to conduct NGS analysis even in the time-critical clinical environment; for example, 30+ fold whole genome sequencing data of human (~100 Gigabases) can be aligned and analyzed in a few hours, with sensitivity and accuracy even higher than before.  Back
 
Keywords:
Bioinformatics & Genomics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4628
Streaming:
Download:
 
Accelerating the DNA Sequencing Variant Calling Pipeline
Mauricio Carneiro (Broad Institute of MIT and Harvard)
Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and acceler ...Read More
Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and accelerate key parts of this pipeline. First we will give you an overview of the process and how researchers around the world are using DNA sequencing data to understand complex and rare variants and their associations with disease. Second we will show you the work we have done to speed up this pipeline through use of GPUs and other technologies. Third we will discuss a new version of the pipeline that takes advantage of the optimizations to enable incremental analysis, that is, leveraging all historical data on every new sequencing project with minimal overhead. We close this presentation by discussing the many points that are still open for optimization and how the community can get involved.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4679
Streaming:
 
Introducing NVBIO: High Performance Primitives for Computational Genomics
Jonathan Cohen (NVIDIA), Nuno Subtil (NVIDIA)
Learn about NVIDIA's new open source CUDA/C++ library for high-performance computational genomics, NVBIO. NVBIO includes primitives for fast alignment using many variants of Smith-Waterman, text indexing via an FM-Index and related data structures, ...Read More
Learn about NVIDIA's new open source CUDA/C++ library for high-performance computational genomics, NVBIO. NVBIO includes primitives for fast alignment using many variants of Smith-Waterman, text indexing via an FM-Index and related data structures, and approximate string matching with backtracking. It also provides basic services like file IO and inter-thread communication. The design of NVBIO supports pipeline parallelism, where computation is expressed as a sequence of stages with queues to communicate between stages. Using this design concept, we have engineered an implementation of the Bowtie2 aligner on top of NVBIO, which aligns short read data 2-7x faster than the original Bowtie2 running on a high-end multicore CPU at comparable quality. In this talk we will introduce the codebase and demonstrate how to use it for your own applications.  Back
 
Keywords:
Bioinformatics & Genomics, GTC 2014 - ID S4741
Streaming:
Download:
Climate, Weather, Ocean Modeling
Presentation
Media
Development, Parallelization and Performance of the NIM Next-Generation Weather Model on Titan
Mark Govett (NOAA)
The Non-hydrostatic Icosahedral Model (NIM) is a next-generation global weather model being developed at NOAA to improve 0-100 day weather predictions. Since development began in 2008, the model has been designed to run on highly parallel computer a ...Read More
The Non-hydrostatic Icosahedral Model (NIM) is a next-generation global weather model being developed at NOAA to improve 0-100 day weather predictions. Since development began in 2008, the model has been designed to run on highly parallel computer architectures such as GPUs. GPU parallelization has relied on the directive-based Fortran-to-CUDA ACCelerator (F2C-ACC) compiler developed at NOAA. Recent work has focused on parallelization of model physics, evaluating the openACC compilers, and preparing the model to run at the full 3.5KM resolution on 5000 nodes of Titan. This talk will report on the development of the NIM model, describe our efforts to improve parallel performance on Titan, and report on our experiences using the openACC compilers.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2014 - ID S4157
Streaming:
 
QUIC EnvSim: Radiative Heat Transfer in Vegetative and Urban Environments with NVIDIA OptiX
Matthew Overby (University of Minnesota Duluth)
This session presents QUIC EnvSim, a scientific tool for modeling the complex interactions between the environment and urban form. The talk will focus on the simulation of radiative heat transfer in urban environments with vegetation (such as trees, ...Read More
This session presents QUIC EnvSim, a scientific tool for modeling the complex interactions between the environment and urban form. The talk will focus on the simulation of radiative heat transfer in urban environments with vegetation (such as trees, parks, or green rooftops) using the GPU accelerated NVIDIA OptiX ray tracing engine. Attend this session to learn how we utilize OptiX to efficiently and accurately simulate radiative transport in urban domains. Topics include: (1) The physical properties of surfaces and vegetation and how they interact with longwave and shortwave radiation; (2) Efficient and scalable discretization of large urban domains; (3) Strategies we employed for overcoming challenges such as atomic operations, multiple GPUs, and more; and (4) Results that illustrate the validity, efficiency, and scalability of the system.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Ray Tracing, GTC 2014 - ID S4312
Streaming:
Download:
 
ASUCA on GPU: Uncompromising Hybrid Port for Physical Core of Japanese Weather Model
Michel Muller (RIKEN Advanced Institute for Computational Science)
ASUCA is the next generation non-hydrostatic Japanese mesoscale weather prediction model, currently developed at the Japan Meteorological Agency. In order to join the successful GPU port of its Dynamical Core by Shimokawabe et al., the Physical Core ...Read More
ASUCA is the next generation non-hydrostatic Japanese mesoscale weather prediction model, currently developed at the Japan Meteorological Agency. In order to join the successful GPU port of its Dynamical Core by Shimokawabe et al., the Physical Core has now been fully ported as well. In order to achieve a unified codebase with high usability as well as high performance on both GPU and CPU, a new directive based Open Source language extension called 'Hybrid Fortran' has been used (as introduced at GTC 2013). Using a python-based preprocessor it automatically creates CUDA Fortran code for GPU and OpenMP Fortran code for CPU - with two separate horizontal loop orders in order to keep performance. Attendees of this session will learn how to create a hybrid codebase with high usability as well as high performance on both CPU and GPU, how we used a preprocessor to achieve our goals and, how to use Macros for Memory optimizations while following the DRY principle.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2014 - ID S4352
Streaming:
Download:
 
Weather Prediction Code Witten by a High-productivity Framework for Multi-GPU Computing
Takashi Shimokawabe (Tokyo Institute of Technology)
Numerical weather prediction is one of the major applications in high-performance computing and is accelerated on GPU supercomputers. Obtaining good parallel efficiency using more than thousand GPUs often requires skillful programming, for example, b ...Read More
Numerical weather prediction is one of the major applications in high-performance computing and is accelerated on GPU supercomputers. Obtaining good parallel efficiency using more than thousand GPUs often requires skillful programming, for example, both MPI for the inter-node communication and NVIDIA GPUDirect for the intra-node communication. The Japan Meteorological Agency is developing a next-generation high-resolution meso-scale weather prediction code ASUCA. We are implementing it on a multi-GPU platform by using a high-productivity framework for mesh-based application. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU codes. The framework can also hide the complicated implementation for the efficient communications described above. In this presentation, we will show the implementation of the weather prediction code by using this framework and the performance evaluation on the TSUBAME 2.5 supercomputer at Tokyo Institute of Technology.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4565
Streaming:
 
Developing a System For Real-Time Numerical Simulation During Physical Experiments in a Wave Propagation Laboratory
Darren Schmidt (National Instruments)
ETH-Zurich is proposing a new concept for wave propagation laboratories in which the physical experiment is linked with a numerical simulation in real time. Adding live experimental data to a larger numerical simulation domain creates a virtual lab e ...Read More
ETH-Zurich is proposing a new concept for wave propagation laboratories in which the physical experiment is linked with a numerical simulation in real time. Adding live experimental data to a larger numerical simulation domain creates a virtual lab environment never before realized and enabling the study of frequencies inherent in important seismological and acoustic real-world scenarios. The resulting environment is made possible by a real-time computing system under development. This system must perform computations typically reserved for traditional (offline) HPC applications but produce results in a matter of microseconds. To do so, National Instruments is using the LabVIEW platform to leverage NI's fastest data acquisition and FPGA hardware with NVIDIA's most powerful GPU processors to build a real-time heterogenous simulator.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Big Data Analytics & Data Algorithms, Numerical Algorithms & Libraries, Signal & Audio Processing, GTC 2014 - ID S4682
Streaming:
Download:
 
Delivering Performance in Scientific Simulations: Present and Future Role of GPUs in Supercomputing
Thomas Schulthess (ETH Zurich / CSCS)
GPU-based supercomputers are the most energy efficient and among the most powerful computing systems in use today. We show with examples from computational physics and climate simulations how this performance is delivered today to solve real-world pr ...Read More
GPU-based supercomputers are the most energy efficient and among the most powerful computing systems in use today. We show with examples from computational physics and climate simulations how this performance is delivered today to solve real-world problems. You will see how application software can has been structured in order to port seamlessly across hardware platforms, what aspects of current hybrid CPU-GPU platforms matter, and how such architectures should best develop, so that applications continue to benefit from exponential performance increases in the future.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Numerical Algorithms & Libraries, Computational Physics, Supercomputing, GTC 2014 - ID S4719
Streaming:
Download:
Clusters & GPU Management
Presentation
Media
GASPI/GPI2 for GPUS: A PGAS Framework for Efficient Communication in GPU Systems
Lena Oden (Fraunhofer ITWM)
GPI2 for GPUs is a PGAS framework for efficient communication in heterogeneous clusters. In this session you learn, how multi GPU programs can benefit from an RDMA based programming model. We will introduce the industry proven PGAS-communication libr ...Read More
GPI2 for GPUs is a PGAS framework for efficient communication in heterogeneous clusters. In this session you learn, how multi GPU programs can benefit from an RDMA based programming model. We will introduce the industry proven PGAS-communication library GPI2 and its support for GPUs. GPUDirect RDMA technology allows real one sided communication between multiple GPUs on different nodes. Therefore, an RDMA based programming model suits best for this technology. Due to the very low ommunication overhead of one sided operations, a latency for an inter-node data transfer of 3us can be reached. Still, GPI2 for GPUs is not only optimized for inter-node communication, but also intra-node communication is optimized by combining the different GPU-Direct technologies.   Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2014 - ID S4183
Streaming:
 
Tools and Tips For Managing a GPU Cluster
Adam DeConinck (NVIDIA)
Managing a multi-user heterogeneous HPC cluster can be challenging, but there are ways to make it easier. This session will cover the GPU-aware cluster software stack from the perspective of a system administrator, from driver installation through re ...Read More
Managing a multi-user heterogeneous HPC cluster can be challenging, but there are ways to make it easier. This session will cover the GPU-aware cluster software stack from the perspective of a system administrator, from driver installation through resource manager integration and centrally-managed development tools such as MPI libraries. This will include an overview NVIDIA's tools for GPU management and monitoring, a survey of third-party tools with GPU integration, and a number of "lessons learned" from managing HPC clusters inside NVIDIA.  Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID S4253
Streaming:
Download:
 
GPU-Accelerated Signal Processing in OpenStack
John Paul Walters (USC Information Sciences Institute)
Learn how to deploy both Fermi and Kepler-based GPUs in an OpenStack cloud. In this session we describe the latest HPC features for the OpenStack cloud computing platform, including Kepler and Fermi GPU support, high speed networking, bare metal pro ...Read More
Learn how to deploy both Fermi and Kepler-based GPUs in an OpenStack cloud. In this session we describe the latest HPC features for the OpenStack cloud computing platform, including Kepler and Fermi GPU support, high speed networking, bare metal provisioning, and heterogeneous scheduling. The features are based on OpenStack Grizzly and Havana, with upcoming support for OpenStack Icehouse. Using examples drawn from signal and image processing, we will characterize the performance and versatility of LXC and Xen GPU support for both regular and irregular computations. We'll also characterize the performance improvements due to support for high speed networking in the OpenStack cloud. The session will conclude with a discussion of the next steps in HPC OpenStack development.  Back
 
Keywords:
Clusters & GPU Management, Desktop & Application Virtualization, Signal & Audio Processing, Supercomputing, GTC 2014 - ID S4257
Streaming:
Download:
 
How to Efficiently Virtualize Local and Remote GPUs
Pavan Balaji (Argonne National Laboratory)
In this session, you will get familiar with vACC, a virtual accelerator/GPU library that virtualizes remote and local GPUs installed across a cluster of compute nodes. The main objective is to provide efficient virtualized access to GPUs from any hos ...Read More
In this session, you will get familiar with vACC, a virtual accelerator/GPU library that virtualizes remote and local GPUs installed across a cluster of compute nodes. The main objective is to provide efficient virtualized access to GPUs from any host in the system. GPU virtualization brings new opportunities for effective management of GPU resources by decoupling them from host applications. In addition to access to remote GPUs, the vACC framework offers power-aware physical/virtual accelerator mapping, fault tolerance with transparent migration, efficient integration with virtual machines in Cloud environments and support for both CUDA and OpenCL paradigms. vACC can enable GPU service providers to offer cost-effective, flexible and fault-tolerant access to GPUs in the Cloud. Such capabilities are crucial in facilitating the adoption of GPU-based services across academia and industry. During the session, we will demonstrate how using vACC can improve GPU access experience and maintenance cost in a local cluster or a Cloud.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2014 - ID S4321
Streaming:
 
Accurate Power and Energy Measurements on Kepler-Based Tesla GPUs
Martin Burtscher (Texas State University)
Learn how to correctly profile the power and energy consumption of your kernels using the built-in power sensor of K20 compute GPUs. The measurements do not directly follow the GPU activity but lag behind and are distorted. This can cause large inacc ...Read More
Learn how to correctly profile the power and energy consumption of your kernels using the built-in power sensor of K20 compute GPUs. The measurements do not directly follow the GPU activity but lag behind and are distorted. This can cause large inaccuracies, especially for short running kernels, when taking the power samples at face value. This session explains how to compute the true power and energy consumption and provides general guidelines on how to best profile the power draw of GPU kernels using NVIDIA's Management Library.  Back
 
Keywords:
Clusters & GPU Management, Performance Optimization, GTC 2014 - ID S4454
Streaming:
Download:
 
Design of a Virtualization Framework to Enable GPU Sharing in Cluster Environments
Kittisak Sajjapongse (University of Missouri)
We describe the design of a runtime component that enables the effective use of GPUs in cluster environments. In particular, our system allows:(1) Abstraction of GPUs from end-users; (2) Different GPU sharing and scheduling mechanisms; (3) Virtual me ...Read More
We describe the design of a runtime component that enables the effective use of GPUs in cluster environments. In particular, our system allows:(1) Abstraction of GPUs from end-users; (2) Different GPU sharing and scheduling mechanisms; (3) Virtual memory management; (4) Load balancing and dynamic recovery in case of GPU failure, upgrade and downgrade; (5) Integration with existing cluster-level schedulers and resource managers for CPU clusters.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2014 - ID S4473
Streaming:
Download:
 
Resources Affinity Can Impact Performance: How to Choose Right Affinity?
Matthieu Ospici (Bull)
In modern heterogeneous architectures for the HPC, several computing resources (CPU, accelerators) and I/O resources (InfiniBand cards, PCIe links, QPI links) should be used simultaneously to take the best of the hardware. This observation is even mo ...Read More
In modern heterogeneous architectures for the HPC, several computing resources (CPU, accelerators) and I/O resources (InfiniBand cards, PCIe links, QPI links) should be used simultaneously to take the best of the hardware. This observation is even more true with the rising of technologies such as GPU Direct RDMA, able to perform communications directly between GPUs and Infiniband links. In this context, resources affinity (i.e resources selection and processes placement) can have a strong impact on performance. Thus, the aim of the presentation is to, firstly, identify the main affinity issues that can occur in current heterogeneous architectures (i.e which CPU core to choose when a particular GPU is used? Which IB interface to chose when a GPU direct RDMA transfer is launched?). We will show visible impact on performance. Then, we propose solutions to handle these issues. We think that affinity selection should be managed globally at the cluster resource manager level (with SLURM in our work), and not by the HPC programmers.  Back
 
Keywords:
Clusters & GPU Management, Supercomputing, GTC 2014 - ID S4491
Streaming:
Download:
 
OpenMPI with RDMA Support and CUDA
Rolf VandeVaart (NVIDIA)
Open MPI is an open source implementation of the Message Passing Interface (MPI) library used to support parallel applications. With GPUs being used more and more in large clusters, there has been work done to make CUDA and MPI work seamlessly togeth ...Read More
Open MPI is an open source implementation of the Message Passing Interface (MPI) library used to support parallel applications. With GPUs being used more and more in large clusters, there has been work done to make CUDA and MPI work seamlessly together. In this talk, we will cover new features added to the library to support sending and receiving of GPU buffers directly.   Back
 
Keywords:
Clusters & GPU Management, GTC 2014 - ID S4589
Streaming:
Download:
 
Citrix 3D Engineering Cloud: A Practical Approach (Presented by IBM)
Bret Bailey (IBM)
In today's fast changing business environment, companies are looking for ways to deliver better designs faster and cheaper while creating high quality products across an ecosystem of partners. To succeed, a company must transform its design processe ...Read More
In today's fast changing business environment, companies are looking for ways to deliver better designs faster and cheaper while creating high quality products across an ecosystem of partners. To succeed, a company must transform its design processes by converting engineering silos into shared engineering clouds that improve collaboration, standardize processes and create a secure environment for sharing designs across operations and organizations including partners and suppliers. The 3D Engineering Cloud Solution is a high performance visual computing environment for organizations that have large 3D intensive graphics requirements and want to improve collaboration while protecting their assets and reducing costs. The 3D Engineering Cloud Solution is made possible due to a partnership between IBM, Citrix, and NVIDIA. This combination creates a unique 3D engineering environment in the Cloud.  Back
 
Keywords:
Clusters & GPU Management, Graphics Virtualization, Computer Aided Design, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4849
Streaming:
Download:
Collaborative & Large Resolution Displays
Presentation
Media
See the Big Picture: Scalable Visualization Solutions for High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. A ...Read More
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.  Back
 
Keywords:
Collaborative & Large Resolution Displays, GTC 2014 - ID S4671
Streaming:
Download:
 
Mid-Tier VR: Cost Reducing the Cave by Embracing the GPU
Rajeev Surati (Scalable Display Technologies), Bei Yang (Walt Disney Imagineering)
We describe how to put together vr caves that used to cost 250k for a whole lot less using NVIDIA NVAPI, provide case studies, pictures, and diagrams of how to go about it. We believe that an substantial expansion in the VR market is occurring and th ...Read More
We describe how to put together vr caves that used to cost 250k for a whole lot less using NVIDIA NVAPI, provide case studies, pictures, and diagrams of how to go about it. We believe that an substantial expansion in the VR market is occurring and that these kinds of systems will become more commonplace and the market expands both by more effectively using the Quadro Cards in the System and use of the Warp and blend apis.   Back
 
Keywords:
Collaborative & Large Resolution Displays, Virtual & Augmented Reality, Digital Product Design & Styling, GTC 2014 - ID S4452
Streaming:
Download:
 
Stereo3d Video Streaming for Remote Collaboration
Julien Berta (Mechdyne)
Learn how Mechdyne leverages video compression and streaming to create remote collaboration solutions. Connecting CAVEs, Powerwalls and other ultra-resolution displays to enable multi-site, multi-display sharing and decision making. We will explore m ...Read More
Learn how Mechdyne leverages video compression and streaming to create remote collaboration solutions. Connecting CAVEs, Powerwalls and other ultra-resolution displays to enable multi-site, multi-display sharing and decision making. We will explore multiple customer use-cases: immersive-to-immersive, desktop-to-immersive, immersive-to-desktop, monoscopic and stereoscopic.  Back
 
Keywords:
Collaborative & Large Resolution Displays, Virtual & Augmented Reality, Remote Graphics & Cloud-Based Graphics, Video & Image Processing, GTC 2014 - ID S4631
Streaming:
Download:
Combined Simulation & Real-Time Visualization
Presentation
Media
Live, Interactive, In-Situ, In-GPU Visualization of Plasma Simulations Running on GPU Supercomputers
Richard Pausch (Helmholtz-Zentrum Dresden - Rossendorf), Guido Juckeland (ZIH, Technical University Dresden)
With GPUs large-scale plasma simulations can provide frames-per-second simulation speeds. We present interactive, in-GPU rendering of large-scale particle-in-cell simulations running on GPU clusters. The user can choose which data is visualized and c ...Read More
With GPUs large-scale plasma simulations can provide frames-per-second simulation speeds. We present interactive, in-GPU rendering of large-scale particle-in-cell simulations running on GPU clusters. The user can choose which data is visualized and change the direction of view while the simulation is running. A remote visualization client can connect to the running simulation, allowing for live visualization even when bandwidth is limited.  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Large Scale Data Visualization & In-Situ Graphics, Remote Graphics & Cloud-Based Graphics, Scientific Visualization, GTC 2014 - ID S4140
Streaming:
Download:
 
Interactive Sandbox: Modelling and Visualization of Nature Phenomena on Hand-Made Landscapes
Maxim Rud (Tomsk Polytechnic University)
Create your own world and use the power of GPU programming to visualize it. Build a unique landscape by your hands with help of device called "Interactive sandbox" and study real-time modelled and realistically visualized natural phenomena, ...Read More
Create your own world and use the power of GPU programming to visualize it. Build a unique landscape by your hands with help of device called "Interactive sandbox" and study real-time modelled and realistically visualized natural phenomena, such as volcanic eruptions, floods, weather and seasons changing. You will learn about using GPUs to increase performance of modelling and visualization, find out, how to implement real-time simulation of fluid flow over varying bottom topography and also discover an efficient and fast method of Microsoft Kinect data filtering.  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Virtual & Augmented Reality, Computational Fluid Dynamics, Real-Time Graphics Applications, GTC 2014 - ID S4269
Streaming:
Download:
 
Real-Time Physically-Based Deformable Objects Simulation Using OpenGL 4.x and GLSL.
Lazaro E. Lesmes Leon (Center of Research in Mathematics (CIMAT))
In this presentation, some recent features of OpenGL for GPGPU programming are presented through the implementation of a real-time physically-based deformable objects simulation application. Atomic operations and Barriers objects are employed success ...Read More
In this presentation, some recent features of OpenGL for GPGPU programming are presented through the implementation of a real-time physically-based deformable objects simulation application. Atomic operations and Barriers objects are employed successfully to manage the threads execution in order to get correct simulation results. The performance achieved for solving numerical simulations overs-speed more than ten times a CPU implementation. This work considers solid objects represented as tetrahedral meshes, and results for huge meshes running at interactive rate will be shown.  Back
 
Keywords:
Combined Simulation & Real-Time Visualization, Real-Time Graphics Applications, GTC 2014 - ID S4276
Streaming:
Computational Fluid Dynamics
Presentation
Media
Fast Fixed-Radius Nearest Neighbor Search on the GPU: Interactive Million-Particle Fluids
Rama Hoetzlein (NVIDIA)
Nearest neighbor search is the key to efficient simulation of many discrete physical models. This talk focuses on a novel, efficient fixed-radius NNS by introducing counting sort accelerated with atomic GPU operations which require only two kernel ca ...Read More
Nearest neighbor search is the key to efficient simulation of many discrete physical models. This talk focuses on a novel, efficient fixed-radius NNS by introducing counting sort accelerated with atomic GPU operations which require only two kernel calls. As a sample application, fluid simulations based on smooth particles hydrodynamics (SPH) make use of NNS to determine interacting fluid particles. The Counting-sort NNS method achieves a performance gain of 3-5x over previous Radix-sort NNS, which allows for interactive SPH fluids of 4 million particles at 4 fps on current hardware. The technique presented is generic and easily adapted to other domains, such as molecular interactions or point cloud reconstructions.   Back
 
Keywords:
Computational Fluid Dynamics, Numerical Algorithms & Libraries, Performance Optimization, Molecular Dynamics, GTC 2014 - ID S4117
Streaming:
Download:
 
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
Peter Vincent (Imperial College London)
Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is par ...Read More
Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is particularly well-suited to many-core architectures, (ii) introduce our massively parallel implementation, PyFR, which through a combination of symbolic manipulation and run-time code generation is able to easily target NVIDIA GPU hardware and, (iii) showcase some of the high-fidelity, unsteady, simulations undertaken using PyFR on both desktop and HPC systems.  Back
 
Keywords:
Computational Fluid Dynamics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4250
Streaming:
Download:
 
Quickly Applying GPU Acceleration to Barracuda: An MP-PIC CAE Software
Andrew Larson (CPFD Software)
Learn about the challenges and possibilities of applying CUDA to a Multi-Phase Particle-In-Cell code base through (1) An applied approach to parallelizing Barracuda VR, a CAE MP-PIC code, (2) Achieved speed-ups of operation types specific to MP-PIC c ...Read More
Learn about the challenges and possibilities of applying CUDA to a Multi-Phase Particle-In-Cell code base through (1) An applied approach to parallelizing Barracuda VR, a CAE MP-PIC code, (2) Achieved speed-ups of operation types specific to MP-PIC codes (in double-precision), (3) Focused discussion on the crux of MP-PIC, i.e. mapping Lagrangian data to the Eulerian grid and (4) Demonstrated speed-up and future expectations.  Back
 
Keywords:
Computational Fluid Dynamics, GTC 2014 - ID S4417
Streaming:
Download:
 
Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs
Christopher Stone (Computational Science and Engineering, LLC)
Explore the latest techniques for accelerating combustion simulations with finite-rate chemical kinetics using GPUs. In this session we will compare the performance of different numerical methods for solving stiff and non-stiff ODEs and discuss the c ...Read More
Explore the latest techniques for accelerating combustion simulations with finite-rate chemical kinetics using GPUs. In this session we will compare the performance of different numerical methods for solving stiff and non-stiff ODEs and discuss the compromises that must be made between parallel throughput and numerical efficiency. Learn techniques used to (1) manage variable integration costs across the concurrent ODEs and (2) reduce thread divergence caused by non-linear iterative solvers.  Back
 
Keywords:
Computational Fluid Dynamics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing, GTC 2014 - ID S4418
Streaming:
Download:
 
Harnessing GPUs to Overcome Conventional Fluid-Particle Interaction Simulation Limitations
Adam Sierakowski (The Johns Hopkins University)
Are you interested in decreasing the runtime of your 24-hour flow simulation to nine minutes? This is the story of how GPUs achieved a 150-time speedup and made Physalis into a viable computational tool for investigating the behavior of large fluid-p ...Read More
Are you interested in decreasing the runtime of your 24-hour flow simulation to nine minutes? This is the story of how GPUs achieved a 150-time speedup and made Physalis into a viable computational tool for investigating the behavior of large fluid-particle systems. The Physalis method is the only known means of applying near-perfect boundary conditions to spherical particles in a coarse Cartesian finite-difference flow solver, but it suffers from a debilitating computational requirement. GPU technology enables us to overcome this limitation so we can investigate the underlying physics behind natural phenomena like dust storms and energy-generation technologies such as fluidized bed reactors. We will discuss concepts of the design of a GPU finite-difference incompressible Navier-Stokes flow solver, introduce the algorithm behind the Physalis method, and evaluate the current and future capabilities of this GPU fluid-particle interaction code.  Back
 
Keywords:
Computational Fluid Dynamics, Numerical Algorithms & Libraries, GTC 2014 - ID S4544
Streaming:
Download:
 
Unstructured Grid CFD Kernels for Gas Turbine Design
Tobias Brandvik (University of Cambridge)
Learn about a new approach to developing large-scale Computational Fluid Dynamics (CFD) software for parallel processors such as GPUs. The session focuses on two topics: (1) the use of automatic source code generation for CFD kernels on unstructured ...Read More
Learn about a new approach to developing large-scale Computational Fluid Dynamics (CFD) software for parallel processors such as GPUs. The session focuses on two topics: (1) the use of automatic source code generation for CFD kernels on unstructured grids to achieve close to optimal performance while maintaining code readability, and (2) case studies of advanced gas turbine simulations on clusters with 100s of GPUs.   Back
 
Keywords:
Computational Fluid Dynamics, Computer Aided Design, Supercomputing, GTC 2014 - ID S4594
Streaming:
 
GPU Acceleration of CFD in Industrial Applications Based on OpenFOAM
Bjoern Landmann (FluiDyna GmbH)
CFD calculations in an industrial context prioritize fast turn-around times - a requirement that can be addressed by porting parts of the CFD calculation to the GPU, leading to a hybrid CPU/GPU approach. In a first step, the GPU library Culises has b ...Read More
CFD calculations in an industrial context prioritize fast turn-around times - a requirement that can be addressed by porting parts of the CFD calculation to the GPU, leading to a hybrid CPU/GPU approach. In a first step, the GPU library Culises has been developed, allowing the GPU-based solution of large-scale linear systems of equations that are in turn set up by MPI-parallelized CFD codes (e.g. OpenFOAM) on CPU. In this session we will address a second step, which consists in porting the construction of the linear system to the GPU as well, while pre- and post-processing remain on the CPU. Aiming for industrial applications in the automotive sector, the approach will be aligned on the simpleFOAM solver of OpenFOAM. As the set up of the linear system consumes up to 40-50% of computational time in typical cases of the automotive industry, this approach can further increase the acceleration of CFD computations.   Back
 
Keywords:
Computational Fluid Dynamics, Automotive, Manufacturing, GTC 2014 - ID S4598
Streaming:
 
PyFR: Technical Challenges of Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
Freddie Witherden (Imperial College London)
Learn how to develop efficient highly-scalable GPU codes faster through use of the Python programming language. In this talk I will describe our accelerated massively parallel computational fluid dynamics (CFD) code, PyFR, and outline some of the te ...Read More
Learn how to develop efficient highly-scalable GPU codes faster through use of the Python programming language. In this talk I will describe our accelerated massively parallel computational fluid dynamics (CFD) code, PyFR, and outline some of the techniques employed to reduce development time and enhance performance. Specifically, it will be shown how even complex algorithms - such as those employed for performing CFD on unstructured grids - can be constructed in terms of efficient matrix-matrix multiplications. Moreover, general advice will be given on how best to integrate CUDA and MPI. Furthermore, I will demonstrate how Python can be used both to simplify development and bring techniques such as run-time kernel generation to the mainstream. Examples of these techniques, as utilized in PyFR, will be given throughout.  Back
 
Keywords:
Computational Fluid Dynamics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4649
Streaming:
Download:
 
Acceleration of Multi-Grid Linear Solver Inside ANSYS FLUENT Using AmgX
Sunil Sathe (ANSYS Inc)
The solution of the linear equation systems arising from discretization of flow equations can be a major time consuming portion of a flow simulation. In the context of ANSYS FLUENT flow solver, especially when using the coupled solver, the linear sol ...Read More
The solution of the linear equation systems arising from discretization of flow equations can be a major time consuming portion of a flow simulation. In the context of ANSYS FLUENT flow solver, especially when using the coupled solver, the linear solver takes a major chunk of the simulation time. In order to improve performance and also to let user take advantage of the available GPU hardware, we provide a mechanism in ANSYS FLUENT to off load the linear solver on to a GPU using NVIDIA's multi-grid AMG solver . In this talk we present a top level view of the architectural design of integrating the AmgX solver into ANSYS FLUENT. We also present some preliminary performance results obtained from our first offering of AmgX inside ANSYS FLUENT release 15.0.   Back
 
Keywords:
Computational Fluid Dynamics, GTC 2014 - ID S4672
Streaming:
Download:
 
Simulation Really Does Imitate Life: Modeling a Human Heart Valve and other FSI Applications with GPU Technology
Wayne Mindle (CertaSIM, LLC)
Fluid Structure interaction is one of the most challenging areas for numerical simulations. By itself modeling Fluid flow is complicated enough but to add the interaction with a deformable structure makes it even more challenging. One particular th ...Read More
Fluid Structure interaction is one of the most challenging areas for numerical simulations. By itself modeling Fluid flow is complicated enough but to add the interaction with a deformable structure makes it even more challenging. One particular theory, SPH, is especially suited for GPU processing. SPH stands for Smooth Particle Hydrodynamics and it is a particle based Lagrangian continuum method which can run completely on the GPU. Improvements in the classic SPH Solver has led to an extremely accurate and robust solver that can better capture the pressure field for violent water impacts. FSI means fluid structure interaction and so to solve complicated problems an equally robust and accurate finite element solver needs to be part of the coupled solution. One particular application is modelling a Real Human Heart Valve, something that has not been done until now. Results using the latest NVIDIA GPU, the K40, will be shown for Heart Valve model along with other FSI applications.  Back
 
Keywords:
Computational Fluid Dynamics, Computational Structural Mechanics, GTC 2014 - ID S4762
Streaming:
Download:
 
Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs
Matthew McNenly (Lawrence Livermore National Laboratory)
There is a growing need in internal combustion (IC) engine design to resolve the complicated combustion kinetics in simulations. Without more predictive simulation tools in the design cycle, the cost of development will consume new concepts as i ...Read More

There is a growing need in internal combustion (IC) engine design to resolve the complicated combustion kinetics in simulations. Without more predictive simulation tools in the design cycle, the cost of development will consume new concepts as it becomes harder to meet the performance and emission targets of the future. The combustion kinetics of real transportation fuels involve thousands of components - each that can react through thousands of intermediate species and tens of thousands of reaction paths. GPUs show promise delivering more physical accuracy (per $) to the IC engine design process. Specifically, GPU acceleration of nearly a factor of ten is demonstrated for the integration of multiple chemical source terms in a reacting fluid dynamics simulation. This speedup is achieved by reorganizing the thermodynamics and chemical reaction functions and by updating the sparse matrix functions using NVIDIA's latest GLU library.

  Back
 
Keywords:
Computational Fluid Dynamics, Automotive, GTC 2014 - ID S4881
Streaming:
Download:
Computational Physics
Presentation
Media
Efficient Computation of Radial Distribution Function on GPUs: Algorithm Design and Optimization
Yicheng Tu (University of South Florida), Anand Kumar (University of South Florida)
The radial distribution function (RDF) is a fundamental tool in validation and analysis of particle simulation data. Computation of RDF is a very time expensive process. It may take days or even months to process moderate size data points (millions) ...Read More
The radial distribution function (RDF) is a fundamental tool in validation and analysis of particle simulation data. Computation of RDF is a very time expensive process. It may take days or even months to process moderate size data points (millions) on CPU. We present an efficient technique to compute RDF on GPUs, which takes advantage of shared memory, registers, and special instructions. Recent GPU architectures support shuffle instruction that can be used to share data between threads, via registers. We exploit these features of the new architecture to improve performance of the RDF algorithm. Further, we present benefits of using different GPU optimization techniques to improve the performance. Effect of algorithm behavior on the speedup is also presented in detail with the help of examples.   Back
 
Keywords:
Computational Physics, Big Data Analytics & Data Algorithms, Molecular Dynamics, GTC 2014 - ID S4149
Streaming:
 
Solution of Discrete Ordinates Transport Equations on GPU
Peng Wang (NVIDIA)
Learn how to port multi-dimensional multi-group discrete ordinate neutron transport equations code SNAP to GPU clusters. It will show that GPU is a good fit for this class of applications. GPU enables both faster throughput at small scale and better ...Read More
Learn how to port multi-dimensional multi-group discrete ordinate neutron transport equations code SNAP to GPU clusters. It will show that GPU is a good fit for this class of applications. GPU enables both faster throughput at small scale and better scalability at large scale. The porting strategy and performance model on GPU will be described.  Back
 
Keywords:
Computational Physics, GTC 2014 - ID S4164
Streaming:
 
Breaking Through Serial Barriers: Scalable Hard Particle Monte Carlo Simulations with HOOMD-Blue
Joshua Anderson (University of Michigan)
Learn how to scale Metropolis Monte Carlo simulations of hard particles to many GPUs. Prior codes run only in serial on the CPU, limiting researchers' abilities to study complex systems. We implement Monte Carlo for arbitrary hard shapes in HOOMD-bl ...Read More
Learn how to scale Metropolis Monte Carlo simulations of hard particles to many GPUs. Prior codes run only in serial on the CPU, limiting researchers' abilities to study complex systems. We implement Monte Carlo for arbitrary hard shapes in HOOMD-blue, a GPU-accelerated particle simulation tool, to enable million particle simulations in a field where thousands is the norm. In this talk, we present the basic parallel algorithms, optimizations that maximize GPU performance, and communication patterns for scaling to multiple GPUs. Research applications include finding densest packings, self-assembly studies, and other uses in materials design, biological aggregation, and operations research.  Back
 
Keywords:
Computational Physics, GTC 2014 - ID S4166
Streaming:
Download:
 
GPU Neutron Transport: Simulating Nuclear Reactions One Neutron at a Time
Tony Scudiero (NVIDIA)
Monte Carlo neutron transport is an approach to simulating radiation transport and nuclear reaction physics by simulating the individual lifespans of many millions of unbound neutrons. OpenMC is a recently developed Monte Carlo neutron transport appl ...Read More
Monte Carlo neutron transport is an approach to simulating radiation transport and nuclear reaction physics by simulating the individual lifespans of many millions of unbound neutrons. OpenMC is a recently developed Monte Carlo neutron transport application intended to allow future reactor designer to leverage extremely low-level simulation of new reactors years before they are built. The presenter, Tony Scudiero, has adapted OpenMC from its original incarnation as 27k lines of single-threaded Fortran 90 to a parallel CUDA C/C++ implementation optimized for the GPU. This talk covers computational considerations of Monte Carlo neutron transport, the design and process of porting OpenMC to CUDA, and the results and lessons learned in the process. Along with OpenMC, its miniapp benchmark XSBench will be discussed.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4170
Streaming:
Download:
 
Increasing Mass Spectrometer Sensitivity 20x with Real-Time GPU Processing
Evan Hauck (LECO Corporation)
The monitoring of waste water for dioxins is important because these compounds are extremely toxic. One possible way to detect dioxins is by an analytical instrument called a Gas Chromatograph / Mass Spectrometer. This session summarizes our research ...Read More
The monitoring of waste water for dioxins is important because these compounds are extremely toxic. One possible way to detect dioxins is by an analytical instrument called a Gas Chromatograph / Mass Spectrometer. This session summarizes our research aimed at increasing the sensitivity of a commercially available time-of-flight mass spectrometer without sacrificing resolution, mass range, or acquisition rate. In brief, we configured the mass spectrometer to pulse ions into the flight tube 20 times faster than originally designed, causing more ions to strike the detector per unit time, increasing sensitivity. However, because lighter, faster ions from one pulse overtake heaver ions from a previous pulse, the resulting mass spectra are severely intertwined, or multiplexed. Our work included developing a demultiplexing algorithm, which computes the theoretical source spectrum from the multiplexed data. Because the instrument generates 1.2GB/s, we designed and coded all algorithms for execution on a GTX Titan.  Back
 
Keywords:
Computational Physics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4245
Streaming:
Download:
 
Massively Parallel Earthquake Simulations on GPUs
Prasenjit Sengupta (Optimal Synthesis)
This presentation describes an R&D effort sponsored by NASA's Earth Sciences Division, that involves GPU implementation of a topologically-realistic numerical simulation of earthquakes occurring on the fault systems of California. The computatio ...Read More
This presentation describes an R&D effort sponsored by NASA's Earth Sciences Division, that involves GPU implementation of a topologically-realistic numerical simulation of earthquakes occurring on the fault systems of California. The computationally intensive modules include (i) Generation of a large-scale stress influence matrix (Green's functions) from fault element data, and (ii) Calculation of stress from the strain vector using large-scale matrix vector multiply, during the rupture propagation phase. Identification of the computational bottlenecks, CUDA code implementation and various code optimizations that led to a 45x speedup over a multi-core CPU implementation, for a 30,000-year earthquake simulation, will be discussed.   Back
 
Keywords:
Computational Physics, GTC 2014 - ID S4258
Streaming:
Download:
 
Optimization of a CUDA-based Monte Carlo Code for Radiation Therapy
Nick Henderson (Stanford University, Institute for Computational and Mathematical Engineering)
Learn about optimization efforts in G4CU, a CUDA Monte Carlo code for radiation therapy. G4CU is based on the core algorithm and physics processes in Geant4, a toolkit for simulating particles traveling through and interacting with matter. The techni ...Read More
Learn about optimization efforts in G4CU, a CUDA Monte Carlo code for radiation therapy. G4CU is based on the core algorithm and physics processes in Geant4, a toolkit for simulating particles traveling through and interacting with matter. The techniques covered will include the use of texture references for look-up tables, device configuration for different simulation components, and scheduling of work for different particle types.  Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Medical Imaging & Visualization, GTC 2014 - ID S4259
Streaming:
Download:
 
Crash, Boom, Bang! Leveraging Game Physics and Graphics APIs for Scientific Computing
Peter Messmer (NVIDIA)
In this talk, you will learn how to use the game and visualization wizard's tool chest to accelerate your scientific computing applications. NVIDIA's game physics engine PhysX and the ray tracing framework OptiX offer a wealth of functionality ofte ...Read More
In this talk, you will learn how to use the game and visualization wizard's tool chest to accelerate your scientific computing applications. NVIDIA's game physics engine PhysX and the ray tracing framework OptiX offer a wealth of functionality often needed in scientific computing application. However, due to the different target audiences, these frameworks are generally not very well known to the scientific computing communities. High-frequency electromagnetic simulations, particle simulations in complex geometries, or discrete element simulations are all examples of applications that could immediately benefit from these frameworks. Based on examples, we will talk about the basic concepts of these frameworks, introduce their strengths and their approximation, and how to take advantage of them from within a scientific application.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4260
Streaming:
Download:
 
Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems
Elmar Westphal (Forschungszentrum Julich GmbH)
See how subdividing and preprocessing static parts of your simulation system beyond the obvious can significantly increase your performance. As an example we use our Micromagnetism simulator TetraMag whose solvers for systems of linear equations and ...Read More
See how subdividing and preprocessing static parts of your simulation system beyond the obvious can significantly increase your performance. As an example we use our Micromagnetism simulator TetraMag whose solvers for systems of linear equations and field calculation parts rely heavily on sparse matrix - vector multiplications. The sizes of the involved matrices for large-scale simulations often outgrow the memory capacity of a single GPU. In our case, these matrices are constant over a program run, which can mean millions of iterations. This talk will show how analyZing, reordering and splitting our original matrices in a checkerboard style enables us to reduce expensive data transfers between GPUs and helps to reduce transfer overhead through fine grained streaming.  Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, GTC 2014 - ID S4283
Streaming:
Download:
 
Lanczos Algorithm Using CUDA for Lattice QCD
Hyung-Jin Kim (Brookhaven National Laboratory)
Getting an eigenvalue and eigenvector set of inversion matrix is the key point of accelerating the matrix inversion and the Lanczos algorithm is one of the well-known methods for this problem. But this routine is highly dominated by data access IO so ...Read More
Getting an eigenvalue and eigenvector set of inversion matrix is the key point of accelerating the matrix inversion and the Lanczos algorithm is one of the well-known methods for this problem. But this routine is highly dominated by data access IO so it can be another bottleneck in the whole sequence. Even though the FLOPS/Bandwidth ratio of GPU is not good enough, GPU still has an advantage in memory bandwidth compared with that of CPU. We are implementing the Lanczos algorithm based on CUDA and will show preliminary performance result on multi GPU clusters.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4311
Streaming:
 
PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations
Vadim Demchik (Dnepropetrovsk National University)
Learn how to construct easily Monte Carlo procedures on GPUs with a new open-source OpenCL library of pseudo-random number generators (PRNG) - PRNGCL. We will introduce our OpenCL implementation of most popular uniform PRNGs and briefly discuss the g ...Read More
Learn how to construct easily Monte Carlo procedures on GPUs with a new open-source OpenCL library of pseudo-random number generators (PRNG) - PRNGCL. We will introduce our OpenCL implementation of most popular uniform PRNGs and briefly discuss the general techniques of PRN generation on GPUs. The performance comparison of existing PRNG libraries with PRNGCL will be provided. Some examples of PRNGCL library application for high-energy physics lattice simulations will be given.  Back
 
Keywords:
Computational Physics, GTC 2014 - ID S4313
Streaming:
Download:
 
Hierarchical Algorithms on Heterogeneous Architectures: Adaptive Multigrid Solvers for LQCD on GPUs
M Clark (NVIDIA)
Graphics Processing Units (GPUs) are an increasingly popular platform upon which to deploy lattice quantum chromodynamics calculations. While there has been much progress to date in developing solver algorithms to improve strong scaling on such plat ...Read More
Graphics Processing Units (GPUs) are an increasingly popular platform upon which to deploy lattice quantum chromodynamics calculations. While there has been much progress to date in developing solver algorithms to improve strong scaling on such platforms, there has been less focus on deploying 'mathematically optimal' algorithms. A good example of this are hierarchical solver algorithms such as adaptive multigrid, which are known to solve the Dirac operator with optimal O(N) complexity. We describe progress to date in deploying adaptive multigrid solver algorithms to NVIDIA GPU architectures and discuss in general the suitability of heterogeneous architectures for hierarchical algorithms.  Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, GTC 2014 - ID S4327
Streaming:
Download:
 
Does Antimatter Fall On The Earth? Measurement Of Antimatter Annihilation with GPUs
Akitaka Ariga (University of Bern)
One of the most important unanswered questions in physics is: Does antimatter fall in the same way as matter? At the European Organization for Nuclear Research (CERN, Geneva) the AEgIS experiment is underway to measure the gravitational force on anti ...Read More
One of the most important unanswered questions in physics is: Does antimatter fall in the same way as matter? At the European Organization for Nuclear Research (CERN, Geneva) the AEgIS experiment is underway to measure the gravitational force on antimatter and has to reach a nanometric precision in determining the free-fall of antimatter. In particular, the 3D reconstruction of particle tracks produced in matter - antimatter annihilations requires a huge amount of computing resources, that is a processing of tomographic images of 30 TByte per day. In this talk, the application of GPUs for the 3D tracking of particles in photo-emulsion detectors will be reported.  Back
 
Keywords:
Computational Physics, Astronomy & Astrophysics, GTC 2014 - ID S4372
Streaming:
 
GPU-Based Lattice QCD Simulations as Thermometer for Heavy-Ion Collisions
Mathias Wagner (Bielefeld University & Indiana University)
See how advances in GPU Computing enable us to simulate Quantum Chromodynamics and learn about fundamental properties of strongly interacting matter i.e., quarks and gluons at finite temperatures. With the advances in hardware and algorithms these si ...Read More
See how advances in GPU Computing enable us to simulate Quantum Chromodynamics and learn about fundamental properties of strongly interacting matter i.e., quarks and gluons at finite temperatures. With the advances in hardware and algorithms these simulations have reached a level that allows for a quantitative comparison with experimental data from heavy-ion colliders. Discover how the Kepler architecture helps us to boost the performance of the simulations and reach new level of precision. I will discuss selected optimizations for the Kepler K20 cards and modifications to prepare the code for the Titan supercomputer. Furthermore I compare and discuss pros and cons of our in-house in comparison to available libraries like the QUDA library.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4453
Streaming:
Download:
 
GooFit: Massively parallel likelihood fitting using GPUs
Rolf Andreassen (University of Cincinnati)
We present the GooFit maximum likelihood fit framework which has been develop to run effectively on general purpose graphics processing units (GPUs) to enable next generation experimental high energy physics (HEP) research. Most analyses of data from ...Read More
We present the GooFit maximum likelihood fit framework which has been develop to run effectively on general purpose graphics processing units (GPUs) to enable next generation experimental high energy physics (HEP) research. Most analyses of data from HEP experiments use maximum likelihood fits. Some of today's analyses use fits which require more than 24 hours on traditional multi-core systems. The next generation of experiments will require computing power two orders of magnitude greater for analyses which are sensitive to New Physics. Our GooFit framework, which has been demonstrated to run on nVidia GPU devices ranging from high end Teslas to laptop GeForce GTs, uses CUDA and the Thrust library to massively parallelize the per-event probability calculation. For realistic physics fits we achieve speedups, relative to executing the same algorithm on a single CPU, of several hundred.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4488
Streaming:
Download:
 
Moving Biophysics to the GPU Cloud for Studying Energy-Transfer in Photosynthesis
Tobias Kramer (Department of Physics, Humboldt-University Berlin, Germany)
We discuss the CUDA and OpenCL implementation of the hierarchical equations of motions (GPU-HEOM) method for tracking quantum-mechanical effects in photosynthesis. The hierarchy of coupled equations yields the time-evolution of the density matrix of ...Read More
We discuss the CUDA and OpenCL implementation of the hierarchical equations of motions (GPU-HEOM) method for tracking quantum-mechanical effects in photosynthesis. The hierarchy of coupled equations yields the time-evolution of the density matrix of a photosynthetic network and is efficiently mapped to the GPU architecture by assigning one thread to each hierarchy member, while storing time-independent information in constant memory. This makes the GPU architecture the optimal choice compared to conventional pthread-based parallelization schemes suffering from higher thread latency and allows one to connect theoretical simulations directly with experimental images of the energy-flow in photosynthesis. It answers the outstanding questions in the field: why is transport in photosynthesis so efficient and how to design artificial devices? The ready-to-run GPU-HEOM tool is installed on the publicly accessible nanoHUB platform where user share data and sessions while performing computations on the connected NVIDIA M2090 GPU cluster.  Back
 
Keywords:
Computational Physics, Quantum Chemistry, Desktop & Application Virtualization, GTC 2014 - ID S4490
Streaming:
Download:
 
Plasma Turbulence Simulations: Porting Gyrokinetic Tokamak Solver to GPU Using CUDA
Praveen Narayanan (NVIDIA)
The porting process of a large scale Particle-In-Cell Solver (GTS) to the GPU using CUDA is described. We present weak scaling results run at scale on Titan which show a speed up of 3-4x for the entire solver. Starting from a performance analysis of ...Read More
The porting process of a large scale Particle-In-Cell Solver (GTS) to the GPU using CUDA is described. We present weak scaling results run at scale on Titan which show a speed up of 3-4x for the entire solver. Starting from a performance analysis of computational kernels, we systematically proceed to eliminating the most significant bottlenecks in the code - in this case, the PUSH step, which constitutes the 'gather' portion of the gather-scatter algorithm that characterizes this PIC code. Points that we think might be instructive to developers include: (1) using the PGI CUDA Fortran infrastructure to interface between CUDA C and Fortran; (2) memory optimizations - creation of a device memory pool, and pinned memory; (3) a demonstration of how communication causes performance degradation at scale, with implications on shifter performance in general PIC solvers, and why we need algorithms that handle communication in particle shifters more effectively; (4) Use of textures and LDG for irregular memory accesses.   Back
 
Keywords:
Computational Physics, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4495
Streaming:
 
Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction
Andreas Herten (Forschungszentrum Julich GmbH)
PANDA is a next generation particle physics experiment involving a novel data acquisition mechanism. Commonly, particle physics experiments read out the full detector response of particle collisions only when a fast hardware-level trigger fires. In c ...Read More
PANDA is a next generation particle physics experiment involving a novel data acquisition mechanism. Commonly, particle physics experiments read out the full detector response of particle collisions only when a fast hardware-level trigger fires. In contrast to this, PANDA uses a sophisticated event filtering scheme which involves reconstruction of the whole incoming data stream in real time (online) to distinguish signal from background events. At a rate of about 20 million events per second, a massive amount of computing power is needed in order to sufficiently reduce the incoming data rate of 100 GB/s to 2 PB/year for permanent storage. We explore the feasibility of using GPUs for this task. This talk outlines the challenges PANDA faces with data acquisition and presents the status of the GPU investigations. Different reconstruction (tracking) algorithms running on NVIDIA GPUs are shown and their features and performances highlighted.  Back
 
Keywords:
Computational Physics, GTC 2014 - ID S4499
Streaming:
Download:
 
Accelerating Particle-Mesh Interaction for Particle-in-Cell Simulation
Alberto Madonna (University of Padova)
We present an extremely innovative GPU implementation of a Particle-in-Cell code for plasma dynamics simulation on 3-D unstructured grids. Starting from a proven codebase, we integrate solutions and ideas coming from a thorough study of the state-of- ...Read More
We present an extremely innovative GPU implementation of a Particle-in-Cell code for plasma dynamics simulation on 3-D unstructured grids. Starting from a proven codebase, we integrate solutions and ideas coming from a thorough study of the state-of-the-art in parallel plasma simulation and other fields, adding some original contributions in areas such as workload management, particle ordering and domain decomposition. The result is a novel, flexible simulation pipeline, capable of performing more than an order of magnitude faster than the CPU implementation it originates from, while still presenting exciting opportunities for future developments. Moreover, all the concepts presented are applicable not only to Particle-in-Cell simulation, but in general to any simulation relying on the interaction between lagrangian particles and a spatial grid.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, GTC 2014 - ID S4500
Streaming:
Download:
 
In-Silico Optimization of DNA-Based Light-Harvesting Antennas for Biomimetic Photosynthesis
Mark Bathe (MIT)
Programmed self-assembly of nucleic acids offers the unique opportunity to engineer geometrically complex megadalton-scale macromolecular architectures with atomic-level accuracy. The sequence specificity of DNA renders these nanoassemblies as spatia ...Read More
Programmed self-assembly of nucleic acids offers the unique opportunity to engineer geometrically complex megadalton-scale macromolecular architectures with atomic-level accuracy. The sequence specificity of DNA renders these nanoassemblies as spatially addressable structural scaffolds to host secondary molecules including light-harvesting dyes and chemically functional groups. These properties may be exploited to rationally design biomimetic light-harvesting antennas to replicate aspects of bacterial photosynthesis. Here, I present our computational design framework CanDo (http://cando-dna-origami.org) that quantitatively predicts the 3D solution structure of megadalton-scale DNA-based nanoassemblies based on underlying DNA sequence, as well as their emergent light-harvesting properties when decorated with dyes. This computational framework enables the in silico design and optimization of functional DNA-based light-harvesting devices prior to time-consuming and costly synthesis and experimental validation.  Back
 
Keywords:
Computational Physics, Bioinformatics & Genomics, Computer Aided Design, Energy Exploration, GTC 2014 - ID S4521
Streaming:
 
GPU Acceleration of a Variational Monte Carlo Method
Niladri Sengupta (Louisiana State University, Baton Rouge, USA)
The session will describe the CUDA implementation of a variational Monte Carlo method for the study of strongly correlated quantum systems including high-temperature superconductors, magnetic semiconductors and metal oxides heterostructures. The pres ...Read More
The session will describe the CUDA implementation of a variational Monte Carlo method for the study of strongly correlated quantum systems including high-temperature superconductors, magnetic semiconductors and metal oxides heterostructures. The presentation will cover different tuning and optimization strategies implemented in the GPU code. To eliminate the bandwidth limited performance we have used caching and a novel restructuring of the computation and data access patterns.We also perform two specific optimizations for Kepler. The code uses dynamic compilation to improve performance, especially in parts with limited parallelism. Using Kepler, our code achieves 22 times and 176 times speedup compared to 8 cores and single core CPU implementations respectively. The GPU code allows us to obtain accurate results for large lattices which are crucial for developing predictive capabilities of materials properties. Our developed techniques for matrix inverse and determinant updates can be recycled for other quantum Monte Carlo methods.  Back
 
Keywords:
Computational Physics, Quantum Chemistry, GTC 2014 - ID S4554
Streaming:
 
Lattice QCD using MILC and QUDA: Accelerating Calculations at the High-Energy Frontier
Justin Foley (Microway, NVIDIA)
Lattice Quantum ChromoDynamics (QCD) is a numerical treatment of the theory of the strong nuclear force. Calculations in this field can answer fundamental questions about the nature of matter, provide insight into the evolution of the early universe, ...Read More
Lattice Quantum ChromoDynamics (QCD) is a numerical treatment of the theory of the strong nuclear force. Calculations in this field can answer fundamental questions about the nature of matter, provide insight into the evolution of the early universe, and play a crucial role in the search for new theories of fundamental physics. However, massive computational resources are needed to achieve these goals. In this talk, we describe how NVIDIA GPUs are powering Lattice QCD calculations involving the MILC code suite and the QUDA library. This code base has allowed lattice applications to access unparalleled compute power on leadership-class facilities such as Blue Waters and Titan.   Back
 
Keywords:
Computational Physics, GTC 2014 - ID S4641
Streaming:
Download:
 
Accelerating Low-Lying Eigenmode Deflation for Lattice QCD Fermion Inverters on GPUs
Alexei Strelchenko (Fermi National Accelerator Laboratory)
Learn how to leverage the power of GPUs to accelerate solution of large sparse linear systems with multiple right hand sides by means of the incremental eigCG algorithm. For a given hermitian system with multiple right hand sides this algorithm allow ...Read More
Learn how to leverage the power of GPUs to accelerate solution of large sparse linear systems with multiple right hand sides by means of the incremental eigCG algorithm. For a given hermitian system with multiple right hand sides this algorithm allows (1) to compute incrementally a number of small magnitude eigenvalues and corresponding eigenvectors while solving the first few systems with standard Conjugate Gradient (CG), and then (2) to reuse the computed eigenvectors to deflate the CG solver for the remaining systems. In this session we will discuss implementation aspects of the technique and analyse its efficiency on the example of lattice QCD fermion matrix inversions.   Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4693
Streaming:
Download:
 
GPU-Accelerated Modeling of Coherent Processes in Magnetic Nano-Structures
Aleksey Demenev (Perm State University)
Multi-scale molecular dynamics of the systems of nanomagnets is investigated by numerical simulation using parallel algorithms. Fortran- code Magnetodynamics-F provides next types of research: study of the possibility of regulation time of switc ...Read More

Multi-scale molecular dynamics of the systems of nanomagnets is investigated by numerical simulation using parallel algorithms. Fortran- code Magnetodynamics-F provides next types of research: study of the possibility of regulation time of switching of the magnetic moment of the nanostructure; estimation of the role of nanocrystal geometry on super-radiation of 1-, 2- and 3-dimensional objects; study of magnetodynamics of a nanodots inductively coupled with the passive resonator; depending on the solution from initial orientation of the magnetic moment in order to find the configurations for which the super-radiance and radiative damping are maximal. The parallel programs created using application programming interfaces OpenMP and OpenACC. The estimates of speedup and efficiency of implemented algorithms in comparison with sequential algorithms have been obtained. It is shown that the use of NVIDIA Tesla accelerates simulation for study of magnetic dynamics systems which include thousands of magnetic nanoparticles.

  Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Molecular Dynamics, GTC 2014 - ID S4493
Download:
Computational Structural Mechanics
Presentation
Media
Efficient Particle-Based Simulation of Dynamic Cracks and Fractures in Ceramic Material
Patrick Diehl (University of Bonn)
Nowadays, ceramic is often used in the automotive or aeronautics industries, but the simulation of dynamic cracks and fractures in these materials is difficult, because of bifurcations at the crack tips. In this session we present the benefits of GPU ...Read More
Nowadays, ceramic is often used in the automotive or aeronautics industries, but the simulation of dynamic cracks and fractures in these materials is difficult, because of bifurcations at the crack tips. In this session we present the benefits of GPU's to simulate dynamic crack and fractures in solids, e. g. ceramic materials, using the Peridynamic technique. (1) Most discrete equations of particle-based methods depend on finding neighborhoods. Therefore we present our novel library to find the k-nearest neighbors efficient on the GPU's. (2) Using the high parallelism of the GPU allows increasing the amount of particles, which influence the dependability of the simulation. To validate our implementation on the GPU we simulate a common high-velocity impact scenario and compare our results with experimental data.   Back
 
Keywords:
Computational Structural Mechanics, Digital Manufacturing, Visual Effects & Simulation, GTC 2014 - ID S4255
Streaming:
Download:
Computer Aided Design
Presentation
Media
No More NURBS: Try PSPS
Qingde Li (University of Hull)
The goal of this session is to show how to create geometric shapes in GPUs, by taking advantage of GPU's tessellation feature, using the state of the art spline technique called PSP splines (PSPS). PSPS are simpler than B-splines in its mathematical ...Read More
The goal of this session is to show how to create geometric shapes in GPUs, by taking advantage of GPU's tessellation feature, using the state of the art spline technique called PSP splines (PSPS). PSPS are simpler than B-splines in its mathematical form, but are much more powerful than NURBS in geometric design. Compared with Bezier, B-spline, NURBS, design a geometric shape using PSPS is much more efficient, flexible and more intuitive. In this session we will describe what PSPS are and demonstrate how to directly implement PSPS in GLSL or HLSL in the tessellation stages to create new geometries.   Back
 
Keywords:
Computer Aided Design, Digital Product Design & Styling, Game Development, GTC 2014 - ID S4240
Streaming:
 
Advanced Solutions for Media & Entertainment, Engineering and Design from HP and NVIDIA (Presented by HP)
Sean Young (HP)
Come learn about the technology and partnership between HP and NVIDIA that empowers users around the world to design and create without limitations. ...Read More

Come learn about the technology and partnership between HP and NVIDIA that empowers users around the world to design and create without limitations.

  Back
 
Keywords:
Computer Aided Design, Digital Manufacturing, Media & Entertainment, Ray Tracing, GTC 2014 - ID S4883
Streaming:
Computer Vision
Presentation
Media
Session 2: Fast, Parallel Algorithms for Computer Vision and Machine Learning with GPUs (Presented by ArrayFire)
Umar Arshad (ArrayFire)
Working on image processing, computer vision, or machine learning? Learn best practices for implementing parallel versions of popular algorithms on GPUs. Instead of reinventing the wheel, you will learn where to find and how to use excellent versions ...Read More
Working on image processing, computer vision, or machine learning? Learn best practices for implementing parallel versions of popular algorithms on GPUs. Instead of reinventing the wheel, you will learn where to find and how to use excellent versions of these algorithms already available in CUDA and ArrayFire libraries. You will walk away equipped with the best tools and knowledge for implementing accelerated image processing and machine learning. This session will also include information about programming CUDA on Tegra mobile devices for computer vision applications.  Back
 
Keywords:
Computer Vision, Numerical Algorithms & Libraries, Machine Learning & AI, Video & Image Processing, GTC 2014 - ID S4711
Streaming:
 
A GPU-Based Free-Viewpoint Video System for Surgical Training
Pierre Boulanger (University of Alberta)
In this presentation, we propose a novel GPU-based algorithm capable of generating free viewpoints from a network of fixed HD video cameras. This free viewpoint TV system consists of two main sub-systems: a real-time depth estimation sub-system, whic ...Read More
In this presentation, we propose a novel GPU-based algorithm capable of generating free viewpoints from a network of fixed HD video cameras. This free viewpoint TV system consists of two main sub-systems: a real-time depth estimation sub-system, which extracts a disparity map from a network of cameras, and a synthetic viewpoint generation sub-system that uses the disparity map to interpolate new views between the cameras. In this system, we use a space-sweep algorithm to estimate depth information, which is amiable to parallel implementation. The view generation sub-system generates new synthetic images from 3D vertices and renders them from an arbitrary viewpoint specified by the user. Both steps are computationally extensive, but the computations can easily be divided from each other and thus can be efficiently implemented in parallel using CUDA. A surgical training application is presented.  Back
 
Keywords:
Computer Vision, Virtual & Augmented Reality, Medical Imaging & Visualization, Video & Image Processing, GTC 2014 - ID S4247
Streaming:
Download:
 
Terrestrial 3D Mapping with Parallel Computing Approach
Janusz Bedkowski (Institute of Mathematical Machines)
This work concerns the parallel implementation of 3D mapping algorithm. Several nearest neighborhood search strategies are compared. The accuracy of final 3D mapping is evaluated with geodetic precision. This work can be used in several applications ...Read More
This work concerns the parallel implementation of 3D mapping algorithm. Several nearest neighborhood search strategies are compared. The accuracy of final 3D mapping is evaluated with geodetic precision. This work can be used in several applications such as mobile robotics and spatial design. Attendees will learn how to choose proper nearest neighbors search strategy for 3D data registration, how to build accurate 3D maps, how to evaluate 3D mapping system with geodetic precision and what the influence of parallel programming is to performance and accuracy.  Back
 
Keywords:
Computer Vision, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4353
Streaming:
 
Real-Time 3D Pose Estimation of Hundreds of Objects
Karl Pauwels (University of Granada, Spain)
Discover how hundreds of objects can be simultaneously located and tracked in 3D through the real-time combination of visual simulation and visual perception. A tight integration of GPU graphics and compute has allowed us to continuously update a 3D ...Read More
Discover how hundreds of objects can be simultaneously located and tracked in 3D through the real-time combination of visual simulation and visual perception. A tight integration of GPU graphics and compute has allowed us to continuously update a 3D scene model on the basis of dense visual cues, while at the same time feeding back information from this model to facilitate the cue estimation process itself. In this session we will describe (1) the low-level dense motion and stereo engine that can exploit such model feedback, (2) the 6DOF pose (location and orientation) estimation of hundreds of rigid objects at 40 Hz, and (3) how the same framework enables multi-camera and/or complex articulated object tracking. Throughout the session, we will pay special attention to implementation and system integration aspects of our real-time demonstrator system.   Back
 
Keywords:
Computer Vision, Machine Learning & AI, Mobile Applications, Video & Image Processing, GTC 2014 - ID S4381
Streaming:
Download:
 
Real-Time Affine-Invariant Feature Extraction: Object Recognition Under Extreme Viewpoint Change
Valeriu Codreanu (Eindhoven University of Technology)
Learn how to efficiently design affine-invariant feature extractors using GPU hardware for the purpose of robust object recognition. Local feature extraction from images is one of the main topics in pattern matching and computer vision in general. So ...Read More
Learn how to efficiently design affine-invariant feature extractors using GPU hardware for the purpose of robust object recognition. Local feature extraction from images is one of the main topics in pattern matching and computer vision in general. Some of the best feature extractors such as SIFT and SURF are scale, rotation, and translation invariant, but fall short when illumination and viewpoint change are taken into account. To increase the viewpoint-invariance of SIFT, the fully affine-invariant ASIFT was developed, but this came with a very high computational cost. We present results from using our simple image transformation framework to achieve real-time affine-invariant object recognition, while also being scalable in terms of the number of GPU devices used. Participants in this session will learn more about this high-performance CUDA solution for adding viewpoint-invariance to any feature extractor, relying on the hardware features of modern GPU devices.  Back
 
Keywords:
Computer Vision, Video & Image Processing, GTC 2014 - ID S4401
Streaming:
Download:
 
High-Resolution Facial Performance Capture Using CUDA
Jerome Courchay (Telecom SudParis)
Learn how to use GPU for accelerating 3D registration with Kinect or similar devices in order to capture highly-detailed facial performance in real time or at an interactive speed. We describe the energy-based approach that we borrowed from Hao Li et ...Read More
Learn how to use GPU for accelerating 3D registration with Kinect or similar devices in order to capture highly-detailed facial performance in real time or at an interactive speed. We describe the energy-based approach that we borrowed from Hao Li et al. paper published at SGP 2008. We also explain why we can benefit from GPU computation power and achieve higher quality and more detail at interactive speeds. Finally, we elaborate on how real-time performance can be achieved by improving our CUDA-based implementation.   Back
 
Keywords:
Computer Vision, Virtual & Augmented Reality, Game Development, Real-Time Graphics Applications, GTC 2014 - ID S4414
Streaming:
 
Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and CUDA
Umar Arshad (ArrayFire)
This session will discuss the lessons learned during the development of a facial modeling application used for glasses.com. The application made use of OpenCV and OpenMP to create a 3D representation of a person's face. Glasses.com wanted to improve ...Read More
This session will discuss the lessons learned during the development of a facial modeling application used for glasses.com. The application made use of OpenCV and OpenMP to create a 3D representation of a person's face. Glasses.com wanted to improve the performance of the application to reduce run-times and hardware costs. We will discuss the performance requirements and the techniques used to meet their goals. Attendees will leave having learned how NVIDIA's visual profile is essential to profiling multi-threaded applications.  Back
 
Keywords:
Computer Vision, GTC 2014 - ID S4426
Streaming:
 
Accelerating 3D Reconstruction from Range Images with a Novel Cyclic Scheme
Christopher Schroers (Saarland University)
Attend this session to get a deep understanding of variational range image integration methods. Such approaches are able to deal with a substantial amount of noise and outliers, while regularizing and thus creating smooth 3D reconstructions. See that ...Read More
Attend this session to get a deep understanding of variational range image integration methods. Such approaches are able to deal with a substantial amount of noise and outliers, while regularizing and thus creating smooth 3D reconstructions. See that incorporating a new direction-dependent smoothing behavior yields a better control of the smoothing with respect to the local structure of the unknown surface and thus state-of-the-art results. Also, learn how the integration can be accelerated with a novel and generic cyclic scheme named Fast Jacobi. Fast Jacobi is essentially a modified Jacobi over-relaxation (JOR) method, where the relaxation parameter is not fixed but varied in a cyclic way. Due to this, Fast Jacobi is much more efficient than JOR but still as simple to implement and perfectly suited for parallelization. Furthermore, the Fast Jacobi scheme is also applicable to a large range of other PDE-based image analysis problems.  Back
 
Keywords:
Computer Vision, Numerical Algorithms & Libraries, GTC 2014 - ID S4570
Streaming:
Download:
 
Deep Neural Networks for Visual Pattern Recognition
Dan Ciresan (IDSIA)
GPU-optimized Deep Neural Networks (DNNs) excel on image classification, detection and segmentation tasks. They are the current state of the art method in many visual pattern recognition problems by a significant margin. DNNs are already better than ...Read More
GPU-optimized Deep Neural Networks (DNNs) excel on image classification, detection and segmentation tasks. They are the current state of the art method in many visual pattern recognition problems by a significant margin. DNNs are already better than humans at recognizing handwritten digits and traffic signs. The complex handwritten Chinese characters are recognized with almost human performance. DNNs are successfully used for automotive problems like traffic signs and pedestrian detection; they are fast and extremely accurate. DNNs help the field of connectomics by making it possible to segment and reconstruct the neuronal connections in large sections of brain tissue for the first time. This will bring a new understanding of how biological brains work. Detecting mitotic cells in breast cancer histology images can be done quickly and efficiently with DNNs. Segmenting blood vessels from retinal images with DNNs helps diagnosticians to detect glaucoma.  Back
 
Keywords:
Computer Vision, Automotive, Machine Learning & AI, Medical Imaging & Visualization, GTC 2014 - ID S4636
Streaming:
 
A Real-Time Defocus Deblurring Method for Semiconductor Manufacturing
Tsutomu Sakuyama (Dainippon Screen Mfg. Co., Ltd.)
This session will present a real-time defocus deblurring method for the industrial equipment for semiconductors. Many studies have proposed fast deblurring methods for the natural and medical images, etc. However, these methods have difficulty in the ...Read More
This session will present a real-time defocus deblurring method for the industrial equipment for semiconductors. Many studies have proposed fast deblurring methods for the natural and medical images, etc. However, these methods have difficulty in the equipment due to following reasons: Most of approaches requires the distance between imaging device and the object which cannot be obtained in most cases. In addition, the process must finish within constant cycle time determined by a specification of the equipment, which means 'real-time' in production purpose. In this session, we propose our deblurring method satisfying those constraints.   Back
 
Keywords:
Computer Vision, Computational Photography, Real-Time Graphics Applications, Video & Image Processing, GTC 2014 - ID S4695
Streaming:
Download:
 
GPU Computing for Cognitive Robotics
Martin Peniak (Plymouth University)
Learn how GPU Computing impacts cognitive robotics in the areas of: (1) Software development,(2) Action and Language Learning in Humanoid Robots based on complex artificial neural networks, and(3) 3D object recognition evolved through the process if ...Read More
Learn how GPU Computing impacts cognitive robotics in the areas of: (1) Software development,(2) Action and Language Learning in Humanoid Robots based on complex artificial neural networks, and(3) 3D object recognition evolved through the process if of simulating process of natural evolution. The presentation will feature the latest state-of-the-art results and videos from each area mentioned above.  Back
 
Keywords:
Computer Vision, Machine Learning & AI, GTC 2014 - ID S4703
Streaming:
Download:
Debugging Tools & Techniques
Presentation
Media
CUDA Debugging with Command Line Tools
Vyas Venkataraman (NVIDIA)
CUDA debugging tools CUDA-GDB and CUDA-MEMCHECK provide a whole new feature set to help improve your CUDA application development cycle. This session is a detailed walk-through of the key debugger features and advanced techniques on using printf, CUD ...Read More
CUDA debugging tools CUDA-GDB and CUDA-MEMCHECK provide a whole new feature set to help improve your CUDA application development cycle. This session is a detailed walk-through of the key debugger features and advanced techniques on using printf, CUDA-GDB and MEMCHECK together to improve overall code productivity on Linux and MacOS platforms.  Back
 
Keywords:
Debugging Tools & Techniques, GTC 2014 - ID S4578
Streaming:
Download:
 
Debugging PGI CUDA Fortran and OpenACC on GPUs with Allinea DDT
Sebastien Deldon (PGI), David Lecomber (Allinea)
PGI CUDA Fortran and OpenACC compilers are used extensively to take advantage of CUDA and NVIDIA GPUs within Fortran applications - and we will present new work that enables developers to debug the GPU kernels on the GPUs interactively. This work br ...Read More
PGI CUDA Fortran and OpenACC compilers are used extensively to take advantage of CUDA and NVIDIA GPUs within Fortran applications - and we will present new work that enables developers to debug the GPU kernels on the GPUs interactively. This work brings the benefits of true debugging to developers using CUDA and OpenACC for Fortran, including the ability to examine device state and memory and control GPU threads, and supplies a vital but previously missing weapon in the armory of GPU developers and is made available in the Allinea DDT 4.2.1 and PGI 14.1 releases.  Back
 
Keywords:
Debugging Tools & Techniques, GTC 2014 - ID S4284
Streaming:
 
Accelerating Software: Experiences with Profiling and Debugging in Enabling OpenACC Codes to Fly
Beau Paisley (Allinea Software)
We present experiences in adapting major applications to OpenACC and show how tools for debugging and profiling software have enabled development to be successful on several large high profile GPU systems. Adapting complex parallel codes requires eff ...Read More
We present experiences in adapting major applications to OpenACC and show how tools for debugging and profiling software have enabled development to be successful on several large high profile GPU systems. Adapting complex parallel codes requires effective parallel profiling to discover the most promising GPU-offload opportunities, and equally requires effective debugging for the resolving bugs within the resulting code after the changes. We will present how Allinea's tools have been used through the process with CUDA and OpenACC for parallel applications at scale. We will present the tools and demonstrate real examples of applications as leading GPU adopters - and show how they have been able to identify and transform development potential into results.   Back
 
Keywords:
Debugging Tools & Techniques, Clusters & GPU Management, Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4362
Streaming:
 
CUDA Dynamic Parallelism: A Debugger Developer's Take on the Kernel of a Revolution
Larry Edelstein (Klocwork)
You've heard about CUDA dynamic parallelism but you probably have questions: Should I be using dynamic parallelism in my programs? What are some of the trade offs and constraints that I need to know about dynamic parallelism? We've been thinking ab ...Read More
You've heard about CUDA dynamic parallelism but you probably have questions: Should I be using dynamic parallelism in my programs? What are some of the trade offs and constraints that I need to know about dynamic parallelism? We've been thinking about these questions at Rogue Wave Software as we adapt our TotalView debugger to provide the visibility and control that you need to really troubleshoot codes that make use of CUDA dynamic parallelism. This talk takes this occasion to look at dynamic parallelism from two perspectives. The first focus will be on how dynamic parallelism is implemented and what users need to know to understand what is really going on in their programs. Then we'll look at it as a debugging challenge, discussing what we needed to do in TotalView to support it and exploring the capabilities that developers need to troubleshoot CUDA dynamic parallel programs.  Back
 
Keywords:
Debugging Tools & Techniques, Performance Optimization, Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4399
Streaming:
Download:
 
CUDA Debugging Tools: CUDA-GDB and CUDA-MEMCHECK
Vyas Venkataraman (NVIDIA)
Advance debugging techniques with CUDA-GDB and CUDA-MEMCHECK will be covered that can assist even the most advanced CUDA developer in locating program correctness issues. ...Read More
Advance debugging techniques with CUDA-GDB and CUDA-MEMCHECK will be covered that can assist even the most advanced CUDA developer in locating program correctness issues.  Back
 
Keywords:
Debugging Tools & Techniques, GTC 2014 - ID S4580
Streaming:
Download:
 
CUDA Application Development Life Cycle Using NVIDIA(R) Nsight(TM), Eclipse Edition
Satish Salian (NVIDIA)
NVIDIA(R) Nsight(TM), Eclipse Edition for Linux and Mac is an all-in-one development environment that lets you develop, debug and optimize CUDA code in an integrated UI environment. This session demonstrates all of Nsight's capabilities, including t ...Read More
NVIDIA(R) Nsight(TM), Eclipse Edition for Linux and Mac is an all-in-one development environment that lets you develop, debug and optimize CUDA code in an integrated UI environment. This session demonstrates all of Nsight's capabilities, including the CUDA aware source editor, build integration of the CUDA tool chain, graphical debugger for both CPU and GPU, and graphical profiler. New features in the upcoming release will be revealed.   Back
 
Keywords:
Debugging Tools & Techniques, GTC 2014 - ID S4591
Streaming:
Download:
 
NVIDIA(R) Nsight(TM) Visual Studio Edition 4.0: A Fast-Forward of All the Greatness of the Latest Edition
Sebastien Domine (NVIDIA)
NVIDIA Nsight Visual Studio Edition is the most complete application development for heterogeneous platforms on Windows. The development tool is capable of GPU kernel and shader debugging for CUDA, OpenGL and DirectX, along with API debugging and pro ...Read More
NVIDIA Nsight Visual Studio Edition is the most complete application development for heterogeneous platforms on Windows. The development tool is capable of GPU kernel and shader debugging for CUDA, OpenGL and DirectX, along with API debugging and profiling capabilities. In this session, Sebastien Domine from NVIDIA, will quickly review all the new features of the latest release of the product. From the latest CUDA 6.0 support with Unified Virtual Memory support, OpenGL 4.3 and additional features like dynamic shader editing, or the brand new Direct3D 9 and 11 GUI with D3D ASM debugging support, and much more... Sebastien will demonstrate many of these features live to better illustrate how powerful these can be for day-to-day GPU compute and graphics development activities.   Back
 
Keywords:
Debugging Tools & Techniques, Performance Optimization, GTC 2014 - ID S4683
Streaming:
Download:
Defense
Presentation
Media
CUDA-Accelerated Wireless Communication Technnique for Aerospace Exploration
Ying Liu (University of Chinese Academy of Sciences)
Learn the concept of a typical telemetry system for aerospace exploration, and identify the bottleneck in he process flow and its corresponding computational complexity. Learn our approach to accelerate Multiple Symbol Detection (MSD)-based demodulat ...Read More
Learn the concept of a typical telemetry system for aerospace exploration, and identify the bottleneck in he process flow and its corresponding computational complexity. Learn our approach to accelerate Multiple Symbol Detection (MSD)-based demodulation method. The computational core of MSD is defined as "sliding correlation" problem, which calculates the correlation between a long vector and a set of short vectors. An efficient CUDA parallelization scheme is proposed to accelerate MSD. High thread-level parallelism is achieved by this scheme and various optimization techniques are applied to improve the performance. CU-MSD is implemented by adapting "sliding correlation". Good speedups is observed on data sets generated from a real aerospace PCM/FM integrated baseband system.  Back
 
Keywords:
Defense, Signal & Audio Processing, GTC 2014 - ID S4189
Streaming:
Download:
 
Batch QR Decomposition of Small Matrices on the GPU Using Givens Rotations
Pierre-Yves Taunay (The Pennsylvania State University)
This work details several GPU implementations of the QR decomposition algorithm using Givens rotations, with a particular focus on large batches of small matrices, displaying performance improvements over similar CPU routines. Each approach essential ...Read More
This work details several GPU implementations of the QR decomposition algorithm using Givens rotations, with a particular focus on large batches of small matrices, displaying performance improvements over similar CPU routines. Each approach essentially consists of successive operations on the input matrix in order to transform it to the upper triangular matrix R, while accumulating operations in the matrix Q. Each GPU block operates on one or more matrices, with care taken to avoid thread divergence and large memory transfers.  Back
 
Keywords:
Defense, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4304
Streaming:
 
Power-Aware Software on ARM
Paul Fox (EM Photonics, Inc.)
Learn how to optimize your software application with power-awareness, to decrease size, weight and power of the overall system. Advancements in processing technology have provided considerable gains in performance and power savings. The latest genera ...Read More
Learn how to optimize your software application with power-awareness, to decrease size, weight and power of the overall system. Advancements in processing technology have provided considerable gains in performance and power savings. The latest generation of mobile processors enables smartphones that can remain idle for days, or operable for an entire trans-continental flight under heavy-use. These advancements have mainly been achieved with low-power-by-design approaches which allow processors to consume less energy when not in use. Unfortunately, situations requiring persistent use, such as navigation, severely limit the benefits of existing designs. Come see how EM Photonics is optimizing software to be more "power-aware," to benefit the soldiers in the field and how these techniques may be of benefit to your application.  Back
 
Keywords:
Defense, Mobile Applications, GTC 2014 - ID S4319
Streaming:
Download:
 
GPU Accelerated 3D Point Clouds Generation from Stereo Images
Bingcai Zhang (BAE Systems)
Automatic image understanding and object recognition/extraction have many applications in geospatial intelligence, remote sensing, image processing, and robotics. However, the radiometric properties and spectral characteristics of image pixels are ve ...Read More
Automatic image understanding and object recognition/extraction have many applications in geospatial intelligence, remote sensing, image processing, and robotics. However, the radiometric properties and spectral characteristics of image pixels are very complex and variable. We take a new approach by extracting 3D objects from 3D point clouds which are generated from stereo images. Our new approach bypasses the complexity of image pixel properties and directly uses the invariant 3D property of any 3D object. One of the critical technologies of this approach is the 3D point clouds generation from stereo images. The 3D point clouds must be accurate and the generation process must be fast. We have developed a GPU accelerated "Automatic Spatial Modeler" (ASM) application that generates accurate 3D point clouds from stereo images. ASM matches every pixel and generates very dense and accurate 3D point clouds. Advanced image matching algorithms in ASM are based on many years of R&D at a global defense, aerospace and security company.   Back
 
Keywords:
Defense, Computer Vision, GTC 2014 - ID S4325
Streaming:
Download:
 
GPU-Accelerated SDR Implementation of a Multi-User Detector for Satellite Return Links
Chen Tang (German Aerospace Center)
In this session a novel GPU-based Software Defined Radio (SDR) implementation of a Multi-User Detector (MUD) receiver for transparent satellite return link is presented. In the past decade new satellite applications have emerged, which require a bidi ...Read More
In this session a novel GPU-based Software Defined Radio (SDR) implementation of a Multi-User Detector (MUD) receiver for transparent satellite return link is presented. In the past decade new satellite applications have emerged, which require a bidirectional satellite link. Due to the scarcity and high cost of satellite frequency spectrum, it is very important to utilize the available spectrum as efficiently as possible. The efficient usage of the spectrum in the satellite return link is a challenging task, especially if multiple users are present. In previous work MUD techniques have been widely studied to increase the spectral efficiency of the satellite return link. However, due to the high computational complexity and its sensitivity to synchronization and channel estimation errors, only few implementations of MUD for satellite communications exist. Here we will present a GPU-accelerated MUD receiver operating in real time for satellite return links, which achieves a decoding throughput of 290 Kbps.  Back
 
Keywords:
Defense, Signal & Audio Processing, GTC 2014 - ID S4361
Streaming:
 
ATCOM: A Real-Time Image Enhancement Platform for Surveillance
Eric Kelmelis (EM Photonics, Inc.)
Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera, which ...Read More
Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera, which severely limits the quality of data that is captured by high-end imaging systems. We will discuss the practical considerations of keeping up with real-time video, including kernel performance and pipelining, and effectively using multiple GPUs in a real-time context. We have optimized specifically for the Kepler warp-shuffle instruction and will go in depth on the performance boosts offered by this new technology.  Back
 
Keywords:
Defense, Video & Image Processing, GTC 2014 - ID S4439
Streaming:
 
Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight Center
Abel Brown (A.I. Solutions)
The computational intensity required for modern-day space missions is quickly outgrowing existing CPU capabilities. The Magnetosphere Multiscale (MMS) mission is the first NASA mission to fly four satellites in formation and thus has uniquely challen ...Read More
The computational intensity required for modern-day space missions is quickly outgrowing existing CPU capabilities. The Magnetosphere Multiscale (MMS) mission is the first NASA mission to fly four satellites in formation and thus has uniquely challenging design and operational requirements, namely, mitigation of collision scenarios involving space debris and/or the formation with itself. By design, no more than 1 in 1000 unsafe close approaches may go undetected while operationally no more than 1 in 20 alarms raised my be false - so as to minimize science interruptions. The confidence intervals required to satisfy such requirements pose daunting computational demands, which operationally, can not be met using traditional CPU solutions. Here it is demonstrated how GPU-accelerated solutions are being deployed, for the first time, at the NASA Goddard Space Flight Center (GSFC) to meet operational MMS mission requirements. Additional applications to Space Situational Awareness and mission design are discussed.  Back
 
Keywords:
Defense, Numerical Algorithms & Libraries, Scientific Visualization, Supercomputing, GTC 2014 - ID S4571
Streaming:
Download:
 
On Designing Accelerator-Based System Architectures for Demanding Signal Processing Applications
Bracy Elton (Dynamics Research Corporation, High Performance Technologies Group), Ross Smith (Dynamics Research Corporation, High Performance Technologies Group)
The advent of (1) high performance PCIe 3.0 compatible accelerators that provide for direct accelerator-to-accelerator communication, e.g., via NVIDIA GPUDirect RDMA, (2) PCIe 3.0 switches & devices, e.g., PLX Technology, and (3) PCIe 3.0 high ba ...Read More
The advent of (1) high performance PCIe 3.0 compatible accelerators that provide for direct accelerator-to-accelerator communication, e.g., via NVIDIA GPUDirect RDMA, (2) PCIe 3.0 switches & devices, e.g., PLX Technology, and (3) PCIe 3.0 high bandwidth network adaptors and switches, such as those for 40 Gb/s Ethernet & 56 Gb/s FDR Infiniband presents opportunities for designing systems that enable demanding signal processing applications, such as real-time image & radar processing, and domain decomposition approaches for fluid dynamics. We combine the above and present ideas for designing systems onto which can be mapped such demanding signal processing applications.  Back
 
Keywords:
Defense, Signal & Audio Processing, Supercomputing, GTC 2014 - ID S4648
Streaming:
Download:
 
Embedding CUDA
Dustin Franklin (GE Intelligent Platforms)
Rugged GPUs are bringing leading edge performance and mission-critical reliability to platforms with harsh operating environments. Follow advances in GPU technology which unlock real-time CUDA capabilities for low-latency GPU applications. Learn ho ...Read More
Rugged GPUs are bringing leading edge performance and mission-critical reliability to platforms with harsh operating environments. Follow advances in GPU technology which unlock real-time CUDA capabilities for low-latency GPU applications. Learn how to architect systems with GPUDirect and 3rd-party IO devices and interconnects for efficient data streaming and increased scalability. Tune your CUDA kernels and control logic for low-latency asynchronous behavior with response times down into the microseconds. Explore embedded GPU applications in signal processing, imaging, avionics, vetronics, and shipboard.  Back
 
Keywords:
Defense, Real-Time Graphics Applications, Signal & Audio Processing, Video & Image Processing, GTC 2014 - ID S4675
Streaming:
 
Exploiting the GPU for High Performance Geospatial Situational Awareness Involving Massive and Dynamic Data Sets
Bart Adams (Luciad)
Geospatial Situational Awareness(SA)engines face stringent accuracy and performance requirements. Large volumes of static and dynamic data need to be analyzed and visualized, in both 2D and 3D in various geographic projections, at sub-centimeter accu ...Read More
Geospatial Situational Awareness(SA)engines face stringent accuracy and performance requirements. Large volumes of static and dynamic data need to be analyzed and visualized, in both 2D and 3D in various geographic projections, at sub-centimeter accuracy and interactive update rates. In contrast to game engines where this data can be pre-processed and stored in optimized data structures, the data comes in any form and needs to be interpreted on-the-fly. This talk will discuss these challenges and the advanced GPU rendering techniques and algorithms that address them. We will show that by exploiting the GPU, terabytes of terrain and imagery data, in combination with highly dynamic data streams that can contain millions of tracks and multiple radar feeds as well as orthorectified UAV video streams, can be handled on a world-scale theater at update rates of over 60Hz.  Back
 
Keywords:
Defense, Big Data Analytics & Data Algorithms, Combined Simulation & Real-Time Visualization, Desktop & Application Virtualization, GTC 2014 - ID S4680
Streaming:
Download:
 
A GPU-Based Computational Framework for Large-Scale Critical Infrastructure Mapping Using Satellite Imagery
Dilip Patlolla (Oak Ridge National Laboratory)
Assessing and monitoring critical infrastructures from space is a cost effective and efficient solution. Satellite images are now available with spatial resolutions and acquisition rates to enable image driven large-scale mapping and monitoring of cr ...Read More
Assessing and monitoring critical infrastructures from space is a cost effective and efficient solution. Satellite images are now available with spatial resolutions and acquisition rates to enable image driven large-scale mapping and monitoring of critical infrastructure a viable possibility. However, processing huge volume of high spatial resolution imagery is not a trivial task. Often solutions require advanced algorithms capable of extracting, representing, modeling, and interpreting scene features that characterize the spatial, structural, and semantic attributes. Furthermore, these solutions should be scalable enabling analysis of big image datasets; at half-meter pixel resolution the earth's surface has roughly 600 Trillion pixels and the requirement to process at this scale at repeated intervals demands highly scalable solutions. In this research, we present a GPU-based computational framework designed for identifying critical infrastructures from large-scale satellite or aerial imagery to assess vulnerable population.  Back
 
Keywords:
Defense, Big Data Analytics & Data Algorithms, Supercomputing, Video & Image Processing, GTC 2014 - ID S4706
Streaming:
Download:
 
GAIA: The GPU-Accelerated Distributed Database and Computational Framework Solving the Infinite Data Problem
Nima Negahban (GIS Federal)
A distributed database and computational framework designed to leverage the GPU. GAIA's unique semantic type system coupled with its near real time processing,query, and visualization capability have made it the solution for Government Agencies comi ...Read More
A distributed database and computational framework designed to leverage the GPU. GAIA's unique semantic type system coupled with its near real time processing,query, and visualization capability have made it the solution for Government Agencies coming to grips with being able to query and visualize high volume data streams. GAIA has been distributed to multiple Government Agencies including: Army, Navy, and DHS.  Back
 
Keywords:
Defense, Big Data Analytics & Data Algorithms, Cloud Visualization, Supercomputing, GTC 2014 - ID S4831
Streaming:
Download:
Desktop & Application Virtualization
Presentation
Media
Application Optimized GPU System Solutions: Winning Strategies for Selecting Best Platforms (Presented by Super Micro)
Don Clegg (Super Micro Computer, Inc)
Power Budget Challenges? Space Limitations? CAPEX and OPEX Constraints? Unmet Performance Expectations? Compressed Deployment Schedules? Server Management Interoperability? Serviceability? Density? Compatibility? What's YOUR biggest hardware challen ...Read More
Power Budget Challenges? Space Limitations? CAPEX and OPEX Constraints? Unmet Performance Expectations? Compressed Deployment Schedules? Server Management Interoperability? Serviceability? Density? Compatibility? What's YOUR biggest hardware challenge when deciding upon YOUR ideal GPU System Architecture?<br> <br> Choosing the right hardware platform plays a tremendous role in the success of any GPU Solution endeavor. Supermicro has the broadest portfolio of GPU system building blocks in the industry. This session will give an overview of the many GPU System choices Supermicro provides to engineers and solutions architects. Strengths, optimizations, and competitive advantages within each of the Supermicro GPU product families will be reviewed so that the audience can gain a better understanding of how to select the best platform to meet their unique demands.   Back
 
Keywords:
Desktop & Application Virtualization, Performance Optimization, Supercomputing, GTC 2014 - ID S4958
Streaming:
Download:
Digital Manufacturing
Presentation
Media
Redefining In-Vehicle Interfaces with UI Composer Studio
Gavin Kistner (NVIDIA), Stephen Mendoza (NVIDIA)
Panel discussion on experiences using UI Comspoer to develop real in-car application include IVI and instrument clusters ...Read More
Panel discussion on experiences using UI Comspoer to develop real in-car application include IVI and instrument clusters  Back
 
Keywords:
Digital Manufacturing, Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4615
Streaming:
 
PLM 2020: Redesigning the Design Process for Virtual First Generation
Kenneth Wong (Desktop Engineering), Dave Coldron (Lightwork Design Ltd.), Jeff Retey (Gulfstream), Safir Bellali (Vans, a VF Company), Oran Davis (Applied Materials), Andras Kemeny (Renault), Roger Lanctot (Strategy Analytics)
Online and mobile experiences are changing how we buy products. Online, mobile and digital showrooms enable us to choose the product configuration we want that is then manufactured and packaged on demand for us. The advent of low cost sensors an ...Read More

Online and mobile experiences are changing how we buy products. Online, mobile and digital showrooms enable us to choose the product configuration we want that is then manufactured and packaged on demand for us. The advent of low cost sensors and internet based data services changes how we can understand actual product use. In manufacturing, our product design, development and production processes are changing to support consumer customization, on demand production, sensor mediated feedback and delivery. But how do these changes impact our PLM tools? Are the current PLM technologies and processes designed to support these new ways of driving product demand? If not, how do we redesign the product development tools to accommodate them? This session, moderated by Kenneth Wong, Senior Editor, Desktop Engineering,will explore these questions.

  Back
 
Keywords:
Digital Manufacturing, GTC 2014 - ID S4950
Streaming:
 
Beyond 4k: Video Walls and Interactive Displays at High Resolutions using Multi-Machine Clusters
Erik Beaumont (Ventuz)
See how companies and broadcasters are tackling the challenge of needing to fill ever larger displays with ever higher resolutions with dynamic, well designed content - which additionally often need to be interactive. In this talk, we will look at a ...Read More
See how companies and broadcasters are tackling the challenge of needing to fill ever larger displays with ever higher resolutions with dynamic, well designed content - which additionally often need to be interactive. In this talk, we will look at a variety of challenges, from the technical hurdles of clustering and framelocking across large video walls, projector setups and multiple machines, to the issues of producing and running content at these very high resolutions. We will discuss how GPUs and realtime rendering are the only feasible answer going forward as resolutions increase as well as look into how interactivity can be achieved in these settings. We will discuss how we have dealt with these problems in real world projects and also look into how the future of large scale displays, whether LED walls, high density displays or high resolution projectors, is shaping and what benefits and risks these technologies might bring.   Back
 
Keywords:
Digital Manufacturing, Collaborative & Large Resolution Displays, Media & Entertainment, Real-Time Graphics Applications, GTC 2014 - ID S4216
 
Getting Maximum Performance in CATIA Live Rendering
Tim Lawrence (BOXX Technologies)
Resolving rendering performance is based on available hardware. The more GPU the faster the rendering, hence more production. Make sure you have the correct hardware and configuration; use certified solutions for all software. Use benchmark tools to ...Read More
Resolving rendering performance is based on available hardware. The more GPU the faster the rendering, hence more production. Make sure you have the correct hardware and configuration; use certified solutions for all software. Use benchmark tools to determine modeling horsepower requirements. Optimal configuration does impact performance of rendering; what are the best configurations for CATIA hardware/software? Best practices for rendering in CATIA and steps to make your process optimal.  Back
 
Keywords:
Digital Manufacturing, Performance Optimization, Clusters & GPU Management, Rendering & Animation, GTC 2014 - ID S4309
Streaming:
Download:
 
Virtual Automotive: Projection Mapped Graphics for Automotive Design
Roy Anthony (Christie Digital), Kevin Moule (Christie Digital)
Explore the challenges and issues that arose during the design and implementation of a projection mapped model of a 1/5th scale Audi R8. Adapting a traditional rendering pipeline to generate image content suitable for projection mapped graphics. A pr ...Read More
Explore the challenges and issues that arose during the design and implementation of a projection mapped model of a 1/5th scale Audi R8. Adapting a traditional rendering pipeline to generate image content suitable for projection mapped graphics. A projection mapped model requires a single view point, but ideally view places that are better suited to the physical setup of the projectors and surface. Efficiently warping and blending the results for a seamless image. Even with content rendered from an ideal eye point the physical setup of the system (placement of the projectors/car) requires that some warping be applied to precisely align the content to the car. Finally we will explore the next steps in applying the above to a full scale car, the challenges in taking projection mapped graphics to scale.  Back
 
Keywords:
Digital Manufacturing, Automotive, Collaborative & Large Resolution Displays, Real-Time Graphics Applications, GTC 2014 - ID S4622
Streaming:
Download:
 
Social, Mobile, Cloud, GPU: The Technology Stack For Untethered Product Development
Randall Newton (Consilia Vektor)
High-performance visual computing is the 'missing link' that allows for a new technology stack in product development, when combined with mobile, social, and cloud technologies. This session will explore new, untethered uses for existing product de ...Read More
High-performance visual computing is the 'missing link' that allows for a new technology stack in product development, when combined with mobile, social, and cloud technologies. This session will explore new, untethered uses for existing product development tools, and new products and techniques only possible with the addition of GPU technology.   Back
 
Keywords:
Digital Manufacturing, Digital Product Design & Styling, Mobile Applications, GTC 2014 - ID S4627
Streaming:
Download:
 
Advanced HMI Design Workflow at PSA Peugeot Citroen
Benoit Deschamps (PSA Peugeot Citroën), Alain Gonzalez (PSA Peugeot Citroën)
PSA Peugeot Citroen IT department is involved to improve HMI Design Workflow. This presentation will explain how to provide tools, efficient workflows and hardware in order to design, evaluate and simulate HMI embedded in new car models. ...Read More
PSA Peugeot Citroen IT department is involved to improve HMI Design Workflow. This presentation will explain how to provide tools, efficient workflows and hardware in order to design, evaluate and simulate HMI embedded in new car models.   Back
 
Keywords:
Digital Manufacturing, Automotive, Combined Simulation & Real-Time Visualization, Manufacturing, GTC 2014 - ID S4629
Streaming:
Download:
 
Using GPU-Based RealityServer for an Online Laboratory Instrument Configurator
Mark Keenan (Technicon)
Learn how an online configurator integrated with GPU-based RealityServer lets lab managers interactively layout equipment and platforms, view 3D images of layouts and share designs with colleagues. Deployed for Thermo Fisher Scientific, a leading inn ...Read More
Learn how an online configurator integrated with GPU-based RealityServer lets lab managers interactively layout equipment and platforms, view 3D images of layouts and share designs with colleagues. Deployed for Thermo Fisher Scientific, a leading innovator of laboratory automation systems, the system is particularly valuable for lab managers who may not have experience laying out the equipment. Configurator rules ensure items are properly located and oriented. Using RealityServer, lab managers can quickly generate 3D views of their layout from any angle and location to see exactly what they're ordering and how it's assembled. Three-D images can be compiled into "photo-rolls" to fully document the layout. Lab managers can also view a budgetary estimate and equipment list for their layout.  Back
 
Keywords:
Digital Manufacturing, Manufacturing, Rendering & Animation, GTC 2014 - ID S4630
Streaming:
Download:
 
TOPS: Real-Time Automotive Appearance Evaluation for Non-Prototype Design
Daisuke Ide (Honda R&D, Japan)
At Honda, our aim is to utilize and evaluate the computer generated appearance in the same way as a physical model. For CG-based design, we need physically accurate results, not just artistic representations. And we also need real time performance. T ...Read More
At Honda, our aim is to utilize and evaluate the computer generated appearance in the same way as a physical model. For CG-based design, we need physically accurate results, not just artistic representations. And we also need real time performance. To accomplish this, we developed rendering software solution called "TOPS". TOPS is already deployed to field users, and now we accomplished real time performance by GPU cluster.Finally we are able to obtain "physically correct rendering based on measured data" with real time performance.  Back
 
Keywords:
Digital Manufacturing, Automotive, GTC 2014 - ID S4639
Streaming:
 
Rethinking the Skateboard Shoe
Safir Bellali (Vans, a VF Company)
This presentation will highlight how the iconic lifestyle and action sports brand is embracing virtual technology to redefine the way its products are designed, developed, and marketed while preserving the authenticity of its heritage. From the iconi ...Read More
This presentation will highlight how the iconic lifestyle and action sports brand is embracing virtual technology to redefine the way its products are designed, developed, and marketed while preserving the authenticity of its heritage. From the iconic Classic line of sneakers to the ground-breaking LXVI offering, see how the Innovation Team at Vans is leveraging digital tools initially developed for the automotive industry to meet the ever-evolving expectations of a new generation.   Back
 
Keywords:
Digital Manufacturing, Computer Aided Design, Digital Product Design & Styling, Rendering & Animation, GTC 2014 - ID S4658
Streaming:
Download:
 
Creative Artistry Driven by GPU Technology: A Look Inside Armstrong White
John Willette (Armstrong White)
From virtual garage for Daimler Chrysler in 2003 and building a reputation for converting CAD data into sophisticated CGI imagery for clients such as The Designory, Saatchi & Saatchi, BBDO, Campbell-Ewald, Team Detroit, Doner, JWT, Organic, McCan ...Read More
From virtual garage for Daimler Chrysler in 2003 and building a reputation for converting CAD data into sophisticated CGI imagery for clients such as The Designory, Saatchi & Saatchi, BBDO, Campbell-Ewald, Team Detroit, Doner, JWT, Organic, McCann Erickson and for manufacturing brands including Subaru, Mazda, Hyundai, Infiniti and Aston Martin. Armstrong White has pioneered the use of GPU computing to create visual innovations in automotive product marketing. The talk will focus on how GPU technologies provide a foundation that enable Armstrong-White's creative team to create automotive art that leads to automotive marketing materials that inspire product demand.  Back
 
Keywords:
Digital Manufacturing, Automotive, Manufacturing, Rendering & Animation, GTC 2014 - ID S4663
Streaming:
Download:
 
Supercharging Engineering Simulations at Mercury Marine with NVIDIA GPUs
Arden Anderson (Mercury Marine)
Mercury Marine will discuss their recent evaluation of NVIDIA GPU's for accelerating performance for Abaqus FEA. As part of the talk, Arden will highlight the critical metrics for the evaluation, and how they chose between having the GPU's at the l ...Read More
Mercury Marine will discuss their recent evaluation of NVIDIA GPU's for accelerating performance for Abaqus FEA. As part of the talk, Arden will highlight the critical metrics for the evaluation, and how they chose between having the GPU's at the local desktop or installed in the back room cluster. Arden will also discuss the business impact for the company from using a GPU-accelerated FEA implementation. Lastly, Arden will discuss what Mercury sees as future potential for leveraging GPU's as part of their design workflow.  Back
 
Keywords:
Digital Manufacturing, Clusters & GPU Management, Computational Fluid Dynamics, Computational Structural Mechanics, GTC 2014 - ID S4669
Streaming:
Download:
 
Immersive Design and Robotic Fabrication in the Trillion Dollar U.S. Construction Industry
Greg Howes (IDEAbuilder)
An explosion of data and accelerating demand for ubiquitous and real-time computation to model, design, construct and manage the built environment is driving innovation in architecture, engineering and construction. In this session we will explore c ...Read More
An explosion of data and accelerating demand for ubiquitous and real-time computation to model, design, construct and manage the built environment is driving innovation in architecture, engineering and construction. In this session we will explore case study projects featuring complex and high-performance buildings in a fully digitized process integrating immersive design and cam/cnc fabrication of building components and assemblies. We will also explore how new software and hardware enables multiple users to interact with this massive data sets and physical buildings using augmented reality, virtual reality and perceptual computing technologies.  Back
 
Keywords:
Digital Manufacturing, Virtual & Augmented Reality, Combined Simulation & Real-Time Visualization, Real-Time Graphics Applications, GTC 2014 - ID S4677
Streaming:
Download:
 
NVIDIA Driven Image Generation in Immersive Fast-Jet Simulators
William Paone (Boeing)
Flight Simulation Visual Systems realism quality in immersive systems is dependent on the NVIDIA GPU roadmap. Image Generator (IG) designs with scalable architecture driving these systems are made successful by fitting the newest NVIDIA releases at ...Read More
Flight Simulation Visual Systems realism quality in immersive systems is dependent on the NVIDIA GPU roadmap. Image Generator (IG) designs with scalable architecture driving these systems are made successful by fitting the newest NVIDIA releases at market launch. With known use-case bench-marks and effective system scaling, development and production can be successful. With fast jet simulations, absolute determinism is required due to the scale (number of rendering platforms) and number of video streams. Also adequate scene quality for both low level and high altitude flight must be provided. To do this, benchmarking needs to be accomplished for all scene components, to allocate proper margins in the GPU for each. This talk will provide an example scalable image generator design with NVDIA solutions with example images and flight . We will review the challenges and issues with traditional visual database to IG release paths and talk about newer technology that renders and tessellates directly from source.   Back
 
Keywords:
Digital Manufacturing, Combined Simulation & Real-Time Visualization, GTC 2014 - ID S4678
Streaming:
Download:
 
Photo-Realistic Real-Time Digital Mock-Up Design Review in a Five-Sided 4Kx4K Immersive Room
Andras Kemeny (Renault)
Renault had recently put in use a new CAVE(TM), a 5 sided virtual reality room with a combined resolution of 70 M pixels, distributed over sixteen 4K projectors and two 2K projector as well as an additional 3D HD collaborative power wall. Images of t ...Read More
Renault had recently put in use a new CAVE(TM), a 5 sided virtual reality room with a combined resolution of 70 M pixels, distributed over sixteen 4K projectors and two 2K projector as well as an additional 3D HD collaborative power wall. Images of the studied vehicle are displayed in real time thanks to a cluster of 20 HP Z800 computers with 24 Go RAM and 40 nVidia Quadro 6000 graphics boards. Renault's CAVE(TM) aims at answering needs of the various vehicle styling and engineering design steps. Starting from vehicle architecture through the subsequent design steps, ergonomic and perceived quality control to production, Renault has built up a list of use-cases and carried out already a number of major Digital Mockup Design Review (DMDR) validations in this CAVE for on-going vehicle projects since early 2013. The talk will discuss the use of the CAVE for digital manufacturing design review and its role in the automotive design process.  Back
 
Keywords:
Digital Manufacturing, Automotive, Collaborative & Large Resolution Displays, Digital Product Design & Styling, GTC 2014 - ID S4688
Streaming:
Download:
 
Increase Traffic and Revenues: Lightworks Iray + Photorealistic Interactive 3D-Online, In-Store POS Digital Configurators
Dave Coldron (Lightwork Design Ltd.)
Learn how to take your 3D Online & In-Store Point of Sale Digital Configuration experiences to new levels using Lightworks Iray+ Interactive Photorealistic Visualization to drive your next digital product campaign. We demonstrate how to free your ...Read More
Learn how to take your 3D Online & In-Store Point of Sale Digital Configuration experiences to new levels using Lightworks Iray+ Interactive Photorealistic Visualization to drive your next digital product campaign. We demonstrate how to free yourself from the constraints of image based configurators allowing true customization of model data, camera, material look and lighting leading to a great consumer experience, increased traffic and increased conversions. Drive your configurator directly from your 3D catalog, avoiding complex, costly and error prone image management, enabling faster product updates, reducing In-Store inventory and allowing a true connection between the consumer and the manufacturing and production process.  Back
 
Keywords:
Digital Manufacturing, Automotive, Ray Tracing, Rendering & Animation, GTC 2014 - ID S4689
Streaming:
 
Our Ride: Taking Design Offroad with the Carducci Dual Sport SC3 Adventure Motorcycle
Jim Carducci (Carducci Dual Sport LLC)
Unusual for custom motorcycle builders, before any metal is cut, Carducci Dual Sport utilizes CAD and computer graphics tools to design, analyze, and render new SC3 Adventure dual sport motorcycles. The tools minimize timely and costly redesigns and ...Read More
Unusual for custom motorcycle builders, before any metal is cut, Carducci Dual Sport utilizes CAD and computer graphics tools to design, analyze, and render new SC3 Adventure dual sport motorcycles. The tools minimize timely and costly redesigns and reworks by enabling design architecture trade-offs, structural analysis of stressed components for safety, CFD heat transfer analysis for reliability, and 3D renders of the full model to visualize the motorcycle. This talk is an overview of the SC3 Adventure motorcycle, the CAD/CG tools, and the design and development process used to create an innovative reproducible custom dual sport motorcycle.  Back
 
Keywords:
Digital Manufacturing, GTC 2014 - ID S4717
Streaming:
Download:
 
GPU Accelerated Physically Accurate Rendering in Autodesk Revit for Modern BIM Workflows
Mark Green (Oldcastle BuildingEnvelope(R)), Paul Arden (migenius)
Learn how Oldcastle BuildingEnvelope is revolutionising the relationship between designers and manufacturers with GPU accelerated physically accurate rendering integrated directly into the design process. Autodesk Revit is one of the most widely util ...Read More
Learn how Oldcastle BuildingEnvelope is revolutionising the relationship between designers and manufacturers with GPU accelerated physically accurate rendering integrated directly into the design process. Autodesk Revit is one of the most widely utilised tools for Architectural Design and Construction (AEC) and Building Information Management (BIM) in North America today, rather than forcing designers to change the way they are working, Oldcastle BuildingEnvelope has taken the unique step of developing its own solution running directly inside Revit and making this available to designers directly. Called BIM IQ, it uses the power of both GPU Cloud Computing and users local GPU resources to provide fully physically accurate rendering and energy analytics in an easy to use add-on. This solution allows designers to understand how their specific manufacturer provided material selections will affect their projects, in order to make informed design choices with predictable outcomes.  Back
 
Keywords:
Digital Manufacturing, Computer Aided Design, Ray Tracing, Rendering & Animation, GTC 2014 - ID S4720
Streaming:
Download:
 
The Display of ALL
Kobi Ben Tzvi (Mishor3D)
Heads-Up Displays (HUD), once a unique and expensive technology which was only available in the cockpits of multimillion-dollar airplanes are now finding their way into many passenger cars, giving the driver the ability to access visually displayed i ...Read More
Heads-Up Displays (HUD), once a unique and expensive technology which was only available in the cockpits of multimillion-dollar airplanes are now finding their way into many passenger cars, giving the driver the ability to access visually displayed information in closer proximity to forward scene events relative to conventional instrument panel displays. Contextual and augmented reality HMI (sometimes also referred to as contact analogue) is an HMI approach which intends to superimpose virtual marking, indications and any other information into the driver's actual vision of the real world and to provide an improved, most intuitive HMI to the driver. While AR HMI can be partially implemented with limited results in a 2D fashion (for example over a video screen), it is the use of a Heads-Up Display which provides the driver with a three-dimensional special impression needed for the best user experience. Come see how it is done.  Back
 
Keywords:
Digital Manufacturing, Automotive, In-Vehicle Infotainment (IVI) & Safety, GTC 2014 - ID S4742
Streaming:
 
Virtual Engine Assembly Training Utilizing zSpace 3D Stereo Displays
Jeff Fisher (National Institute for Aviation Research, Wichita State University)
This session will showcase the use of digital models of an aircraft engine developed in CATIA to create an interactive learning experience on the ZSpace platform. This includes a customized environment developed using 3DVIA Studio Pro to provide stu ...Read More
This session will showcase the use of digital models of an aircraft engine developed in CATIA to create an interactive learning experience on the ZSpace platform. This includes a customized environment developed using 3DVIA Studio Pro to provide students the ability to gain the necessary knowledge of how to assemble an aircraft engine.   Back
 
Keywords:
Digital Manufacturing, Combined Simulation & Real-Time Visualization, Real-Time Graphics Applications, GTC 2014 - ID S4843
Streaming:
 
Leveraging a Super Computer to Achieve Real Time Interaction for a Digital Peugeot Car with Full Global Illumination
Benoit Deschamps (PSA Peugeot Citroen), Alain Gonzalez (PSA Peugeot Citroen), Arnaud Renard (Reims Champagne Ardennes University), Michael Krajecki (Reims Champagne Ardennes University), Julien Berta (Mechdyne)
PSA Peugeot Citroen in partnership with Reims University, RTT and Barco will show a car model in Real Time Full GI using the Romeo supercomputer based in Reims, France equipped with 260 Tesla K20s. The car model will be loaded into RTT DeltaGen which ...Read More
PSA Peugeot Citroen in partnership with Reims University, RTT and Barco will show a car model in Real Time Full GI using the Romeo supercomputer based in Reims, France equipped with 260 Tesla K20s. The car model will be loaded into RTT DeltaGen which will be connected to Reims through a remote display to leverage the horse power of the GPU Cluster. Tapping into this power on demand allows Peugeot to achieve stunning photorealistic results in real time, as if the real vehicle was right in front of them. With this, design changes for materials and colors can be visualized instantly and decisions can be made faster.  Back
 
Keywords:
Digital Manufacturing, Automotive, Digital Product Design & Styling, Rendering & Animation, GTC 2014 - ID S4845
Streaming:
Download:
 
From Play to Presence (Presented by Unity)
Paul Tham (Unity Technologies)
Unity launched in 2005 as a 3D games authoring tool for Mac. 9 years later, enabled by the ubiquity of GPU technologies, with over 2.5 million registered developers and 350 million Unity Web Player installs, Unity powers thousands games on mobile dev ...Read More
Unity launched in 2005 as a 3D games authoring tool for Mac. 9 years later, enabled by the ubiquity of GPU technologies, with over 2.5 million registered developers and 350 million Unity Web Player installs, Unity powers thousands games on mobile devices, web browsers and desktop OSs. However Unity is not just about games. Unity is increasingly used by manufacturing companies to engage, educate and explain new products and visualize new developments. Porsche and Lego use Unity to power their online 3D configurators, NASA used Unity to educate the populace about the Mars rover, the US Army uses Unity to train for maintenance on the Apache helicopter. Seimens are using Unity to teach maintenance engineers how to maintain renewable energy solutions such as Wind farms; while research centers like Vienna University of Technology are using Unity to create advanced prosthesis. In the AEC industries, companies such as Arch Virtual are pioneering the successful use real time visualization in architecture for prestigious projects such as the $85 million Rutgers University School of Business development. This talk uses example industry case studies to illustrate Unity's journey from creating entertainment products to how designers and researchers are using Unity to create and innovate new product experiences.  Back
 
Keywords:
Digital Manufacturing, GTC 2014 - ID S4876
Streaming:
Education
Presentation
Media
Establishing a CUDA Research Center at Penn State: Perspectives on GPU-Enabled Teaching and Research
William Brouwer (The Pennsylvania State University)
The Research Computing and Cyberinfrastructure (RCC) Unit at The Pennsylvania State University (PSU) has a strong commitment to GPU enabled research, and is currently a CUDA Research Center. The main GPU cluster Lion-GA consists of Tesla M2070 and M2 ...Read More
The Research Computing and Cyberinfrastructure (RCC) Unit at The Pennsylvania State University (PSU) has a strong commitment to GPU enabled research, and is currently a CUDA Research Center. The main GPU cluster Lion-GA consists of Tesla M2070 and M2090 GPU cards, and newer devices including the K20 are available for interactive use. Lion-GA itself is capable of delivering above thirty Teraflops during peak usage, and delivered almost twenty GPU-years in 2012. This presentation will detail experiences in establishing GPU enriched teaching and research at Penn State, covering a broad range of topics including benchmarking, administration, and high level code development.   Back
 
Keywords:
Education, Clusters & GPU Management, GTC 2014 - ID S4298
Streaming:
Download:
 
Democratizing Parallel Computing, Democratizing Education: Teaching a MOOC About GPU Computing
David Luebke (NVIDIA), John Owens (University of California, Davis)
Modern graphics processing units, or GPUs, herald the democratization of parallel computing. Today's GPUs not only render video game frames, they also accelerate astrophysics, video transcoding, image processing, protein folding, seismic exploration ...Read More
Modern graphics processing units, or GPUs, herald the democratization of parallel computing. Today's GPUs not only render video game frames, they also accelerate astrophysics, video transcoding, image processing, protein folding, seismic exploration, computational finance, radio astronomy, heart surgery, self-driving cars - the list goes on and on. It is imperative that we teach students parallel computing: they will inherit a world in which there exists no other kind. Meanwhile, the world of education is being shaken up by massively online open courses, or MOOCs, that offer a democratization of education. Universities and companies suddenly offer high quality courses over the internet - for free! - to anybody in the world. John Owens (UC Davis) and David Luebke (NVIDIA) have been teaching a MOOC focused on GPU computing. The Udacity course has over 40,000 register students from over 130 countries. This session will present their experience and thoughts on GPUs, MOOCs, and parallel computing education.  Back
 
Keywords:
Education, Programming Languages & Compilers, GTC 2014 - ID S4705
Streaming:
 
Teaching Parallel Programming with CUDA
Mark Ebersole (NVIDIA)
Learn how to use the CUDA computing platform as a tool to teach a wide array of parallel programming concepts. Examples will be given which demonstrate onboarding introductory students, to using the available tools to dive deep into complex parallel ...Read More
Learn how to use the CUDA computing platform as a tool to teach a wide array of parallel programming concepts. Examples will be given which demonstrate onboarding introductory students, to using the available tools to dive deep into complex parallel programming concepts. We'll also look at educators already using the CUDA platform the results they've attained. The world is parallel, and it's imperative we prepare students for the future. Recognizing this, the ACM and IEEE CS2013 curriculum guidelines second most important Key Area added was specifically about Parallel and Distributed Computing.  Back
 
Keywords:
Education, GTC 2014 - ID S4937
Streaming:
Energy Exploration
Presentation
Media
High Frequency Elastic Seismic Modeling on GPUs Without Domain Decomposition
Thor Johnsen (Chevron)
What if you want to do FDTD modeling on a dataset that cannot possibly fit into GPU memory? This session explores design patterns that take advantage of two levels in the GPU memory hierarchy that is often overlooked, host memory and disk, thereby gr ...Read More
What if you want to do FDTD modeling on a dataset that cannot possibly fit into GPU memory? This session explores design patterns that take advantage of two levels in the GPU memory hierarchy that is often overlooked, host memory and disk, thereby greatly expanding the size of problem that can be handled. Two seismic modeling kernels were implemented, acoustic TTI with variable density and elastic triclinic. We show that these GPU kernels can handle extremely large datasets without domain decomposition (10's of billions of cells) while also taking full advantage of the computational throughput of 16 Kepler GPUs, achieving 20-30x better throughput than highly optimized CPU code running on a dual socket Sandy Bridge server. We also show that this design pattern can be applied to other numerical methods that have a concept of timestepping and exhibit good spatial locality, such as Lattice Boltzmann methods for fluid flow modeling.  Back
 
Keywords:
Energy Exploration, Numerical Algorithms & Libraries, GTC 2014 - ID S4145
Streaming:
Download:
 
Multi-Block GPU Implementation of a Stokes Equations Solver for Absolute Permeability Computation
Nicolas Combaret (FEI Visualization Sciences Group)
The goal of this session is to show a multi-block implementation of a Stokes equations solver in Avizo(R) Fire for absolute permeability computation. Challenges to compute such a complex property in a general purpose software application will be firs ...Read More
The goal of this session is to show a multi-block implementation of a Stokes equations solver in Avizo(R) Fire for absolute permeability computation. Challenges to compute such a complex property in a general purpose software application will be first defined to explain the basis of this work. A Stokes equations solver will be presented, which was developed to the GPGPU computing. Details about the multi-block approach which allows dealing with large datasets in acceptable time on one GPU will be given. Examples and metrics of performance will be finally shown before emphasizing the future perspectives.  Back
 
Keywords:
Energy Exploration, Computational Fluid Dynamics, Computational Physics, GTC 2014 - ID S4209
Streaming:
Download:
 
High Performance Numerical Algorithms for Seismic and Reservoir Simulations
Hatem Ltaief (KAUST), Rio Yokota (KAUST)
Learn how to leverage current numerical algorithms for solving challenging reservoir and seismic simulation problems on GPUs using: 1) a novel preconditioner technique based on massively parallel, compute intensive Fast N-body methods, 2) an optimize ...Read More
Learn how to leverage current numerical algorithms for solving challenging reservoir and seismic simulation problems on GPUs using: 1) a novel preconditioner technique based on massively parallel, compute intensive Fast N-body methods, 2) an optimized implementation of the Sparse Matrix-Vector multiplication used during the iterative solver phase, which exploits the existing structure of the sparse matrix and 3) a synchronization-reducing algorithm for stencil-based computation during explicit time integration.   Back
 
Keywords:
Energy Exploration, Numerical Algorithms & Libraries, GTC 2014 - ID S4287
Streaming:
Download:
 
Simulating Generation, Retention and Expulsion of Hydrocarbons on GPUs
Massimo Bernaschi (National Research Council of Italy)
Learn how to use GPUs as batch processors to simulate thousands of independent systems having a complex dynamics but relatively limited computing requirements. By using an apparently naive approach with a single CUDA thread simulating an entire syste ...Read More
Learn how to use GPUs as batch processors to simulate thousands of independent systems having a complex dynamics but relatively limited computing requirements. By using an apparently naive approach with a single CUDA thread simulating an entire system, it is possible to obtain excellent global performances and minimize, at the same time, the differences in the results with respect to the original, serial, implementation of the same application. Crucial for the success of the porting is a proper choice of the data structures that need to be designed so that the global memory of the GPU can be accessed effectively even if threads work on distinct problems. The application we present simulates products of primary migration and the expulsion of hydrocarbons from source rock but the idea can be applied to other fields. The final result in our case is a highly scalable code that runs transparently on multiple GPUs and that can be more easily updated when the underlining model changes.  Back
 
Keywords:
Energy Exploration, Numerical Algorithms & Libraries, Computational Physics, GTC 2014 - ID S4316
Streaming:
Download:
 
Enhanced Oil Recovery Simulation Performances on New Hybrid Architectures
Thomas Guignon (IFP Energies Nouvelles)
The goal of this session is to show that GPU linear solvers with highly parallel preconditioners can tackle with most advanced ones (CPR-AMG) using classical MPI based programming model in the context of reservoir simulation. ...Read More
The goal of this session is to show that GPU linear solvers with highly parallel preconditioners can tackle with most advanced ones (CPR-AMG) using classical MPI based programming model in the context of reservoir simulation.  Back
 
Keywords:
Energy Exploration, Numerical Algorithms & Libraries, GTC 2014 - ID S4373
Streaming:
 
Porting CPU-Based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing
Steve Jankly (Halliburton / Pinnacle)
This talk describes our endeavors, from start to finish, in implementing a parallelizable and computationally intensive process on a GPU for fiber optic solutions, specifically Distributed Acoustic Sensing (DAS) interrogation systems. Applications fo ...Read More
This talk describes our endeavors, from start to finish, in implementing a parallelizable and computationally intensive process on a GPU for fiber optic solutions, specifically Distributed Acoustic Sensing (DAS) interrogation systems. Applications for DAS vary, and include stimulation and production monitoring, verification of downhole equipment operation, pipeline monitoring and collection of seismic imaging data. These systems can produce up to a few gigabytes per second of data which needs to be processed in real-time. We have previously utilized embedded processors, but the need for faster computation ability arose with the next-generation system, due to the increased amounts of data and more complex data processing. We will discuss the process we undertook in porting the parallelized CPU version of the algorithms to NVIDIA GPUs, utilizing CUDA C. We also explore the various GPUs tested, and provide performance metrics.  Back
 
Keywords:
Energy Exploration, Computational Physics, GTC 2014 - ID S4470
Streaming:
Download:
 
Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migration
Max Grossman (Repsol)
This talk explores a computationally demanding seismic imaging algorithm: Kirchhoff Migration. In particular, we focus on memory and bandwidth management. Supporting large data sets via efficient bandwidth and memory utilization is a must given the r ...Read More
This talk explores a computationally demanding seismic imaging algorithm: Kirchhoff Migration. In particular, we focus on memory and bandwidth management. Supporting large data sets via efficient bandwidth and memory utilization is a must given the richness of current data acquisition systems. This work builds on past work presented at GTC2013 on adaptive scheduling in distributed and GPU-centric platforms. Our implementation has been extended to support larger granularity tasks and larger data sets. This enables more effective utilization of network bandwidth but requires the implementation of GPU "virtual memory", as data usually exceed device capacity. The discussion will cover both stages of Kirchhoff Migration: travel time calculation and seismic data migration. Each stage presents unique challenges to effective memory utilization. This talk will include in-depth analysis of performance, scheduling, and bandwidth metrics generated by the target application under real world workloads.   Back
 
Keywords:
Energy Exploration, Computational Physics, Supercomputing, GTC 2014 - ID S4539
Streaming:
Download:
 
An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling
Robin Weiss (The University of Chicago)
In this session we will describe our experience porting finite-difference time-domain (FDTD) algorithms for solving 3D anisotropic elastic wave equations to GPU, and extending the implementation to support clusters of GPU-equipped compute nodes. Thes ...Read More
In this session we will describe our experience porting finite-difference time-domain (FDTD) algorithms for solving 3D anisotropic elastic wave equations to GPU, and extending the implementation to support clusters of GPU-equipped compute nodes. These implementations have been integrated with the open-source Madagascar seismic processing software package to allow for accelerated computation of 3D anisotropic elastic wave models. In our work we adopt a straightforward porting strategy that leads to a transparent yet high-performance implementation suitable for mid-sized computational grids. The approach is based on a stress-stiffness formulation on a non-staggered grid and achieves significant speedup compared to a parallel CPU-based implementation allowing for computation of seismic data at lower hardware cost and in less time than was previously possible. We also report details of our implementation strategy as well as performance evaluations in varied heterogeneous compute environments with a number of different GPU architectures.  Back
 
Keywords:
Energy Exploration, Numerical Algorithms & Libraries, Computational Physics, Scientific Visualization, GTC 2014 - ID S4599
Streaming:
Download:
 
Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging
Chris Leader (Stanford)
Discover how we can harness the power of multiple GPUs to explore the Earth with seismic data. A wave equation based inversion process is used to turn these data into a high-fidelity image, however for contemporary datasets this requires around 1018 ...Read More
Discover how we can harness the power of multiple GPUs to explore the Earth with seismic data. A wave equation based inversion process is used to turn these data into a high-fidelity image, however for contemporary datasets this requires around 1018 operations, if not more. GPUs can ease this computational bottleneck but they create two further limiting factors: exacerbated disk accesses and global memory limitations. These can be addressed by manipulating the domain boundaries and by decomposing our problem across multiple GPUs. We will show you how we can create detailed seismic images without these traditional restrictions  Back
 
Keywords:
Energy Exploration, Numerical Algorithms & Libraries, Computational Physics, Scientific Visualization, GTC 2014 - ID S4632
Streaming:
Download:
 
Accelerating Reverse Time Migration on GPUs: A Dataflow Approach
Hicham Lahlou (Xcelerit)
Learn how to map Reverse Time Migration (RTM) applications to a dataflow model for high performance execution on GPUs and multi-core CPUs with improved user developer experience. As Oil & Gas exploration is pushed towards more complex geologies, ...Read More
Learn how to map Reverse Time Migration (RTM) applications to a dataflow model for high performance execution on GPUs and multi-core CPUs with improved user developer experience. As Oil & Gas exploration is pushed towards more complex geologies, RTM has become the de-facto standard algorithm to construct images of the Earth's subsurface from seismic wave data. GPUs allow to cope with the enormous computational complexity involved. This talk shows how RTM algorithms can be modeled and implemented as dataflow graphs. The benefits of using this model for high performance execution are detailed, e.g., the exposed levels of parallelism, memory locality, and optimization opportunities. The application code is portable between different hardware and the execution can be managed automatically, improving user experience. We use a practical example to demonstrate the performance that can be achieved.  Back
 
Keywords:
Energy Exploration, GTC 2014 - ID S4647
Streaming:
 
Large Scale Reservoir Simulation Utilizing Multiple GPUs
Garfield Bowen (Ridgeway Kite Software)
Reservoir simulation has a long history as a tool used by reservoir engineers to plan and optimize (oil & gas) field developments. These simulations are inevitably 3-dimensional and transient and hence require considerable computing resources. Tr ...Read More
Reservoir simulation has a long history as a tool used by reservoir engineers to plan and optimize (oil & gas) field developments. These simulations are inevitably 3-dimensional and transient and hence require considerable computing resources. Traditional simulators are typically constrained by the bandwidth to memory. The GPU architecture allows access to greater bandwidth, once the simulator is parallel. However, the memory constraints on a GPU, limit the problem size that can be tackled. In this presentation we describe a paradigm where we utilize a single GPU if the problem will fit into the memory and simply scale to multiple GPUs as the memory requirement grows. The practicality is demonstrated by running a 32 million cell case on 32 Tesla GPUs.   Back
 
Keywords:
Energy Exploration, Computational Fluid Dynamics, Computer Aided Design, GTC 2014 - ID S4727
Streaming:
Download:
Finance
Presentation
Media
Pricing American Options with Least Square Monte Carlo simulations on GPUs
Massimiliano Fatica (NVIDIA)
This talk will present a CUDA implementation of the Least Square Monte Carlo method by Longstaff and Schwartz to price American options on GPUs. We will examine all the details of the implementation, from the random number and paths generations to th ...Read More
This talk will present a CUDA implementation of the Least Square Monte Carlo method by Longstaff and Schwartz to price American options on GPUs. We will examine all the details of the implementation, from the random number and paths generations to the Least Square estimation of the continuation value. The implementation can price a put option with 200,000 paths and 50 time steps in less than 10 ms on a Tesla K20X.  Back
 
Keywords:
Finance, Numerical Algorithms & Libraries, GTC 2014 - ID S4154
Streaming:
Download:
 
GPUs in Quantitative Asset Management
Daniel Egloff (Incube Advisory and QuantAlea)
Modern portfolio theory, initially developed by Harry Markowitz, has been used in the industry for several decades to construct optimal portfolios, which properly balance risk and return. In recent years more refined quantitative methods have been de ...Read More
Modern portfolio theory, initially developed by Harry Markowitz, has been used in the industry for several decades to construct optimal portfolios, which properly balance risk and return. In recent years more refined quantitative methods have been developed to improve asset allocations and create optimal portfolios in a more stable and robust manner. We will discuss some of these new ideas and explain where large-scale numerical problems appear and how they can be solved with special algorithms on GPUs. You will learn how GPUs can help to consistently blend historical data and expert views in order to obtain more robust and realistic inputs for portfolio optimization, either with Bayesian techniques or with the minimum discrimination information principle, and how back-testing can be brought to a new level of sophistication.   Back
 
Keywords:
Finance, GTC 2014 - ID S4175
Streaming:
Download:
 
Effortless GPU Models for Finance
Ben Young (SunGard)
Learn how SunGard provides support for GPUs, such that both SunGard engineers, and quantitative developers at our clients have to make only trivial code changes to exploit both the CPU and GPU to full effect. ...Read More
Learn how SunGard provides support for GPUs, such that both SunGard engineers, and quantitative developers at our clients have to make only trivial code changes to exploit both the CPU and GPU to full effect.   Back
 
Keywords:
Finance, GTC 2014 - ID S4199
Streaming:
Download:
 
GPU Implementation of Explicit and Implicit Finite Difference Methods in Finance
Mike Giles (University of Oxford)
This talk will explain how to achieve excellent performance with GPU implementations of standard explicit and implicit finite difference methods in computational finance. Implicit methods are much harder to implement efficiently, but the task is mad ...Read More
This talk will explain how to achieve excellent performance with GPU implementations of standard explicit and implicit finite difference methods in computational finance. Implicit methods are much harder to implement efficiently, but the task is made easier through the development of library software for the solution of multiple tridiagonal systems in parallel. The implementation strategies depend on the size and dimensionality of the problems being solved. 1D problems can be solved within one SMX unit of a GPU, 2D problems usually require more than one SMX, and 3D / 4D problems require the entire GPU for their solution. Computational performance results will be given for Kepler GPUs, and the talk will also discuss whether single precision arithmetic provides sufficient accuracy.  Back
 
Keywords:
Finance, GTC 2014 - ID S4227
Streaming:
 
Heavy Parallelization of Alternating Direction Schemes in Multi-Factor Option Valuation Models
Cris Doloc (Quantras Research Ltd.)
Learn how to use the latest GPU technology to substantially improve the performance of numerical implementations for Alternating Direction schemes. These numerical methods are used in pricing problems associated with high-dimensional PDEs where the u ...Read More
Learn how to use the latest GPU technology to substantially improve the performance of numerical implementations for Alternating Direction schemes. These numerical methods are used in pricing problems associated with high-dimensional PDEs where the use of more common Finite Difference techniques, like Crank-Nicholson are very challenging and inefficient. The Alternating Direction schemes both Implicit and Explicit, are unconditionally stable and very efficient second-order methods in both space and time variables. The suggested GPU implementation of the heavy parallelized Alternating Direction scheme provides a significant increase in performance over the CPUs when dealing with multi-factor exotic derivatives like barrier or rainbow options. The goal of this session is to offer an interesting insight into how technology savvy Trading firms could use the latest GPU architecture to improve the efficiency of real-time risk control while reducing the costs associated with their technology infrastructure.   Back
 
Keywords:
Finance, GTC 2014 - ID S4291
Streaming:
Download:
 
Fast and Easy GPU Offloading for Computational Finance
Lukasz Mendakiewicz (Microsoft Corp)
This session provides insight on how to obtain superior performance for computational finance workloads without compromising developer productivity. C++ AMP technology lets you write C++ STL like code that runs on GPUs (and CPUs) in a platform (Windo ...Read More
This session provides insight on how to obtain superior performance for computational finance workloads without compromising developer productivity. C++ AMP technology lets you write C++ STL like code that runs on GPUs (and CPUs) in a platform (Windows and Linux) and vendor agnostic manner. The session will start with an overview of C++ AMP, dive into C++ AMP features, list various compilers that support C++ AMP and showcase the performance characteristics of options pricing workloads written using C++ AMP code. Attend this talk to see how you can write productive and easy to maintain code that offers superior performance. Thereby delivering the ability to write productivity code once and exploit the hardware to its fullest.  Back
 
Keywords:
Finance, Big Data Analytics & Data Algorithms, Programming Languages & Compilers, GTC 2014 - ID S4331
Streaming:
Download:
 
Monte Carlo Calibration to Implied Volatility Surface: A New Computational Paradigm
Chuan-Hsiang Han (National Tsing-Hua University)
This presentation offers a new possibility that Monte Carlo simulation is capable of fast solving the calibration problem of implied volatility surfaces. Dimension separation and standard error reduction constitute the two-stage procedure. The first ...Read More
This presentation offers a new possibility that Monte Carlo simulation is capable of fast solving the calibration problem of implied volatility surfaces. Dimension separation and standard error reduction constitute the two-stage procedure. The first stage aims to reduce dimensionality of the solving optimization problem by utilizing the Fourier transform representation of the volatility dynamics. The second stage provides a high performance computing paradigm for option pricing by standard error reduction. GPU a parallel accelerating device drastically increases the total number of simulations in addition to variance reduction algorithms. In virtue of its flexibility, this two-stage Monte Carlo method is applied to estimate various volatility models such as hybrid models and multiscale stochastic volatility models.  Back
 
Keywords:
Finance, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4360
Streaming:
 
Hybridizer: Develop in Dot Net - Debug and Execute on GPU
Florent Duguet (Altimesh)
GPU computing performance and capabilities have improved at an unprecedented pace. CUDA dramatically reduced the learning curve to GPU usage for general purpose computing. The Hybridizer takes a step further in enabling GPUs in other development ecos ...Read More
GPU computing performance and capabilities have improved at an unprecedented pace. CUDA dramatically reduced the learning curve to GPU usage for general purpose computing. The Hybridizer takes a step further in enabling GPUs in other development ecosystems (C#, java, dot net) and execution platforms (Linux, Windows, Excel). Transforming dot net binaries into CUDA source code, the Hybridizer is your in house GPU guru. With a growing number of features including virtual functions, generics and more, the Hybridizer also offers number of coding features for the multi- and many-core architectures, while making use of advanced optimization features like AVX and ILP.  Back
 
Keywords:
Finance, Programming Languages & Compilers, GTC 2014 - ID S4376
Streaming:
Download:
 
Incremental Risk Charge With cuFFT: A Case Study of Enabling Multi Dimensional Gain with Few GPUs
Amit Kalele (Tata Consultancy Services Limited), Manoj Nambiar (Tata Consultancy Services Limited)
GPUs are well suited for massively parallel problems but many a times users have a dilemma of adoption due to limited memory bandwidth between host and device. The problem of Incremental Risk Charge calculation was posed to us by one of our customer. ...Read More
GPUs are well suited for massively parallel problems but many a times users have a dilemma of adoption due to limited memory bandwidth between host and device. The problem of Incremental Risk Charge calculation was posed to us by one of our customer. This proof of concept demonstrates that GPUs with cuFFT library and multi-stream computations not only enables speedy performance but also achieves substantial reduction in hardware footprint and energy consumption. These gains cannot be overlooked by any business unit. This study is also helpful in taking an informed decision for choosing the right technology for business use.  Back
 
Keywords:
Finance, Numerical Algorithms & Libraries, GTC 2014 - ID S4407
Streaming:
 
GPU Computing in .NET for Financial Risk Analytics
Ryan Deering (Chatham Financial)
Learn how a rapidly growing mid-sized financial company incorporated GPU computing into its quantitative finance models. Our quantitative development team faced two major obstacles in adopting GPU computing. The first obstacle is the large cost of s ...Read More
Learn how a rapidly growing mid-sized financial company incorporated GPU computing into its quantitative finance models. Our quantitative development team faced two major obstacles in adopting GPU computing. The first obstacle is the large cost of switching away from our mature .NET development process. The other obstacle arises from the difficulty of synchronizing a slow hardware purchasing cycle with a fast software delivery cycle. We addressed these concerns by creating a hybrid linear algebra library in .NET that dynamically switches to GPU computing when CUDA hardware is available. This library allows our developers to code in .NET and focus on the mathematical and financial models without worrying about CUDA syntax. In this session we will describe how we built the library in .NET using CUBLAS, CURAND, and CUDA Runtime libraries. We will also show the performance gains from switching to GPU computing in pricing Bermudan swaptions using the Libor Market Model.  Back
 
Keywords:
Finance, GTC 2014 - ID S4451
Streaming:
Download:
 
An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management
Yigal Jhirad (Cohen & Steers), Blay Tarnoff (Cohen & Steers)
This session discusses the convergence of parallel processing and big data in finance as the next step in evolution of risk management and trading systems. We advocate a risk management approach in finance should evolve from more traditional inter-d ...Read More
This session discusses the convergence of parallel processing and big data in finance as the next step in evolution of risk management and trading systems. We advocate a risk management approach in finance should evolve from more traditional inter-day top down metrics to intra-day bottom up approach using signal generation and pattern recognition. We have also determined that parallel processing is a key tool to absorb greater insights into market patterns providing "trading DNA" and more effective tools to manage risk in real time.   Back
 
Keywords:
Finance, Big Data Analytics & Data Algorithms, Numerical Algorithms & Libraries, GTC 2014 - ID S4536
Streaming:
Download:
 
Accelerating Option Risk Analytics in R Using GPUs
Matthew Dixon (University of San Francisco)
Learn how to combine the convenience of the R Statistical Software Package with the computational resources provided by GPUs to accelerate computationally intensive financial computations exhibiting high degrees of parallelism. In this talk, we descr ...Read More
Learn how to combine the convenience of the R Statistical Software Package with the computational resources provided by GPUs to accelerate computationally intensive financial computations exhibiting high degrees of parallelism. In this talk, we describe ongoing work towards the development of a R library providing GPU optimized computationally intensive kernels frequently appearing in option risk analytics applications. Such kernels are bottlenecks in a work-flow which is often highly dependent on a rich set of numerical and statistical functionality native to R. This functionality may be difficult to replicate outside of R. We demonstration the utility of our approach to the intra-day calibration of the Bates stochastic volatility jump-diffusion models, often used for risk analysis of equity derivatives. The combined performance gain from rewriting the error function in C++ and deploying the computations on a NVIDIA Tesla K20c (Kepler architecture) is approximately 760x. Detailed results will be presented during the talk.   Back
 
Keywords:
Finance, Numerical Algorithms & Libraries, GTC 2014 - ID S4557
Streaming:
Download:
 
The Esther Solution for XVA Mega-Models: An In-Memory Architecture Built Around the K10 As An Algebraic Engine
Claudio Albanese (Global Valuation Ltd)
Mega-models denoted by three-letter acronyms such as CVA/FVA/DVA (collectively XVA) have sprouted on the wave of the banking reform and represent a major challenge to the traditional cluster computing paradigm of parallelism. The Esther architectur ...Read More
Mega-models denoted by three-letter acronyms such as CVA/FVA/DVA (collectively XVA) have sprouted on the wave of the banking reform and represent a major challenge to the traditional cluster computing paradigm of parallelism. The Esther architecture is an innovative solution breaking new ground in this space. Esther is the first in-memory risk analytics engine running on large memory servers. It is based on new Mathematics built from the ground up with the objective of capturing bottlenecks in matrix multiplication logic handled by K10 multi-GPU engines. It achieves unparalleled levels of performance on standard XVA metrics and grants access to new classes of hard XVA metrics for massive portfolios. Applications include interactive pre-trade XVA analytics, capital simulation for balance-sheet optimization and waterfall modelling at Clearing Houses.   Back
 
Keywords:
Finance, Supercomputing, GTC 2014 - ID S4777
Streaming:
 
Monte-Carlo Simulation of American Options with GPUs
Julien Demouth (NVIDIA)
In that session we will present our work on the computation of the Greeks of multi-asset American options. We will describe our implementation of the Longstaff-Schwartz algorithm and explain the programming techniques used to obtain a very efficient ...Read More
In that session we will present our work on the computation of the Greeks of multi-asset American options. We will describe our implementation of the Longstaff-Schwartz algorithm and explain the programming techniques used to obtain a very efficient code for the Andersen-QE path discretization. This solution was developed in collaboration with IBM and STAC and is used to calculate the Greeks in real-time on a single workstation with Tesla GPUs.  Back
 
Keywords:
Finance, GTC 2014 - ID S4784
Streaming:
Download:
Game Development
Presentation
Media
NVIDIA VisualFX SDK: Enabling Cinematic Effects in Games
Monier Maher (NVIDIA), Nathan Reed (NVIDIA), Simon Green (NVIDIA), Tae-Yong Kim (NVIDIA)
The NVIDIA VisualFx SDK provides game developers a turnkey solution to enable cinematic effects like interactive fire and smoke, fur, waves , global illumination and more in games. All these complex, realistic effects are provided in an easy-to-use S ...Read More
The NVIDIA VisualFx SDK provides game developers a turnkey solution to enable cinematic effects like interactive fire and smoke, fur, waves , global illumination and more in games. All these complex, realistic effects are provided in an easy-to-use SDK to facilitate the integration and tuning in any given game engine. In this session we will provide an overview of the different VisualFX SDK modules, the roadmap and some case studies on how they were successfully used.  Back
 
Keywords:
Game Development, Rendering & Animation, Visual Effects & Simulation, GTC 2014 - ID S4618
Streaming:
Download:
 
Bringing Digital Fur to Computer Games
Tae-Yong Kim (NVIDIA)
Fur rendering is one of the most important, but computationally expensive tasks in digitally creating animal creatures in films and games. We explain how features of recent GPUs can be used to create visually realistic rendering and simulation of fu ...Read More
Fur rendering is one of the most important, but computationally expensive tasks in digitally creating animal creatures in films and games. We explain how features of recent GPUs can be used to create visually realistic rendering and simulation of fur and hairs. Our fur technology consists of 1) authoring pipeline to prepare hair assets in artist friendly tools 2) simulation engine to move hairs on skinned, animated characters 3) rendering and tessellation engine that creates millions of hair primitives on the fly all inside GPUs. We also share real-world challenges we faced in integrating the fur module into highly anticipated upcoming games such as Witcher 3, Call of Duty - Ghosts.   Back
 
Keywords:
Game Development, Combined Simulation & Real-Time Visualization, Real-Time Graphics Applications, Visual Effects & Simulation, GTC 2014 - ID S4179
Streaming:
 
Smoke & Mirrors: Advanced Volumetric Effects for Games
Nuttapong Chentanez (NVIDIA), Simon Green (NVIDIA)
Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big differences bet ...Read More
Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big differences between real-time and offline visual effects. In this talk we will show how volumetric effects are now practical on current GPU hardware. We will describe several new simulation and rendering techniques, including new solvers, combustion models, optimized ray marching and shadows, which together can make volumetric effects a practical alternative to particle-based methods for game effects.  Back
 
Keywords:
Game Development, Rendering & Animation, Visual Effects & Simulation, GTC 2014 - ID S4607
Streaming:
 
DirectX 11 Rendering and NVIDIA GameWorks in Batman: Arkham Origins
Colin Barre-Brisebois (WB Games Montreal)
This talk presents several rendering techniques behind Batman: Arkham Origins (BAO), the third installment in the critically-acclaimed Batman: Arkham series. This talk focuses on several DirectX 11 features developed in collaboration with NVIDIA spec ...Read More
This talk presents several rendering techniques behind Batman: Arkham Origins (BAO), the third installment in the critically-acclaimed Batman: Arkham series. This talk focuses on several DirectX 11 features developed in collaboration with NVIDIA specifically for the high-end PC enthusiast. Features such as tessellation and how it significantly improves the visuals behind Batman's iconic cape and brings our deformable snow technique from the consoles to the next level on PC will be presented. Features such as physically-based particles with PhysX, particle fields with Turbulence, improved shadows, temporally stable dynamic ambient occlusion, bokeh depth-of-field and improved anti-aliasing will also be presented. Additionally, other improvements to image quality, visual fidelity and compression will be showcased, such as improved detail normal mapping via Reoriented Normal Mapping and how Chroma Subsampling at various stages of our lighting pipeline was essential in doubling the size of our open world and still fit on a single DVD.  Back
 
Keywords:
Game Development, Real-Time Graphics Applications, Rendering & Animation, Visual Effects & Simulation, GTC 2014 - ID S4614
Streaming:
Download:
Graphics Virtualization
Presentation
Media
Intro to Virtualization 101
Jared Cowart (NVIDIA), Luke Wignall (NVIDIA)
This session introduces the audience to the concepts of server, desktop, and application virtualization. The audience will learn about the key concepts and technologies used in virtualization as a foundation on how NVIDIA is leading the way in adding ...Read More
This session introduces the audience to the concepts of server, desktop, and application virtualization. The audience will learn about the key concepts and technologies used in virtualization as a foundation on how NVIDIA is leading the way in adding a key ingredient, a superior end user experience. Jared and Luke will discuss the business reasons that make virtualization such a powerful answer, the important considerations before moving into virtualization, and what typical environments look like with live demos.  Back
 
Keywords:
Graphics Virtualization, Desktop & Application Virtualization, GTC 2014 - ID S4726
Streaming:
Download:
 
Virtual is Better than Physical - Delivering a Delightful User Experience from a Virtual Desktop
Kenneth Fingerlos (Lewan Technology)
Desktop Virtualization has been around for years with a large number of very good reasons to deploy it. However in the effort to control costs and deliver desktops over poor connections and skinny pipes IT Admins have often resorted to delivering sub ...Read More
Desktop Virtualization has been around for years with a large number of very good reasons to deploy it. However in the effort to control costs and deliver desktops over poor connections and skinny pipes IT Admins have often resorted to delivering sub-par user experiences. This session focuses on technologies which allow delivery of stunning, responsive, rich user experiences from virtual desktops without breaking the bank. With a focus on user experience this session delves into IO, graphics, memory, CPU, and how to get the most smiles for your dollar. This session includes specific discussion of GPU Virtualization, IO optimization, and FLASH storage for Virtual Desktop Environments. Including configurations for Citrix XenApp, XenDesktop, and VMware View.  Back
 
Keywords:
Graphics Virtualization, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4783
Streaming:
Download:
 
Smackdown GPU Optimized VDI Solutions: 2014 Edition
Ruben Spruijt (PQR)
Get up to speed with GPU optimized VDI solutions. More and more customers see the benefits of server-hosted Desktop Virtualization solutions such as VDI. There are several important players in this market space, and from a marketing perspective these ...Read More
Get up to speed with GPU optimized VDI solutions. More and more customers see the benefits of server-hosted Desktop Virtualization solutions such as VDI. There are several important players in this market space, and from a marketing perspective these solutions have a lot in common. This presentation is based on industry analysis and customer cases and covers the Good/Bad/Ugly of VDI, the various use-case scenario's of GPU in Desktop Virtualization, the technical differences between Microsoft, Citrix and VMware VDI solutions and their approach to deliver Graphical,- resource intensive applications to end-users and finally tips on how to choose the right solution. Join Ruben Spruijt (MVP, CTP and vExpert - CTO @ PQR) as he provides opinions and great argumentation on such topics.   Back
 
Keywords:
Graphics Virtualization, Desktop & Application Virtualization, GTC 2014 - ID S4118
Streaming:
 
Cloud Gaming & Application Delivery with NVIDIA GRID Technologies
Franck Diard (NVIDIA)
This session presents the technologies behind NVIDIA GRID(TM) and the future of game engines and application delivery running in the cloud. The audience will learn about the key components of NVIDIA GRID, like optimal capture, efficient compression, ...Read More
This session presents the technologies behind NVIDIA GRID(TM) and the future of game engines and application delivery running in the cloud. The audience will learn about the key components of NVIDIA GRID, like optimal capture, efficient compression, fast streaming, and low latency rendering that make cloud gaming and application delivery possible. Franck will demonstrate how these components fit together, how to use the GRID APIs, and how to optimize their usage to deliver an ultimate experience, with live demos.  Back
 
Keywords:
Graphics Virtualization, Game Development, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4159
Streaming:
Download:
 
Reality Check: GRID-Accelerated High-End Graphics Performance in Virtual Desktops
Bernhard Tritsch (Bluecue Consulting), Shawn Bass (Syn-Net)
How good are today's virtual desktop remoting protocols when combined with NVIDIA GRID cards? Virtualization experts Benny Tritsch and Shawn Bass developed a unique, vendor-independent test methodology allowing them to visually compare Microsoft Rem ...Read More
How good are today's virtual desktop remoting protocols when combined with NVIDIA GRID cards? Virtualization experts Benny Tritsch and Shawn Bass developed a unique, vendor-independent test methodology allowing them to visually compare Microsoft RemoteFX, VMware/Teradici PCoIP and Citrix HDX head-to-head under different network conditions. Join them in their session where they walk you through the results of their NVIDIA-accelerated tests. See the difference between shared and dedicated GPUs in virtual desktops running on different popular virtualization platforms.  Back
 
Keywords:
Graphics Virtualization, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4251
Streaming:
 
How to Virtualize 3D Workstations? High-End Graphics for VDI in 10 Easy Steps
Mayunk Jain (Citrix Systems), Praveen Prakash (Citrix Systems)
In this session, people with diverse technical backgrounds can learn what it takes to embark on a project with GPU-enabled virtual desktops and applications. We will talk about what components are needed, how they interact, where to learn more, and p ...Read More
In this session, people with diverse technical backgrounds can learn what it takes to embark on a project with GPU-enabled virtual desktops and applications. We will talk about what components are needed, how they interact, where to learn more, and pick-up optimization best practices. You will gain the knowledge to plan your own roadmap for adopting this fantastic technology.  Back
 
Keywords:
Graphics Virtualization, Computer Aided Design, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4278
Streaming:
 
Move Your Showroom into the Cloud Using GRID VCA
Stefan Schoenefeld (NVIDIA)
This talk will give an overview how GRID VCA and how it can be used outside a standard Virtual Desktop Infrastructure. Learn how GRID VCA can replace the workstation-based infrastructure in a showroom without major changes to the existing software so ...Read More
This talk will give an overview how GRID VCA and how it can be used outside a standard Virtual Desktop Infrastructure. Learn how GRID VCA can replace the workstation-based infrastructure in a showroom without major changes to the existing software solution. We will also cover modifications done to the GRID VCA architecture to enable custom application features as well as an outlook on the future development plans.  Back
 
Keywords:
Graphics Virtualization, Digital Manufacturing, Automotive, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4369
Streaming:
 
Product Innovation Using Private & Public Cloud
Ravi Kunju (Altair)
Simulation driven product innovation leads to a lot of design explorations that traditionally require significant investment in computing infrastructure. Cloud based solutions have promising potential in becoming a channel for such massive computat ...Read More
Simulation driven product innovation leads to a lot of design explorations that traditionally require significant investment in computing infrastructure. Cloud based solutions have promising potential in becoming a channel for such massive computations, however the biggest challenge is to address the visualization of the 'big-data', generated from these large computation. A software and hardware engineered appliance targeted in providing a unified interface for the entire Product Simulation Lifecycle will be demonstrated with examples, as the framework for the Altair's private and public cloud offerings.   Back
 
Keywords:
Graphics Virtualization, Digital Manufacturing, Computational Structural Mechanics, Digital Product Design & Styling, GTC 2014 - ID S4449
Streaming:
 
Customer Experiences with GPU Virtualization and 3D Remoting
Derek Thorslund (Citrix)
Industry leaders have been delivering centralized 3D graphics applications using Citrix XenDesktop HDX 3D Pro since 2009, and now the cost per user is more attractive than ever thanks to GPU-sharing technologies including XenServer/NVIDIA GRID vGPU. ...Read More
Industry leaders have been delivering centralized 3D graphics applications using Citrix XenDesktop HDX 3D Pro since 2009, and now the cost per user is more attractive than ever thanks to GPU-sharing technologies including XenServer/NVIDIA GRID vGPU. Learn about real-world customer experiences with GPU virtualization and 3D graphics remoting, including use cases, benefits and business drivers, scalability measurements, bandwidth consumption per session, cost saving, and best practices.   Back
 
Keywords:
Graphics Virtualization, Cloud Visualization, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4581
Streaming:
Download:
 
Simulation On Demand: Using GRID as a Platform to Develop Simulation-Based Training with a Distributed Team
Joshua Lewis (Check-6 Training Systems)
How do you deploy a complex virtual simulation application to users across the continental US with a small support staff and minimal cost? Centralize it! Check-6 serves the energy industry by creating and deploying innovative solutions that blend i ...Read More
How do you deploy a complex virtual simulation application to users across the continental US with a small support staff and minimal cost? Centralize it! Check-6 serves the energy industry by creating and deploying innovative solutions that blend interactive courseware, knowledge and skills assessment, and simulation-based training. The discussion will be on the process and benefits of using GRID as a centralized platform to build simulation-based procedural training for industrial systems using gaming technology. Topics will include the training development lifecycle, assets and models appropriate for training-oriented simulations, and the GRID assessment, deployment, and configuration process including some unexpected challenges and benefits.  Back
 
Keywords:
Graphics Virtualization, Digital Manufacturing, Desktop & Application Virtualization, Energy Exploration, GTC 2014 - ID S4637
Streaming:
Download:
 
Next Technology Steps for Applied Materials Global Engineering Collaboration Using CAD in the Cloud
Oran Davis (Applied Materials)
Applied Materials experiences and impression while migrating it's 3D mechanical CAD cloud to next generation technology. Applied Materials has 2000+ engineers spread across 93 locations in 22 countries. Each engineer has access to a private cloud C ...Read More
Applied Materials experiences and impression while migrating it's 3D mechanical CAD cloud to next generation technology. Applied Materials has 2000+ engineers spread across 93 locations in 22 countries. Each engineer has access to a private cloud CAD blade-station accessing terabytes of engineering data within the server room. Applied Materials has begun deploying the next generation of CAD in the cloud technologies. The goal is to enable improved service and resource management for the private cloud . Audience members will learn about the details for operating a cloud infrastructure, the challenges involved using current technologies, and the benefits that Applied Materials is currently seeing. From this real world example, audience members will gain key insights for determining if this type of solution is right for their company.  Back
 
Keywords:
Graphics Virtualization, Digital Manufacturing, Computer Aided Design, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4666
Streaming:
Download:
 
NVIDIA GRID for VDI: How To Design And Monitor Your Implementation
Florian Becker (Lakeside Software Inc.), Ben Murphy (Lakeside Software Inc.)
Learn how to implement NVIDIA GRID technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameter ...Read More
Learn how to implement NVIDIA GRID technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameters and GPU utilization and use the data to accurately size and scale the vGPU implementation in VDI use cases. Monitor virtual GPUs to proactively detect changes in performance requirements of the end-user community and manage the end-user experience and to pinpoint performance bottlenecks in the environment.   Back
 
Keywords:
Graphics Virtualization, Big Data Analytics & Data Algorithms, Computer Aided Design, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4686
Streaming:
Download:
 
ArcGIS Pro - 3D GIS in Virtualized Environments
John Meza (ESRI)
The ESRI user community encompasses a wide set of industries, from local and federal government, power utilities, to oil and gas, many are deployed in VDI environments. NVIDIA Grid cards allow those users to take advantage of the 3D graphics now ava ...Read More
The ESRI user community encompasses a wide set of industries, from local and federal government, power utilities, to oil and gas, many are deployed in VDI environments. NVIDIA Grid cards allow those users to take advantage of the 3D graphics now available in our newest GIS analytical product: ArcGIS Pro. This presentation will show results of the performance and scalability testing of ArcGIS Pro in NVIDIA Grid enabled VDI environments. The presentation will include: VDI vendor tools available for admin and tuning, performance and scalability metrics, challenges in designing, developing and testing 3D applications for these environments.  Back
 
Keywords:
Graphics Virtualization, Defense, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4715
Streaming:
 
Delivering High-Performance Remote Graphics with NVIDIA GRID Virtual GPU
Andy Currid (NVIDIA)
Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the ...Read More

Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the GPU between multiple virtual machines, a walkthrough of Virtual GPU setup on Citrix XenServer with remote graphics, and examples of how to tune the configuration for optimum remote graphics performance.

  Back
 
Keywords:
Graphics Virtualization, Cloud Visualization, Media & Entertainment, GTC 2014 - ID S4725
Streaming:
Download:
 
Remote Graphics VDI for the Digital Factory at Gulfstream
Jeff Retey (Gulfstream)
Gulfstream as The World Standard(R) in business aviation worked with NVIDIA technology to enable an entirely new remote graphics VDI for manufacturing, service and support users around the globe. This technology allows the business to react quickly ...Read More
Gulfstream as The World Standard(R) in business aviation worked with NVIDIA technology to enable an entirely new remote graphics VDI for manufacturing, service and support users around the globe. This technology allows the business to react quickly to expanding operations and increase in staffing. Leveraging NVIDIA GRID(TM) technology with Citrix XenDesktop, the CATIA 3D Model Based Design data and PLM data are accessible where needed, when needed. This capability protects the company's most valuable IP assets while providing high performance 3D graphics on non-CAD workstations and mobile tablet devices.  Back
 
Keywords:
Graphics Virtualization, Digital Manufacturing, Cloud Visualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4735
Streaming:
Download:
 
Anatomy of an Aerospace VDI Project: Discover, Plan and Implement a Large Scale VDI 3D GPU Project
Steve Greenberg (Thin Client Computing LLC)
The information presented is based on the Discovery and Design Phase of a real world Aerospace Company. Topics include how to classify users and use cases, expected performance levels, approaches to infrastructure design, and, developing an effective ...Read More
The information presented is based on the Discovery and Design Phase of a real world Aerospace Company. Topics include how to classify users and use cases, expected performance levels, approaches to infrastructure design, and, developing an effective security model. A sample Bill of Materials for Servers/Storage/GPU will be presented as well as guidance on configuring the data center, and, how to transform the organization to adopt Cloud concepts, strategies and operational practices.  Back
 
Keywords:
Graphics Virtualization, Desktop & Application Virtualization, Real-Time Graphics Applications, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4782
Streaming:
 
Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing
David Pellerin (Amazon Web Services)
The use of HPC Cloud Computing environments continues to accelerate with the advent of higher performing infrastructures and capabilities. In this presentation, AWS and HGST will provide a view into specific use cases that highlight how HPC and GPU ...Read More
The use of HPC Cloud Computing environments continues to accelerate with the advent of higher performing infrastructures and capabilities. In this presentation, AWS and HGST will provide a view into specific use cases that highlight how HPC and GPU Cloud Computing is being used as a competitive advantage with CAD/CAM and electronic design automation (EDA), with the ability to spin up clusters running HPC applications. Real-world manufacturing use cases will be discussed.   Back
 
Keywords:
Graphics Virtualization, Digital Manufacturing, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4844
Streaming:
Download:
 
Explore Dell Wyse Datacenter's Graphics Options for Virtual Desktop Computing (Presented by Dell)
Gary Radburn (Dell, Inc.)
In this session we will explore the various options Dell supports in its Dell Wyse DataCenter solution offerings. This session will describe various platform offerings, such as the PowerEdge 720, PowerEdge C8220x and Precision 7610 with the various g ...Read More
In this session we will explore the various options Dell supports in its Dell Wyse DataCenter solution offerings. This session will describe various platform offerings, such as the PowerEdge 720, PowerEdge C8220x and Precision 7610 with the various graphics cards options. In addition, we will discuss the solution offerings around VMware View, Citrix XenDeskop, and Microsoft with Dell vWorkspace. Lastly, we will detail the capabilities of those solutions offerings with various hypervisors, such as VMware vSphere, Citrix XenServer, and Microsoft Windows 2012. This will provide attendees with an overall view of what Dell can offer, giving customers multiple options that they can pick from.  Back
 
Keywords:
Graphics Virtualization, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4850
Streaming:
 
The State of the Industry: How GPU Technologies Are Set to Empower the VDI Experience
Gunnar Berger (Gartner)
Hear the state of the VDI industry and how GPU is set to transform the way enterprises see and use a virtual desktop. In this presentation attendees will (1) learn about the typical VDI use cases and how virtual technologies are changing the paradigm ...Read More
Hear the state of the VDI industry and how GPU is set to transform the way enterprises see and use a virtual desktop. In this presentation attendees will (1) learn about the typical VDI use cases and how virtual technologies are changing the paradigm; (2) understand the different GPU technologies that exist for SBC and VDI workloads; and (3) see visual demonstrations of each graphics technology.  Back
 
Keywords:
Graphics Virtualization, GTC 2014 - ID S4903
Streaming:
 
If You Build It, Will They Come? Better Question Is, Will They Stay?
Dane Young (Entisys Solutions)
Building on S4726 (Intro to Virtualization) and S4783 (Virtual is Better than Physical), this session will take the audience through the most crucial phases of the development lifecycle: Pilot, Production Build, and Roll-out. Regardless of your motiv ...Read More
Building on S4726 (Intro to Virtualization) and S4783 (Virtual is Better than Physical), this session will take the audience through the most crucial phases of the development lifecycle: Pilot, Production Build, and Roll-out. Regardless of your motivations and business drivers to virtualize, if users don't catch the organizational vision, adoption may fail and projects may stall. If not tended to properly, this could turn the organization's pricey CapEx investment into a rather large paperweight! In this session, attendees will learn from the trenches what to do, and what not to do, when it comes time to extend the virtualized solutions to end users. Staying actively engaged during this phase will ensure that the solution continues to move forward and achieve enterprise adoption.  Back
 
Keywords:
Graphics Virtualization, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4948
Streaming:
Download:
 
LoginVSI: New Graphics Workloads For Industry standard Performance & Scalability Testing
Eric-Jan van Leeuwen (Login VSI), Ian Williams (NVIDIA Corportaion)
It's all about the User Experience and there is no second chance to make a first impression. Testing the performance & scalability of a centralized desktop environment, like VDI, is crucial to ensure a great user experience, Knowing what the li ...Read More
It's all about the User Experience and there is no second chance to make a first impression. Testing the performance & scalability of a centralized desktop environment, like VDI, is crucial to ensure a great user experience, Knowing what the limits are of your environment and knowing what to do to solve this, is not a "nice to know" but an essential "need to know" to keep your users on board. Learn how LoginVSI, the standard in VDI Performance and Benchmark testing, and NVIDIA have partnered to extend Login VSI's Performance & Benchmark Testing framework to include graphics workloads. Graphics workloads are geared towards 3D modeling applications using real data and real applications. The session will cover how the framework was extended, what is required of a graphics workload, and results seen from scale testing using an example of a prototype AutoCAD workload.   Back
 
Keywords:
Graphics Virtualization, Cloud Visualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4949
Streaming:
Download:
Keynote
Presentation
Media
Opening Keynote
Jen-Hsun Huang (NVIDIA)
Don''t miss the opening keynote feature Jen-Hsun Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what''s next in visual computing, and preview disruptive technologies and exciting demonstrations across industries. ...Read More
Don''t miss the opening keynote feature Jen-Hsun Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what''s next in visual computing, and preview disruptive technologies and exciting demonstrations across industries.  Back
 
Keywords:
Keynote, GTC 2014 - ID S4736
Streaming:
 
Keynote: Video Games and the Future of Cognitive Enhancement
Adam Gazzaley (UCSF)
A fundamental challenge of modern society is the development of effective approaches to enhance brain function and cognition in both healthy and impaired individuals. For the healthy, this serves as a core mission of our educational system and f ...Read More

A fundamental challenge of modern society is the development of effective approaches to enhance brain function and cognition in both healthy and impaired individuals. For the healthy, this serves as a core mission of our educational system and for the cognitively impaired this is a critical goal of our medical system. Unfortunately, there are serious and growing concerns about the ability of either system to meet this challenge. I will describe an approach developed in our lab that uses custom-designed video games to achieve meaningful and sustainable cognitive enhancement (e.g., Anguera, et al. Nature 2013), as well the next stage of our research program, which uses video games integrated with technological innovations in software (e.g., brain computer interface algorithms, GPU computing) and hardware (e.g., virtual reality headsets, mobile EEG, transcranial electrical brain stimulation) to create a novel personalized closed loop system. I will share with you a vision of the future in which high-tech is used as an engine to enhance our brain''s information processing systems, thus reducing our reliance on non-specific drugs to treat neurological and psychiatric conditions and allowing us to better target our educational efforts.

This keynote will be preceded by naming the winner of the CUDA Center of Excellence Achievement Award, winner for Best Poster, and the new CUDA Fellows, followed by the launch announcement of the Global Impact Award. (Award ceremony duration approximately 15 minutes).

  Back
 
Keywords:
Keynote, Medical Imaging & Visualization, Video & Image Processing, GTC 2014 - ID S4780
Streaming:
 
Keynote: Using NVIDIA GPUs for Feature Film Production at Pixar
Danny Nahmias (Pixar), Dirk Van Gelder (Pixar)
This presentation will show how Pixar uses GPU technology to empower artists in the animation and lighting departments. By providing our artists with high-quality, interactive visual feedback, we enable them to spend more time making creative de ...Read More

This presentation will show how Pixar uses GPU technology to empower artists in the animation and lighting departments. By providing our artists with high-quality, interactive visual feedback, we enable them to spend more time making creative decisions. Animators interactively pose characters in order to create a performance. When features like displacement, fur, and shadows become critical for communicating the story, it is vital to be able to represent these visual elements in motion at interactive frame rates. We will show Presto, Pixar''s proprietary animation system, which uses GPU acceleration to deliver real-time feedback during the character animation process, using examples from Pixar''s recent films. Lighting artists place and adjust virtual lights to create the mood and tone of the scene as well as guide the audience''s attention. A physically-based illumination model allows these artists to create visually-rich imagery using simpler and more direct controls. We will demonstrate our interactive lighting preview tool, based on this model, built on NVIDIA''s OptiX framework, and fully integrated into our new Katana-based production workflow.

  Back
 
Keywords:
Keynote, Media & Entertainment, GTC 2014 - ID S4884
Streaming:
Large Scale Data Visualization & In-Situ Graphics
Presentation
Media
How to Visualize Your GPU-Accelerated Simulation Results
Peter Messmer (NVIDIA)
Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis techniques allowing you to investigate your data on the fly. Starting with some basic CUD ...Read More
Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis techniques allowing you to investigate your data on the fly. Starting with some basic CUDA/OpenGL interoperability, we will introduce more sophisticated data models allowing you to take advantage of widely used tools like ParaView and VisIt to visualize your GPU resident data. Questions like parallel compositing, remote visualization and application steering will be addressed in order to allow you to take full advantage of the GPUs installed in your supercomputing system.  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Combined Simulation & Real-Time Visualization, Scientific Visualization, Supercomputing, GTC 2014 - ID S4244
Streaming:
Download:
 
Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems
Mel Krokos (University of Portsmouth)
Our session will focus on exploitation of emerging GPU-enabled, hybrid HPC architectures for scientific data visualization. We employ Splotch - a rendering algorithm that allows production of high quality imagery and supports very large-scale dataset ...Read More
Our session will focus on exploitation of emerging GPU-enabled, hybrid HPC architectures for scientific data visualization. We employ Splotch - a rendering algorithm that allows production of high quality imagery and supports very large-scale datasets. We summarize a previously developed CUDA implementation of Splotch referring to the underlying performance model for data transfers, computations and memory access. We subsequently focus on exploitation of HyperQ to allow GPU sharing among multiple cores within nodes, followed by an MPI-based approach to distribute workloads across multiple hybrid nodes within HPC systems. A work-offloading model is finally discussed based on MPI-2 remote memory access features for exploiting multi-node, multi-core and multi-coprocessor accelerated computations towards achieving an optimal level of parallelism. We discuss performance results using reference datasets coming from large-scale astrophysical simulations.  Back
 
Keywords:
Large Scale Data Visualization & In-Situ Graphics, Astronomy & Astrophysics, Scientific Visualization, Supercomputing, GTC 2014 - ID S4516
Streaming:
Download:
Machine Learning & AI
Presentation
Media
Massively-Parallel Stochastic Control and Automation: A New Paradigm in Robotics
Jonathan Rogers (Georgia Institute of Technology)
Uncertainty in locomotion and sensing is one of the primary challenges in the robotics domain. GPU's are emerging as powerful new tools for uncertainty quantification through their ability to perform real-time Monte Carlo simulation as part of a cl ...Read More
Uncertainty in locomotion and sensing is one of the primary challenges in the robotics domain. GPU's are emerging as powerful new tools for uncertainty quantification through their ability to perform real-time Monte Carlo simulation as part of a closed-loop control system. By coupling GPU-based uncertainty propagation with optimal control laws, robotic vehicles can "hedge their bets" in unknown environments and protect themselves from unexpected disturbances. Examples of GPU-based stochastic controllers will be discussed for several robotic systems of interest, including simulated and experimental results demonstrating unique improvements in obstacle avoidance and accuracy. The theoretical concepts behind GPU-based control will be described allowing application of these control laws to a wide array of robotic systems.  Back
 
Keywords:
Machine Learning & AI, GTC 2014 - ID S4261
Streaming:
Download:
 
Preliminary Work on Fast Radix-Based k-NN MultiSselect on the GPU
Roshan D'Souza (University of Wisconsin - Milwaukee)
In this presentation we describe an efficient multi-level parallel implementation of the most significant bit (MSB) radix sort-based multi-select algorithm (k-NN). Our implementation processes multiple queries within a single kernel call with each th ...Read More
In this presentation we describe an efficient multi-level parallel implementation of the most significant bit (MSB) radix sort-based multi-select algorithm (k-NN). Our implementation processes multiple queries within a single kernel call with each thread block/warp simultaneously processing different queries. Our approach is incremental and reduces memory transactions through the use of bit operators, warp voting functions, and shared memory. Benchmarks show significant improvement for over previous implementation of k-NN search on the GPU.   Back
 
Keywords:
Machine Learning & AI, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4494
Streaming:
Download:
 
Building Random Forests on the GPU with PyCUDA
Alexander Rubinsteyn (NYU)
Random Forests have become an extremely popular machine learning algorithm for making predictions from large and complicated data sets. The currently highest performing implementations of Random Forests all run on the CPU. We implemented a Random For ...Read More
Random Forests have become an extremely popular machine learning algorithm for making predictions from large and complicated data sets. The currently highest performing implementations of Random Forests all run on the CPU. We implemented a Random Forest learner for the GPU (using PyCUDA and runtime code generation) which outperforms the currently preferred libraries (scikits-learn and wiseRF). The "obvious" parallelization strategy (using one thread-block per tree) results in poor performance. Instead, we developed a more nuanced collection of kernels to handle various tradeoffs between the number of samples and the number of features.   Back
 
Keywords:
Machine Learning & AI, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4525
Streaming:
 
Deep Learning Meets Heterogeneous Computing
Ren Wu (Distinguished Scientist), Baidu
The rise of the internet, especially mobile internet, has accelerated the data explosion - a driving force for the great success of deep learning in recent years. Behind the scenes, the heterogeneous high-performance computing is another key enabler ...Read More
The rise of the internet, especially mobile internet, has accelerated the data explosion - a driving force for the great success of deep learning in recent years. Behind the scenes, the heterogeneous high-performance computing is another key enabler of that success. In this talk, we will share some of work we did at Baidu. We will highlight how big data, deep analytics and high-performance heterogeneous computing can work together with great success.  Back
 
Keywords:
Machine Learning & AI, Big Data Analytics & Data Algorithms, Supercomputing, Video & Image Processing, GTC 2014 - ID S4651
Streaming:
Download:
 
Machine Learning with GPUs: Fast Support Vector Machines without the Coding Headaches
Stephen Tyree (Washington University in St. Louis)
Speeding up machine learning algorithms has often meant tedious, bug-ridden programs tuned to specific architectures, all written by parallel programming amateurs. But machine learning experts can leverage libraries such as CuBLAS to greatly ease the ...Read More
Speeding up machine learning algorithms has often meant tedious, bug-ridden programs tuned to specific architectures, all written by parallel programming amateurs. But machine learning experts can leverage libraries such as CuBLAS to greatly ease the burden of development and make fast code widely available. We present a case study in parallelizing Kernel Support Vector Machines, powerful machine-learned classifiers which are very slow to train on large data. In contrast to previous work which relied on hand-coded exact methods, we demonstrate that a recent approximate method can be compelling for its remarkably simple implementation, portability, and unprecedented speedup on GPUs.  Back
 
Keywords:
Machine Learning & AI, Big Data Analytics & Data Algorithms, Numerical Algorithms & Libraries, GTC 2014 - ID S4656
Streaming:
Download:
 
10 Billion Parameter Neural Networks in Your Basement
Adam Coates (Stanford University)
See how a cluster of GPUs has enabled our research group to train Artificial Neural Networks with more than 10 billion connections. "Deep learning" algorithms, driven by bigger datasets and the ability to train larger networks, have led to ...Read More
See how a cluster of GPUs has enabled our research group to train Artificial Neural Networks with more than 10 billion connections. "Deep learning" algorithms, driven by bigger datasets and the ability to train larger networks, have led to advancements in diverse applications including computer vision, speech recognition, and natural language processing. After a brief introduction to deep learning, we will show how neural network training fits into our GPU computing environment and how this enables us to duplicate deep learning results that previously required thousands of CPU cores.  Back
 
Keywords:
Machine Learning & AI, Computer Vision, Supercomputing, GTC 2014 - ID S4694
Streaming:
Download:
 
GPU-Optimized Deep Learning Networks for Automatic Speech Recognition
Jessica Ray (MIT Lincoln Laboratory)
In this talk, we compare the implementation of deep learning networks [1] on traditional x86 processors with the implementation on NVIDIA Tesla K20 GPU Accelerators for the purposes of training Restricted Boltzmann Machines [2] and for deep network b ...Read More
In this talk, we compare the implementation of deep learning networks [1] on traditional x86 processors with the implementation on NVIDIA Tesla K20 GPU Accelerators for the purposes of training Restricted Boltzmann Machines [2] and for deep network back propagation in a large-vocabulary speech recognition task (automatic transcription of TED talks). Two GPU implementations are compared: 1) a high-level implementation using Theano [3] and 2) a native implementation using low-level CUDA BLAS libraries. We describe the scaling properties of these implementations in comparison to a baseline batched-x86 implementation as a function of training data size. We also explore the development time tradeoffs for each of the implementations.   Back
 
Keywords:
Machine Learning & AI, Performance Optimization, Defense, GTC 2014 - ID S4732
Streaming:
 
Using GPUs to Accelerate Learning to Rank
Alexander Shchekalev (Yandex)
Machine learning is a powerful tool for processing large amounts of data. Learning to rank plays a key role in many information retrieval problems and constructs a ranking model from training data. Ensemble methods allow us to make a trade-off betwee ...Read More
Machine learning is a powerful tool for processing large amounts of data. Learning to rank plays a key role in many information retrieval problems and constructs a ranking model from training data. Ensemble methods allow us to make a trade-off between the quality of the obtained model and computational time of the learning process. On the other hand a lot of algorithms imply parallel processing of data. We describe the task of machine-learned ranking and consider the MatrixNet algorithm based on decision tree boosting. We present GPU optimized implementation of this method, which performs more than 20 times faster compared to the CPU based version and retains the same quality of ranking.  Back
 
Keywords:
Machine Learning & AI, GTC 2014 - ID S4739
Streaming:
Download:
 
Visual Object Recognition Using Deep Convolutional Neural Networks
Rob Fergus (New York University / Facebook)
This talk will describe recent progress in object recognition using deep convolutional networks. Over the last 18 months, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. ...Read More
This talk will describe recent progress in object recognition using deep convolutional networks. Over the last 18 months, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.   Back
 
Keywords:
Machine Learning & AI, Computer Vision, GTC 2014 - ID S4753
Streaming:
 
Clarifai: Enabling Next Generation Intelligent Applications
Matthew Zeiler (Clarifai)
Significant advances have recently been made in the fields of machine learning and image recognition, impacted greatly by the use of NVIDIA GPUs. Leading performance is harnessed from deep neural networks trained on millions of images to predict thou ...Read More
Significant advances have recently been made in the fields of machine learning and image recognition, impacted greatly by the use of NVIDIA GPUs. Leading performance is harnessed from deep neural networks trained on millions of images to predict thousands of categories of objects. Our expertise at Clarifai in deep neural networks helped us achieve the world's best published image labeling results [ImageNet 2013]. We use NVIDIA GPUs to train large neural networks within practical time constraints and are creating a developer API to enable the next generation of applications in a variety of fields. This talk will describe what these neural networks learn from natural images and how they can be applied to auto-tagging new images, searching large untagged photo collections, and detecting near-duplicates. A live demo of our state of the art system will showcase these capabilities and allow audience interaction.  Back
 
Keywords:
Machine Learning & AI, GTC 2014 - ID S4959
Streaming:
Media & Entertainment
Presentation
Media
Full GPU Image Processing Pipeline for Camera Applications: An Overview
Fyodor Serzhenko (Fastvideo)
The goal of this session is to demonstrate how to combine fast performance and high quality for full image processing pipeline on GPU for camera applications in real time. In this session we will present: detailed analysis of GPU image processin ...Read More

The goal of this session is to demonstrate how to combine fast performance and high quality for full image processing pipeline on GPU for camera applications in real time. In this session we will present: detailed analysis of GPU image processing pipeline for camera and its constituent parts (Dark Frame subtraction, Flat-Field Correction, PRNU, White Balance, Demosaicing, ICC profiling and Color Management, output via OpenGL, compression to JPEG), and their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to machine vision, broadcasting and high speed imaging.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4728
Streaming:
Download:
 
An Overview of FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs
Marco Aldinucci (Computer Science Department, University of Torino)
Learn how FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. With ...Read More

Learn how FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. With FastFlow patterns we go beyond simple loop parallelism by combining data-parallelism and streaming. As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4729
Streaming:
Download:
 
On-Line and Batch Stitching of Gigapixel Images Using OpenGL and CUDA Frameworks
Daniel Marks (Duke University)
We present GPU-based methods for generating gigapixel-scale image renderings from the AWARE multi-scale gigapixel cameras. We demonstrate a streaming zoomable gigapixel video interface, allowing viewers to digitally zoom by 30x over a 100 degree ...Read More

We present GPU-based methods for generating gigapixel-scale image renderings from the AWARE multi-scale gigapixel cameras. We demonstrate a streaming zoomable gigapixel video interface, allowing viewers to digitally zoom by 30x over a 100 degree field of view. We also discuss adaptive batch gigapixel image stitching for online distribution. We compare the performance and utility of OpenGL-based image rendering through the traditional GPU video pipeline and CUDA-based image rendering via GPGPU methods.

  Back
 
Keywords:
Media & Entertainment, Computational Photography, Video & Image Processing, GTC 2014 - ID S4737
Streaming:
 
VRender: Pixar's GPU-Accelerated Volume Renderer
Florian Hecht (Pixar)
Pixar has developed an interactive, progressive renderer to speed-up the workflows involving volumetric special effects for our feature films. The renderer has been implemented with NVIDIA's CUDA and makes full use of available GPU performan ...Read More

Pixar has developed an interactive, progressive renderer to speed-up the workflows involving volumetric special effects for our feature films. The renderer has been implemented with NVIDIA's CUDA and makes full use of available GPU performance and memory. The renderer supports various area lights sources and modifiers and implements a physically-based shading model using multiple importance sampling. It is used not just to create a preview but to produce the final frames for compositing in our movies. We'll talk about how the renderer is structured in terms of rendering phases and corresponding kernels on the GPU. We'll discuss how data is laid out and accessed in memory and how we deal with memory limitations. We'll go into details about how various features of the renderer work like customizable shaders, motion blur, shadows from surfaces, deep data output, screen space caching and guaranteed frame-rate interactivity.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4747
Streaming:
Download:
 
From Tent-Pole to Indie: How OctaneRender is Changing the Industry
Jules Urbach (OTOY Inc. and LightStage)
OTOY has built a production pipeline from high-quality content creation to delivery of photorealistic graphics in real-time. In this session, attendees will get a sneak peek under the hood as Jules Urbach unveils the 2014 roadmap for OctaneRende ...Read More

OTOY has built a production pipeline from high-quality content creation to delivery of photorealistic graphics in real-time. In this session, attendees will get a sneak peek under the hood as Jules Urbach unveils the 2014 roadmap for OctaneRender. The unbiased, physically-based OctaneRender, which draws on the power of NVIDIA GPUs, promises to revolutionize visual effects. Attendees will learn how the VFX community is using OctaneRender to transform their production pipelines. OctaneRender uses NVIDIA GPUs to deliver a rich feature set and interactive, final-quality previews to artists and TDs, allowing them to set cameras, lights and materials without time consuming iterations. From tent-pole to indie, Octane has been adopted by some of the industry's leading production designers.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4766
Streaming:
 
Advances in Chaos V-Ray RT Towards GPU Production Rendering
Vladimir "Vlado" Koylazov (Chaos Group)
Learn about the recent advances of V-Ray RT GPU for photorealistic production and interactive rendering. The talk will follow the R&D process of Chaos Software for V-Ray RT GPU towards the goal of delivering production-quality final frame re ...Read More

Learn about the recent advances of V-Ray RT GPU for photorealistic production and interactive rendering. The talk will follow the R&D process of Chaos Software for V-Ray RT GPU towards the goal of delivering production-quality final frame rendering on the GPU as well as improving the performance of the interactive renderer. The various obstacles along the way and the resulting solutions will be discussed. The talk will offer a behind-the-scenes glimpse into the exciting world of GPU programming and hopefully serve as a valuable insight for other software developers.G TC attendees with interest in photorealistic rendering, raytracing, distributed calculations, and programming in CUDA.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4779
Streaming:
 
GPU Usage and the VFX Industry (Presented by Lenovo)
Allen Bolden (Bit Theory, Inc.)
A case study with on the top methods used in the VFX pipeline, and some of the more daring "out of the box" uses coming in the near future for VFX and beyond. The goal of this presentation is to show how multiple technologies tie into ...Read More

A case study with on the top methods used in the VFX pipeline, and some of the more daring "out of the box" uses coming in the near future for VFX and beyond. The goal of this presentation is to show how multiple technologies tie into a pipeline which is accessible anywhere and, powered by a backbone of GPU's, puts production on set in real time during critical time on set.

  Back
 
Keywords:
Media & Entertainment, Video & Image Processing, GTC 2014 - ID S4809
Streaming:
Download:
 
Speeding Innovation for the Global Multiscreen Market with Virtualization through Software-defined Video Processing
Jesse Rosenzweig (Elemental)
The biggest challenge facing multiscreen content providers is keeping pace. Last year, Elemental, the leading multiscreen content delivery solutions supplier and pioneer of the use of GPUs to optimize video streaming over IP networks, responded ...Read More

The biggest challenge facing multiscreen content providers is keeping pace. Last year, Elemental, the leading multiscreen content delivery solutions supplier and pioneer of the use of GPUs to optimize video streaming over IP networks, responded with breakneck-paced innovation. In just under 11 months, Elemental launched the most complete HEVC codec implementation, supported the first real-time 4Kp60 live transmissions, and delivered hybrid ground-to-cloud solutions to major brand customers, including the industry's only cloud-bursting workflow to feature ground and cloud clusters working together with full feature parity. The key enabler: flexible software built upon high-performance, programmable hardware. This presentation will explore how Software-Defined Video Processing (SDVP) and the ubiquity of GPUs as virtual machines on the ground and in the Cloud provide the optimal core for large-scale media deployment architectures. It will also explore challenges with current virtual machine solutions and address customer uncertainty about virtualized video processing reliability and functionality.

  Back
 
Keywords:
Media & Entertainment, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4833
Streaming:
 
Creating CONSTRUCT: How NVIDIA GPUs are Defining a New Filmmaking Paradigm
Kevin Margo (Blur Studio)
Kevin Margo will describe how he is using Chaos Group's V-Ray RT Renderer in his upcoming short film "CONSTRUCT", a CG-animated short film with final-production frames rendered entirely on NVIDIA GPUs. Follow early development of t ...Read More

Kevin Margo will describe how he is using Chaos Group's V-Ray RT Renderer in his upcoming short film "CONSTRUCT", a CG-animated short film with final-production frames rendered entirely on NVIDIA GPUs. Follow early development of this project with behind-the-scenes breakdowns, concepts, motion-capture fight choreography, models, look development, and final-render clips from a segment of the short film. Focusing on both the creative and technical demands of the film, Kevin will show how GPU technology is enabling his small team of artists, working on nights and weekends in a short period of time to achieve excellent results while trailblazing new film-making workflows possible only due to recent gains in GPU rendering and performance.

  Back
 
Keywords:
Media & Entertainment, Rendering & Animation, GTC 2014 - ID S4855
Streaming:
 
The Path to Fast Lines in Adobe Illustrator
Vineet Batra (Adobe)
This talk covers a real-world application of NVIDIA's path rendering technology (NVPR) for accelerating 2D vector graphics, based on Adobe PDF model. We shall demonstrate the use of this technology for real-time, interactive rendering in Ado ...Read More

This talk covers a real-world application of NVIDIA's path rendering technology (NVPR) for accelerating 2D vector graphics, based on Adobe PDF model. We shall demonstrate the use of this technology for real-time, interactive rendering in Adobe Illustrator CC. The substantial performance improvement is primarily attributed to NVPR's ability to render complex cubic Bezier curves independently of device resolution. Further, we shall also discuss the use of NVIDIA's Blend extension to support compositing of transparent artwork in conformance with the Porter-Duff model using 8X-multisampling and per-sample fragment Shaders. Using these technologies, we achieve performance of 30 FPS when rendering and scaling a complex artwork consisting of a hundred thousand cubic Bezier curves with ten thousand blend operations per frame using GTX 780 TI graphics card.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4867
Streaming:
 
Building Photo-Real Virtual Reality from Real Reality, Byte by Byte
Scott Metzger (Nurulize)
This talk describes the process for creating and viewing immersive VR environments of extraordinary visual quality, building on work that was used to launch NVIDIA's Quadro K6000 at SIGGRAPH 2013. It will review how laser scanning, 3D modeli ...Read More

This talk describes the process for creating and viewing immersive VR environments of extraordinary visual quality, building on work that was used to launch NVIDIA's Quadro K6000 at SIGGRAPH 2013. It will review how laser scanning, 3D modeling, HDR image capture, and 3D paint tools were combined with high-frame rate playback to create highly-interactive worlds. He'll review the Mari workflow used for early production of Rise, a theatrical release now in production, and demonstrate the results on an Oculus Rift head-mounted display.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4880
Streaming:
 
Working with the Latest Oculus Rift Hardware and Software
Michael Antonov (Oculus)
Oculus VR is revolutionizing the way people experience 3D worlds. The company's first product, Oculus Rift, is a virtual reality headset that allows users to step inside virtual environments. It provides an immersive, stereoscopic 3D experie ...Read More

Oculus VR is revolutionizing the way people experience 3D worlds. The company's first product, Oculus Rift, is a virtual reality headset that allows users to step inside virtual environments. It provides an immersive, stereoscopic 3D experience with an ultra-wide field of view and super-low-latency head tracking. Since the debut of the Oculus Rift development kit at Game Developers' Conference 2013, Oculus has added a high-definition display, positional tracking and low-persistence support. Also, we've made critical improvements to the Oculus SDK, adding new features while making things simpler and reducing latency. In this talk, we'll discuss everything you need to get started integrating the latest Oculus Rift hardware with your 3D environment. The talk includes an overview of the latest hardware, a technical breakdown for engineers and a game design discussion. We'll also talk about our vision for future hardware development leading to the consumer Rift.

  Back
 
Keywords:
Media & Entertainment, Virtual & Augmented Reality, Defense, Medical Imaging & Visualization, GTC 2014 - ID S4886
Streaming:
 
Project Lighthouse: Final-Frame Rendering with a GPU
Mike Romey (ZOIC Studios)
Zoic Studios will discuss our joint efforts with Chaos Group and NVIDIA to fully realize a final-frame GPU rendering pipeline in production. This unique talk identifies the critical pinch points of an existing VRAY pipeline and the level of atte ...Read More

Zoic Studios will discuss our joint efforts with Chaos Group and NVIDIA to fully realize a final-frame GPU rendering pipeline in production. This unique talk identifies the critical pinch points of an existing VRAY pipeline and the level of attention contemporary GPU's must take to not only meet, but greatly exceed the expectations of quick-turn, television episodic production. Romey will include case study statistics from Zoic's juggernaut ZEUS virtual production pipeline which typically averages 300-400 shots per two week cycle for some of today's most demanding visual effects TV shows, Films, Commercials and Games. Discussion will include a candid evaluation of final-frame rendering on the GPU, its performance, deployment cost and effects on current and future production cycles. Additionally, a diary of technical event and challenges from this project will be discussed as well as a roadmap for future development needed to meet the insatiable demand for rendering visual effects.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4887
Streaming:
 
Leveraging GPUs on Amazon Web Services for Media and Entertainment
John Phillips (Amazon Web Services), Jules Urbach (OTOY Inc. and LightStage)
Amazon EC2 now offers the G2, a new NVIDIA GPU instance type capable of running 3D graphics and GPU compute workloads in the AWS cloud. In this session, attendees will learn about the adoption and evolution of media workflows on the AWS cloud, i ...Read More

Amazon EC2 now offers the G2, a new NVIDIA GPU instance type capable of running 3D graphics and GPU compute workloads in the AWS cloud. In this session, attendees will learn about the adoption and evolution of media workflows on the AWS cloud, including how OTOY and AWS have teamed up to bring the full power of NVIDIA GPUs to media and entertainment. Hosted on AWS' infrastructure, OTOY's ORBX-powered AMIs offer M&E professionals a fully customizable "PC in the Cloud," making sophisticated 3D design, rich-media and creative applications available through a web browser. In this presentation, OTOY and AWS will provide a snapshot of industry-specific use cases, including how ORBX streaming technology can be coupled with real-time 3D rendering via OTOY's OctaneRender.

  Back
 
Keywords:
Media & Entertainment, GTC 2014 - ID S4899
Streaming:
 
Creating High-Dynamic-Range Content for Dolby Vision
Thad Beier (Dolby Laboratories)
Thad Beier will present Dolby's high-dynamic range, wide color gamut system called "Dolby Vision", describing the motivation behind its development and the positive, visceral reaction that content producers and viewers alike have on first ...Read More
Thad Beier will present Dolby's high-dynamic range, wide color gamut system called "Dolby Vision", describing the motivation behind its development and the positive, visceral reaction that content producers and viewers alike have on first seeing content created and viewed in this radically wider image space. He will discuss how NVIDIA's GPU technology is integral to every step of the production process, from off-line computation to real-time image processing.   Back
 
Keywords:
Media & Entertainment, Visual Effects & Simulation, GTC 2014 - ID S4960
Streaming:
Media & Entertainment Summit
Presentation
Media
Accelerated Visual Effects Made Accessible with Javascript
Sean Safreed (Red Giant)
Learn how to exploit the powerful new platform from Red Giant that leverages OpenGL and OpenCL on the latest GPUs coupled with easy to use Javascript to create visual effects tools that run on a variety of operating system and host applications for v ...Read More
Learn how to exploit the powerful new platform from Red Giant that leverages OpenGL and OpenCL on the latest GPUs coupled with easy to use Javascript to create visual effects tools that run on a variety of operating system and host applications for video editing and compositing. The Red Giant platform lets artists create both simple image processing tools and complete user interfaces with just a few simple lines of code. This session will provide both an architectural overview and live examples of advanced tools that exploit the Red Giant framework. In addition, this session will show the power of connecting real-time gaming render techniques and visual effects.   Back
 
Keywords:
Media & Entertainment Summit, Real-Time Graphics Applications, Video & Image Processing, GTC 2014 - ID S4134
Streaming:
Download:
 
Generation, Simulation and Rendering of Large Varied Animated Crowds
Isaac Rudomin (Barcelona Supercomputing Center), Benjamin Hernandez (Barcelona Supercomputing Center)
We discuss several steps in the process for simulating and visualizing large and varied crowds, in real time, for consumer-level computers and graphic cards (GPUs). We discuss methods for simulating, generating, animating and rendering crowds of vari ...Read More
We discuss several steps in the process for simulating and visualizing large and varied crowds, in real time, for consumer-level computers and graphic cards (GPUs). We discuss methods for simulating, generating, animating and rendering crowds of varied aspect and a diversity of behaviors.   Back
 
Keywords:
Media & Entertainment Summit, Combined Simulation & Real-Time Visualization, Rendering & Animation, Visual Effects & Simulation, GTC 2014 - ID S4229
Streaming:
Download:
 
GPU Renderfarm with Integrated Asset Management & Production System
Chen Quan (Nanyang Technological University (NTU))
We propose our integrated system of GPU renderfarm with Asset Management & Production System that can greatly streamline the Computer Graphics (CG) movie production. Two of main advantages of our systems are: 1. Our asset management system can ea ...Read More
We propose our integrated system of GPU renderfarm with Asset Management & Production System that can greatly streamline the Computer Graphics (CG) movie production. Two of main advantages of our systems are: 1. Our asset management system can ease the difficulty in asset handling of a CG movie production between multiple artists. 2. The 3D CG assets that are stored in our system can then be directly submitted to GPU renderfarm for rapid rendering without the need to manually download the assets (from the asset management system). Moreover, by using GPU renderfarm we can accelerate the rendering time significantly.  Back
 
Keywords:
Media & Entertainment Summit, Clusters & GPU Management, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4356
Streaming:
Download:
 
Real-Time 4K JPEG2000 for Broadcast and Digital Cinema
Jiri Matela (Comprimato)
JPEG2000 is compression standard for digital cinema post-producition and it is an emerging standard for broadcast contribution and archiving. So far the JPEG2000 format was considered as computationally too heavy to be used for other then standardize ...Read More
JPEG2000 is compression standard for digital cinema post-producition and it is an emerging standard for broadcast contribution and archiving. So far the JPEG2000 format was considered as computationally too heavy to be used for other then standardized applications such as cinema distribution. We present successful GPU design and implementation of JPEG2000 codec allowing for real-time film compression and decompression in digital cinema and broadcast applications. Fast GPU processing will help to further spread JPEG2000 as archiving and mezzanine format.  Back
 
Keywords:
Media & Entertainment Summit, Medical Imaging & Visualization, Video & Image Processing, GTC 2014 - ID S4434
Streaming:
 
Graphics and Computer Vision for Live Augmented Reality: The 34th America's Cup
Tim Heidmann (Serious Intent LLC)
For the 2013 America's Cup sailboat races, the event tech team tracked the yachts, marks, and HDTV helicopter cameras with unprecedented accuracy, enabling a real-time augmented reality graphics system called AC LiveLine. This was used extensively t ...Read More
For the 2013 America's Cup sailboat races, the event tech team tracked the yachts, marks, and HDTV helicopter cameras with unprecedented accuracy, enabling a real-time augmented reality graphics system called AC LiveLine. This was used extensively throughout the over 100 hours of international live television broadcast. In 2012, it received the Emmy for Technical Achievement in Sports Broadcast. Visuals provided identification of the yachts, details of the course, graphical display of tactical information, and a number of detailed insights into wind, course, and currents. GPU technology was pivotal in solving the problems of simulation, display, tracking, and visual processing inherent in such a complex project. This talk builds on a talk from last year's conference, and includes new topics such as using computer vision techniques to fine-tune the positioning of the yachts, corner tracking to eliminate the jitter of graphics relative to the video, and accelerated particle system techniques to simulate and display visual effects.  Back
 
Keywords:
Media & Entertainment Summit, Virtual & Augmented Reality, Video & Image Processing, GTC 2014 - ID S4446
Streaming:
Download:
 
Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics
Gergely Klar (UCLA, Graphics Lab)
The Material Point Method's potential for computer graphics has been demonstrated at SIGGRAPH 2013, where it has been used with success in the simulation of snow. This opens up a range of exciting new opportunities for the simulation of elastic, vis ...Read More
The Material Point Method's potential for computer graphics has been demonstrated at SIGGRAPH 2013, where it has been used with success in the simulation of snow. This opens up a range of exciting new opportunities for the simulation of elastic, viscous and fracturing materials. In this talk, we present an efficient, massively parallel implementation of MPM on the GPU. We show a radical new approach to remove the parallelization bottleneck imposed by the method's particles-to-grid rasterization step. The rasterization is implemented with the use of atomic instructions, while our specialized arrangement of the particles minimizes the occasions when actual synchronization is required. We demonstrate the efficiency of this speculative atomics approach by comparing it to more traditional implementations, where the rasterization step is transformed to a gather operation. This combination of atomic instructions and special arrangement is not limited to MPM, but can be used in any simulation with a particles-to-grid rasterization step.  Back
 
Keywords:
Media & Entertainment Summit, Performance Optimization, Rendering & Animation, Visual Effects & Simulation, GTC 2014 - ID S4458
Streaming:
Download:
 
GPU-Based Multiplatform Transcoding
Mahmut Samil Sagiroglu (Erlab Software)
Learn how to take the advantage of GPU for video processing and encoding in order to get a high efficient, real time, and multiplatform video output. Contemporary trends make it mandatory to transmit digital media to all platforms. We use GPU process ...Read More
Learn how to take the advantage of GPU for video processing and encoding in order to get a high efficient, real time, and multiplatform video output. Contemporary trends make it mandatory to transmit digital media to all platforms. We use GPU processing and NVENC hardware encoding in order to produce the video stream output in different formats simultaneously with minimum latency and high quality.  Back
 
Keywords:
Media & Entertainment Summit, Video & Image Processing, GTC 2014 - ID S4475
Streaming:
Download:
 
Next Gen Indie Filmmaking: The Technology, Workflow and Power at Our Fingertips
James Fox (Dawnrunner)
Learn how to leverage breakthrough technologies within a nimble independent production environment to create Hollywood-Challenging films. With 1) GPU Acceleration providing an unprecedented economy of time-to-power, and 2) Desktop Virtualization cre ...Read More
Learn how to leverage breakthrough technologies within a nimble independent production environment to create Hollywood-Challenging films. With 1) GPU Acceleration providing an unprecedented economy of time-to-power, and 2) Desktop Virtualization creating an opportunity for hassle-free expansion and contraction of work force; the barrier to entry required to bring your vision to life has been substantially mitigated. Concepts in technology and their integration into a developed workflow will be illustrated, along with a side-by-side comparison of a project that was transformed within this next-gen model.  Back
 
Keywords:
Media & Entertainment Summit, Graphics Virtualization, Desktop & Application Virtualization, GTC 2014 - ID S4480
Streaming:
Download:
 
High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video
Peter Walsh (ESPN)
Discover how to architect a system for the real time GPU processing of broadcast video. Learn how the current generation of GPU hardware and the CUDA computing platform can be used to support simultaneous video acquisition, transferring of video from ...Read More
Discover how to architect a system for the real time GPU processing of broadcast video. Learn how the current generation of GPU hardware and the CUDA computing platform can be used to support simultaneous video acquisition, transferring of video from CPU to GPU, processing on the GPU, transferring video from the GPU to the CPU and performing processing on the CPU. The architecture described achieves high throughput by pipelining these operation while maintaining the flexibility for easy reconfiguration. A common buffer mechanism will be described for both CPU and GPU memory. This buffer mechanism also supports buffers having different line pitches which may be required depending on the hardware configuration. In addition to video processing, the interoperation between graphics and CUDA processing is also addressed within the same framework.  Back
 
Keywords:
Media & Entertainment Summit, Real-Time Graphics Applications, Video & Image Processing, GTC 2014 - ID S4481
Streaming:
Download:
 
WYSIWYG Computational Photography via Viewfinder Editing
Jongmin Baek (Stanford University)
Digital cameras with electronic viewfinders provide a relatively faithful depiction of the final image, providing a WYSIWYG experience. If, however, the image is created from a burst of differently captured images, or non-linear interactive edits sig ...Read More
Digital cameras with electronic viewfinders provide a relatively faithful depiction of the final image, providing a WYSIWYG experience. If, however, the image is created from a burst of differently captured images, or non-linear interactive edits significantly alter the final outcome, then the photographer cannot directly see the results, but instead must imagine the post-processing effects. In this talk we explore the notion of viewfinder editing, which makes the viewfinder more accurately reflect the final image the user intends to create. We demonstrate an application that allows the user to alter the local or global appearance (tone, color, saturation, or focus) via stroke-based input, and propagate the edits spatiotemporally. The system then delivers a real-time visualization of these modifications to the user, and drives the camera control routines to select better capture parameters.  Back
 
Keywords:
Media & Entertainment Summit, Computational Photography, Mobile Applications, Video & Image Processing, GTC 2014 - ID S4505
Streaming:
Download:
 
Shadertoy: Do You Know What a Fragment Shader Can Do?
Pol Jeremias (Beautypi), Inigo Quilez (Beautypi)
Shadertoy is a website that allows developers to live-code shaders that react to music, videos and webcam using WebGL. These creations are shared with the community, making Shadertoy a great repository for finding inspiration, learning and teaching a ...Read More
Shadertoy is a website that allows developers to live-code shaders that react to music, videos and webcam using WebGL. These creations are shared with the community, making Shadertoy a great repository for finding inspiration, learning and teaching about shading, reactivity and rendering. The website has one challenge: developers have to create their content by only using one full-screen fragment shader. This restriction has pushed the boundaries of what is possible to render in only two triangles. In this session, we are going to walk the audience through Shadertoy and some of the most innovative, artistic and creative GPU algorithms that our community has been developing over the last 6 months. This includes shaders employing raymarching, procedural texturing, modelling and animation, fractal geometry, image compression and volumetric rendering.  Back
 
Keywords:
Media & Entertainment Summit, Real-Time Graphics Applications, Rendering & Animation, GTC 2014 - ID S4550
Streaming:
 
The Future of Entertainment: Immersive Reality System
Matteo Garibotti (University of Genoa)
Sooner or later, the worlds of videogames and cinemas will collide/meet halfway/join: we will have videogame with the quality of movies, and movies with the interaction of videogame. With a novel stereoscopic 3D visualization technique we developed a ...Read More
Sooner or later, the worlds of videogames and cinemas will collide/meet halfway/join: we will have videogame with the quality of movies, and movies with the interaction of videogame. With a novel stereoscopic 3D visualization technique we developed and patented (www.truedynamic3d.com), we are able to create an immersive reality system where user can perceive virtual world completely merged with real world. This will lead to a new generation of entertainment contents, where movies will not be limited inside the frame of the monitor, but they will surround the user.   Back
 
Keywords:
Media & Entertainment Summit, Virtual & Augmented Reality, Computer Vision, Game Development, GTC 2014 - ID S4640
Streaming:
Download:
 
High-Performance Video Encoding Using NVIDIA GPUs
Abhijit Patait (NVIDIA)
This session is intended to provide a broad overview of the video encoding capabilities of the current (Kepler) and future (Maxwell) generations of NVIDIA GPUs. We will provide an overview of the hardware capabilities and software APIs used for video ...Read More
This session is intended to provide a broad overview of the video encoding capabilities of the current (Kepler) and future (Maxwell) generations of NVIDIA GPUs. We will provide an overview of the hardware capabilities and software APIs used for video encoding, with an overview of recent improvements made in performance and quality of the encoder. We will also provide a quick overview of how NVIDIA video encoding can be used in various applications such as transcoding, low-latency applications, virtualization, streaming etc.   Back
 
Keywords:
Media & Entertainment Summit, Cloud Visualization, Desktop & Application Virtualization, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4646
Streaming:
Download:
 
Porting Fabric Engine to NVIDIA Unified Memory: A Case Study
Peter Zion (Fabric Engine Inc.)
Fabric Engine is a platform for the development of high-end 3D production tools. Fabric Engine exposes high-performance computation through the KL programming language. KL is a high-level, single-source language that was initially designed to levera ...Read More
Fabric Engine is a platform for the development of high-end 3D production tools. Fabric Engine exposes high-performance computation through the KL programming language. KL is a high-level, single-source language that was initially designed to leverage multicore CPUs for parallelism, and was later adapted to additionally support modern GPUs. In this talk we present a case study of adding support for NVIDIA GPUs to KL through the use of the new unified memory feature, including key challenges faced and solutions architected to overcome these challenges.   Back
 
Keywords:
Media & Entertainment Summit, Programming Languages & Compilers, Visual Effects & Simulation, GTC 2014 - ID S4657
Streaming:
Download:
 
Boosting Image Processing Performance in Adobe Photoshop with GPGPU Technology
Joseph Hsieh (Adobe)
Get an inside look at two Photoshop features and how they use OpenCL. The Smart Sharpen and Blur Gallery features use OpenCL to vastly improve performance, even while operating over very large images. Challenges and solutions will be discussed, follo ...Read More
Get an inside look at two Photoshop features and how they use OpenCL. The Smart Sharpen and Blur Gallery features use OpenCL to vastly improve performance, even while operating over very large images. Challenges and solutions will be discussed, followed by a demonstration of the achievements.  Back
 
Keywords:
Media & Entertainment Summit, Performance Optimization, Computational Photography, Video & Image Processing, GTC 2014 - ID S4662
Streaming:
 
Implementing OptiX Raytracing Features into FurryBall GPU Renderer
Jan Tomanek (AAA studio)
This session describes our three-month experience adding GPU ray tracing acceleration to our FurryBall GPU rasterized rendering engine by incorporating NVIDIA OptiX. We'll review the advantages of rasterizing techniques for hair, displacement and an ...Read More
This session describes our three-month experience adding GPU ray tracing acceleration to our FurryBall GPU rasterized rendering engine by incorporating NVIDIA OptiX. We'll review the advantages of rasterizing techniques for hair, displacement and antialiasing vs the benefits of ray tracing for indirect, reflection and refraction--and how to combine the best of both. We'll show comparisons of rasterized-only scenes and the value of OptiX ray tracing.   Back
 
Keywords:
Media & Entertainment Summit, Ray Tracing, Rendering & Animation, GTC 2014 - ID S4690
Streaming:
Download:
 
Realtime Preview For VFX: Challenges and Rewards
Damien Fagnou (MPC)
In this session we will discuss the challenge and benefits of interactively visualizing large scenes in modern big budget VFX-driven movies. We will share some examples of the scale and complexity we experienced in our recent productions at MPC ...Read More

In this session we will discuss the challenge and benefits of interactively visualizing large scenes in modern big budget VFX-driven movies. We will share some examples of the scale and complexity we experienced in our recent productions at MPC and the value of being able to visualize them without the need to go through long offline render processes. We will show initial results of our work done using Nvidia's OptiX framework and Fabric Engine to assemble and render large scenes in an interactive environments taking advantage of the power of high end GPUs.

  Back
 
Keywords:
Media & Entertainment Summit, Large Scale Data Visualization & In-Situ Graphics, Rendering & Animation, Visual Effects & Simulation, GTC 2014 - ID S4697
Streaming:
Download:
Medical Imaging & Visualization
Presentation
Media
Real-Time Functional Brain Imaging: How GPU Acceleration Redefines Each Stage
Adam Gazzaley (UCSF), Tim Mullen (Swartz Center for Computational Neuroscience, Institute for Neural Computation, UC San Diego), Christian Kothe (UCSD), Oleg Konings (Gazzaley Lab at UCSF)
Learn how massively parallel CPU-GPU architectures and distributed optimization algorithms are advancing the state-of-the art in real-time non-invasive electroencephalography (EEG) and brain-machine interfaces (BCI), offering new perspectives in how ...Read More
Learn how massively parallel CPU-GPU architectures and distributed optimization algorithms are advancing the state-of-the art in real-time non-invasive electroencephalography (EEG) and brain-machine interfaces (BCI), offering new perspectives in how we study and interface with the human brain. Specifically, we will discuss recent efforts to accelerate key computationally-intensive inference problems. These include accurate neuronal source reconstruction, large-scale dynamical system identification, graph-theoretic connectivity analysis, and statistical machine learning for improved neuronal and cognitive state inference. We will examine distributed implementations of Alternating Direction Method of Multipliers (ADMM) convex optimization, using cuBLAS and custom CUDA kernels. Among these, a CUDA implementation of the sum-of-norms regularization (group lasso) will be discussed and compared to a serial C++ implementation and an optimized multi-core CPU MATLAB implementation.  Back
 
Keywords:
Medical Imaging & Visualization, Performance Optimization, Real-Time Graphics Applications, GTC 2014 - ID S4633
Streaming:
 
Experiences Porting Real Time Signal Processing Pipeline CUDA Kernels to Kepler and Windows 8
Ismayil Guracar (Siemens Medical Solutions USA, Inc. Ultrasound Business Unit)
The move to the Kepler generation of GPU cards created new challenges and opportunities for an existing medical ultrasound imaging product with high performance real time signal processing kernels based on CUDA running on the Fermi based Quadro 2000 ...Read More
The move to the Kepler generation of GPU cards created new challenges and opportunities for an existing medical ultrasound imaging product with high performance real time signal processing kernels based on CUDA running on the Fermi based Quadro 2000 and WinXP. The initial port to Kepler and Win8 only required a new driver, however significant degradation in execution speed was noted compared to the earlier generation. I will show how various causes of the slowdown were identified and the strategies we developed, including increasing instruction level parallelism (ILP), to refactor kernels to achieve the full potential of the Kepler architecture.  Back
 
Keywords:
Medical Imaging & Visualization, Performance Optimization, Signal & Audio Processing, GTC 2014 - ID S4148
Streaming:
Download:
 
Computation of Mutual Information Metric for Image Registration on Multiple GPUs
Andrew Adinetz (Julich Supercomputing Centre, Forschungszentrum Julich)
Because of their computational power, GPUs are widely used in the field of image processing. And while registration of brain images has been previously accelerated with GPUs, registration of human brain images presents new challenges due to large amo ...Read More
Because of their computational power, GPUs are widely used in the field of image processing. And while registration of brain images has been previously accelerated with GPUs, registration of human brain images presents new challenges due to large amounts of data and images not fitting in the memory of a single device. We present how we address these challenges with a multi-GPU approach. We present in detail how we overcome challenges arising due highly irregular communication during metric computation. Our evaluation demonstrates that adequate performance is achieved with multiple GPUs even with high volume of communication.   Back
 
Keywords:
Medical Imaging & Visualization, GTC 2014 - ID S4270
Streaming:
Download:
 
CUDA-Accelerated MATLAB without Parallel Computing Toolbox for 3D Medical Image Segmentation
Jung W. Suh (KLA-Tencor)
Learn how to accelerate your MATLAB codes using CUDA without Parallel Computing Toolbox. Although the Parallel Computing Toolbox is useful for speeding up, this toolbox may not be accessible to every MATLAB user and may have limitations in fully expl ...Read More
Learn how to accelerate your MATLAB codes using CUDA without Parallel Computing Toolbox. Although the Parallel Computing Toolbox is useful for speeding up, this toolbox may not be accessible to every MATLAB user and may have limitations in fully exploiting the power of both MATLAB and CUDA. For the purpose of general speeding up of MATLAB applications, the GPU-utilization through c-mex would provide more flexibility and power in many situations. This session will go through the MATLAB implementation of the atlas-based 3D hippocampus segmentation for MRI image as an example. The atlas-based segmentation is widely used in neuroimage analysis due to its reliable segmentation result even for the challenging target objects with ambiguous and complicated boundaries. However, it requires a high computational power because 3D image registration is used during the segmentation process. This session will show the each step of CUDA optimization for our atlas-based segmentation MATLAB codes from profiling to CUDA conversions through c-mex.  Back
 
Keywords:
Medical Imaging & Visualization, Computer Vision, Video & Image Processing, GTC 2014 - ID S4342
Streaming:
Download:
 
Accelerated X-Ray Imaging: Real-Time Multi-Plane Image Reconstruction with CUDA
Prashanth Bhat (Manipal Dot Net Pvt. Ltd.)
Explore the realm of modern X-ray Fluoroscopy, where ever-increasing data rates and computational requirements are the norm. This session presents an efficient and scalable CUDA solution for multi-plane image reconstruction, an essential yet computat ...Read More
Explore the realm of modern X-ray Fluoroscopy, where ever-increasing data rates and computational requirements are the norm. This session presents an efficient and scalable CUDA solution for multi-plane image reconstruction, an essential yet computationally challenging component of these systems. Our parallelization strategy incorporates several non-trivial techniques to improve performance: (a)reduce run-time computations by using pre-computed LUTs; (b)reduce memory bandwidth consumption by accumulating computations in registers before writing to memory; (c)exploit 2D data locality by using the GPU's texture memory and cache; (d) optimize occupancy by tuning the thread-block configuration. We present experimental results on three Kepler GPUs: GeForce GTX690, Tesla K10, and Tesla K20. On the GTX690, we show real-time rates of 15 fps for 32 1000x1000 image planes, with speed-ups of 6000x over a CPU implementation, and 10x over an alternative CUDA approach. On both Tesla GPUs, we show linear scaling, making a multi-GPU solution viable.  Back
 
Keywords:
Medical Imaging & Visualization, Ray Tracing, Video & Image Processing, GTC 2014 - ID S4363
Streaming:
Download:
 
Enabling Real-Time Cancer Research Tools: Accelerating Analysis of Cell Responses to Chemotactic Gradients
Jimmy Pettersson (High Performance Consulting)
Learn how we used CUDA to accelerate cancer research by building a complete real-time automated analysis tool for research scientists. By shortening an analysis process down from days to less than a minute we're enabling scientists to spend more tim ...Read More
Learn how we used CUDA to accelerate cancer research by building a complete real-time automated analysis tool for research scientists. By shortening an analysis process down from days to less than a minute we're enabling scientists to spend more time focusing on their work: cancer research, molecular drug screening on a cellular level, etc,. The talk will also get into some of the computational challenges and algorithm design opportunities that were seized upon.   Back
 
Keywords:
Medical Imaging & Visualization, Bioinformatics & Genomics, GTC 2014 - ID S4364
Streaming:
Download:
 
A New GPU-Based Level Set Method for Medical Image Segmentation
Wenzhe Xue (Mayo Clinic Arizona; Arizona State University)
We have developed a new approach to measure lesion volumes in medical images using GPU programming. The approach is based on the level set method and minimizes the number of voxels included in the computational domain with unique efficiency. The unde ...Read More
We have developed a new approach to measure lesion volumes in medical images using GPU programming. The approach is based on the level set method and minimizes the number of voxels included in the computational domain with unique efficiency. The underlying cost function and specifics of the level sets approach are not limited by the implementation, and multiple methods for determining the boundary progression speed are possible. We have experimented with intensity-based approaches as well as higher-order feature spaces using multiple image contrasts. We have tested our approach on synthetic images and in a clinical setting. GPU programming also enables real-time 3D rendering and visualization of the propagating level set surface volume. This GPU-enabled combination of speed and interactivity makes our approach an excellent candidate for use in oncology where change in tumor volume guides clinical decision making and assessment of treatment effectiveness.  Back
 
Keywords:
Medical Imaging & Visualization, Combined Simulation & Real-Time Visualization, Video & Image Processing, GTC 2014 - ID S4422
Streaming:
Download:
 
GPU Acceleration of Processing and Visualization for Various Optical Coherence Tomography Methodologies
Kevin Wong (Simon Fraser University)
The goal of this session is to explore the many GPU computing applications for accelerating the processing pipeline and visualization algorithms in Fourier Domain Optical Coherence Tomography (FDOCT) for medical applications, such as ophthalmic imagi ...Read More
The goal of this session is to explore the many GPU computing applications for accelerating the processing pipeline and visualization algorithms in Fourier Domain Optical Coherence Tomography (FDOCT) for medical applications, such as ophthalmic imaging. We will describe the GPU programming techniques that we used for accelerating and optimizing real-time FDOCT processing, which is currently the fastest GPU implementation for FDOCT to the best of our knowledge. We will demonstrate two additional novel variations of functional OCT imaging made possible by GPU acceleration: real time speckle variance OCT (svOCT) to visualize the vasculature network in the retina, and wavefront sensorless adaptive optics OCT for ultrahigh resolution volumetric imaging. We will present videos to illustrate the use of our GPU-based FDOCT processing and the imaging applications in a clinical environment.  Back
 
Keywords:
Medical Imaging & Visualization, GTC 2014 - ID S4513
Streaming:
Download:
 
Scalable Rendering Architecture: Challenges & Approaches
Ketan Mehta (Vital Images Inc)
Learn about challenges in deploying medical imaging advanced visualization in enterprise data-centers and the impacts of virtualization. This talk will discuss and dispel some myths about virtualization with GPUs. The talk will also cover key challe ...Read More
Learn about challenges in deploying medical imaging advanced visualization in enterprise data-centers and the impacts of virtualization. This talk will discuss and dispel some myths about virtualization with GPUs. The talk will also cover key challenges of designing scalable rendering architecture that can support tens to hundreds of users, focusing on system and architecture challenges along with software design concerns that need to be addressed. Active participation and sharing of experiences from the audience is welcome and encouraged.  Back
 
Keywords:
Medical Imaging & Visualization, Clusters & GPU Management, Remote Graphics & Cloud-Based Graphics, GTC 2014 - ID S4645
Streaming:
Download:
Mobile Applications
Presentation
Media
Making Subdivision Surfaces an Industry Standard
David Yu (Pixar Animation Studios)
Catmull-Clark subdivision surfaces provide a powerful way to represent geometry in animation systems. At a base level they extend bspline patches to handle arbitrary topology. And with advanced features such as semi-sharp creasing, face-varying ...Read More

Catmull-Clark subdivision surfaces provide a powerful way to represent geometry in animation systems. At a base level they extend bspline patches to handle arbitrary topology. And with advanced features such as semi-sharp creasing, face-varying texture coordinates, and hierarchical control meshes, they can encode higher frequency modeling features with a surprising economy of control points.In this talk Bill Polson will cover: What is subdivsion? What are the advanced features and why are they useful? How are they drawn and evaluated on CPUs and GPUs? Bill will also discuss OpenSubdiv: Pixar's opensource implementation of subdivision surfaces. He'll demonstrate OpenSubdiv in action; and provide an update on the project, the engineering roadmap, and industry adoption.

  Back
 
Keywords:
Mobile Applications, Media & Entertainment, Rendering & Animation, GTC 2014 - ID S4856
Streaming:
Mobile Summit
Presentation
Media
The Mobile Revolution: Key Challenges and Opportunities
Neil Trevett (NVIDIA), Roman Hasenbeck (Metaio US), Paul Travers (Vuzix), Tim Droz (SoftKinetic), Matt Bell (Matterport)
A lively and an interactive panel session where we bring together leading experts from NVIDIA and the industry to discuss how the mobile revolution is changing the world we live in and the key challenges that are yet to be solved. This is your chanc ...Read More
A lively and an interactive panel session where we bring together leading experts from NVIDIA and the industry to discuss how the mobile revolution is changing the world we live in and the key challenges that are yet to be solved. This is your chance to get your gnarliest mobile questions answered!  Back
 
Keywords:
Mobile Summit, GTC 2014 - ID S4786
Streaming:
 
Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's Latest Developer Tools Suite
Sebastien Domine (NVIDIA)
The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this tutorial spans from advanced graphics to compute and multi-core CPU tool ...Read More
The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this tutorial spans from advanced graphics to compute and multi-core CPU tools to enable developers to fully take advantage of the heterogeneous computing horsepower available. More specifically, compute developers will learn about the tools available to program CUDA on Tegra K1. Graphics developers will be introduced to the new Tegra Graphics Debugger for Tegra K1. This new mobile graphics development tool supports all the advanced features that Tegra K1 has to offer, via OpenGL ES 2.0, 3.0 and OpenGL 4.3. Finally, game developers will see how to manage their Android build configuration and debugging sessions all within the latest Visual Studio 2013, profile their application to identify hot spots and corresponding call stacks with our brand new release of Tegra System Profiler.  Back
 
Keywords:
Mobile Summit, Debugging Tools & Techniques, Performance Optimization, Game Development, GTC 2014 - ID S4825
Streaming:
Download:
 
Image and Vision Processing on Tegra
Elif Albuz (NVIDIA)
Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fast pr ...Read More
Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fast processing of these algorithms enable new paradigms in embedded and mobile applications. Tegra K1 is built to address data parallel embedded and mobile applications, with CUDA enabled GPU, Image Signal processing Engine, NEON enabled quad-core ARM and encode and decode accelerator hardware. Tegra software libraries wrap all this capability and provide to the use of developers. In this session, an overview of software libraries and architecture that are relevant for image and vision computing on Tegra platforms will be presented.  Back
 
Keywords:
Mobile Summit, Automotive, Computational Photography, Computer Vision, GTC 2014 - ID S4873
Streaming:
 
Flap Higher Than the Birds: Differentiate Your Android Game and Allegorithmic Substance
Andrew Edelsten (NVIDIA), Sebastien Deguy (Allegorithmic)
Android continues its meteoric rise as the world''s dominate mobile operating system. Every day developers large and small discover new ways to delight users but getting noticed is increasingly difficult. The latest NVIDIA(r) Tegra(r) K1 processors p ...Read More
Android continues its meteoric rise as the world''s dominate mobile operating system. Every day developers large and small discover new ways to delight users but getting noticed is increasingly difficult. The latest NVIDIA(r) Tegra(r) K1 processors provide developers with a host of new features to differentiate their titles and get them flying above the rest of the crowd. During this session discover the new CPU, GPU, and multimedia features the latest Tegra processors offer and learn how to use them to enhance and extend your applications. As an example of the type of differentiation the Tegra K1 makes possible, Allegorithmic and RUST Ltd will provide a hands-on demo of physically based shading (PBR), dynamic texturing and high resolution GPU based particle throwing using the latest Allegorithmic Substance texturing pipeline.  Back
 
Keywords:
Mobile Summit, Game Development, Mobile Applications, GTC 2014 - ID S4877
Streaming:
Download:
 
Mobile GPU Compute with Tegra K1
Amit Rao (NVIDIA), Mark Ebersole (NVIDIA)
An in-depth session that explores how sophisticated mobile applications can harness the power of GPU Compute using the Kepler GPU in the Tegra K1 SOC. Topics to be covered include: (1) An overview of GPU Compute capability of Tegra K1; (2) A review o ...Read More
An in-depth session that explores how sophisticated mobile applications can harness the power of GPU Compute using the Kepler GPU in the Tegra K1 SOC. Topics to be covered include: (1) An overview of GPU Compute capability of Tegra K1; (2) A review of the various GPU Compute APIs with relative strengths and weaknesses including CUDA, RenderScript, OpenGL Compute Shaders and OpenCL; (4) Getting up and running on the Tegra Development Platform with GPU Compute; (5) Principles and considerations for programming with CUDA on Tegra and; (6) Walk-throughs of GPU Compute coding examples using CUDA for Tegra K1.  Back
 
Keywords:
Mobile Summit, GTC 2014 - ID S4906
Streaming:
Download:
 
Enhancement of Tegra Tablet's Computational Performance by GeForce Desktop and Wifi
Di Zhao (Ohio State University)
Learn how to develop Tablet-Wifi-Desktop based applications to enhance your tablet's performance. Discussion of this talk includes: (1) a comprehensive discussion of the most popular architecture Table-Wifi-Desktop; (2) an introduction to programmin ...Read More
Learn how to develop Tablet-Wifi-Desktop based applications to enhance your tablet's performance. Discussion of this talk includes: (1) a comprehensive discussion of the most popular architecture Table-Wifi-Desktop; (2) an introduction to programming approaches for Tablet-Wifi-Desktop; (3) an introduction to decomposition of computationally intensive problems such as Fast Fourier Transform for computer graphics on Tablet-Wifi-Desktop; (4) computational results to illustrate the performance enhancement of Tegra Tablet by Tablet-Wifi-Desktop.  Back
 
Keywords:
Mobile Summit, Mobile Applications, GTC 2014 - ID S4191
Streaming:
Download:
 
eyeSight on Android-based Set-Top-Boxes
Gideon Shmuel (eyeSight Technologies Ltd.)
More and more we are seeing gesture recognition interfaces being integrated into digital devices around us. TVs and PCs with pre-installed gesture controls are becoming a standard feature in new devices launched in the market. As a provider of gestur ...Read More
More and more we are seeing gesture recognition interfaces being integrated into digital devices around us. TVs and PCs with pre-installed gesture controls are becoming a standard feature in new devices launched in the market. As a provider of gesture solutions - we will discuss the benefits of running the gesture engines on GPU, as well as how Tegra based devices, including set top boxes, can benefit from such touch-free interfaces.   Back
 
Keywords:
Mobile Summit, Computer Vision, Smart TV, Mobile & Second Screen Applications, Video & Image Processing, GTC 2014 - ID S4197
Streaming:
 
From CPU to GPU: Optimizing 3D Depthmap and Filtering
Tim Droz (SoftKinetic)
The advent of 3D technologies has created a particular strain on processing resources for embedded platforms such as Tegra. 3D Depthmap generation and filtering optimization have traditionally been processed using the CPU, but by offloading these ca ...Read More
The advent of 3D technologies has created a particular strain on processing resources for embedded platforms such as Tegra. 3D Depthmap generation and filtering optimization have traditionally been processed using the CPU, but by offloading these capabilities to the much more robust GPU, up to a quarter of the bottleneck created in processing 3D images can be eliminated. In this session learn how utilizing the GPU to free up processing resources allows for a leaner, faster developer experience. Also, discuss how to manage and overcome the most difficult part of GPU processing - synchronization.   Back
 
Keywords:
Mobile Summit, Numerical Algorithms & Libraries, Computer Vision, Programming Languages & Compilers, GTC 2014 - ID S4392
Streaming:
Download:
 
Khronos Open API Standards for Mobile Graphics, Compute and Vision Processing
Neil Trevett (NVIDIA)
Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that help define the future of mobile silicon. This session explores the role of industry standards in maximizing mobile market opportunities and pr ...Read More
Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that help define the future of mobile silicon. This session explores the role of industry standards in maximizing mobile market opportunities and provides an overview of the state-of-the-art in acceleration APIs on Android and ARM-based systems including: (1) accelerating time to productive ecosystems rather than minimizing time to proprietary specifications; (2) balancing and reconciling the opposing benefits of "differentiation" and "fragmentation"; (3) designing open standards that drive innovation while allowing room for a healthy competition; (4) overview of Khronos ecosystem APIs for graphics, computing, sensor and vision processing; and (5) accelerating advanced applications such as Augmented Reality.   Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Computer Vision, Real-Time Graphics Applications, GTC 2014 - ID S4785
Streaming:
Download:
 
Streamlined Transmission of 3D Assets with glTF
Fabrice Robinet (MontageJS)
This session explores issues around delivering real-time 3D content into mobile and web applications, by considering the following questions: (1)Images have JPEG, musics have MP3, why not a format to deliver 3D Content? (2) When designing a format fo ...Read More
This session explores issues around delivering real-time 3D content into mobile and web applications, by considering the following questions: (1)Images have JPEG, musics have MP3, why not a format to deliver 3D Content? (2) When designing a format for delivery, we can't ignore the underlying graphic API (GL) to do so. Therefore, wouldn't the most efficient engine formats eventually converge on the same kind of design? (3)Once content is baked and ready to be consumed by GL, how can we improve transfer rate with dedicated compression? (4) Wouldn't it be great to have a declarative way to represent GL content, so that developers can easily build a data-driven engine? (5)Why not centralize these common and so far redundant efforts to design a delivery and runtime format that is truly efficient for GL APIs? During this this show and tell presentation, glTF (graphics library Transmission Format) will be introduced. Following an overview of the eco-system, an introduction to glTF design and catchy demos from different implementations will be shown.Finally, compression results leveraging Open3DGC will be shared.   Back
 
Keywords:
Mobile Summit, Web Acceleration, Game Development, Real-Time Graphics Applications, GTC 2014 - ID S4805
Streaming:
Download:
 
HTML5 Outside the Browser Survival Guide: The Challenges of Hybrid Apps and Games
Iker Jamardo (Ludei)
Web based apps and games are growing both in number and complexity. Running outside the browser on a mobile device is still a challenging path full of bumps and hoops to overcome. From efficient memory management to access to native features, hybrid ...Read More
Web based apps and games are growing both in number and complexity. Running outside the browser on a mobile device is still a challenging path full of bumps and hoops to overcome. From efficient memory management to access to native features, hybrid apps provide a great way to solve the problems and mix all the advantages of both worlds: web and native. Far from the media fight of which is best, a combination of both technologies provide a much richer development experience. In this talk attendees will know and understand how to solve important matters when dealing with the system webview fragmentation, the poor bandwidth of the native bridges or the lack of support for certain important technologies like WebGL.  Back
 
Keywords:
Mobile Summit, Debugging Tools & Techniques, Web Acceleration, Game Development, GTC 2014 - ID S4807
Streaming:
Download:
 
Real-Time Facial Motion Capture and Animation on Mobile
Emiliano Gambaretto (Mixamo), Stefano Corazza (Mixamo)
3D Animation is the art form of the present and the future, with hundred of millions people drawn to its emotional power in movie theaters and games every year. Mixamo recently developed a facial capture and animation technology to enable anybody to ...Read More
3D Animation is the art form of the present and the future, with hundred of millions people drawn to its emotional power in movie theaters and games every year. Mixamo recently developed a facial capture and animation technology to enable anybody to create compelling animated content that is immediately reflected on a character's face. The technology was originally developed for 3D professionals, but with the recent introduction of the new generation mobile GPU hardware supporting OpenCL APIs such as the Tegra K1 it is now possible to port the technology to mobile devices. In the course of this presentation we will introduce numerical approaches to facial motion capture and animation that are based on a mixture of global and local models of human facial expressions and shape. The presenter will also go into the details of implementing the real-time technology on a Tegra K1 device.  Back
 
Keywords:
Mobile Summit, Computer Vision, Game Development, Machine Learning & AI, GTC 2014 - ID S4808
Streaming:
Download:
 
NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web
Mark Kilgard (NVIDIA)
Come see how NVIDIA is transforming your web browser into a fully GPU-accelerated experience. NVIDIA Path Rendering provides GPU-acceleration for web graphics standards such as Scalable Vector Graphics (SVG), HTML 5 Canvas, PDF documents, and font r ...Read More
Come see how NVIDIA is transforming your web browser into a fully GPU-accelerated experience. NVIDIA Path Rendering provides GPU-acceleration for web graphics standards such as Scalable Vector Graphics (SVG), HTML 5 Canvas, PDF documents, and font rendering. On mobile devices, screen resolutions and densities vary so vector graphics is a natural way to deploy 2D graphics experience such as games, maps, and traditional web pages. Watch as we demonstrate accelerated SVG viewers and web browsers on Tegra devices. We do this with an OpenGL extension available on all of NVIDIA's latest desktop and mobile GPUs.  Back
 
Keywords:
Mobile Summit, Web Acceleration, In-Vehicle Infotainment (IVI) & Safety, Real-Time Graphics Applications, GTC 2014 - ID S4810
Streaming:
Download:
 
Augmented Reality Gets Deep: The Impact of 3-D and Depth Cameras on Visualization
Rajesh Narasimha (Metaio)
Learn how the introduction and future integration of embedded 3-D cameras will affect computer vision and augmented reality experiences. This session will look at 3-D camera companies like Primesense and Softkinetic as well as the enabling technologi ...Read More
Learn how the introduction and future integration of embedded 3-D cameras will affect computer vision and augmented reality experiences. This session will look at 3-D camera companies like Primesense and Softkinetic as well as the enabling technologies that take advantage of them. Learn also how this technology can be applied in automotive, manufacturing, and even consumer sectors.   Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Automotive, Digital Product Design & Styling, GTC 2014 - ID S4826
Streaming:
 
Embedded Vision: Enabling Smarter Mobile Apps and Devices
Jeff Bier (Embedded Vision Alliance)
For decades, computer vision technology was found mainly in university laboratories and a few niche applications. Today, virtually every tablet and smartphone is capable of sophisticated vision functions such as hand gesture recognition, face recogni ...Read More
For decades, computer vision technology was found mainly in university laboratories and a few niche applications. Today, virtually every tablet and smartphone is capable of sophisticated vision functions such as hand gesture recognition, face recognition, gaze tracking, and object recognition. These capabilities are being used to enable new types of applications, user interfaces, and use cases for mobile devices. We illuminate the key drivers behind the rapid proliferation of vision capabilities in mobile devices, and highlight some of the most innovative processor architectures, sensors, tools and APIs being used for mobile vision.  Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Computational Photography, Computer Vision, GTC 2014 - ID S4829
Streaming:
Download:
 
Chrome on Mobile at 60 FPS: Now and In the Future
Nat Duca (Google)
Chrome Android's GPU compositing architecture renders trillions of frames a day, at 60fps. On one hand, it is tasked with rendering the most common desktop pages: content designed for desktop-class hardware capabilities, that without careful handlin ...Read More
Chrome Android's GPU compositing architecture renders trillions of frames a day, at 60fps. On one hand, it is tasked with rendering the most common desktop pages: content designed for desktop-class hardware capabilities, that without careful handling would bring a phone to its knees. On the other hand, it needs to render, also at 60fps, dynamic, touch-driven mobile applications. In this talk, we will explain how this architecture works, reviewing the design choices we've had to make around CPU/GPU alternatives, power efficiency, and memory usage. We will close with a discussion on where we're going in the future, especially around usage of the GPU for rasterization of web content.  Back
 
Keywords:
Mobile Summit, Web Acceleration, GTC 2014 - ID S4832
Streaming:
Download:
 
WebGL, HTML5 and How the Mobile Web Was Won
Tony Parisi (Vizi)
After years of fear, uncertainty and doubt, the jury is now in: HTML5 is the platform of choice for building cross-platform, connected applications for desktop and mobile. The advanced programming, animation and multimedia capabilities of modern web ...Read More
After years of fear, uncertainty and doubt, the jury is now in: HTML5 is the platform of choice for building cross-platform, connected applications for desktop and mobile. The advanced programming, animation and multimedia capabilities of modern web browsers, combined with hardware-accelerated 3D rendering provided by WebGL, represents a combination with limitless possibilities. With these technologies, developers can create immersive 3D games, integrated 2D/3D presentations, product displays, social media sites and more, all coded in JavaScript and running in the browser. This awesome power is also available to mobile devices: WebGL is now built into Android, and there are quality adapter libraries for use in developing hybrid applications (native + WebKit) for iOS. With HTML5 and WebGL, developers can build high-performance mobile 3D applications and web sites rivaling native implementations, in a fraction of the time. Join 3D pioneer and WebGL guru Tony Parisi as he explores the technology, busts the myths and tells us where it's really at for creating the next generation of 3D web and mobile applications.   Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Web Acceleration, Game Development, GTC 2014 - ID S4837
Streaming:
Download:
 
Enabling Direct3D Content on OpenGL Platforms: Mac, Linux, Android, and Beyond!
Gavriel State (TransGaming Inc.)
Today's developers face unprecedented challenges in choosing which platforms to target when developing games and applications meant to be used by a wide consumer audience. Beyond the Windows desktop, there are now a huge variety of new choices: alte ...Read More
Today's developers face unprecedented challenges in choosing which platforms to target when developing games and applications meant to be used by a wide consumer audience. Beyond the Windows desktop, there are now a huge variety of new choices: alternative desktop OS platforms such as Linux and Mac OS X; mobile devices such as phones and tablets; HTML-based web platforms, running on cloud-based servers; and a plethora of embedded CE systems, ranging from video game consoles to TV platforms. All of these platforms use some variety of OpenGL or OpenGL ES, rather than Direct3D. If you have games or other Direct3D-based content that you want to retarget to a new platform, this session will show you how to quickly and easily enable your graphics code to run on OpenGL platforms using TransGaming's shim technology.  Back
 
Keywords:
Mobile Summit, Game Development, Media & Entertainment, Programming Languages & Compilers, GTC 2014 - ID S4846
Streaming:
Download:
 
Using AR Capabilities to build new User Paradigms with Wearable Glasses
Raymond Lo (Meta Company)
Learn how to develop novel user interfaces with Spaceglasses, the world first wearable augmented reality glasses that utilize optical see-through stereoscopic display and time-of-flight ranging imaging camera for hand tracking and recognition of user ...Read More
Learn how to develop novel user interfaces with Spaceglasses, the world first wearable augmented reality glasses that utilize optical see-through stereoscopic display and time-of-flight ranging imaging camera for hand tracking and recognition of user's environment. With the ability to track wearer's hand and surrounding surfaces in 3D space, a new form of human-computer interaction can be enabled by turning everyday objects into interactive surfaces with augmented graphics. Example augmented reality applications will be given to show the possible applications can be developed with our current AR-enabled eyeglasses.  Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Computer Vision, Signal & Audio Processing, GTC 2014 - ID S4847
Streaming:
Download:
 
Project Tango: Giving Mobile Devices a Human-Scale Understanding of Space and Motion
Johnny Lee (Google)
Project Tango is a focused effort to harvest research from the last decade of work in computer vision and robotics and concentrate that technology into a mobile platform. It uses computer vision and advanced sensor fusion to estimate position and or ...Read More
Project Tango is a focused effort to harvest research from the last decade of work in computer vision and robotics and concentrate that technology into a mobile platform. It uses computer vision and advanced sensor fusion to estimate position and orientation of the device in the real-time, while simultaneously generating a 3D map of the environment. We will discuss the underlying technologies that make this possible, such as the hardware sensors and some of the software algorithms. We will also show demonstrations of how the technology could be used in both gaming and non-gaming applications. This is just the beginning and we hope you will join us on this journey. We believe it will be one worth taking.   Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Computer Vision, Game Development, GTC 2014 - ID S4848
Streaming:
 
Take GPU Power to the Maximum for Vision and Depth Sensor Processing: From NVIDIA's K1 Tegra to GPUs on the Cloud
Chen Sagiv (SagivTech Ltd.), Eri Rubin (SagivTech Ltd.)
Over the last six months SagivTech has been intensively developing CUDA code on the K1 Tegra for mobile computer vision applications that require immense computing power. In this talk we will share our joint effort together with NVIDIA and Mantis Vis ...Read More
Over the last six months SagivTech has been intensively developing CUDA code on the K1 Tegra for mobile computer vision applications that require immense computing power. In this talk we will share our joint effort together with NVIDIA and Mantis Vision to port the core algorithms of Mantis Vision's depth camera to NVIDIA K1 Tegra. We will also introduce SceneNet, a project funded by the European Commission that uses the power of crowd sourcing, in the form of multiple mobile phone users, to create a higher quality 3D video scene experience. We will discuss SagivTech's vision to exploit the compute power of the hybrid platform composed of NVIDIA's K1 Tegra and discrete GPUs in the cloud for computationally intensive, online and interactive applications. We will conclude with some take home tips on writing CUDA on the K1 Tegra.   Back
 
Keywords:
Mobile Summit, Computer Vision, GTC 2014 - ID S4872
Streaming:
Download:
 
Android Differentiation Options on Tegra K1
Andrew Edelsten (NVIDIA)
Android continues its meteoric rise as the world's dominate mobile operating system. Every day developers large and small discover new ways to delight users but getting noticed is increasingly difficult. The latest NVIDIA Tegra K1 processors provide ...Read More
Android continues its meteoric rise as the world's dominate mobile operating system. Every day developers large and small discover new ways to delight users but getting noticed is increasingly difficult. The latest NVIDIA Tegra K1 processors provide developers with a host of new features to differentiate their titles and get them flying above the rest of the crowd.   Back
 
Keywords:
Mobile Summit, Game Development, Mobile Applications, GTC 2014 - ID S4878
Streaming:
 
Efficient Parallel Computation on Android
Jason Sams (Google), Tim Murray (Google)
Mobile is very different from traditional desktop and HPC compute, which creates a new set of problems. Android's RenderScript addresses these problems by taking a different approach to compute. We will cover the problems presented by mobile and ho ...Read More
Mobile is very different from traditional desktop and HPC compute, which creates a new set of problems. Android's RenderScript addresses these problems by taking a different approach to compute. We will cover the problems presented by mobile and how RenderScript solves them. Results from K1 running typically RenderScript workloads will also be presented and discussed.  Back
 
Keywords:
Mobile Summit, Mobile Applications, Programming Languages & Compilers, GTC 2014 - ID S4885
Streaming:
Download:
 
Integrating Computer Vision Sensor Innovations into Mobile Devices
Eli Savransky (NVIDIA)
Integrating innovative computer vision sensors into new mobile devices is a very complex endeavor. Especially when popular machines like tablets and smartphones have already a very high level of functionality and users expectations are sky high. In t ...Read More
Integrating innovative computer vision sensors into new mobile devices is a very complex endeavor. Especially when popular machines like tablets and smartphones have already a very high level of functionality and users expectations are sky high. In this talk, we will discuss the challenges and opportunities of developing a new device with advanced sensor capabilities.   Back
 
Keywords:
Mobile Summit, Virtual & Augmented Reality, Computer Vision, Game Development, GTC 2014 - ID S4900
Streaming:
 
The Evolution and Future of Wearable Displays
Paul Travers (Vuzix)
Through the history of wearable displays, an industry that has such great promise, there has been very few winners except in niche markets. This talk will explore the history of wearable display technology sharing some of the failures and also some o ...Read More
Through the history of wearable displays, an industry that has such great promise, there has been very few winners except in niche markets. This talk will explore the history of wearable display technology sharing some of the failures and also some of the few success; leading to what is happening today where the real opportunity for wearable displays is finally being realized. Leaving on some thoughts for how the technology is evolving to create possibly one of the most exciting new technologies and market opportunities since the smart phone itself.   Back
 
Keywords:
Mobile Summit, GTC 2014 - ID S4901
Streaming:
Download:
 
WebGL Magic for Mortals (Not Everyone is a Wizard)
Victor Sand (Goo Technologies)
Since the introduction of the GPU powered web a few years back, graphics experts and hackers have been working hard to utilize the WebGL power. The API is approaching its second version (WebGL 2.0) and the time has come to bring it to the masses. In ...Read More
Since the introduction of the GPU powered web a few years back, graphics experts and hackers have been working hard to utilize the WebGL power. The API is approaching its second version (WebGL 2.0) and the time has come to bring it to the masses. In this presentation, Goo Technologies will talk about (and show!) how they use WebGL, HTML5 and JavaScript as not only prerequisites, but as the complete foundation for bringing graphics technology to a broader audience. The session will include descriptions and demos of the modern 3D API Goo Engine and the cutting edge online 3D editor Goo Create.  Back
 
Keywords:
Mobile Summit, Web Acceleration, Real-Time Graphics Applications, Rendering & Animation, GTC 2014 - ID S4902
Streaming:
 
Mobile Depth Sensing Methods and What They're Good For
Gur Arie Bittan (Mantis Vision)
Introduction of main mobile depth sensing technologies, understanding their limitations and value to different kind of mobile apps (gesture, AR, modeling, content creation). In this session we will overview the inherent difference between "Time ...Read More
Introduction of main mobile depth sensing technologies, understanding their limitations and value to different kind of mobile apps (gesture, AR, modeling, content creation). In this session we will overview the inherent difference between "Time of flight" depth sensing (SoftKinetics), passive triangulation (multi aperturestereoshape from motion), active triangulationcoded light methods (Mantis-Vision, Primesense), and in specific, the performance differences effect on different kind of mobile apps.   Back
 
Keywords:
Mobile Summit, GTC 2014 - ID S4904
Streaming:
Molecular Dynamics
Presentation
Media
OpenMM Molecular Dynamics on Kinases: Key Cancer Drug Targets Revealed with New Methods and GPU Clusters
Vijay Pande (Stanford University)
Learn how to use GPU-enabled molecular dynamics codes, parallelized on a cluster of 100 GPUs, and sample key conformational transitions. When applied to protein kinase molecules, key targets in anti-cancer drugs, these methods reveal new insights in ...Read More
Learn how to use GPU-enabled molecular dynamics codes, parallelized on a cluster of 100 GPUs, and sample key conformational transitions. When applied to protein kinase molecules, key targets in anti-cancer drugs, these methods reveal new insights into how to target new drugs to these systems.  Back
 
Keywords:
Molecular Dynamics, Big Data Analytics & Data Algorithms, Bioinformatics & Genomics, Computational Physics, GTC 2014 - ID S4133
Streaming:
 
Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM
Antti-Pekka Hynninen (National Renewable Energy Laboratory)
This presentation provides a first glimpse of a heterogeneous CPU+GPU Molecular Dynamics (MD) engine in CHARMM. In the MD engine, the GPU is used for the calculation of the direct part of the non-bonded force calculation, while the CPU takes care of ...Read More
This presentation provides a first glimpse of a heterogeneous CPU+GPU Molecular Dynamics (MD) engine in CHARMM. In the MD engine, the GPU is used for the calculation of the direct part of the non-bonded force calculation, while the CPU takes care of the rest of the work (reciprocal force calculation, bonded force calculation, integration, etc.). The MD engine is built around the CHARMM domain decomposition code enabling massively parallel MD simulations on multiple CPU+GPU nodes. The new MD engine outperforms the CPU code by a factor of 8 or more.  Back
 
Keywords:
Molecular Dynamics, Numerical Algorithms & Libraries, Computational Physics, Supercomputing, GTC 2014 - ID S4163
Streaming:
Download:
 
OpenMM: GPU Accelerated Algorithm Development for Molecular Dynamics
Peter Eastman (Stanford University)
Learn how to develop molecular dynamics algorithms for a GPU without writing any GPU code. OpenMM provides a high level scripting language in which scientists describe the computation to do using mathematics, not code. The equations are automatical ...Read More
Learn how to develop molecular dynamics algorithms for a GPU without writing any GPU code. OpenMM provides a high level scripting language in which scientists describe the computation to do using mathematics, not code. The equations are automatically analyzed and transformed into highly optimized CUDA kernels. This happens at runtime and is invisible to the user. Entirely novel algorithms can be implemented in just a few lines by someone with no CUDA programming experience, yet they run at full speed on the GPU hardware. This talk will describe how to use OpenMM to dramatically simplify and accelerate MD algorithm development. It also will describe the techniques used to transform equations into optimized code, making it relevant to programmers who want to apply similar techniques to other fields.  Back
 
Keywords:
Molecular Dynamics, Computational Physics, GTC 2014 - ID S4184
Streaming:
Download:
 
Deep Optimization of the Parallel Algorithm for Molecular Dynamics Simulations
Witold Rudnicki (University of Warsaw, ICM)
In-depth analysis of optimizations of the molecular dynamics code for large-scale molecular dynamics simulations will be presented. They were performed on the GPU port of the IMD code used for MD simulations of large solid-state systems. Several opt ...Read More
In-depth analysis of optimizations of the molecular dynamics code for large-scale molecular dynamics simulations will be presented. They were performed on the GPU port of the IMD code used for MD simulations of large solid-state systems. Several optimization techniques were developed for the linked-cell protocol of MD simulations: (1) tiling of atom-atom interactions; (2) implementation of action reaction principle; (3) removal of redundant atoms and tiles; and (4) pipelining of the computations for subsequent layers of cells. These methods were compared with a brute force approach and tested for Fermi and Kepler architectures. The optimizations employed allowed up to 5-fold performance improvement in comparison with the straightforward port on Kepler and up to 3-fold improvement on Fermi. Up to 60-fold speedups of force kernels were observed in comparison with a single core CPU. The single workstation with K20 card was equivalent to 64 MPI processes on a cluster.  Back
 
Keywords:
Molecular Dynamics, Computational Physics, Supercomputing, GTC 2014 - ID S4295
Streaming:
 
GPU Accelerated Parallel Simulated Annealing for Fitting Molecular Dynamics Potentials
Pierre-Yves Taunay (The Pennsylvania State University)
This work presents a parallel simulated annealing implementation for fitting molecular dynamics potentials. In our implementation, each GPU is given a random set of Lennard-Jones parameters sigma and epsilon, and performs separately a molecular dynam ...Read More
This work presents a parallel simulated annealing implementation for fitting molecular dynamics potentials. In our implementation, each GPU is given a random set of Lennard-Jones parameters sigma and epsilon, and performs separately a molecular dynamics simulation. A derived quantity, the structure factor, is then compared to experimental data and determines the quality of the fitting parameters. Information about the best fit is exchanged across GPUs at a fixed number of iterations. The choice of random parameters is then restarted in the vicinity of the best parameter set. Using GPUs, a larger parameter set can be explored in a given time as molecular dynamics simulations benefit greatly from GPU acceleration.  Back
 
Keywords:
Molecular Dynamics, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4307
Streaming:
 
Computing the Cure: Combining Sequencing and Physical Simulation on GPUs to Provide Patient Customized Cancer Treatments
Ross Walker (UCSD)
The sequencing revolution is completely changing the landscape of cancer treatment ushering in the era of personalized medicine where individual treatments will be customized for a specific patient. Instead of simply looking at stained tumor biopsy ...Read More
The sequencing revolution is completely changing the landscape of cancer treatment ushering in the era of personalized medicine where individual treatments will be customized for a specific patient. Instead of simply looking at stained tumor biopsy sections under a microscope, cancer diagnosis is going high-tech by allowing sequencing of patient tumors (and patient genomes) to determine what precise molecular events cause an individual cancer. In principle, this sequence information holds the key to individually targeted therapies with enormously increased success rates in treating (and even curing) cancer. This is the "molecular oncology" revolution and it will completely change the cancer diagnosis and treatment landscape in the next decade. This talk will highlight work by scientists at MSKCC, Stanford and UCSD to build the tools needed to determine drug susceptibilities using a combination of sequencing data and *physical* simulation. This work will ultimately provide a way to compute patient customized cancer treatments.  Back
 
Keywords:
Molecular Dynamics, Bioinformatics & Genomics, Computational Physics, Computational Structural Mechanics, GTC 2014 - ID S4333
Streaming:
 
Visualization and Analysis of Petascale Molecular Simulations with VMD
John Stone (University of Illinois)
We present recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. This talk will focus on recent a ...Read More
We present recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. This talk will focus on recent algorithm developments and the applicability and efficient use of new CUDA features on state-of-the-art Kepler GPUs. We will present the latest performance results for GPU accelerated trajectory analysis runs on Cray XK7 petascale systems and GPU-accelerated workstation platforms. We will conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.  Back
 
Keywords:
Molecular Dynamics, Big Data Analytics & Data Algorithms, Scientific Visualization, Supercomputing, GTC 2014 - ID S4410
Streaming:
Download:
 
Peer-to-Peer Molecular Dynamics and You
Scott LeGrand (Amazon Web Services)
Recent code optimization within AMBER has improved single-node performance by up to 30% and multi-GPU scaling by up to 70%. The latter was achieved by aggressive use of Peer-to-Peer copies and RDMA. This has unleashed new time scale regimes for sam ...Read More
Recent code optimization within AMBER has improved single-node performance by up to 30% and multi-GPU scaling by up to 70%. The latter was achieved by aggressive use of Peer-to-Peer copies and RDMA. This has unleashed new time scale regimes for sampling and simulation on low-end GPU clusters, beating every known software-based molecular dynamics codebase in existence at the time of submission. This talk will cover first how AMBER's already efficient single-node performance was made even more so, the challenge not only of enabling peer to peer copies between GPUs, but obtaining hardware capable of enabling it, and finally, up to the minute results using MVAPICH2 and OpenMPI for RDMA directly between GPUs on separate nodes connected by dual-line FDR Infiniband.  Back
 
Keywords:
Molecular Dynamics, Big Data Analytics & Data Algorithms, Supercomputing, GTC 2014 - ID S4460
Streaming:
Download:
 
Accelerating the Discrete Element Method for Faceted Particles Using HOOMD-Blue
Matthew Spellings (University of Michigan)
Explore the concepts behind large-scale modeling of faceted anisotropic particles. Dynamical methods are the most direct way to study the full set of properties of systems of colloidal and nanoscale particles. Classical and event-driven molecular dyn ...Read More
Explore the concepts behind large-scale modeling of faceted anisotropic particles. Dynamical methods are the most direct way to study the full set of properties of systems of colloidal and nanoscale particles. Classical and event-driven molecular dynamics simulations of the past have focused on behavior of isotropic particles and limited classes of anisotropic particles such as ellipsoids. In this talk, we discuss the algorithms and data structures behind a GPU-accelerated implementation of the discrete element method for polyhedral particles in HOOMD-Blue. This formulation allows us to efficiently simulate conservative and non-conservative dynamics of faceted shapes within a classical molecular dynamics framework. Research applications include studies of nucleation and growth, granular materials, glassy dynamics and active matter.  Back
 
Keywords:
Molecular Dynamics, Computational Physics, GTC 2014 - ID S4477
Streaming:
Download:
 
GPU Accelerated Fully Flexible Haptic Protein-Ligand Docking
Thanasis Anthopoulos (Cardiff University)
This presentation refers to a haptic protein-ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enabled the application to run with a fully flexib ...Read More
This presentation refers to a haptic protein-ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enabled the application to run with a fully flexible ligand and protein target. The first part of the talk describes the algorithm used to perform the MMFF94s force-field energy and force calculations. Performance benchmarks will be presented to show the speed-up gained from the presented CUDA algorithms. The second part of the talk refers to an evolutionary algorithm designed to exploit Hyper-Q capabilities and evaluate asynchronously Energy kernels using the algorithm explained in the first part of the talk. Performance benchmarks are provided to show how the algorithm can achieve an additional 2-3X depending on system size when it runs on a GK110 chipset. The session closes with results generated from researchers using the CUDA version of the application.   Back
 
Keywords:
Molecular Dynamics, Computational Physics, Supercomputing, GTC 2014 - ID S4492
Streaming:
Download:
 
Accelerating Dissipative Particle Dynamics Simulation on Kepler: Algorithm, Numerics and Application
Yu-Hang Tang (Brown University)
The talk focuses on the implementation of a highly optimized dissipative particle dynamics (DPD) simulation code in CUDA, which achieves 20 times speedup on a single Kepler GPU over 12 Ivy-Bridge cores. We will introduce a new pair searching algorith ...Read More
The talk focuses on the implementation of a highly optimized dissipative particle dynamics (DPD) simulation code in CUDA, which achieves 20 times speedup on a single Kepler GPU over 12 Ivy-Bridge cores. We will introduce a new pair searching algorithm that is parallel, deterministic, capable of generating strictly ordered neighbor list and atomics-free. Such neighbor list leads to optimal memory efficiency when combined with proper particle reordering schemes. We also propose an in-situ generation scheme for Gaussian random numbers that has a better performance without losing quality. In addition, details will be given on how to design custom transcendental functions that fit specifically to our DPD functional form. The code is scalable and can run on over a thousand nodes on the Titan supercomputer. Demonstration of large-scale DPD simulations on vesicle assembly and red blood cell suspension hydrodynamics using our code will be given.  Back
 
Keywords:
Molecular Dynamics, Numerical Algorithms & Libraries, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4518
Streaming:
Download:
 
Virtual Screening of One Billion Compound Libraries Using Novel GPU-Accelerated Cheminformatics Approaches
Olexandr Isayev (University of North Carolina at Chapel Hill)
Recent years have seen an unprecedented growth of chemical databases incorporating tens of millions of available or up to 170 billion of synthetically feasible chemical compounds. They offer unprecedented opportunities for discovering novel molecules ...Read More
Recent years have seen an unprecedented growth of chemical databases incorporating tens of millions of available or up to 170 billion of synthetically feasible chemical compounds. They offer unprecedented opportunities for discovering novel molecules with the desired therapeutical and safety profile. However, current cheminformatics technologies and software relying on conventional CPUs are not capable to handle, characterize, and virtually screen such "Big Data" chemical libraries. We present the first proof-of-concept study of GPU-accelerated cheminformatics software capable of calculating chemical descriptors for a billion molecules-large library. Furthermore, we demonstrate the ability of GPU-based virtual screening software to rapidly identify compounds with specific properties in extremely large virtual libraries. We posit that in the era of big data explosion in chemical genomics, GPU computing represents an effective and inexpensive architecture to develop and employ a new generation of cheminformatics methods and tools.  Back
 
Keywords:
Molecular Dynamics, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4561
Streaming:
Download:
 
BUDE: GPU-Accelerated Molecular Docking for Drug Discovery
Richard Sessions (University of Bristol)
The Bristol University Docking Engine (BUDE) is next-generation molecular docking software exploiting GPUs to deliver a step change in performance. Massive sampling of search space, coupled with a novel method of estimating the free energy of binding ...Read More
The Bristol University Docking Engine (BUDE) is next-generation molecular docking software exploiting GPUs to deliver a step change in performance. Massive sampling of search space, coupled with a novel method of estimating the free energy of binding between the receptor and ligand (the docking partners), enables novel science. BUDE and a medium sized GPU-enabled supercomputer can be used to perform (1) virtual-screening-by-docking of 10 million drug-like molecules to a protein for drug discovery in a few days; (2) scanning of the surface of a protein with hundreds of drug-like molecules to locate binding sites; (3) protein-protein docking in real space for predicting important protein interactions involved in cellular signaling. In recent optimization work with BUDE, we have achieved a sustained 46% theoretical peak FLOPs on an NVIDIA GTX680.   Back
 
Keywords:
Molecular Dynamics, Bioinformatics & Genomics, Supercomputing, GTC 2014 - ID S4604
Streaming:
Download:
Numerical Algorithms & Libraries
Presentation
Media
Session 4: Deploying Your CUDA Applications Into The Wild (Presented by ArrayFire)
Umar Arshad (ArrayFire)
Excited about CUDA but concerned about deployment? In this session, you will learn best practices for deploying your CUDA application and about how to resolve issues that commonly arise in the process. You will learn about scaling your application t ...Read More
Excited about CUDA but concerned about deployment? In this session, you will learn best practices for deploying your CUDA application and about how to resolve issues that commonly arise in the process. You will learn about scaling your application to multiple GPUs to handle large amounts of data (such as streams and/or files on disk). You will also learn about deploying your CUDA based applications in the cloud using Node.js, containers via Docker, etc.  Back
 
Keywords:
Numerical Algorithms & Libraries, Big Data Analytics & Data Algorithms, Clusters & GPU Management, Mobile Applications, GTC 2014 - ID S4713
Streaming:
 
Easy Multi-GPU Programming with CUDArrays
Javier Cabezas (Barcelona Supercomputing Center)
Learn how to boost your productivity with CUDArrays. CUDArrays is user-level library that eases the development of CUDA programs by offering a multi-dimensional array data type that can be used both in host and device code. This data type relieves pr ...Read More
Learn how to boost your productivity with CUDArrays. CUDArrays is user-level library that eases the development of CUDA programs by offering a multi-dimensional array data type that can be used both in host and device code. This data type relieves programmers of the burden of managing multi-dimensional arrays through flat "C"-style memory allocations. Moreover, in systems with several GPUs and P2P memory access support, CUDArrays transparently distributes the computation across several GPUs. Using data access pattern information provided by the compiler, the runtime automatically determines how to partition (or replicate) the arrays to minimize the number of accesses to other GPUs' memories. Results show that linear speedups can be achieved in most cases. Examples will be provided for different types of scientific computations.  Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4142
Streaming:
Download:
 
Efficient GPU-Friendly Pre-Conditioners for Large-Scale Finite Element Analysis
Krishnan Suresh (University of Wisconsin)
The goal of this session is to introduce a new GPU-friendly pre-conditioner, specifically for finite-element applications. The pre-conditioner is assembly-free in that neither the finite-element stiffness matrix nor the pre-conditioner is assembled ( ...Read More
The goal of this session is to introduce a new GPU-friendly pre-conditioner, specifically for finite-element applications. The pre-conditioner is assembly-free in that neither the finite-element stiffness matrix nor the pre-conditioner is assembled (ever!). The memory foot-print is therefore extremely small, and the GPU implementation is, in most cases, compute-bound. A CUDA implementation will be discussed, followed by examples of finite element problems with 10's of millions of degrees of freedom. It is assumed that registrants are already familiar with finite element techniques.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Structural Mechanics, Computer Aided Design, GTC 2014 - ID S4171
Streaming:
Download:
 
Fast Evaluation of the Inverse Poisson Cumulative Distribution Function
Mike Giles (University of Oxford)
The inverse of the Poisson cumulative distribution function maps uniformly-distributed random numbers to Poisson random variates. This talk describes a fast implementation for GPUs which is based on some novel approximations of the inverse of the cl ...Read More
The inverse of the Poisson cumulative distribution function maps uniformly-distributed random numbers to Poisson random variates. This talk describes a fast implementation for GPUs which is based on some novel approximations of the inverse of the closely-related incomplete gamma function for the case of large Poisson rates. Both single-precision and double-precision versions have been developed, and in each case the computational cost is not much more than the cost of the corresponding function for inverting the Normal cumulative distribution function. The software is freely available as open source from http://people.maths.ox.ac.uk/gilesm/poissinv/  Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4173
Streaming:
Download:
 
How to Avoid Global Synchronization by Domino Scheme
Lung-Sheng Chien (NVIDIA)
Learn how to trace a data dependence graph without global synchronization. Such dependence graph can be built from sparse triangular solve, incomplete Cholesky factorization or incomplete LU factorization. There are several issues we will address, in ...Read More
Learn how to trace a data dependence graph without global synchronization. Such dependence graph can be built from sparse triangular solve, incomplete Cholesky factorization or incomplete LU factorization. There are several issues we will address, including: 1)How to reproduce the result without atomic operation; 2)How to keep one kernel to track data dependence graph; 3)How to keep small working space because GPU has limited device memory, and; 4)Penalty of warp size on this latency-sensitive application.   Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4188
Streaming:
Download:
 
Finite Difference Simulations on GPU Clusters: How Far Can You Push 1D Domain Decomposition?
Pierre Wahl (Brussels Photonics Team/ Vrije Universiteit Brussel)
To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session the FDTD code B-CALM is introduced and is used as a case study to explain by example how both targets can be met. We expl ...Read More
To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session the FDTD code B-CALM is introduced and is used as a case study to explain by example how both targets can be met. We explain how the memory bound kernels of B-CALM have been optimized for Fermi and Kepler and how efficient inter GPU communication was enabled by using CUDA-aware MPI. We explain in detail how this was done and present two performance models which we have developed to estimate single GPU performance as well as the scaling limits. To validate the model performance results from different systems are presented including a Infiniband Cluster with GPUDirect RDMA.  Back
 
Keywords:
Numerical Algorithms & Libraries, Clusters & GPU Management, Computational Physics, Supercomputing, GTC 2014 - ID S4190
Streaming:
Download:
 
GPU Acceleration of Sparse Matrix Factorization in CHOLMOD
Steven Rennich (NVIDIA), Tim Davis (University of Florida)
Sparse direct solvers, and their requisite factorization step, are a critical component of computational engineering and science codes. High performance is typically achieved by reducing the sparse problem to dense sub-problems and applying dense ma ...Read More
Sparse direct solvers, and their requisite factorization step, are a critical component of computational engineering and science codes. High performance is typically achieved by reducing the sparse problem to dense sub-problems and applying dense math kernels. However, achieving high performance on a GPU is complicated due to the range of sizes of the dense sub-problems, irregular memory access patterns, and the limited communication bandwidth between the host system and the GPU. This talk will describe the high factorization performance achieved in CHOLMOD using the GPU and discuss in detail key techniques used to achieve this performance including minimizing communication and maximizing concurrency.   Back
 
Keywords:
Numerical Algorithms & Libraries, Big Data Analytics & Data Algorithms, Computational Structural Mechanics, GTC 2014 - ID S4201
Streaming:
Download:
 
Multifrontal Sparse QR Factorization on the GPU
Tim Davis (University of Florida)
Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPU ...Read More
Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multi-frontal QR factorization method that meets this challenge, and is up to eleven times faster than a highly optimized method on a multicore CPU. Our method is unique compared with prior methods, since it factorizes many frontal matrices in parallel, and keeps all the data transmitted between frontal matrices on the GPU. A novel bucket scheduler algorithm extends the communication-avoiding QR factorization for dense matrices, by exploiting more parallelism and by exploiting the staircase form present in the frontal matrices of a sparse multifrontal method. Peak performance is over 80 Gflops on an Fermi Tesla C2070, in double precision. This is joint work with Nuri Yeralan and Sanjay Ranka.  Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4204
Streaming:
 
PARALUTION: A Library for Iterative Sparse Methods on Multi-core CPUs and GPUs
Dimitar Lukarski (Uppsala University, Sweden)
Dive deep into the sparse iterative solvers on GPUs without touching CUDA, with advanced preconditioning techniques and full portability of your program towards CPUs! Learn, how the PARALUTION library is able to handle these features! The library pro ...Read More
Dive deep into the sparse iterative solvers on GPUs without touching CUDA, with advanced preconditioning techniques and full portability of your program towards CPUs! Learn, how the PARALUTION library is able to handle these features! The library provides various Krylov subspace and algebraic/geometric multigrid solvers, including ILU and approximate inverse type of preconditioners/smoothers. You will investigate the design of the library in details, learn about its key techniques for fine-grained parallelism and additionally take note on the latest performance benchmarks on multi-core CPU, GPU and Xeon Phi. Source code examples will be presented to show the ease of use. Finally, the talk will give insight on how to directly integrate PARALUTION into your application using the C++ API or with the supplied plug-ins for FORTRAN, Deal.II, OpenFOAM, Elmer and Agros2D.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4207
Streaming:
Download:
 
Fast N-body Methods as a Compute-Bound Preconditioner for Sparse Solvers on GPUs
Rio Yokota (KAUST)
Learn how to unleash the full power of GPUs on one of the more difficult problems -- preconditioning in sparse solvers -- by using fast N-body methods as a preconditioner. Fast N-body methods have been able to achieve high percentage of the peak perf ...Read More
Learn how to unleash the full power of GPUs on one of the more difficult problems -- preconditioning in sparse solvers -- by using fast N-body methods as a preconditioner. Fast N-body methods have been able to achieve high percentage of the peak performance since the early days of GPU computing. However, its successful applications have been limited to astrophysics and molecular dynamics, where the physics itself is naturally described by a collection of discrete points. Mathematically, there is nothing that prevents the use of fast N-body methods as a solver for a more general class of PDEs. This would not have been a good idea back when Flops were expensive, since it essentially turns the sparse matrix into a dense matrix of the same size, before hierarchically grouping the off-diagonal blocks. But now that Flops are becoming comparatively cheap, the notion of a "compute-bound preconditioner" sounds attractive more than ever. We will demonstrate how competitive such a preconditioner actually is on Kepler.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Physics, GTC 2014 - ID S4228
Streaming:
Download:
 
Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points
Peter Zaspel (University of Bonn)
Join our presentation to get insight into our latest developments on meshless numerical methods with millions of degrees of freedoms. So-called radial basis function (RBF) kernel methods allow us to attack numerical problems such as interpolation, qu ...Read More
Join our presentation to get insight into our latest developments on meshless numerical methods with millions of degrees of freedoms. So-called radial basis function (RBF) kernel methods allow us to attack numerical problems such as interpolation, quadrature or the solution of partial differential equations with provable high-order convergence. Important applications are knowledge gain on extreme-size data sets or e.g. fluid-dynamics. However, kernel methods usually involve to solve dense linear systems (with O(N?) complexity) which makes them unattractive for large-scale problems. We are now able to overcome this complexity bottleneck by an appropriately preconditioned iterative approach, thus achieving O(N?) or even O(N log(N)) complexity. The preconditioner fully decouples many small subproblems thus perfectly fits to multi-GPU and later on to Exascale systems. Overall, the method allows for an almost perfect scaling on hundreds of GPUs and RBF kernel problems with millions of unknowns.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Physics, Machine Learning & AI, Supercomputing, GTC 2014 - ID S4235
Streaming:
Download:
 
A GPU Sparse Direct Solver for AX=B
Jonathan Hogg (Science and Technology Facilities Council (STFC))
The solution of Ax=b for sparse A is one of the core computation kernels ("dwarves") used in scientific computing. While there are many GPU iterative methods libraries available, these can only tackle a limited range of problems due to prec ...Read More
The solution of Ax=b for sparse A is one of the core computation kernels ("dwarves") used in scientific computing. While there are many GPU iterative methods libraries available, these can only tackle a limited range of problems due to preconditioning requirements. On the CPU, black box direct solvers are often the first port of call for more challenging problems, however little GPU support is present in existing libraries. We present a new direct solver library capable of performing entirely on GPU factorization and solve for symmetric problems. The talk will cover our solution to a number of the challenges involved in making this reality, and present results across a number of application areas including FEM and Optimization.   Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4243
Streaming:
Download:
 
Efficient Solution of Multiple Scalar and Block-Tridiagonal Equations
Endre Laszlo (University of Oxford, Oxford e-Research Center)
Many numerical methods require the solution of multiple independent tridiagonal systems. This talk will describe optimized methods for solving such systems, considering both the case where the tridiagonal elements are scalar, and the case where they ...Read More
Many numerical methods require the solution of multiple independent tridiagonal systems. This talk will describe optimized methods for solving such systems, considering both the case where the tridiagonal elements are scalar, and the case where they are composed of square blocks of dimension D, typically 3-8. For the scalar case very good performance is achieved using a combination of the Thomas algorithm and parallel cyclic reduction. In the block case it is shown that good performance can be achieved by using D cooperating threads, all within the same warp.   Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Physics, Finance, GTC 2014 - ID S4289
Streaming:
Download:
 
Fast Solvers for Linear Systems on the GPU
Cornelis Vuik (Delft University of Technology)
Some examples are given to solve large linear systems coming from practical/industrial applications. The methods are based on preconditioned Krylov subspace methods. Most building blocks are easy implemented on the GPU. The most involved operation is ...Read More
Some examples are given to solve large linear systems coming from practical/industrial applications. The methods are based on preconditioned Krylov subspace methods. Most building blocks are easy implemented on the GPU. The most involved operation is the preconditioner. In this talk three variants are discussed: (1) Neumann series, (2) Deflation techniques, and (3) Recursive red black ordering. The methods are applied so multi-phase flow and a ship simulator application and show speedups of a factor 30-40.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Fluid Dynamics, Manufacturing, Supercomputing, GTC 2014 - ID S4299
Streaming:
Download:
 
Performance Impact of Dynamic Parallelism on Clustering Algorithms on GPUs.
Michela Taufer (University of Delaware)
Discover and quantify the performance gains of dynamic parallelism for clustering algorithms on GPUs. Dynamic parallelism effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. ...Read More
Discover and quantify the performance gains of dynamic parallelism for clustering algorithms on GPUs. Dynamic parallelism effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance is measured using two well-known clustering algorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.  Back
 
Keywords:
Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4318
Streaming:
Download:
 
Multi-GPU Iterative Solvers Made Easy Using ArrayFire
Pavan Yalamanchili (ArrayFire)
Learn how to control of the location of your data while relieving the burden of managing the communication between GPUs using ArrayFire. ArrayFire is a scientific library that has the fastest implementations of hundreds of algorithms including Linear ...Read More
Learn how to control of the location of your data while relieving the burden of managing the communication between GPUs using ArrayFire. ArrayFire is a scientific library that has the fastest implementations of hundreds of algorithms including Linear Algebra (Sparse and Dense), Numerical methods. ArrayFire's easy to use array notation coupled with fast, out of core, implementations of commonly used algorithms helps users to easily implement traditional and customized iterative solvers.   Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4349
Streaming:
 
GPU Floating Point Accuracy: Theory and Practice
Lars Nyland (NVIDIA), Dale Southard (NVIDIA), Alex Fit-Florea (NVIDIA)
With computational rates in the teraflops, GPUs can accumulate round-off errors at an alarming rate. The errors are no different than those on other IEEE-754-compliant hardware, but GPUs are commonly used for much more intense calculations, so the c ...Read More
With computational rates in the teraflops, GPUs can accumulate round-off errors at an alarming rate. The errors are no different than those on other IEEE-754-compliant hardware, but GPUs are commonly used for much more intense calculations, so the concern for error is or should be significantly increased. In this talk, we'll examine the accumulation of round-off errors in the n-body application from the CUDA SDK, showing how varied the results can be, depending on the order of operations. We'll then explore a solution that tracks the accumulated errors, motivated by the methods suggested by Kahan (Kahan Summation) and Gustavson, Moreira & Enekel (from their work on stability and accuracy regarding Java portability). The result is a dramatic reduction in round-off error, typically resulting in the nearest floating-point value to the infinitely-precise answer. Furthermore, we will show the performance impact of tracking the errors, which is small, even on numerically-intense algorithms such as the n-body algorithm.  Back
 
Keywords:
Numerical Algorithms & Libraries, Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4370
Streaming:
 
AMR-Based on Space-Filling Curve for Stencil Applications
Takayuki Aoki (Tokyo Institute of Technology)
AMR (Adaptive Mesh Refinement) is an efficient method capable to assign a mesh with a proper resolution to any local areas. It has great advantages from the view point of computational cost and memory usage for practical stencil applications such as ...Read More
AMR (Adaptive Mesh Refinement) is an efficient method capable to assign a mesh with a proper resolution to any local areas. It has great advantages from the view point of computational cost and memory usage for practical stencil applications such as computational fluid dynamics. According to the octree data structure, the refinement process is recursive and the computation is carried out on the leaf meshes. By using bigger leaves than those of CPU, we can assign a CUDA block to a leaf with enough thread numbers. We show a GPU implementation in which the leaves are connected by the Hilbert space-filling curve and discuss the overhead of the data management.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4371
Streaming:
 
Raising the Roofline on GPU Applications with Stacked Memory
Lorena Barba (George Washington University)
GPU applications face three potential bottlenecks: instruction throughput, memory throughput and latency. Sometimes we can refactor the algorithm to improve performance after profiling. Another approach is to use the roofline model to analyze computa ...Read More
GPU applications face three potential bottlenecks: instruction throughput, memory throughput and latency. Sometimes we can refactor the algorithm to improve performance after profiling. Another approach is to use the roofline model to analyze computational kernels and identify performance limitations on specific hardware. Such analysis characterizes many important scientific algorithms as memory-bound when running on GPUs. But as we look forward to new generations endowed with stacked DRAM, we see the roof magically lifting due to reduced latencies and higher bandwidths, leading to unprecedented speed-up factors in memory-bound algorithms. With my co-author Manuel Ujaldon, NVIDIA CUDA Fellow and Professor of Computer Architecture at the University of Malaga (Spain), we are looking at how scientific algorithms may benefit from the stacked DRAM of future GPU generations. In this talk, I will present how we characterize GPU application performance via the roofline model and analyze the contribution of stacked DRAM to anticipate its impact in raising performance ceilings in future GPUs like Volta.  Back
 
Keywords:
Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4486
Streaming:
 
Parallelizing a Real-Time 3D Finite Element Algorithm using CUDA: Limitations, Challenges and Opportunities
Vukasin Strbac (KULeuven University, Leuven)
Learn about the challenges of parallelizing a Finite Element problem using the Total Lagrangian Explicit Dynamic formulation. We examine the algorithm and perform a detailed analysis of the performance limiting factors of parallelization using CUDA. ...Read More
Learn about the challenges of parallelizing a Finite Element problem using the Total Lagrangian Explicit Dynamic formulation. We examine the algorithm and perform a detailed analysis of the performance limiting factors of parallelization using CUDA. Potential optimization benefits are elucidated in terms of register usage thresholds and other factors for better performance. Results of a larger usability study are presented on a simple problem examining single/double precision tradeoff on a wide range of GPUs and problem sizes. Discover the impact that real-time FE can bring to the intraoperative surgical setting with in-the-loop computation facilitating surgical robotics.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Physics, Computational Structural Mechanics, GTC 2014 - ID S4497
Streaming:
Download:
 
Explore Computational Power of GPU in Electromagnetics and Micromagnetics
Sidi Fu (UCSD)
This session presents how GPUs are utilized to parallelize the computation on realistic large-scale practical Electromagnetic and Micromagnetic simulators. Two important algorithms are discussed: Non-uniform Fast Fourier Transform (NUFFT) and Sparse ...Read More
This session presents how GPUs are utilized to parallelize the computation on realistic large-scale practical Electromagnetic and Micromagnetic simulators. Two important algorithms are discussed: Non-uniform Fast Fourier Transform (NUFFT) and Sparse Matrix Vector Multiplication (SpMV). Methods used to overcome the bottlenecks related to communications between threads and irregular data access pattern are presented. We also outline a scheme to distribute the computations between multiple GPUs for further acceleration. We, then, demonstrate how this GPU accelerated methods are used in Electromagnetic and Micromagnetic solvers for modeling the magnetization dynamics in ultra-complex magnetic nanostructured materials and devices.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Physics, GTC 2014 - ID S4522
Streaming:
 
Sparse LU Factorization on GPUs for Accelerating SPICE Simulation
Xiaoming Chen (Tsinghua University)
Simulation Program with Integrated Circuit Emphasis (SPICE) simulators are widely used for transistor-level simulation in IC design and verification. The time cost of SPICE simulators is dominated by two parts: MOSFET model evaluation and the sparse ...Read More
Simulation Program with Integrated Circuit Emphasis (SPICE) simulators are widely used for transistor-level simulation in IC design and verification. The time cost of SPICE simulators is dominated by two parts: MOSFET model evaluation and the sparse linear solver. This session will talk about our work on GPU-based sparse LU factorization which is specially designed for SPICE simulation. In particular, we will introduce the challenges of mapping a sparse solver onto a GPU, our parallelization strategies of sparse LU factorization, and performance optimization approaches. Experimental results will be presented and discussed as well.  Back
 
Keywords:
Numerical Algorithms & Libraries, Electronic Design Automation, GTC 2014 - ID S4524
Streaming:
Download:
 
MAGMA: Development of High-Performance Linear Algebra for GPUs
Stan Tomov (University of Tennesse, Knoxville)
In this session you will learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. We will show a number of novel algorithms for solving linear systems and eigenvalue problems. Besides the a ...Read More
In this session you will learn about the newest developments in high-performance numerical linear algebra for heterogeneous GPU-based systems. We will show a number of novel algorithms for solving linear systems and eigenvalue problems. Besides the algorithmic developments, we will present the methodology for their implementation on multiGPU platforms. Ease of development is achieved through a programming model that allows to express algorithms through sequential code that gets executed in parallel by a run-time system that schedules the execution over GPUs and multicore CPUs while seamlessly moves data between (when needed) GPUs and CPUs. The implementations are open source, available through the MAGMA library - a next generation of Sca/LAPACK for heterogeneous architectures. Besides the Sca/LAPACK functionality for dense linear algebra problems, we will present a new MAGMA component that deals with sparse linear algebra problems as well.  Back
 
Keywords:
Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4541
Streaming:
Download:
 
GAMPACK: A Scalable GPU-Accelerated Algebraic Multigrid Package
Yongpeng Zhang (Stone Ridge Technology)
We present our latest development work for GAMPACK, a fully GPU-accelerated Algebraic Multigrid PACKage. GAMPACK is used to solve elliptic PDEs found in various applications including reservoir simulation, CFD and structural mechanics. We compare cla ...Read More
We present our latest development work for GAMPACK, a fully GPU-accelerated Algebraic Multigrid PACKage. GAMPACK is used to solve elliptic PDEs found in various applications including reservoir simulation, CFD and structural mechanics. We compare classical and aggregation-based AMG algorithms on GPUs and demonstrate substantial acceleration of both the setup and solve phases over CPU-only implementations. We discuss how we achieve good scaling for large problems by utilizing all computing resources (including multi-GPU, multi-core CPU and clusters), by overlapping communication and computation and by optimally distributing the workload across available hardware resources. Finally, we describe how accelerated AMG can benefit engineering and scientific applications by significantly reducing the time to solution.   Back
 
Keywords:
Numerical Algorithms & Libraries, Clusters & GPU Management, GTC 2014 - ID S4542
Streaming:
Download:
 
CUB: "Collective" Software Primitives for CUDA Kernel Development
Duane Merrill (NVIDIA)
Learn how to use the CUB library of "collective" SIMT primitives to simplify CUDA kernel development, maintenance, and tuning. Constructing, tuning, and maintaining kernel code is perhaps the most challenging, time-consuming aspect of CUDA ...Read More
Learn how to use the CUB library of "collective" SIMT primitives to simplify CUDA kernel development, maintenance, and tuning. Constructing, tuning, and maintaining kernel code is perhaps the most challenging, time-consuming aspect of CUDA programming. CUDA kernel software is where the complexity of parallelism is expressed. Programmers must reason about deadlock, livelock, synchronization, race conditions, shared memory layout, plurality of state, granularity, throughput, latency, memory bottlenecks, etc. However, with the exception of CUB, there are few (if any) software libraries of reusable kernel primitives. In the CUDA ecosystem, CUB is unique in this regard. CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model:Device-wide primitives (sort, prefix scan, reduction, histogram, etc.); Block-wide "collective" primitives (I/O, sort, prefix scan, reduction, histogram, etc.); Warp-wide "collective" primitives (Warp-wide prefix scan, reduction, etc.)  Back
 
Keywords:
Numerical Algorithms & Libraries, Programming Languages & Compilers, GTC 2014 - ID S4566
Streaming:
Download:
 
Reasoning About Memory Performance Using Index-Digit Notation
Brandon Lloyd (NVIDIA)
Achieving good memory performance in CUDA for algorithms on arrays with non-trivial access patterns, such as transpose or FFT, requires careful attention to shared memory bank conflicts, global memory coalescing, and on older GPUs, partition camping. ...Read More
Achieving good memory performance in CUDA for algorithms on arrays with non-trivial access patterns, such as transpose or FFT, requires careful attention to shared memory bank conflicts, global memory coalescing, and on older GPUs, partition camping. Thinking about memory performance issues in the native multi-dimensional problem domain can sometimes be challenging. Index-digit notation provides an abstract representation of memory access patterns that can make reasoning about solutions to memory performance issues easier. In this session learn how to resolve bank conflicts, coalescing, and partition camping by performing simple transformations in index-digit notation. Applications to transpose and FFT will be discussed.  Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4586
Streaming:
Download:
 
Chrono::Flex - A Flexible Multibody Dynamics Framework on the GPU
Daniel Melanz (University of Wisconsin - Madison)
In this work, we investigate the performance gains that the Spike::GPU methodology offers over alternative solutions based on using other linear solvers, such as Pardiso. We present results for problems of sizes that are relevant in engineering appl ...Read More
In this work, we investigate the performance gains that the Spike::GPU methodology offers over alternative solutions based on using other linear solvers, such as Pardiso. We present results for problems of sizes that are relevant in engineering applications; for example, a net simulation composed of approximately one million beam elements.  Back
 
Keywords:
Numerical Algorithms & Libraries, Combined Simulation & Real-Time Visualization, Computational Physics, Computational Structural Mechanics, GTC 2014 - ID S4593
Streaming:
 
In-Place Array Transposition and Fast Array of Structure Accesses
Bryan Catanzaro (NVIDIA)
We'll present a new algorithm for in-place array transposition. The algorithm is useful for in-place transposes of large matrices, as well as in-place conversions between Arrays of Structures and Structures of Arrays. The simple structure of this al ...Read More
We'll present a new algorithm for in-place array transposition. The algorithm is useful for in-place transposes of large matrices, as well as in-place conversions between Arrays of Structures and Structures of Arrays. The simple structure of this algorithm enables full memory bandwidth accesses to Arrays of Structures. We'll discuss the algorithm, as well as several implementations on GPUs and CPUs.  Back
 
Keywords:
Numerical Algorithms & Libraries, GTC 2014 - ID S4664
Streaming:
Download:
 
General Transformations for GPU Execution of Tree Traversals
Milind Kulkarni (Purdue University)
We present general-purpose techniques for implementing irregular algorithms on GPUs that exploit similarities in algorithmic structure rather than application-specific knowledge. We demonstrate these techniques on several tree traversal algorithms, a ...Read More
We present general-purpose techniques for implementing irregular algorithms on GPUs that exploit similarities in algorithmic structure rather than application-specific knowledge. We demonstrate these techniques on several tree traversal algorithms, achieving speedups of up to 38x over 32-thread CPU versions.  Back
 
Keywords:
Numerical Algorithms & Libraries, Performance Optimization, Programming Languages & Compilers, GTC 2014 - ID S4668
Streaming:
Download:
Performance Optimization
Presentation
Media
CUDA Optimization with NVIDIA Nsight(TM) Visual Studio Edition: A Case Study
Julien Demouth (NVIDIA)
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas ...Read More
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4160
Streaming:
Download:
 
CUDA Optimization with NVIDIA(R) Nsight(TM) Eclipse Edition: A Case Study
Julien Demouth (NVIDIA), Cliff Woolley (NVIDIA)
In this session, we will study a real CUDA application and use NVIDIA(R) Nsight(TM) Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ide ...Read More
In this session, we will study a real CUDA application and use NVIDIA(R) Nsight(TM) Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4165
Streaming:
Download:
 
Part 4: Essential CUDA Optimization Techniques (Presented by Acceleware)
Dan Cyca (Acceleware Ltd.)
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. ...Read More
Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented and highlighting the differences between the Fermi and Kepler architecture. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures.  Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4702
Streaming:
 
Session 3: Advanced CUDA Optimizations (Presented by ArrayFire)
Umar Arshad (ArrayFire)
In this session, we will examine Instruction Level Parallelism (ILP), Kepler specific optimization including shuffle instructions, dynamic parallelism. We will also equip you with knowledge of important profiling and debugging tools to improve GPU ut ...Read More
In this session, we will examine Instruction Level Parallelism (ILP), Kepler specific optimization including shuffle instructions, dynamic parallelism. We will also equip you with knowledge of important profiling and debugging tools to improve GPU utilization and kernel performance.  Back
 
Keywords:
Performance Optimization, Debugging Tools & Techniques, Numerical Algorithms & Libraries, Programming Languages & Compilers, GTC 2014 - ID S4712
Streaming:
 
Fast and Precise: GPU Techniques for High Quality 2D CAD Graphics
Ravi Krishnaswamy (Autodesk Inc.)
Learn GPU techniques to render high quality CAD geometry typical to 2D CAD using analytic definitions of geometry on the GPU. 2D CAD documents are increasingly relevant for GPU solutions, with increased consumption of 2D CAD documents in electronic ...Read More
Learn GPU techniques to render high quality CAD geometry typical to 2D CAD using analytic definitions of geometry on the GPU. 2D CAD documents are increasingly relevant for GPU solutions, with increased consumption of 2D CAD documents in electronic form on a variety of devices with GPU support. The goal is to show specific implementations for a range of DX/OpenGL versions and shader models that provide progressive benefits. The key areas discussed will be 1) GPU algorithms for analytic representation 2) Addressing 1D stylization (linetypes) through 2D Textures 3) Optimizing memory use through instancing and instance management   Back
 
Keywords:
Performance Optimization, Automotive, Computer Aided Design, Real-Time Graphics Applications, GTC 2014 - ID S4210
Streaming:
Download:
 
RDMA GPU Direct for the Fusion-io ioDrive
Robert Wipfel (Fusion-io)
Learn how to eliminate I/O bottlenecks by integrating Fusion-io's ioDrive Flash storage into your GPU applications. The first part of this session is a technical overview of Fusion-io's PCIe attached ioDrive. The second part presents developer best ...Read More
Learn how to eliminate I/O bottlenecks by integrating Fusion-io's ioDrive Flash storage into your GPU applications. The first part of this session is a technical overview of Fusion-io's PCIe attached ioDrive. The second part presents developer best practices and tuning for GPU applications using ioDrive based storage. Topics will cover threading, pipe-lining, and data path acceleration via RDMA GPU Direct. Demos and example code showing integration between RDMA GPU Direct and Fusion-io's ioDrive will be given.   Back
 
Keywords:
Performance Optimization, Big Data Analytics & Data Algorithms, Finance, GTC 2014 - ID S4265
Streaming:
Download:
 
Resolving False Dependence on Shared Memory
Patric Zhao (NVIDIA)
Large-scale shared memory provided by GPU can hugely improve the performance of applications and the shared memory programming model has been widely used for commercial and scientific purpose.However, a plenty of barriers arise when the shared memory ...Read More
Large-scale shared memory provided by GPU can hugely improve the performance of applications and the shared memory programming model has been widely used for commercial and scientific purpose.However, a plenty of barriers arise when the shared memory is immoderately employed, which causes most of running time wasted on synchronization. Furthermore, false dependence issue occurs in some cases and it may dramatically depress the performance. In this session,we demonstrate how to identify false dependence issues. Meanwhile,we propose various strategies and solutions to deal with false dependence issue from both application algorithm and GPU kernel level. Performance analysis on NAMD, a very popular molecular dynamics program, has been done and the code example is provided. By applying our strategies, the effective occupancy is improved to 0.98 and the synchronization time is reduced by 70%, which finally brings about 30% performance increments.  Back
 
Keywords:
Performance Optimization, Molecular Dynamics, GTC 2014 - ID S4279
Streaming:
Download:
 
How to Combine OpenMP, Streams, and ArrayFire for Maximum Multi-GPU Throughput
Shehzan Mohammed (ArrayFire)
You've finished tuning your algorithm on a single GPU, and it's time to integrate it into your multi-threaded host code. What's next? This session will explore how to combine CUDA sterams and contexts with OpenMP threads to maximize throughput. I ...Read More
You've finished tuning your algorithm on a single GPU, and it's time to integrate it into your multi-threaded host code. What's next? This session will explore how to combine CUDA sterams and contexts with OpenMP threads to maximize throughput. It will also cover how this mapping works for out-of-core problems to keep your GPUs fed with data. The session will cover these techniques in CUDA and the ArrayFire library for productive GPU computing using examples from the analysis of large-scale financial data and a structure from motion algorithm from computer vision. Attendees will leave with an excellent understanding of how to handle out-of-core data; the ability to program using CUDA streams and how to integrate these with ArrayFire and; knowledge of three techniques for mapping OpenMP threads to CUDA devices and when to use each technique.   Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4386
Streaming:
Download:
 
SPECACCEL: The New Standard for Accelerator Performance Benchmarking
Mathew Colgrove (NVIDIA), Robert Henschel (Indiana University)
The Systems Performance Evaluation Corporation (SPEC) is a non-profit corporation that produces, maintains and publishes results of standardized performance benchmarks for high-performance computers. SPEC benchmark suites produced by the SPEC High-P ...Read More
The Systems Performance Evaluation Corporation (SPEC) is a non-profit corporation that produces, maintains and publishes results of standardized performance benchmarks for high-performance computers. SPEC benchmark suites produced by the SPEC High-Performance Group (HPG) are generally comprised of applications focused on scientific and technical computing, coded using standard parallel programming interfaces. SPECACCEL is the latest benchmark from SPEC HPG, and is designed to objectively compare the performance of accelerator hardware systems, accelerator programming models and accelerator-enabled compilers. This talk will give an overview of the SPECACCEL suite, the benchmark run rules and processes for reporting results, and some sample performance results. Finally, we'll take an in-depth look at a few of the benchmarks to see what they can reveal about the performance characteristics of various accelerators.  Back
 
Keywords:
Performance Optimization, Supercomputing, GTC 2014 - ID S4437
Streaming:
Download:
 
Performance Analysis and Optimization of OpenACC Applications
Michael Wolfe (NVIDIA)
Learn how to use performance analysis tools to find the bottlenecks in your OpenACC applications. With the proper performance information, and the feedback from the compiler, you can tune your application and improve overall performance. Live demons ...Read More
Learn how to use performance analysis tools to find the bottlenecks in your OpenACC applications. With the proper performance information, and the feedback from the compiler, you can tune your application and improve overall performance. Live demonstrations will use PGI's pgprof, NVIDIA's Visual Profiler and command-line nvprof, and additional tools available to the parallel computing community.   Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4472
Streaming:
Download:
 
An Elegantly Simple Design Pattern for Building Multi-GPU Applications
Bob Zigon (Beckman Coulter)
GPU-based applications can be architected in different ways. The simplest approach will have a client application that is tightly coupled to a single GPU. The second approach can have a client application that is tightly coupled to multi GPU's by wa ...Read More
GPU-based applications can be architected in different ways. The simplest approach will have a client application that is tightly coupled to a single GPU. The second approach can have a client application that is tightly coupled to multi GPU's by way of operating system threads and GPU contexts. Finally, in scientific computing, a common pattern is to use MPI, multiple Intel cores and multiple GPU's that work cooperatively to solve a fixed problem. This session will describe a design pattern that loosely couples a client application to a collection of GPU's by way of a public domain "data structure server" called Redis. The approach works well for fat client and thin client based applications. The compelling aspects of the approach are 1) the ease of debugging and 2) the ease with which multiple GPU's can be added to handle increased user load.  Back
 
Keywords:
Performance Optimization, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4572
Streaming:
Download:
 
CUDA Profiling Tools
Sandarbh Jain (NVIDIA)
The NVIDIA Visual Profiler, nvvp, and command-line profiler, nvprof, are powerful profiling tools that you can use to maximize your CUDA application's performance. The NVIDIA Visual Profiler helps you understand your application's behavior with a d ...Read More
The NVIDIA Visual Profiler, nvvp, and command-line profiler, nvprof, are powerful profiling tools that you can use to maximize your CUDA application's performance. The NVIDIA Visual Profiler helps you understand your application's behavior with a detailed timeline and data from GPU performance counters. This session will provide an overview of the new GPU profiling features that an help you better tune your CUDA application.  Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4587
Streaming:
Download:
 
Configuring Workstation Performance to the Max! (Presented by Dell)
Scott Hamilton (Dell)
Engineers, media and entertainment professional and scientists perform mission critical roles in their organizations, so it is essential to provide them with the best tools. Workstations are the preferred system of choice to deliver performance for ...Read More
Engineers, media and entertainment professional and scientists perform mission critical roles in their organizations, so it is essential to provide them with the best tools. Workstations are the preferred system of choice to deliver performance for the compute and graphics intensive applications required by these users due to the superior architecture and available components. The question becomes which combination of components and tools will provide the greatest performance for your application. Dell will discuss the various considerations to configure and tune a workstation to get the best overall performance and reliability.   Back
 
Keywords:
Performance Optimization, GTC 2014 - ID S4869
Streaming:
Download:
Programming Languages & Compilers
Presentation
Media
Introduction to Accelerated Computing Using Directives
Jeff Larkin (NVIDIA)
OpenACC and OpenMP 4.0 provides directives-based approaches to rapidly accelerating application for GPUs and other parallel architectures. This tutorial serves as an introduction to programming with OpenACC 2.0 and OpenMP 4.0. Participants will learn ...Read More
OpenACC and OpenMP 4.0 provides directives-based approaches to rapidly accelerating application for GPUs and other parallel architectures. This tutorial serves as an introduction to programming with OpenACC 2.0 and OpenMP 4.0. Participants will learn how to apply compiler directives to an existing application to parallelize the application for accelerated architectures. No prior GPU experience is required for this tutorial.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4167
Streaming:
Download:
 
Advanced Accelerated Computing Using Directives
Jeff Larkin (NVIDIA)
This tutorial will expand upon the participants'' experience with accelerator directives (OpenACC and OpenMP) by focusing on performance optimization and interoperability with other programming models. Participants will learn about the multiple level ...Read More
This tutorial will expand upon the participants'' experience with accelerator directives (OpenACC and OpenMP) by focusing on performance optimization and interoperability with other programming models. Participants will learn about the multiple levels of parallelism that can be expressed in OpenACC and OpenMP and how to apply them to their application code. They will also learn how asynchronous execution improves application performance. Finally, they will learn how compiler directives interoperate with other accelerated computing technologies such as CUDA C, CUDA Fortran, and libraries.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4200
Streaming:
Download:
 
NumbaPro: High-Level GPU Programming in Python for Rapid Development
Siu Kwan Lam (Continuum Analytics, Inc), Travis Oliphant (Continuum Analytics, Inc)
Learn about high-level GPU programming in NumbaPro to reduce development time and produce high-performance data-parallel code with the ease of Python. This tutorial is for beginning to intermediate CUDA programmers who already know Python. In this ...Read More
Learn about high-level GPU programming in NumbaPro to reduce development time and produce high-performance data-parallel code with the ease of Python. This tutorial is for beginning to intermediate CUDA programmers who already know Python. In this tutorial, audience will learn about (1) high-level Python decorators that turn simple Python functions into data-parallel GPU kernels without any knowledge of the CUDA architecture; (2) CUDA library bindings that can be used as a drop-in to speedup existing applications; and, (3) reuse existing CUDA-C/C++ code in Python with JIT Linking.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4413
Streaming:
Download:
 
Panel on Compiler Directives for Accelerated Computing
Jeff Larkin (NVIDIA), James Beyer (Cray Inc.), Fernanda Foertter (Oak Ridge National Laboratory), Nathan Sidwell (Mentor Graphics)
Representatives from multiple organizations will discuss the current state and future directions for accelerated computing with compiler directives (OpenACC and OpenMP). Topics will include the status of OpenACC and OpenMP, commercial and freely avai ...Read More
Representatives from multiple organizations will discuss the current state and future directions for accelerated computing with compiler directives (OpenACC and OpenMP). Topics will include the status of OpenACC and OpenMP, commercial and freely available compilers, and user experiences.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4514
Streaming:
 
Part 1: An Introduction to CUDA Programming (Presented by Acceleware)
Chris Mason (Acceleware Ltd.)
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device res ...Read More
Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4699
Streaming:
 
Part 2: GPU Architecture & The CUDA Memory Model (Presented by Acceleware)
Chris Mason (Acceleware Ltd.)
Explore the memory model of the GPU! The session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memor ...Read More
Explore the memory model of the GPU! The session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features available in the Kepler architecture such as the shuffle instruction, shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.  Back
 
Keywords:
Programming Languages & Compilers, Performance Optimization, GTC 2014 - ID S4700
Streaming:
 
Part 3: Asynchronous Operations & Dynamic Parallelism in CUDA (Presented by Acceleware)
Dan Cyca (Acceleware Ltd.)
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operatio ...Read More
This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations and dynamic parallelism will be included.  Back
 
Keywords:
Programming Languages & Compilers, Performance Optimization, GTC 2014 - ID S4701
Streaming:
 
Session 1: Introduction to Productive GPU Programming (Presented by ArrayFire)
Umar Arshad (ArrayFire)
Excited to get started with GPU computing? Learn about the best libraries and tools to quickly get started with GPUs. We will introduce you to the cutting edge libraries that exist in the CUDA ecosystem and how to efficiently use them. You will walk ...Read More
Excited to get started with GPU computing? Learn about the best libraries and tools to quickly get started with GPUs. We will introduce you to the cutting edge libraries that exist in the CUDA ecosystem and how to efficiently use them. You will walk away with the knowledge of the right tools to get started with increased productivity and cutting edge libraries to accelerate your applications using GPUs. Some of the libraries discussed will include cuBLAS, cuFFT, ArrayFire and Thrust.  Back
 
Keywords:
Programming Languages & Compilers, Debugging Tools & Techniques, Performance Optimization, Numerical Algorithms & Libraries, GTC 2014 - ID S4710
Streaming:
 
Languages, Libraries and Development Tools for GPU Computing
Will Ramey (NVIDIA)
Get a head start on the conference with this introduction to key technologies for GPU Computing. This tutorial will cover the key features of major programming language solutions, libraries and development tools for GPU computing that are available t ...Read More
Get a head start on the conference with this introduction to key technologies for GPU Computing. This tutorial will cover the key features of major programming language solutions, libraries and development tools for GPU computing that are available today. You will also learn which sessions to attend to learn more about each of the topics covered.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4874
Streaming:
Download:
 
Portability and Performance: A Functional Language for Stencil Operations
Gerhard Zumbusch (Friedrich-Schiller Universitat Jena)
A new programming language designed for stencils operations in explicit finite difference and image processing applications is introduced. Learn to use a small domain specific functional language. It allows for a short and portable way to express num ...Read More
A new programming language designed for stencils operations in explicit finite difference and image processing applications is introduced. Learn to use a small domain specific functional language. It allows for a short and portable way to express numerical schemes. Objects are immutable functions without storage type and side effects. The results are independent of the order of instructions and decisions to redundantly re-compute partial results. The scheduling of instructions and the storage layout, the partition into GPU kernels and the memory management are all left to the compiler. Learn about the parallel patterns used by the compiler to create high performance implementations of the numerical scheme for a specific problem size and hardware configuration. These include data layout for effective vectorization, strategies to re-compute or cache intermediate results, sliding window and space-time tiling of the iteration space, and list-scheduling to create code blocks for off-loading. Strategies which are useful in general.  Back
 
Keywords:
Programming Languages & Compilers, Numerical Algorithms & Libraries, GTC 2014 - ID S4155
Streaming:
Download:
 
CUDA Streams: Best Practices and Common Pitfalls
Justin Luitjens (NVIDIA)
Using streams in CUDA is a fundamental optimization that many programmers overlook. This tutorial will teach you how to use streams in your application and cover the many mistakes that people make when using streams. After attending this talk you w ...Read More
Using streams in CUDA is a fundamental optimization that many programmers overlook. This tutorial will teach you how to use streams in your application and cover the many mistakes that people make when using streams. After attending this talk you will be equipped with the necessary knowledge to use streams within your application. This talk is appropriate for all skill levels, whether you have never heard of streams or use them regularly.  Back
 
Keywords:
Programming Languages & Compilers, Performance Optimization, GTC 2014 - ID S4158
Streaming:
Download:
 
Accelerated JavaScript: How to Access the GPU Without Leaving the Comfort of JavaScript
Norman Rubin (NVIDIA)
JavaScript is often considered a low performance scripting language but lately more and more developers are recognizing that the portability of JavaScript makes it an ideal target for web applications. This talk will describe a research prototype th ...Read More
JavaScript is often considered a low performance scripting language but lately more and more developers are recognizing that the portability of JavaScript makes it an ideal target for web applications. This talk will describe a research prototype that allows JavaScript to access accelerators using natural JavaScript idioms, no GPU knowledge required. Even on lower performance GPU's, the implementation delivers substantial performance for many applications. The talk will include a demo.   Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4168
Streaming:
Download:
 
How to Design a Language Integrated CUDA Compiler with LLVM
Xiang Zhang (QuantAlea), Aaron Brewbaker (QuantAlea)
We introduce the concept of a language integrated compiler for CUDA and explain how it can be implemented with the new LLVM tool chain of CUDA. To set the stage we give an introduction to LLVM and NVVM in CUDA 5 and illustrate the difference to OpenC ...Read More
We introduce the concept of a language integrated compiler for CUDA and explain how it can be implemented with the new LLVM tool chain of CUDA. To set the stage we give an introduction to LLVM and NVVM in CUDA 5 and illustrate the difference to OpenCL. We then give an overview of the design of a language integrated compiler in .NET developed with F#. We explain how CUDA resources are defined, GPU algorithms are programmed in a kernel DSL and how the runtime driver is implemented to compile, link and execute kernel code dynamically at runtime. The last part of the talk considers the pcalc framework, which is a higher level abstraction on top of CUDA, and simplifies the composition and reusing of GPU algorithms.   Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4174
Streaming:
Download:
 
Killer-App Fundamentals: Massively-Parallel Data Structures, Performance to 13 PF/s, Portability, Transparency, and More
Rob Farber (BlackDog Endeavors, LLC)
Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and ...Read More
Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented. General programming approaches for graph algorithms and identifying 100x speedups in algorithms like Kriging interpolation will be discussed. Both portability to -- and performance comparisons against -- other architectures such as Intel Xeon Phi will also be covered. Specific examples of this technology for social media analysis and brain research will be highlighted.  Back
 
Keywords:
Programming Languages & Compilers, Machine Learning & AI, Supercomputing, GTC 2014 - ID S4178
Streaming:
Download:
 
Shared Memory Multiplexing: An Efficient Way to Utilize Shared Memory
Huiyang Zhou (North Carolina State University), Yi Yang (NEC Laboratories America, Inc.)
Do you ever wish for larger shared memory? The on-chip shared memory is a valuable resource. However, using it extensively may limit the number of threads that can run concurrently. In this session, we will introduce a new way to make effective use o ...Read More
Do you ever wish for larger shared memory? The on-chip shared memory is a valuable resource. However, using it extensively may limit the number of threads that can run concurrently. In this session, we will introduce a new way to make effective use of shared memory. The key idea is to time-multiplex this precious resource, which can be achieved with either pure-software code transformations or hardware-assisted schemes.  Back
 
Keywords:
Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4202
Streaming:
Download:
 
Dandelion: A Unified Programming Model for GPU Clusters
Jon Currey (Microsoft Research), Chris Rossbach (Microsoft Research)
Dandelion takes applications written in a managed language using familiar LINQ data-manipulation constructs and automatically executes them on multi-core CPUs and GPUs, either single-machine or distributed across a cluster. The system generates the n ...Read More
Dandelion takes applications written in a managed language using familiar LINQ data-manipulation constructs and automatically executes them on multi-core CPUs and GPUs, either single-machine or distributed across a cluster. The system generates the necessary GPU code and nested dataflow graphs to coordinate execution across a cluster and the CPUs and GPUs within each machine. Dandelion is built on top of PTask, a runtime for dataflow-based execution on GPUs. We describe the system in detail and its performance gains across a variety of applications, and then dig into the issues encountered as we built and used the system, and the enhancements to both Dandelion and PTask that this prompted.  Back
 
Keywords:
Programming Languages & Compilers, Clusters & GPU Management, Machine Learning & AI, GTC 2014 - ID S4221
Streaming:
Download:
 
Array-Oriented Python on the GPU with Parakeet
Alexander Rubinsteyn (NYU)
Python is quickly becoming the "glue" language of choice for scientific and numerical computing. For performance-critical algorithms, however, programmers still have to offload computations into compiled code. Parakeet is a runtime compiler ...Read More
Python is quickly becoming the "glue" language of choice for scientific and numerical computing. For performance-critical algorithms, however, programmers still have to offload computations into compiled code. Parakeet is a runtime compiler for a numerical subset of Python which lifts this productivity burden. Parakeet intercepts calls into array-oriented Python functions and transparently compiles them into CUDA implementations. Come learn how to use Parakeet to gain orders of magnitude performance improvements over Python/NumPy programs.   Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4226
Streaming:
 
Halloc: A High-Throughput Dynamic Memory Allocator for GPGPU Architectures
Andrew Adinetz (Julich Supercomputing Centre, Forschungszentrum Julich)
Dynamic memory management is something that is taken for granted in almost all modern CPU runtime environments. However, on GPUs it became available only recently, and had so far seen very limited adoption due to low speed and lack of scalability. We ...Read More
Dynamic memory management is something that is taken for granted in almost all modern CPU runtime environments. However, on GPUs it became available only recently, and had so far seen very limited adoption due to low speed and lack of scalability. We present Halloc, a malloc/free-style dynamic memory allocator for GPUs. It is built around the idea of using a hash function-based procedure to search chunk bit arrays for free blocks. We describe algorithms used to manage slabs, large regions of memory from which smaller blocks are allocated. An evaluation of the allocator shows that it is scalable to tens of thousands of threads and hundreds of MiB worksets, can perform up to 1.7 billion allocations/s, demonstrates stable performance in a variety of scenarios and performs 2-100x better than state-of-the-art malloc/free-style GPU memory allocators.   Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4271
Streaming:
Download:
 
OpenACC Parallelization and Optimization of NAS Parallel Benchmarks
Rengan Xu (University of Houston)
The goal of this session is to present how to program large applications and apply efficient optimization techniques using OpenACC. We will present our experiences in porting NAS parallel benchmarks that includes 5 kernels and 3 pseudo applications E ...Read More
The goal of this session is to present how to program large applications and apply efficient optimization techniques using OpenACC. We will present our experiences in porting NAS parallel benchmarks that includes 5 kernels and 3 pseudo applications EP, FT, IS, CG, MG, LU, SP and BT to OpenACC. All of these benchmarks exhibit different program characteristics such as compute bound, memory bound, irregular memory access, etc. We will present our evaluation results that compares to that of the serial and the OpenMP version. Advantages and limitations of OpenACC will also be discussed.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4340
Streaming:
Download:
 
OpenUH: An Open Source OpenACC Compiler
Xiaonan Tian (University of Houston)
The goal of this session is to present the research and development experiences of creating an open-source compiler implementation of the OpenACC API. We will be discussing multiple loop mapping techniques available in our OpenACC compiler that offer ...Read More
The goal of this session is to present the research and development experiences of creating an open-source compiler implementation of the OpenACC API. We will be discussing multiple loop mapping techniques available in our OpenACC compiler that offers an efficient distribution of parallel loops to the threading architecture of GPGPUs. We discuss our findings providing guidance to the users on how to choose the most suitable loop mapping technique for the application that is being considered.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4343
Streaming:
Download:
 
C# with CUDAfy: Image Stitching Concepts
John Hauck (LECO Corporation)
Learn to add GPU calculations to your C# application. The example presented restores an image that has been divided into tiles, where the tiles have been randomly rearranged. The same source code is used to execute the algorithm on both the CPU and t ...Read More
Learn to add GPU calculations to your C# application. The example presented restores an image that has been divided into tiles, where the tiles have been randomly rearranged. The same source code is used to execute the algorithm on both the CPU and the GPU.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4375
Streaming:
Download:
 
High-Performance Domain-Specific Languages for GPU Computing
Marcel Koster (Saarland University)
In this talk we present AnyDSL - a compiler framework for domain-specific languages (DSLs). The framework helps with defining concise and compact languages with high-level abstractions. At the same time, AnyDSL completely removes the overhead of thes ...Read More
In this talk we present AnyDSL - a compiler framework for domain-specific languages (DSLs). The framework helps with defining concise and compact languages with high-level abstractions. At the same time, AnyDSL completely removes the overhead of these abstractions. Via its LLVM back end, AnyDSL supports a wide range of architectures including PTX to target NVIDIA GPUs.  Back
 
Keywords:
Programming Languages & Compilers, Video & Image Processing, GTC 2014 - ID S4378
Streaming:
Download:
 
GPU Computing with MATLAB
Andy The (MathWorks)
Learn how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and computational finance. We will use an image processing example to demonstrate how you can speed up you ...Read More
Learn how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and computational finance. We will use an image processing example to demonstrate how you can speed up your MATLAB code by using built-in GPU enabled functionality or by replacing key computations with CUDA kernels. We will also illustrate how MATLAB can be used as a development environment and test framework for CUDA kernel evaluation, visualization, and validation.   Back
 
Keywords:
Programming Languages & Compilers, Medical Imaging & Visualization, Video & Image Processing, GTC 2014 - ID S4421
Streaming:
Download:
 
Enabling Efficient Many-Task Computing on GPGPUs
Scott Krieder (Illinois Institute of Technology)
Current software and hardware limitations prevent Many-Task Computing (MTC) workloads from leveraging hardware accelerators boasting Many-Core Computing architectures. Some broad application classes that fit the MTC paradigm are workflows, MapReduce, ...Read More
Current software and hardware limitations prevent Many-Task Computing (MTC) workloads from leveraging hardware accelerators boasting Many-Core Computing architectures. Some broad application classes that fit the MTC paradigm are workflows, MapReduce, high-throughput computing, and a subset of high-performance computing. MTC emphasizes using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics have been measured in seconds; this work aims to reduce this granularity into milliseconds. Learn how to enable efficient Many-Task Computing through the use of a CUDA based framework that (1) Features a daemon kernel on the device managing compute elements, and (2) Enables efficient dynamic memory management through a sub-allocator.  Back
 
Keywords:
Programming Languages & Compilers, Clusters & GPU Management, Supercomputing, GTC 2014 - ID S4429
Streaming:
 
What's new in OpenACC 2.0 and OpenMP 4.0
Jeff Larkin (NVIDIA)
In 2013, both OpenACC and OpenMP released significant updates to their respective standards to better support GPUs. This talk will discuss what's new in each standard and how these features simplify GPU programming. ...Read More
In 2013, both OpenACC and OpenMP released significant updates to their respective standards to better support GPUs. This talk will discuss what's new in each standard and how these features simplify GPU programming.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4438
Streaming:
Download:
 
Cross-Platform Performance Portability Using OpenACC
Michael Wolfe (The Portland Group)
OpenACC is designed to support performance portable parallel programming across a wide variety of heterogeneous and parallel node configurations. Learn what that means and how it affects the programs you write today and in the future. Examples will ...Read More
OpenACC is designed to support performance portable parallel programming across a wide variety of heterogeneous and parallel node configurations. Learn what that means and how it affects the programs you write today and in the future. Examples will include NVIDIA Kepler and AMD Radeon targets.  Back
 
Keywords:
Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4468
Streaming:
Download:
 
Scaling OpenACC Across Multiple GPUs
Michael Wolfe (NVIDIA)
Learn how to scale your OpenACC application across multiple GPUs. This Example-based presentation will cover three methods of using multiple GPUs. First, you can use MPI with OpenACC to program a different GPU from each MPI process. You can even shar ...Read More
Learn how to scale your OpenACC application across multiple GPUs. This Example-based presentation will cover three methods of using multiple GPUs. First, you can use MPI with OpenACC to program a different GPU from each MPI process. You can even share data on the GPU across the MPI processes when you have multiple MPI processes on a single node. Second, you can use OpenMP with OpenACC, assigning a different GPU to each OpenMP thread. If you have more CPU threads than GPUs, you can share some GPUs across multiple threads. Third, even a single thread or process can distribute data and computation across multiple GPUs. By dynamically selecting the device, you can easily split or replicate data across multiple devices.   Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4474
Streaming:
Download:
 
Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters
Dhabaleswar K. (DK) Panda (The Ohio State University)
Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that mak ...Read More
Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop applications with dynamic communication patterns. However, the existing UPC and OpenSHMEM standards do not allow communication calls to be made directly on GPU device memory. Data has to be moved to the CPU before PGAS models can be used for communication. This talk discusses simple extensions to the OpenSHMEM and UPC models that address this issue. They allow direct communication from GPU memory and enable runtimes to optimize data movement using features like CUDA IPC and GPUDirect RDMA, in a way that is transparent to the application developer. We present designs which focus on performance and truly one-sided communication. We use application kernels to demonstrate the use of the extensions and performance impact of the runtime designs, on clusters with GPUs.  Back
 
Keywords:
Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4528
Streaming:
Download:
 
Exploring New Optimizations for Hybrid Programming Using OpenSHMEM and OpenACC
Oscar Hernandez (Oak Ridge National Laboratory)
With the new accelerator-based systems, heterogeneous hybrid programming models are the natural choice to exploit the hardware available on these new systems. On accelerator-based system, previous efforts looking into hybrids models have primarily fo ...Read More
With the new accelerator-based systems, heterogeneous hybrid programming models are the natural choice to exploit the hardware available on these new systems. On accelerator-based system, previous efforts looking into hybrids models have primarily focused in using MPI (for inter-node programming on a cluster) in association with OpenACC/CUDA/OpenCL/HMPP (for inner-node programming on the accelerator). As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogeneous hybrid models will be needed to effectively leverage the new hardware. In this session we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library, for one-sided communication between nodes. We will also discuss how these two specifications interoperate and what new features are needed in the specifications to make this hybrid programming model work better.   Back
 
Keywords:
Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4576
Streaming:
Download:
 
OpenACC vs.OpenMP4: The Strong, the Weak, the Missing to Develop Performance Portable Applications on GPU and Xeon Phi
James Lin (Shanghai Jiao Tong University (SJTU))
Learn to develop a unique code base to deal with both NVIDIA GPU and Intel Xeon Phi by using directive-based programming approach (OpenACC and OpenMP4) . We carried out early experiment on ?, the GPU-Phi supercomputer of SJTU CCOE, with CAPS OpenACC ...Read More
Learn to develop a unique code base to deal with both NVIDIA GPU and Intel Xeon Phi by using directive-based programming approach (OpenACC and OpenMP4) . We carried out early experiment on ?, the GPU-Phi supercomputer of SJTU CCOE, with CAPS OpenACC compiler and HOMP, the OpenMP4-to-CUDA compiler based on Rose Compiler from LLNL. In this session we will show preliminary results of the evaluation with benchmarks and mini-apps, and then discuss different optimization methods applied, finally we will identify the strong, the weak, the missing features for OpenACC and OpenMP4 to achieve good performance portability on very different architectures.   Back
 
Keywords:
Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4595
Streaming:
Download:
 
Generating Optimized CUDA Code from Parallel Patterns
HyoukJoong Lee (Stanford University)
Using high-level languages for GPU programming improves programmer productivity, but the compiler must apply GPU specific optimizations to match the performance of manually optimized kernels. In this talk, we explore building a compiler based on stru ...Read More
Using high-level languages for GPU programming improves programmer productivity, but the compiler must apply GPU specific optimizations to match the performance of manually optimized kernels. In this talk, we explore building a compiler based on structured parallel patterns to generate efficient code for GPU. Especially, we describe techniques for mapping nested parallel patterns on GPU and using shared memory. We compare the performance of our compiler with manually written kernels and show the impact of optimizations applied.  Back
 
Keywords:
Programming Languages & Compilers, Large Scale Data Analytics, Defense, GTC 2014 - ID S4602
Streaming:
 
Inside Thrust: Building Parallel Algorithms with Bulk
Jared Hoberock (NVIDIA)
Learn how to build high performance and robust CUDA kernels with Bulk, the CUDA C++ library powering Thrust's high performance algorithm implementations. Learn how virtual shared memory, cooperative algorithms, and bulk-synchronous task launch make ...Read More
Learn how to build high performance and robust CUDA kernels with Bulk, the CUDA C++ library powering Thrust's high performance algorithm implementations. Learn how virtual shared memory, cooperative algorithms, and bulk-synchronous task launch make CUDA programming easier, more productive, and fun.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4673
Streaming:
Download:
 
Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0
James Beyer (Cray Inc)
The talk will briefly introduce two accelerator programming directive sets with a common heritage, OpenACC 2.0 and OpenMP 4.0. After introducing the two directive sets, a side by side comparison of available features along with code examples will ...Read More
The talk will briefly introduce two accelerator programming directive sets with a common heritage, OpenACC 2.0 and OpenMP 4.0. After introducing the two directive sets, a side by side comparison of available features along with code examples will be presented to help developers understand their options as they the begin programming as these two programming models both become available in production compilers.  Back
 
Keywords:
Programming Languages & Compilers, Performance Optimization, Supercomputing, GTC 2014 - ID S4708
Streaming:
Download:
 
CUDA 6 and Beyond
Mark Harris (NVIDIA)
CUDA is NVIDIA's parallel computing platform and programming model. CUDA 6 dramatically increases developer productivity with the introduction of Unified Memory, which simplifies memory management by automatically migrating data between the CPU and ...Read More
CUDA is NVIDIA's parallel computing platform and programming model. CUDA 6 dramatically increases developer productivity with the introduction of Unified Memory, which simplifies memory management by automatically migrating data between the CPU and GPU. Unified Memory and other new features in CUDA tools and libraries make GPU computing easier than ever before. In this talk you'll hear about these features and get insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.  Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4830
Streaming:
Download:
 
Accelerating JAVA on GPUs
Vinod Grover (NVIDIA)
In this talk we'll describe a dynamic approach for compiling and executing Java byte-codes on GPUs. This method is based on NVIDIA's Compiler SDK that underlies the CUDA platform. Our approach is independent of the underlying JVM and uses reflecti ...Read More
In this talk we'll describe a dynamic approach for compiling and executing Java byte-codes on GPUs. This method is based on NVIDIA's Compiler SDK that underlies the CUDA platform. Our approach is independent of the underlying JVM and uses reflection to translate bytecodes to LLVM IR which is then translated to GPU code using libNVVM. In this talk we give an overview of the approach and will show how Java code can be accelerated.   Back
 
Keywords:
Programming Languages & Compilers, GTC 2014 - ID S4939
Streaming:
Quantum Chemistry
Presentation
Media
Acceleration of Electron Repulsion Integral Evaluation on Graphics Processing Units via Use of Recurrence Relations
Yipu Miao (University of Florida)
A fast and efficient implementation of ab initio quantum chemistry calculation on GPU with novel accuracy level. Our software supports Hartree-Fock and DFT calculation with 10-100 times relative to traditional CPU nodes. ...Read More
A fast and efficient implementation of ab initio quantum chemistry calculation on GPU with novel accuracy level. Our software supports Hartree-Fock and DFT calculation with 10-100 times relative to traditional CPU nodes.  Back
 
Keywords:
Quantum Chemistry, GTC 2014 - ID S4211
Streaming:
 
Speeding-up NWChem on Heterogeneous Clusters
Antonino Tumeo (Pacific Northwest National Laboratory)
Learn the approaches that we implemented to accelerate NWChem, one of the flagship high performance computational chemistry tools, on heterogeneous supercomputers. In this talk we will discuss the new domain specific code generator, the auto-tuners ...Read More
Learn the approaches that we implemented to accelerate NWChem, one of the flagship high performance computational chemistry tools, on heterogeneous supercomputers. In this talk we will discuss the new domain specific code generator, the auto-tuners for the tensor contractions, and the related optimizations that enable acceleration of the Coupled-Cluster methods module for single- and multi-reference formulations of NWChem.  Back
 
Keywords:
Quantum Chemistry, Clusters & GPU Management, Computational Fluid Dynamics, Supercomputing, GTC 2014 - ID S4329
Streaming:
 
GRID-Based Methods for the Analysis of the Wave Function in Quantum Chemistry Accelerated by GPUs
Jorge Garza (Universidad Autonoma Metropolitana-Iztapalapa)
Learn how to distribute on GPUs scalar and vectorial fields defined in quantum chemistry. In this talk we analyze the wave function obtained by Hartree-Fock, density functional theory or many-body perturbation theory to second order by using the atom ...Read More
Learn how to distribute on GPUs scalar and vectorial fields defined in quantum chemistry. In this talk we analyze the wave function obtained by Hartree-Fock, density functional theory or many-body perturbation theory to second order by using the atoms in molecules approach. Gradient and laplacian of the electron density are used as examples of fields that can be evaluated easily on GPUs. The performance of our algorithms are contrasted with algorithms non accelerated by GPUs.  Back
 
Keywords:
Quantum Chemistry, Numerical Algorithms & Libraries, Supercomputing, GTC 2014 - ID S4389
Streaming:
Download:
 
Great Performance for Tiny Problems: Batched Products of Small Matrices
Nikolay Markovskiy (NVIDIA)
Learn how to get great performance on Kepler GPUs for small dense matrix products. Dense linear algebra operations are generally best performed in cuBLAS, but for batches of very small matrices, it may be possible to exploit some extra knowledge of ...Read More
Learn how to get great performance on Kepler GPUs for small dense matrix products. Dense linear algebra operations are generally best performed in cuBLAS, but for batches of very small matrices, it may be possible to exploit some extra knowledge of your particular application to improve the performance. After an analysis of an initial implementation, we will look into different algorithmic improvements (tiling, prefetching), use special features of the Kepler architecture and finally investigate autotuning to select the best implementation for a given problem size.   Back
 
Keywords:
Quantum Chemistry, Numerical Algorithms & Libraries, GTC 2014 - ID S4391
Streaming:
Download:
 
Achievements and Challenges Running GPU-Accelerated Quantum ESPRESSO on Heterogeneous Clusters
Filippo Spiga (Quantum ESPRESSO Foundation)
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. Within the Quantum ESPRESSO suite, the Plane-Wave Self-Consistent Field (PWscf) code represents a powerful compu ...Read More
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. Within the Quantum ESPRESSO suite, the Plane-Wave Self-Consistent Field (PWscf) code represents a powerful computational tool for scientists in both academia and industries for electronic-structure calculations at nanoscale. Due to wide adoption of GPU computing, it is now mandatory to push further the capability of the code by adding new functionalities and explicit optimizations. The aim of this talk is to present challenges and achievements of running a CPU-GPU code on heterogeneous clusters of various sizes. Benchmarks are performed on Darwin (University of Cambridge GPU cluster) and Titan (Oak Ridge National Laboratory). Input cases are provided by researchers in both academia and private companies.  Back
 
Keywords:
Quantum Chemistry, Clusters & GPU Management, Computational Physics, GTC 2014 - ID S4397
Streaming:
 
Virtual Molecular Modelling Kits: Playing Games with Quantum Chemistry
Nathan Luehr (Stanford University)
We discuss the impact of GPU-based quantum chemistry calculations for small molecules. Based on a specially optimized version of TeraChem, we demonstrate real-time molecular dynamics for systems up to a few dozen atoms. Harnessing this performance, w ...Read More
We discuss the impact of GPU-based quantum chemistry calculations for small molecules. Based on a specially optimized version of TeraChem, we demonstrate real-time molecular dynamics for systems up to a few dozen atoms. Harnessing this performance, we describe the development of interactive interfaces to virtual quantum chemistry models. Such interfaces make possible a new paradigm for chemical education and research.  Back
 
Keywords:
Quantum Chemistry, Combined Simulation & Real-Time Visualization, Molecular Dynamics, GTC 2014 - ID S4427
Streaming:
Download:
 
Enabling Gaussian 09 on GPGPUs
Roberto Gomperts (NVIDIA)
In 2011 Gaussian, Inc., NVIDIA Corp. and PGI started a long-term project to enable all the performance critical paths of Gaussian on GPGPUs. While the ultimate goal is to show significant performance improvement by using accelerators in conjunction w ...Read More
In 2011 Gaussian, Inc., NVIDIA Corp. and PGI started a long-term project to enable all the performance critical paths of Gaussian on GPGPUs. While the ultimate goal is to show significant performance improvement by using accelerators in conjunction with CPUs, the initial efforts are directed towards creating an infrastructure that will leverage the current CPU code base and at the same time minimize the additional maintenance effort associated with running on GPUs. Here we present the current status of this work for Direct Hartree-Fock and triples-correction calculations as applied in for example Coupled Cluster calculations that uses mostly the directives based OpenACC framework.  Back
 
Keywords:
Quantum Chemistry, Programming Languages & Compilers, Supercomputing, GTC 2014 - ID S4613
Streaming:
Download:
 
GPUs and Real-Space Grids: A Powerful Alternative for the Simulation of Electrons
Xavier Andrade (Harvard University)
Learn why modeling electrons is important and what can we learn from these simulations, followed by a very brief introduction to the method of density functional theory (DFT) as an approximation of quantum mechanics to model electrons in molecular sy ...Read More
Learn why modeling electrons is important and what can we learn from these simulations, followed by a very brief introduction to the method of density functional theory (DFT) as an approximation of quantum mechanics to model electrons in molecular systems. This presentation also introduces the traditional method used in quantum chemistry to solve the DFT equations; namely the expansion of the molecular orbitals in a basis of Gaussian functions, and discuss its limitations for parallelization on GPUs. An alternative idea of simulating electrons with real-space grids and finite-differences and the application of GPUs to accelerate real-space calculations. The presentation will explain the scheme we developed to expose the data parallelism available in the DFT approach. Finally, results for current-generation GPUs which show that our scheme, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when o the CPU version of the code will be presented.  Back
 
Keywords:
Quantum Chemistry, Performance Optimization, Computational Physics, GTC 2014 - ID S4625
Streaming:
Download:
 
VASP: A Case Study for Accelerating Plane Wave DFT Codes
Sarah Tariq (NVIDIA), Przemyslaw Tredak (University of Warsaw)
In this session we will detail how we accelerated the VASP software package, used for atomic scale material modeling, on GPUs. Presenters in past years have shown that a straightforward implementation of VASP on GPUs with the help of the GPU-accelera ...Read More
In this session we will detail how we accelerated the VASP software package, used for atomic scale material modeling, on GPUs. Presenters in past years have shown that a straightforward implementation of VASP on GPUs with the help of the GPU-accelerated cuFFT and cuBLAS libraries can yeild reasonable speedups, bur we will show in this session that by targeting the implementation more towards the GPU's strengths and porting additional work, we can achieve more than a 3x speedup over this. We will present the methodology we followed, for improving both single GPU performance and multi-GPU, multi-node scaling. This work has been implemented in collaboration by NVIDIA interns and engineers (Jeroen Bedorf, Przemyslaw Tredak , Dusan Stosic, Arash Ashari, Paul Springer, Darko Stosic and Sarah Tariq), and researchers from Ens-lyon, IFPEN (Paul Fleurat-Lessard and Anciaux Sedrakian), CMU(Michael Widom) and University of Chicago (Maxwell Hutchinson).  Back
 
Keywords:
Quantum Chemistry, GTC 2014 - ID S4692
Streaming:
Download:
Ray Tracing
Presentation
Media
Mining Hidden Coherence for Parallel Path Tracing on GPUs
Yangdong Deng (Tsinghua University)
As one of the essential global illumination algorithms, Monte Carlo path tracing has long been considered as a typical irregular problem that is less friendly for graphics hardware. To improve the efficiency of Monte Carlo path tracing, many techniqu ...Read More
As one of the essential global illumination algorithms, Monte Carlo path tracing has long been considered as a typical irregular problem that is less friendly for graphics hardware. To improve the efficiency of Monte Carlo path tracing, many techniques have been proposed to exploit the inherent coherence in processing different paths and materials for better SIMD efficiency on GPUs. In this paper, we develop a novel technique to extract extra parallelism in Monte Carlo path tracing by identifying hidden coherence. The basic idea is to perform a partial traversal in the fast on-chip memory of GPU and then identify coherent paths by analyzing the traversal results as well as other features of rays. Our technique enables a higher level of parallelism that not only compensates the overhead of traversal, but also leads to improved performance. Experiments prove that our techniques deliver a up to 15% higher traversal throughput.  Back
 
Keywords:
Ray Tracing, Performance Optimization, Visual Effects & Simulation, GTC 2014 - ID S4156
Streaming:
 
Quickly Adding Photographic Quality Ray Tracing to Your Application
Dave Hutchinson (Lightwork Design Ltd.)
Learn how adding interactive and production rendering into your application or workflow just got easier. This session explores how a streamline user centered solution based around NVIDIA Iray technology is ushering in a new era of usability and consi ...Read More
Learn how adding interactive and production rendering into your application or workflow just got easier. This session explores how a streamline user centered solution based around NVIDIA Iray technology is ushering in a new era of usability and consistency for both developers and users. The thirst for the highest level of photo-realism through design, configuration, collaboration and cloud solutions is greater than ever; yet the availability of an easy to develop with, consistent and future proof platform for software developers has remained unanswered, until now.   Back
 
Keywords:
Ray Tracing, Media & Entertainment, Rendering & Animation, GTC 2014 - ID S4317
Streaming:
 
Petascale Molecular Ray Tracing: Accelerating VMD/Tachyon with OptiX
John Stone (University of Illinois)
We describe the adaptation of VMD, a popular molecular visualization and analysis tool, to exploit the Tesla K20X GPU for acceleration of large scale molecular visualization runs on Cray XK7 petascale supercomputers such as Blue Waters and Titan. We ...Read More
We describe the adaptation of VMD, a popular molecular visualization and analysis tool, to exploit the Tesla K20X GPU for acceleration of large scale molecular visualization runs on Cray XK7 petascale supercomputers such as Blue Waters and Titan. We will describe ray tracing performance benefits and memory efficiency optimizations achieved through the use of custom geometric primitives and triangle mesh formats, and relate our experiences adapting the Tachyon CPU-based ray tracing engine used by VMD, to NVIDIA's OptiX GPU ray tracing framework. We will present performance data for large visualization runs on the Cray XK7, discuss our approach to integrating OptiX into VMD, and describe avenues for further improvement.  Back
 
Keywords:
Ray Tracing, Molecular Dynamics, Scientific Visualization, Supercomputing, GTC 2014 - ID S4400
Streaming:
Download:
 
Advanced OptiX Programming
David McAllister (NVIDIA)
As more and more demanding customers are adopting OptiX we are learning more about what it takes to write high performance yet elegant code using OptiX. This talk will use case studies to teach OptiX developers how to get highest performance out of O ...Read More
As more and more demanding customers are adopting OptiX we are learning more about what it takes to write high performance yet elegant code using OptiX. This talk will use case studies to teach OptiX developers how to get highest performance out of OptiX, and debug OptiX code.  Back
 
Keywords:
Ray Tracing, Manufacturing, Media & Entertainment, Rendering & Animation, GTC 2014 - ID S4597
Streaming:
Download:
Real-Time Graphics Applications
Presentation
Media
OpenGL 4.4 Scene Rendering Techniques
Christoph Kubisch (NVIDIA), Markus Tavenrath (NVIDIA)
OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore ...Read More
OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.  Back
 
Keywords:
Real-Time Graphics Applications, Performance Optimization, Media & Entertainment, GTC 2014 - ID S4379
Streaming:
Download:
 
Order Independent Transparency in OpenGL
Christoph Kubisch (NVIDIA)
Rendering many transparent surfaces is still a challenge int real-time rendering. With hardware features exposed in OpenGL 4 it is possible to minimize the amount of geometry passes and create transparency effects with order-independent drawing. Seve ...Read More
Rendering many transparent surfaces is still a challenge int real-time rendering. With hardware features exposed in OpenGL 4 it is possible to minimize the amount of geometry passes and create transparency effects with order-independent drawing. Several techniques with different quality, performance and memory usage characteristics will be presented. The approaches will be evaluated on two different scenarios, hair and CAD model rendering.  Back
 
Keywords:
Real-Time Graphics Applications, Performance Optimization, GTC 2014 - ID S4385
Streaming:
Download:
 
GPU-Based Visualization for Flight Simulation
Tim Woodard (Diamond Visionics)
Learn about the unique challenges which arise in designing modern visualization systems for use in real-time flight simulation and how recent GPU advancements are helping to address them. Scene generation tasks that have traditionally required extens ...Read More
Learn about the unique challenges which arise in designing modern visualization systems for use in real-time flight simulation and how recent GPU advancements are helping to address them. Scene generation tasks that have traditionally required extensive pre-computation can now be performed on the GPU. This has numerous advantages including instant feedback, greater scene complexity and fidelity, and allows for hardware consolidation.  Back
 
Keywords:
Real-Time Graphics Applications, Virtual & Augmented Reality, Collaborative & Large Resolution Displays, Combined Simulation & Real-Time Visualization, GTC 2014 - ID S4440
Streaming:
Download:
 
Practical Real-Time Voxel-Based Global Illumination for Current GPUs
Alexey Panteleev (NVIDIA)
This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin's research, a library has been developed ...Read More
This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin's research, a library has been developed that allows applications to render GI effects for large and fully dynamic scenes at 30 frames per second or more, producing soft diffuse indirect lighting and blurry specular reflections, and providing emissive material support. During the session, Alexey will talk about the cone tracing GI algorithm in general and get into the details of scene representation, efficient multi-resolution voxelization, and indirect light gathering.  Back
 
Keywords:
Real-Time Graphics Applications, Performance Optimization, Game Development, Mobile Applications, GTC 2014 - ID S4552
Streaming:
Download:
 
OpenGL: 2014 and Beyond
Cass Everitt (NVIDIA), Seth Williams (NVIDIA)
Learn techniques for efficiently using the GPU and detecting and eliminating driver overhead. See the direction that OpenGL is heading in to embrace multi-threaded, multi-core CPU app designs. Also, the GPU can construct and update app rendering da ...Read More
Learn techniques for efficiently using the GPU and detecting and eliminating driver overhead. See the direction that OpenGL is heading in to embrace multi-threaded, multi-core CPU app designs. Also, the GPU can construct and update app rendering data structures to require very little CPU intervention. We will also explore subdivision surfaces and how to get them automatically GPU accelerated with a new extension. And hand-in-glove with subdivision surfaces is PTEX support in OpenGL. Finally, while OpenGL is the most broadly available open API for 3D graphics, it's also the most fragmented. We will explore Regal, an open source library that illustrates how to de-fragment the OpenGL landscape and keep your graphics back end code from becoming a patchwork of platform #ifdefs.  Back
 
Keywords:
Real-Time Graphics Applications, Performance Optimization, Game Development, Rendering & Animation, GTC 2014 - ID S4610
Streaming:
 
Realizing High-Performance Pipelines Using Piko
Kerry Seitz (University of California, Davis), Anjul Patney (NVIDIA), Stanley Tzeng (NVIDIA)
We present Piko, a system abstraction to help implement high-level algorithmic pipelines on modern parallel architectures. We define 'pipelines' as a sequence of complex, dynamically-scheduled kernels that combine to implement a complex application ...Read More
We present Piko, a system abstraction to help implement high-level algorithmic pipelines on modern parallel architectures. We define 'pipelines' as a sequence of complex, dynamically-scheduled kernels that combine to implement a complex application. While primarily targeted towards efficient graphics applications, the way in which Piko exposes both parallelism and locality can naturally be applied to other domains as well. The abstraction helps programmers define work granularities as the data evolves across stages of an application. These definitions are disjoint from the underlying algorithms, which helps authors of Piko pipelines explore tradeoffs between locality and parallelism across varying application configurations and target architectures. As a consequence, Piko helps design high-performance software pipelines that are flexible as well as portable across architectures.  Back
 
Keywords:
Real-Time Graphics Applications, Performance Optimization, Programming Languages & Compilers, Ray Tracing, GTC 2014 - ID S4650
Streaming:
Download:
 
First In Vivo Medical Images Using Photon-Counting, Real-Time GPU Reconstruction
Augustus Lowell (Triple Ring Technologies)
Triple Ring Technologies has worked on several generations of a cardiology imaging system. The unique x-ray imaging chain allows for up to 20x radiation exposure reduction and 3D localization. Each generational improvement in image quality required ...Read More
Triple Ring Technologies has worked on several generations of a cardiology imaging system. The unique x-ray imaging chain allows for up to 20x radiation exposure reduction and 3D localization. Each generational improvement in image quality required a 10x or more increase in the number of computations required to process the images. With sample rates of nearly 1 Msps and high-density detectors comprising over 200,000 elements, the latest generation system generates 160 billion samples per second and processes them into real-time images useful to a cardiologist. Historically, the processing elements used to achieve required computation rates were created using pipelined parallel processing stages in state-of-the art FPGAs, or exotic massively-parallel processor arrays. The latest generation of NVIDIA GPUs have changed this. We have recently implemented a novel image processor using an array of nine GPUs. We will show the first cardiac imaging study using this approach.   Back
 
Keywords:
Real-Time Graphics Applications, Clusters & GPU Management, Computational Physics, Medical Imaging & Visualization, GTC 2014 - ID S4743
Streaming:
Download:
Rendering & Animation
Presentation
Media
Multi-GPU Rendering
Ingo Esser (NVIDIA), Shalini Venkataraman (NVIDIA)
With more workstation applications utilizing more efficient rendering pipelines and rendering larger scenes with more complex fragment shaders, GPUs can become the bottleneck in a system. The first part of this talk will be a refresher on multi-gpu p ...Read More
With more workstation applications utilizing more efficient rendering pipelines and rendering larger scenes with more complex fragment shaders, GPUs can become the bottleneck in a system. The first part of this talk will be a refresher on multi-gpu programming basics to scale your rendering tasks. We show how to target individual APIs programmatically as well as structure your application by using multiple threads, OpenGL contexts and handle the synchronization and data transfer. The second part will dive in to details of designing a rendering pipeline that can efficiently utilize a multi-GPU setup by splitting rendering tasks into a set of phases. These phases represent a set of threads that distribute the rendering load across a set of GPUs. The talk will comprise how to set up a multithreaded application using C++11 constructs, how to analyze/debug the performance of a graphics application, how to do PCIe transfers efficiently and how to optimally distribute workload across different GPUs.  Back
 
Keywords:
Rendering & Animation, Large Scale Data Visualization & In-Situ Graphics, Media & Entertainment, Real-Time Graphics Applications, GTC 2014 - ID S4455
Streaming:
 
GPU Ray Tracing and Advanced Rendering Solutions from NVIDIA
Phillip Miller (NVIDIA)
Learn how GPU computing is revolutionizing performance and possibilities in both interactive and production rendering. The latest capabilities of NVIDIA''s Advanced Rendering solutions will be explored and demonstrated, along with what''s possible wi ...Read More
Learn how GPU computing is revolutionizing performance and possibilities in both interactive and production rendering. The latest capabilities of NVIDIA''s Advanced Rendering solutions will be explored and demonstrated, along with what''s possible with the latest in NVIDIA OptiX for accelerating custom ray tracing solutions. Trends in the industry, along with guidelines for configuring optimal rendering, will also be discussed.  Back
 
Keywords:
Rendering & Animation, Manufacturing, Media & Entertainment, Ray Tracing, GTC 2014 - ID S4626
Streaming:
 
NVIDIA Rendering Innovations within Autodesk Maya and 3ds Max
Peter de Lappe (NVIDIA), Bart Gawboy (NVIDIA), Julia Flototto (NVIDIA), David Hackett (The Mill), Jonathan Beals (Hinge Digital)
Come learn about the latest rendering capabilities of Autodesk 3ds Max and Maya from the makers of NVIDIA mental ray and iray. Moderated by Phil Miller, Director, Advanced Rendering at NVIDIA, this course will focus on artist workflow and also discus ...Read More
Come learn about the latest rendering capabilities of Autodesk 3ds Max and Maya from the makers of NVIDIA mental ray and iray. Moderated by Phil Miller, Director, Advanced Rendering at NVIDIA, this course will focus on artist workflow and also discuss the science behind the features, along with recent studio work showcasing how it is used in production, making this of value to anyone producing or even appreciating 3D rendering and animation.  Back
 
Keywords:
Rendering & Animation, Media & Entertainment, Ray Tracing, GTC 2014 - ID S4721
Streaming:
Download:
 
AdaptiveNURBS Tessellation with CUDA
Jie Qiu (Nanyang Technological University, Multi-plAtform Game Innovation Centre)
This session presents a method for on-the-fly tessellation of NURBS surfaces on GPU. The method involves conversion from NURBS to rational Bezier patches, tessellation interval estimation and tessellation of rational Bezier patches.The tessellation i ...Read More
This session presents a method for on-the-fly tessellation of NURBS surfaces on GPU. The method involves conversion from NURBS to rational Bezier patches, tessellation interval estimation and tessellation of rational Bezier patches.The tessellation intervals are estimated for each rational Bezier patch to guarantee that the tessellation is within a given approximation error.Each rational Bezier patch is tessellated independently while the tessellation is assured to be gap-free. In each stage of the process, various strategies of parallelism are designed in order to improve the performance on GPU. The proposed tessellation enables complicated NURBS models to be rendered in real time and within prescribed tolerance.  Back
 
Keywords:
Rendering & Animation, Computer Aided Design, Real-Time Graphics Applications, GTC 2014 - ID S4567
Streaming:
Download:
 
Interactive 3D Data Visualization of 700 GB
Jorg Mensmann (NVIDIA), Tom-Michael Thamm (NVIDIA)
Technical presentation of the latest version of NVIDIA IndeX (TM) with the emphasis of large volumetric data visualization. IndeX is a scalable GPU based software framework which renders high-quality images with interactive frame-rates. ...Read More
Technical presentation of the latest version of NVIDIA IndeX (TM) with the emphasis of large volumetric data visualization. IndeX is a scalable GPU based software framework which renders high-quality images with interactive frame-rates.   Back
 
Keywords:
Rendering & Animation, Big Data Analytics & Data Algorithms, Clusters & GPU Management, Combined Simulation & Real-Time Visualization, GTC 2014 - ID S4716
Streaming:
 
Sharing Physically Based Materials Between Renderers with MDL
Lutz Kettner (NVIDIA), Phillip Miller (NVIDIA)
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based def ...Read More
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based definitions can be defined while developers will learn what's entailed in supporting MDL within their own product/renderer.   Back
 
Keywords:
Rendering & Animation, Manufacturing, Media & Entertainment, Ray Tracing, GTC 2014 - ID S4722
Streaming:
Download:
 
Interactive Global Illumination with NVIDIA Visual Computing Cluster
Phillip Miller (NVIDIA), Stefan Radig (NVIDIA), Ankit Patel (NVIDIA)
It is now possible to have physically-based, global illumination at interactive speeds within minimal noise. Come learn of Iray Nitro software and NVIDIA's new appliance designed specifically for high bandwidth cluster rendering that is the perfect ...Read More
It is now possible to have physically-based, global illumination at interactive speeds within minimal noise. Come learn of Iray Nitro software and NVIDIA's new appliance designed specifically for high bandwidth cluster rendering that is the perfect solution for companies needing digital design reviews with uncompromised realism. Hundreds of GPUs will be used to show near linear of scaling of interactive GI driven from numerous professional software client solutions. See also how cluster management is made easy, allowing users to access all or part of the cluster with minimal networking knowledge.   Back
 
Keywords:
Rendering & Animation, Automotive, Manufacturing, Ray Tracing, GTC 2014 - ID S4723
Streaming:
Scientific Visualization
Presentation
Media
Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies
Ibrahim Demir (University of Iowa)
As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data. Recent developments in web technologi ...Read More
As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data. Recent developments in web technologies make it easy to manage, visualize and share large data sets with the public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and change the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires intelligent knowledge discovery techniques to extract information from complex computational simulations and large data repositories. This presentation provides an overview of the information visualization and communication tools developed for communicating radar and satellite-based rainfall products utilizing graphical processing unit and latest web technologies in a web browser. User interface allows user to interact with the data using hand-gestures.  Back
 
Keywords:
Scientific Visualization, Large Scale Data Visualization & In-Situ Graphics, Real-Time Graphics Applications, GTC 2014 - ID S4203
Streaming:
Download:
 
Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale
Kenneth Moreland (Sandia National Laboratories)
Visualization on today's GPU technology and at extreme scale requires massive concurrency. The Dax Toolkit is a development framework for designing and using such devices. Learn how to use Dax to execute classic visualization and analysis algorithms ...Read More
Visualization on today's GPU technology and at extreme scale requires massive concurrency. The Dax Toolkit is a development framework for designing and using such devices. Learn how to use Dax to execute classic visualization and analysis algorithms on a variety of mesh data structures and adapt the templated toolkit to your own data structures. Also design your own massively-threaded visualization algorithms in a simplified development environment that allows you to focus on the mathematical and algorithmic design. Dax's automatic concept and scheduling mechanisms automatically build parallel scheduling and communication code from signatures using C++.  Back
 
Keywords:
Scientific Visualization, Large Scale Data Visualization & In-Situ Graphics, GTC 2014 - ID S4620
Streaming:
Download:
 
Interactive Processing and Visualization of Geospatial Imagery
Brian Smith (The Boeing Company)
Air borne and space borne sensors have the capability to collect massive quantities of geospatial imagery data. These sensors often operate outside of the range of the human visual system, presenting a challenge for analysis and visualization. The Ag ...Read More
Air borne and space borne sensors have the capability to collect massive quantities of geospatial imagery data. These sensors often operate outside of the range of the human visual system, presenting a challenge for analysis and visualization. The Agility framework was developed by The Boeing Company to more easily leverage GPU technology for interactive visualization of large geospatial datasets. Come see how interactive processing can increase the utility of geospatial data with examples from synthetic aperture radar and hyperspectral imaging systems.  Back
 
Keywords:
Scientific Visualization, Defense, GTC 2014 - ID S4778
Streaming:
Download:
Signal & Audio Processing
Presentation
Media
Hardware and Software Design for a 1000 FPS Real-Time Soft-Field Tomography System
Patrik Gebhardt (Ruhr-University Bochum)
See how to build up a high speed, low latency real-time measurement system for industrial process tomography based on Electrical-Impedance-Tomography, which is capable to generate more than 1000 cross-sectional images of a pipe per second, using (A) ...Read More
See how to build up a high speed, low latency real-time measurement system for industrial process tomography based on Electrical-Impedance-Tomography, which is capable to generate more than 1000 cross-sectional images of a pipe per second, using (A) FPGAs for data acquisition of several ADCs and preprocessing in parallel and (B) GPUs for solving the underlying PDE and reconstruct these images with a latency of approx. 50 ms. Examples of the signal processing algorithms as well as the methods used to accelerate the reconstruction process on GPUs will be given.  Back
 
Keywords:
Signal & Audio Processing, Medical Imaging & Visualization, GTC 2014 - ID S4206
Streaming:
Download:
 
A Flexible IIR filtering Implementation for Audio Processing
Juergen Schmidt (Technicolor Research & Innovation)
Infinite impulse response (IIR) filters are used in almost any signal processing area. In the field of audio applications they are used for loudspeaker equalization, crossover filtering or for sound control in mixing consoles. Modern audio applicatio ...Read More
Infinite impulse response (IIR) filters are used in almost any signal processing area. In the field of audio applications they are used for loudspeaker equalization, crossover filtering or for sound control in mixing consoles. Modern audio applications like 3D sound require many audio channels to be processed in parallel in high precision. This is often implemented using high order IIR filter chains. A straight forward implementation of IIR filters would lead to bad utilization of the GPU system due to OpenCL's inability of recursive processing, though. In this contribution, an efficient implementation will be presented circumventing the recursive processing problem of OpenCL. It allows the processing of more than 64 audio channels with IIR filters of order 40 or more with scalable latency. It is implemented for all major operating systems in a flexible OpenCL/C++ framework.   Back
 
Keywords:
Signal & Audio Processing, GTC 2014 - ID S4382
Streaming:
Download:
 
Making It Fast and Reliable: Speech Recognition with GPUs by Sequential Utilization of Available Knowledge Sources
Alexei V. Ivanov (Pervoice SPA), Fabio Brugnara (Fondazione Bruno Kessler (FBK), Trento, Italy)
Join application field experts for a discussion on the methods of speech recognition/understanding where the original task is factorized into smaller sub-tasks, that individually can be efficiently implemented in GPU. Speech recognition is an instanc ...Read More
Join application field experts for a discussion on the methods of speech recognition/understanding where the original task is factorized into smaller sub-tasks, that individually can be efficiently implemented in GPU. Speech recognition is an instance of multi-criteria optimization problem that allows sequential refinement of a pool of reasonably plausible hypotheses by pruning according to several optimality criteria originating from independent knowledge sources. Intermediate results are represented by graph-like objects: lattices and confusion networks of alternatives. Besides compactness this representation allows for efficient quantification of system's confidence in the output. Computation time analysis reveals improvements in comparison to the traditional implementation of speech recognizer. Live demonstration will be provided to illustrate advantages of the proposed approach.  Back
 
Keywords:
Signal & Audio Processing, Defense, GTC 2014 - ID S4405
Streaming:
Download:
 
GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search
Wonkyum Lee (Carnegie Mellon University)
Learn how to develop GPU-accelerated model combination for Robust Speech Recognition and Keyword Search that is built on (1) GPU-accelerated acoustic score computation for DNN and GMM models, (2) acoustic score level combination with different combin ...Read More
Learn how to develop GPU-accelerated model combination for Robust Speech Recognition and Keyword Search that is built on (1) GPU-accelerated acoustic score computation for DNN and GMM models, (2) acoustic score level combination with different combination techniques, and (3) efficiency rescoring of hypothesis over hybrid architectures of GPU and multicore CPUs. Evaluation will be given under 2013 OpenKWS evaluation task, which is challenging corpus to see how combination helps speech recognition task and keyword search task.   Back
 
Keywords:
Signal & Audio Processing, Defense, Machine Learning & AI, Mobile Applications, GTC 2014 - ID S4533
Streaming:
Download:
 
Making Games Sound as Good as They Look: Real-time Geometric Acoustics on the GPU
Zuofu Cheng (University of Illinois at Urbana-Champaign)
Geometric acoustics (GA), which involves directly simulating in real-time the acoustic transfer between sound sources and listeners in a virtual space, is considered the holy grail of game audio. We present a GA method and optimizations which along w ...Read More
Geometric acoustics (GA), which involves directly simulating in real-time the acoustic transfer between sound sources and listeners in a virtual space, is considered the holy grail of game audio. We present a GA method and optimizations which along with the massive parallelism of modern GPUs allows for immersive sound rendering at interactive frame-rates. This talk focuses on optimizations made for Fermi and Kepler GPUs on the two main components of our engine: the ray-acoustic engine and the per-path head-related transfer function (HRTF) renderer. Audio examples will be given using the open-source ID Tech 3 engine, comparing original assets from the Quake 3 game rendered via traditional positional audio to the same assets processed through our engine.  Back
 
Keywords:
Signal & Audio Processing, Virtual & Augmented Reality, Defense, Game Development, GTC 2014 - ID S4537
Streaming:
Download:
 
Signal Processing Libraries for High Performance Embedded Computing (Presented by GE)
David Tetley (GE Intelligent Platforms)
High Performance Embedded Computing (HPEC) is bringing hardware found in the Top500 supercomputers into the embedded market space. This is leading to Linux clusters consisting of a mixture of CPUs and GPUs being deployed to tackle signal and image pr ...Read More
High Performance Embedded Computing (HPEC) is bringing hardware found in the Top500 supercomputers into the embedded market space. This is leading to Linux clusters consisting of a mixture of CPUs and GPUs being deployed to tackle signal and image processing applications such as those found on Intelligence, Surveillance and Reconnaissance (ISR) platforms. Developers, whilst wanting to take advantage of the potential performance advantage of GPUs want to keep their code architecture agnostic so it can be ported to other hardware platforms without significant re-design. Whilst CUDA and OpenCL are emerging to offer this capability at a lower programing level, the industry standard Vector Signal and Image Processing API provides a higher level of abstraction with over 600 signal processing and vector math functions to choose from. This enables developers to build portable signal processing algorithms that can be targeted at either the CPU or GPU with no source code changes. This session provides an overview of the VSIPL standard and demonstrates the portability between CPU and GPU platforms.   Back
 
Keywords:
Signal & Audio Processing, Numerical Algorithms & Libraries, Performance Optimization, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4934
Streaming:
Download:
Supercomputing
Presentation
Media
Multi GPU Programming with MPI (Part I+II+III)
Jiri Kraus (NVIDIA), Peter Messmer (NVIDIA)
In this session you will learn how to program GPU clusters using the message passing interface (MPI) and OpenACC or CUDA. Part I of this session will explain how to get started by giving a quick introduction to MPI and how it can be combined with Ope ...Read More
In this session you will learn how to program GPU clusters using the message passing interface (MPI) and OpenACC or CUDA. Part I of this session will explain how to get started by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Part II will explain more advanced topics like GPU-aware MPI and how to overlap communication with computation to hide communication times. Finally, Part III will cover how to use the NVIDIA performance analysis tools in a MPI environment and give an overview of third party tools specifically designed for GPU clusters.  Back
 
Keywords:
Supercomputing, Performance Optimization, GTC 2014 - ID S4236
Streaming:
Download:
 
Optimizing a LBM code for Compute Clusters with Kepler GPUs
Jiri Kraus (NVIDIA)
To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session a LBM code applying a D2Q37 model is used as a case study to explain by example how both targets can be met. The compute ...Read More
To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session a LBM code applying a D2Q37 model is used as a case study to explain by example how both targets can be met. The compute intensive collide kernel of the LBM code is optimized for Kepler specifically considering the large amount of state needed per thread due to the complex D2Q37 model. To gain efficient inter GPU communication CUDA-aware MPI was used. We explain how this was done and present performance results on a Infiniband Cluster with GPUDirect RDMA.  Back
 
Keywords:
Supercomputing, Computational Fluid Dynamics, GTC 2014 - ID S4186
Streaming:
Download:
 
Implementation and Optimization of Three-dimension UPML-FDTD Algorithm on GPU Platform
Lei Xu (Shanghai Supercomputing Center)
The finite difference time domain (FDTD) method usually requires large amount of floating-point operations for the numerical simulation of electromagnetic (EM). The goal of this session is to present on the implementation and optimization of 3D UPML- ...Read More
The finite difference time domain (FDTD) method usually requires large amount of floating-point operations for the numerical simulation of electromagnetic (EM). The goal of this session is to present on the implementation and optimization of 3D UPML-FDTD algorithm on GPU clusters. The numerical results of EM simulation were validated by the analytic solution to the EM field with the electric dipole excitation source, and the numerical solver is found to be numerical accurate. A set of techniques are utilized to optimize the FDTD algorithm, for example, i) the application of GPU texture memory, ii) asynchronization of data transfer between CPU and GPU, iii) the concurrent kernel execution. The performance of the parallel FDTD algorithm is tested on Tesla M2070 and K20m GPU clusters. The application of the optimization techniques is found to improve the performance around 20% for 64 GPU cards. The scalability of the algorithm is shown for up to 80 Tesla K20m GPUs, where the parallel efficiency is around 90%.   Back
 
Keywords:
Supercomputing, Computational Physics, Computer Aided Design, GTC 2014 - ID S4196
Streaming:
Download:
 
Kokkos: A Manycore Device Performance Portability Library for C++ HPC Applications
H. Carter Edwards (Sandia National Laboratories)
Discover how the Kokkos library enables you to develop HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, ...Read More
Discover how the Kokkos library enables you to develop HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for achieving best performance. Thus codes must be extensively re-written to meet device specific memory access pattern requirements; e.g., data structures and loops transformed from array-of-structures patterns to structure-of-arrays patterns. We address this issue by integrating compile-time polymorphic data layout with parallel execution. We will present manycore performance portability of the LAMMPS molecular dynamics code and Trilinos/Tpetra linear solvers implemented with MPI+Kokkos, and run on a clusters with Intel Xeon Phi and NVIDIA Kepler devices.   Back
 
Keywords:
Supercomputing, Numerical Algorithms & Libraries, Programming Languages & Compilers, GTC 2014 - ID S4213
Streaming:
Download:
 
GGAS: Global GPU Address Spaces for Efficient Communication in GPU Clusters
Holger Froning (University of Heidelberg)
Modern GPUs are powerful high-core-count processors, widely used to accelerate computationally intensive general-purpose tasks. For peak performance, GPUs are distributed throughout the cluster. Current solutions typically combine the bulk-synchronou ...Read More
Modern GPUs are powerful high-core-count processors, widely used to accelerate computationally intensive general-purpose tasks. For peak performance, GPUs are distributed throughout the cluster. Current solutions typically combine the bulk-synchronous task model of GPUs with message passing semantics, which significantly increases complexity and requires the CPUs to communicate among distributed GPUs. This talk presents Global GPU Address Spaces (GGAS), which span over the device memories of GPUs at the cluster level for sharing and aggregation purposes. GGAS allow low overhead synchronization and efficient data movement between GPUs and confine control flow to the GPU domain for all computation and communication tasks. Both aspects contribute to time and energy savings. In addition, GGAS maintain the GPUs bulk-synchronous programming model by relying on a thread-collective communication model, which reduces the complexity of parallel programming on distributed GPUs significantly.  Back
 
Keywords:
Supercomputing, Clusters & GPU Management, Programming Languages & Compilers, GTC 2014 - ID S4262
Streaming:
Download:
 
Scaling Soft Matter Physics to Thousands of GPUs in Parallel
Alan Gray (EPCC, The University of Edinburgh)
Discover how to adapt a real, complex application such that it can efficiently utilize thousands of GPUs in parallel. We describe our successes in combining CUDA with MPI to simulate a wide variety of complex fluids of key importance to everyday life ...Read More
Discover how to adapt a real, complex application such that it can efficiently utilize thousands of GPUs in parallel. We describe our successes in combining CUDA with MPI to simulate a wide variety of complex fluids of key importance to everyday life. We are careful to present our work in a generalizable way, such that others can learn from our experience, follow our methodology and even re-use our highly efficient communication library. We detail our efforts to maximize both performance and maintainability, noting that we support both CPU and GPU versions (where the latter is 3.5-5 times faster comparing equal numbers of GPUs and fully-utilized CPUs). We present our work to carefully schedule and overlap lattice based operations and halo-exchange communication mechanisms, allowing excellent scaling to at least 8,192 GPUs in parallel on the Titan supercomputer.  Back
 
Keywords:
Supercomputing, Computational Fluid Dynamics, Computational Physics, GTC 2014 - ID S4285
Streaming:
Download:
 
Harnessing the Power of Titan with the Uintah Computational Framework
Alan Humphrey (Scientific Computing and Imaging Institute, University of Utah)
This presentation will discuss how directed acyclic graph (DAG) approaches provide a powerful abstraction for solving challenging engineering problems and how using this abstraction and DAG approach, computational frameworks such as Uintah can be ext ...Read More
This presentation will discuss how directed acyclic graph (DAG) approaches provide a powerful abstraction for solving challenging engineering problems and how using this abstraction and DAG approach, computational frameworks such as Uintah can be extended with relative ease to efficiently leverage GPUs, even at scale. Attendees will learn how frameworks like Uintah are able to shield the application developer from the complexities of the deep memory hierarchies and multiple levels of parallelism found in heterogeneous supercomputers such as Titan.  Back
 
Keywords:
Supercomputing, Computational Fluid Dynamics, GTC 2014 - ID S4341
Streaming:
Download:
 
GPU Cluster with Proprietary Interconnect Utilizing GPU Direct Support for RDMA
Toshihiro Hanawa (The University of Tokyo)
Learn how our proprietary interconnect effectively works and achieves good performance on the GPU cluster using GPU Direct support for RDMA. We promote the HA-PACS project at the Center for Computational Sciences, University of Tsukuba, in order to b ...Read More
Learn how our proprietary interconnect effectively works and achieves good performance on the GPU cluster using GPU Direct support for RDMA. We promote the HA-PACS project at the Center for Computational Sciences, University of Tsukuba, in order to build up the HA-PACS base cluster system, as a commodity GPU cluster, and to develop an experimental system based on the Tightly Coupled Accelerators (TCA) architecture as a proprietary interconnect connecting GPUs beyond the nodes using GPU Direct Support for RDMA mechanism. In this session, we describe the TCA architecture and the design and implementation of PEACH2 for realizing the TCA architecture using FPGA. We also introduce APIs for TCA cluster environment and show the performance of the application using TCA cluster.  Back
 
Keywords:
Supercomputing, Clusters & GPU Management, GTC 2014 - ID S4383
Streaming:
 
Exploiting Application Scalability Using GPU Direct RDMA and Infiniband on Heterogeneous Clusters
Filippo Spiga (HPCS, University of Cambridge)
One of the main limitations in application scalability on heterogeneous clusters is the fact that, prior to any communication, data has to be transfer from device memory to host memory. NVIDIA introduced GPU Direct RDMA that provides a direct Peer-to ...Read More
One of the main limitations in application scalability on heterogeneous clusters is the fact that, prior to any communication, data has to be transfer from device memory to host memory. NVIDIA introduced GPU Direct RDMA that provides a direct Peer-to-Peer data path between the GPU memory directly to/from the Infiniband card. We recently designed a new GPU cluster to be the highest performance and high scalable. The system consists of 128 Ivy-Bridge Intel nodes and 256 NVIDIA K20's. In order to exploit GPU Direct RDMA on all GPU each node is equipped with two Mellanox FDR Connect IB cards, each one sharing the same PCIe bus of one GPU. Thus the system represents the most scalable GPU cluster architecture possible today. The aim of this talk is to present improvements in application performance, showing some best practices to exploit properly GPU Direct. The pool of applications includes codes from both the scientific open-source community and industrial partners of HPCS in the domains of molecular modeling, electronic structure and CFD  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4393
Streaming:
 
Attacking HIV with Petascale Molecular Dynamics Simulations on Titan and Blue Waters
James Phillips (University of Illinois)
The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced i ...Read More
The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. When Blue Waters entered production in 2013, the first breakthrough it enabled was the complete atomic structure of the HIV capsid through calculations using NAMD, featured on the cover of Nature. How do the GPU-accelerated Cray XK7 Blue Waters and ORNL Titan machines compare to CPU-based platforms for a 64-million-atom virus simulation? Come learn the opportunities and pitfalls of taking GPU computing to the petascale and the importance of CUDA 5.5 and Kepler features in combining multicore host processors and GPUs in a legacy message-driven application.  Back
 
Keywords:
Supercomputing, Molecular Dynamics, GTC 2014 - ID S4394
Streaming:
Download:
 
Towards Real-Time Nanostructure Prediction with GPUs
Abhinav Sarje (Lawrence Berkeley National Laboratory)
Nanostructure prediction at synchrotron light-sources through X-ray scattering requires compute-intensive analysis of massive amounts of data, making it an ideal example of a Big Compute and Big Data application. In this session you will learn about ...Read More
Nanostructure prediction at synchrotron light-sources through X-ray scattering requires compute-intensive analysis of massive amounts of data, making it an ideal example of a Big Compute and Big Data application. In this session you will learn about how hybrid computing with Nvidia graphics processors and multi-core CPUs are making faster than ever data analysis at light-sources possible. Two major components of such analyses will be covered: (1) Forward simulations, and (2) Inverse Modeling. Software tools developed at the Berkeley Lab for this purpose will be taken as case-study. Details of implementations and code-optimization strategies of these software tools on massively-parallel GPU clusters will be given along with performance study on state-of-the-art supercomputers.  Back
 
Keywords:
Supercomputing, Computational Physics, GTC 2014 - ID S4443
Streaming:
Download:
 
Harnessing Irregular Parallelism: A Case Study on Unstructured Meshes
Cliff Woolley (NVIDIA)
Traversal of unstructured meshes presents an interesting challenge for massively parallel processors such as GPUs. The problem offers abundant but irregular parallelism. Fortunately this irregular parallelism can still be harnessed to provide a speed ...Read More
Traversal of unstructured meshes presents an interesting challenge for massively parallel processors such as GPUs. The problem offers abundant but irregular parallelism. Fortunately this irregular parallelism can still be harnessed to provide a speedup using GPUs. This talk presents our work on accelerating UMT2013, a benchmark that performs distributed 3D unstructured-mesh photon transport. UMT leverages both OpenMP and SIMD parallelism on CPUs, but neither by itself is sufficient to allow UMT to scale onto a GPU. Using the CPU and GPU together to detect and resolve sequential dependencies across the mesh, we can maximize parallelism.  Back
 
Keywords:
Supercomputing, Computational Physics, GTC 2014 - ID S4489
Streaming:
Download:
 
Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand
Dhabaleswar K. (DK) Panda (The Ohio State University)
Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory ...Read More
Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA, framework for MPI Datatype processing using CUDA kernels, support for heterogeneous clusters with GPU and non-GPU nodes, and more. We use the popular OSU micro-benchmark suite and example applications to demonstrate how developers can effectively take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We provide guidance on issues like processor affinity to GPU and network that can significantly affect the performance of MPI applications that use MVAPICH2.  Back
 
Keywords:
Supercomputing, Performance Optimization, Programming Languages & Compilers, GTC 2014 - ID S4517
Streaming:
Download:
 
Accelerating HPL on Heterogeneous Clusters with NVIDIA GPUs
Dhabaleswar K. (DK) Panda (The Ohio State University)
Learn about the design and use of a hybrid High-Performance Linpack (HPL) benchmark to measure the peak performance of heterogeneous clusters with GPU and non-GPU nodes. HPL continues to be used as the yardstick for ranking supercomputers around the ...Read More
Learn about the design and use of a hybrid High-Performance Linpack (HPL) benchmark to measure the peak performance of heterogeneous clusters with GPU and non-GPU nodes. HPL continues to be used as the yardstick for ranking supercomputers around the world. Many clusters, of different scales, are being deployed with only a subset of nodes equipped with NVIDIA GPU accelerators. Their true peak performance is not reported due to the lack of a version of HPL that can take advantage of all the CPU and GPU resources available. We discuss a simple yet elegant approach of a fine-grain weighted MPI process distribution to balance the load between CPU and GPU nodes. We use techniques like process reordering to minimize communication overheads. We use a real-world cluster, Oakley at the Ohio Supercomputer Center, to evaluate our approach. On a heterogeneous configuration with 32 GPU and 192 non-GPU nodes, we achieve up to 50% of the combined theoretical peak and up to 80% of the combined actual peak performance of the GPU and non-GPU nodes.  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4535
Streaming:
Download:
 
The Operational Impact of GPUs on ORNL's Cray XK7 Titan
Jim Rogers (Oak Ridge National Laboratory)
With a peak computational capacity of more than 27PF, Oak Ridge National Lab's Cray XK7, Titan, is currently the largest computing resource available to the US Department of Energy. Titan contains 18,688 individual compute nodes, where each node pai ...Read More
With a peak computational capacity of more than 27PF, Oak Ridge National Lab's Cray XK7, Titan, is currently the largest computing resource available to the US Department of Energy. Titan contains 18,688 individual compute nodes, where each node pairs one commodity x86 processor with a single NVIDIA Kepler GPU. When compared to a typical multicore solution, the ability to offload substantive amounts of work to the GPUs provides benefits with significant operational impacts. Case studies show time-to-solution and energy-to-solution that are frequently more than 5 times more efficient than the non-GPU-enabled case. The need to understand how effectively the Kepler GPUs are being used by these applications is augmented by changes to the Kepler device driver and the Cray Resource Utilization software, which now provide a mechanism for reporting valuable GPU usage metrics for scheduled work and memory use, on a per job basis.  Back
 
Keywords:
Supercomputing, Performance Optimization, GTC 2014 - ID S4670
Streaming:
Download:
 
GPU-Accelerated Ab-Initio Simulations of Low-Pressure Turbines
Richard Sandberg (Aerodynamics and Flight Mechanics Group, University of Southampton), Vittorio Michelassi (General Electric Global Research)
Gas turbines (GT) are widely applied to aircraft propulsion and power generation. A 2 to 3% GT efficiency improvement is worth 3 to 6 $B/Yr of fuel, and a corresponding reduction of environmental impact for the GE GT fleet alone. Although GT performa ...Read More
Gas turbines (GT) are widely applied to aircraft propulsion and power generation. A 2 to 3% GT efficiency improvement is worth 3 to 6 $B/Yr of fuel, and a corresponding reduction of environmental impact for the GE GT fleet alone. Although GT performance has improved considerably, it is now becoming increasingly difficult to make further advances with current design tools. High fidelity Computational Fluid Dynamics (CFD) promises to accelerate this advancement through shifting from modelling to resolving flow phenomena at unprecedented level of detail. Still, resolving all scales of turbulence present in GT constitutes a formidable computational challenge that can only be met by algorithms that exploit the latest GPU accelerated architectures. The talk will present porting of a highly efficient hybrid OpenMP/MPI flow solver to Titan using Open-ACC. Performance of the novel code version and initial results of GPU-accelerated ab-initio simulations at realistic engine conditions will be shown. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4756
Streaming:
Download:
 
GPU Acceleration of the Fine/Turbo CFD Solver at the HPC Scale
David Gutzwiller (Numeca-USA)
In response to long term trends in power consumption and cost the GPU has become a common component in many of the world's largest supercomputers. However, industrial CFD solvers are typically not well suited for quick or effective GPU porting. S ...Read More
In response to long term trends in power consumption and cost the GPU has become a common component in many of the world's largest supercomputers. However, industrial CFD solvers are typically not well suited for quick or effective GPU porting. Successfully utilizing the computational power of the GPU in an HPC environment requires very careful planning. This talk explores the GPU acceleration of the Fine/Turbo multi-block structured CFD solver through targeted porting of a computationally expensive module. The session will begin with an overview of the "CPUBooster" convergence acceleration module, which was chosen as a promising candidate for acceleration. The restructuring of this module with a GPU-oriented programming model and tuning of the new implementation for optimal performance will also be explored. Further discussion will highlight the positive impact of the GPU developments from an HPC user's point of view. Recent design optimization work by Ramgen Power Systems on the ORNL Titan supercomputer will showcase the performance gains. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4757
Streaming:
Download:
 
Accelerating Three-Body MD Potentials Using NVIDIA Tesla K20X GPUs
Masako Yamada (GE Global Research)
We will give on overview on porting a three-body molecular dynamics potential from the CPU-only Jaguar supercomputer to the hybrid CPU/GPU Titan at Oak Ridge National Lab. We achieved >5x acceleration in a 1 million molecule droplet simulation by ...Read More
We will give on overview on porting a three-body molecular dynamics potential from the CPU-only Jaguar supercomputer to the hybrid CPU/GPU Titan at Oak Ridge National Lab. We achieved >5x acceleration in a 1 million molecule droplet simulation by moving the 3-body potential and neighbor lists to the Tesla K20X GPU accelerator, while keeping the time integration, thermostat/barostat, bond/angle calculations and statistics on the AMD Opteron CPUs. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4758
Streaming:
Download:
 
High Resolution Catastrophe Modeling Using CUDA
Dag Lohmann (KatRisk)
Extreme weather and climate events are costly, dangerous, and disruptive. Our ability to estimate the current and future risk of such events is important for emergency response preparedness, climate change adaptation, relevant public policies , and i ...Read More
Extreme weather and climate events are costly, dangerous, and disruptive. Our ability to estimate the current and future risk of such events is important for emergency response preparedness, climate change adaptation, relevant public policies , and insurance. After a short introduction into catastrophe modeling we will talk about high resolution global flood risk models. With CUDA-based fluid mechanics code running on the latest generation of NVIDIA Kepler GPUs (inhouse, as well as on the Oakridge National Lab TITAN supercomputer) it is now possible to create flood maps and probabilistic flood models on 10m to 90m resolution worldwide. We will talk about specific challenges (coding, atmospheric data, terrain models, data volume, etc.) during this project and will show exciting results of our simulations. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4759
Streaming:
Download:
 
Accelerating Research and Development Using the Titan Supercomputer
Fernanda Foertter (Oak Ridge National Laboratory)
The Oak Ridge Leadership Computing Facility (OLCF) has seen a steady growth of allocation requests specifically to use GPU accelerators on Titan. These early adopters have been successful in accelerating their applications and time to results. In th ...Read More
The Oak Ridge Leadership Computing Facility (OLCF) has seen a steady growth of allocation requests specifically to use GPU accelerators on Titan. These early adopters have been successful in accelerating their applications and time to results. In this presentation, we will explore the growth and maturation of GPU use on Titan, with particular focus on important technical considerations. Early science users from academia and industry will share their motivation and experience transitioning to running on Titan and the impact of accelerators on their research. We will also discuss the future of OLCF, and forthcoming opportunities as we prepare for exascale in the next decade. Lastly an overview of OLCF user programs, with specific information on how to apply for allocations of computing, resources will be shared. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.  Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4760
Streaming:
Download:
 
System Design Considerations for GPU-Enabled HPC Solutions (Presented by Dell)
Onur Celebioglu (Dell, Inc.)
GPU accelerated systems have become an integral component of the HPC ecosystem. GPUs provide a quantum leap in performance across a wide spectrum of HPC applications. However, to fully realize these gains, it is important to design balanced systems a ...Read More
GPU accelerated systems have become an integral component of the HPC ecosystem. GPUs provide a quantum leap in performance across a wide spectrum of HPC applications. However, to fully realize these gains, it is important to design balanced systems and this talk will discuss different system-level considerations for various use cases. We will utilize HPL to analyze performance and power consumption at a system level as well as compare today's GPUs to the previous generation GPUs as applicable to highlight the improvements. The goal is to provide the audience information and best practices to design GPU enabled systems using the Kepler GPUs considering parameters such as power consumption, system size and system-level features.   Back
 
Keywords:
Supercomputing, GTC 2014 - ID S4835
Streaming:
 
HP and NVIDIA: Delivering Innovative HPC Solutions (Presented by HP)
Ed Turkel (HP)
High Performance Computing is characterized by user demand for increasing levels of performance to accomplish their science, engineering, or analytics workloads. These demands for performance growth are becoming more limited by the power, space and ...Read More
High Performance Computing is characterized by user demand for increasing levels of performance to accomplish their science, engineering, or analytics workloads. These demands for performance growth are becoming more limited by the power, space and cost of deployment of new systems. For years, HP has partnered with NVIDIA to develop HPC solutions that are purpose-built for performance and scalability, while delivering innovative energy and space efficiency, with a focus on customer ROI. This session will showcase HP and NVIDIA's latest technologies and solutions in use today by leaders in the HPC community, plus trends for the future.  Back
 
Keywords:
Supercomputing, Clusters & GPU Management, Scientific Visualization, GTC 2014 - ID S4879
Streaming:
Download:
 
Diving for Dollars: Liquid Cooling for Better TCO (Presented by Penguin Computing)
Philip Pokorny (Penguin Computing)
Penguin Computing is the largest private supplier of complete high performance computing (HPC) solutions in North America and has built and operates the leading specialized public HPC cloud service Penguin Computing on Demand (POD). Penguin Computing ...Read More
Penguin Computing is the largest private supplier of complete high performance computing (HPC) solutions in North America and has built and operates the leading specialized public HPC cloud service Penguin Computing on Demand (POD). Penguin Computing also applies its core expertise in the field of distributed large-scale enterprise computing delivering scale-out compute, storage, virtualization, and cloud solutions for organizations looking to take advantage of modern open data center architectures. Attend this session to learn about Penguin's liquid cooling that provides significant reduction in power consumption across CPU and GPU as well as saves on OPEX, CAPEX and floor space.  Back
 
Keywords:
Supercomputing, Clusters & GPU Management, Energy Exploration, GTC 2014 - ID S4936
Streaming:
Video & Image Processing
Presentation
Media
Topics in GPU-Based Video Processing
Thomas TRUE (NVIDIA)
The GPU is a high performing floating point parallel processor with extremely high memory bandwidth. This makes it ideally suited for video and image processing applications. This tutorial will present the latest techniques for optimal GPU-based vid ...Read More
The GPU is a high performing floating point parallel processor with extremely high memory bandwidth. This makes it ideally suited for video and image processing applications. This tutorial will present the latest techniques for optimal GPU-based video processing.  Back
 
Keywords:
Video & Image Processing, Performance Optimization, Media & Entertainment, Real-Time Graphics Applications, GTC 2014 - ID S4324
Streaming:
Download:
 
Detailed Overview of NVENC Encoder API
Swagat Mohapatra (NVIDIA), Abhijit Patait (NVIDIA)
This session gives a detailed overview of the NVENC encoder interface. The session will focus on the software aspect of NVENC and will have a detail tutorial how to correctly use the encoder interface to take advantage of the hardware encoder. The tu ...Read More
This session gives a detailed overview of the NVENC encoder interface. The session will focus on the software aspect of NVENC and will have a detail tutorial how to correctly use the encoder interface to take advantage of the hardware encoder. The tutorial will focus detail steps how to create HW encoder session, asynchronously use the encoder to encode frames. The second part of tutorial will focus on various knobs provided in the interface to control the performance and quality of the encoder session.  Back
 
Keywords:
Video & Image Processing, Media & Entertainment, GTC 2014 - ID S4654
Streaming:
Download:
 
SIFT Descriptor Extraction on the GPU for Large-Scale Video Analysis
Hannes Fassold (Joanneum Research)
Learn how the analysis of large-scale video data sets can be greatly accelerated by taking usage of the power of GPUs. Due to their robustness, SIFT (Scale-Invariant Feature Transform) descriptors are very popular for all sort of video analysis tasks ...Read More
Learn how the analysis of large-scale video data sets can be greatly accelerated by taking usage of the power of GPUs. Due to their robustness, SIFT (Scale-Invariant Feature Transform) descriptors are very popular for all sort of video analysis tasks. In this talk, we will first present an efficient GPU implementation of an interest point detector (e.g. using the DoG or LoG operator) and the extraction of SIFT descriptors around these interest points. We will compare the GPU implementation with the reference CPU implementation from the HessSIFT library in terms of runtime and quality. Furthermore, we will talk about the benefits of GPU-accelerated SIFT descriptors for applications such as near-duplicate video detection, which aims at detecting duplicates almost identical video segments in large video data sets, or linking video segments by shooting location or salient object.  Back
 
Keywords:
Video & Image Processing, Computer Vision, Media & Entertainment, GTC 2014 - ID S4147
Streaming:
Download:
 
Full GPU Image Processing Pipeline for Camera Applications
Fyodor Serzhenko (Fastvideo)
The goal of this session is to demonstrate how to combine fast performance and high quality for full image processing pipeline on GPU for camera applications in real time. In this session we will present: detailed analysis of GPU image processing pip ...Read More
The goal of this session is to demonstrate how to combine fast performance and high quality for full image processing pipeline on GPU for camera applications in real time. In this session we will present: detailed analysis of GPU image processing pipeline for camera and its constituent parts (Dark Frame subtraction, Flat-Field Correction, PRNU, White Balance, Demosaicing, ICC profiling and Color Management, output via OpenGL, compression to JPEG), and their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to machine vision, broadcasting and high speed imaging.  Back
 
Keywords:
Video & Image Processing, Computer Vision, Mobile Applications, GTC 2014 - ID S4151
Streaming:
Download:
 
Pronunciation Assistance Based on Automatic Speech and Facial Recognition
Maria Pantoja (Santa Clara University)
Learn how to create, develop, and implement a L2 learning assistance instructional e-tool (desktop plus mobile application) capable of assessing student's pronunciation and providing accurate corrective feed-back. The model presented integrates spee ...Read More
Learn how to create, develop, and implement a L2 learning assistance instructional e-tool (desktop plus mobile application) capable of assessing student's pronunciation and providing accurate corrective feed-back. The model presented integrates speech and image recognition technology capable of quantifying and analyzing the learner's input and providing evaluative feed-back and data to evaluate the model's performance. The image/audio analysis and the expert system needed to provide recommendations to students will be done in the GPU to allow for a fast feedback to students.  Back
 
Keywords:
Video & Image Processing, Computer Vision, Mobile Applications, GTC 2014 - ID S4267
Streaming:
Download:
 
Optimization Opportunities and Pitfalls when Implementing High Performance 2D Convolutions
Ian Wainwright (High Performance Consulting Sweden)
Learn how to develop high performance 2D convolutions using Kepler specific features, such as warp-shuffle and __restrict__ pointers. Alternative strategies, such as FFT-based and shared memory-based implementations and their disadvantages, will also ...Read More
Learn how to develop high performance 2D convolutions using Kepler specific features, such as warp-shuffle and __restrict__ pointers. Alternative strategies, such as FFT-based and shared memory-based implementations and their disadvantages, will also be presented.   Back
 
Keywords:
Video & Image Processing, Medical Imaging & Visualization, Signal & Audio Processing, GTC 2014 - ID S4297
Streaming:
Download:
 
Object Tracking Under Nonuniform Illumination Conditions
Kenia Picos (CITEDI-IPN)
The goal of this session is to demonstrate the performance of object tracking with correlation filtering by using nonuniform illuminated scenes. For this work, there are two fundamental limiters to kernel performance: memory usage and processed fram ...Read More
The goal of this session is to demonstrate the performance of object tracking with correlation filtering by using nonuniform illuminated scenes. For this work, there are two fundamental limiters to kernel performance: memory usage and processed frames per second. In this session we will describe how to use source code basis for image processing and correlation techniques. Concepts will be illustrated with an example of object recognition and tracking, using ArrayFire, a new generation of image processing libraries that improves sequential algorithms to the highly parallel GPU and multicores architectures.  Back
 
Keywords:
Video & Image Processing, Computer Vision, GTC 2014 - ID S4328
Streaming:
Download:
 
High Performance 2D Convolution and Block Matching on the GPU
Brant Zhao (NVIDIA)
2D convolution is the most basic algorithm in image processing, while 2D block matching (BM) can be found in many application areas such as stereo, motion estimation and video compression. In this work, we will present their high performance implemen ...Read More
2D convolution is the most basic algorithm in image processing, while 2D block matching (BM) can be found in many application areas such as stereo, motion estimation and video compression. In this work, we will present their high performance implementations on the GPU. The optimized versions are not only of great use in the real applications, but also expose several common performance problems encountered by many applications: low compute/bytes ratio (2D convolution), massive number of compute operations (BM), and low throughput of specific instructions (IMAD and ISAD). On GK208, for the 2D convolution case, the optimized version is math limited and has achieved 88% of the peak performance for a general filter window size. For the BM case, its peak performance is 1.7x faster than the ISAD version and our implementation achieves 85% of peak performance.   Back
 
Keywords:
Video & Image Processing, Computer Vision, Media & Entertainment, Mobile Applications, GTC 2014 - ID S4334
Streaming:
 
Optical Character Recognition with GPUs: Document Processing Throughput Increased by a Magnitude
Jeremy Reed (University of Kentucky)
Learn how an OCR engine, built from scratch for the GPU, enables businesses to turn document images into searchable, editable text several orders of magnitude faster than is possible with currently available commercial software. Several case studies ...Read More
Learn how an OCR engine, built from scratch for the GPU, enables businesses to turn document images into searchable, editable text several orders of magnitude faster than is possible with currently available commercial software. Several case studies will be presented outlining the cost and technical benefits and use cases of the technology before diving deeper into the technical details of the software itself. A demo of the software will also be given.  Back
 
Keywords:
Video & Image Processing, Big Data Analytics & Data Algorithms, GTC 2014 - ID S4345
Streaming:
Download:
 
High Performance Edge-Preserving Filter on GPU
Jonas Li (NVIDIA)
The goal of this session is to show you the GPU implementation of a novel approach for performing high-quality edge-preserving filtering of images and videos in real time. A variety of effects can be achieved based on this filter, including edge-pres ...Read More
The goal of this session is to show you the GPU implementation of a novel approach for performing high-quality edge-preserving filtering of images and videos in real time. A variety of effects can be achieved based on this filter, including edge-preserving filtering, depth-of-field effects, and stylization. We develop a CUDA-based high performance GPU implementation of edge-preserving filter. In this session, we will present our efforts to address some of the challenges with optimizing performance of edge-preserving filter on GPU. We touch upon such issues as highly-dependent workload, warp synchronization, divergent memory access and transposed data storage. Applied these optimizing approaches, The GPU implementation can filter 256 megapixel color images per second on a Tesla K20c card.   Back
 
Keywords:
Video & Image Processing, Performance Optimization, Mobile Applications, GTC 2014 - ID S4355
Streaming:
Download:
 
A Parallel GPU Solution to the Maximal Clique E