SEARCH SESSIONS

Search All
 
Refine Results:
All tags
All Events
 
All Years
All Types

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

GTC On-Demand Featured Talks

GPU computing is a transformational force in high performance computing and is enabling developers, engineers, programmers and researchers across a myriad of industry verticals, as well as academia to accelerate research and mission critical applications. See our featured sessions highlighting some of our best talks or delve head-long into the many other keynotes, technical sessions, presentations, research posters, webinars and tutorials we make available to you at any time on GTC On-Demand.

Astronomy & Astrophysics
Presentation
Media
Shooting for the Stars with GPUs
Hatem Ltaief (KAUST), Damien Gratadour (Université Paris Diderot & LESIA, Observatoire de Paris)
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the ...Read More
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the universe ever built. MOA is the most complex adaptive optics concept proposed for the E-ELT and simulating the instrument at full scale is extremely compute-intensive. The tomographic reconstructor (TR) is one of the core components of both the design simulations and eventually system operations and it requires the inversion of a large dense covariance matrix.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID S5122
Streaming:
 
GPU-Accelerated Imaging Processing for NASA's Solar Dynamics Observatory
Mark Cheung (Lockheed Martin Solar & Astrophysics Laboratory)
Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onboard SDO de ...Read More
Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onboard SDO deliver 4096x4096 pixel images at a cadence of more than one image per second. Although SDO images are free from distortion by absorption and scattering in the Earth's atmosphere, images are still blurred by the intrinsic point spread functions of the telescopes. In this presentation, we show how the instrument teams have deployed CUDA-enabled GPUs to perform deconvolution of SDO images. The presentation will demonstrate how we leveraged cuFFT and Thrust to implement an efficient image processing pipeline.  Back
 
Keywords:
Astronomy & Astrophysics, Video & Image Processing, GTC 2015 - ID S5209
Streaming:
 
Embedded Supercomputing: Radio Astronomy at the Limit
Simon Ratcliffe (SKA South Africa)
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the devel ...Read More
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the development of a highly parallel, low power, low cost imager using System on Chip devices. In particular NVIDIA's TK1 and successors are considered. The talk will also briefly describe the opportunities and solutions presented by the forthcoming Square Kilometer Array, whose processing costs require game changing technology shifts to become achievable.  Back
 
Keywords:
Astronomy & Astrophysics, Embedded Systems, GTC 2015 - ID S5222
Streaming:
Download:
 
Taranis: Ray-Traced Radiative Transfer in Smoothed Particle Hydrodynamics
Sam Thomson (University of Edinburgh)
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is mot ...Read More
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is motivated by the current intractability of coupled radiation-hydrodynamics simulations. This talk focuses on Taranis' tracing component, which has been influenced by recent work in computer graphics. It outperforms a 32-core CPU code on a single GPU. Our scheme allows particles to be updated independently and requires fewer rays than a typical 'long characteristics' method. Taranis' radiation transport solver is also implemented on the GPU, and targets large-scale simulations of reionization. However, the tracing API exists as a standalone entity.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, Rendering & Ray Tracing, GTC 2015 - ID S5266
Streaming:
Download:
 
Optimization of GPU-Based Signal Processing of Radio Telescopes
Vinay Deshpande (NVIDIA)
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio ...Read More
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio Telescope (GMRT) receiver with wide-band GPU-based back-end and extending this design as a proposal for back-end for the LOW frequency array of the SKA Telescope. We look at various processing stages involved in pipeline for exploring optimization possibilities with some interesting results already achieved.  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2015 - ID S5302
Streaming:
Download:
 
Statistics of the Universe: Exa-Calculations and Cosmology's Data Deluge
Matthew Bellis (Siena College), Deborah Bard (SLAC National Accelerator Laboratory)
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and ...Read More
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and three-point correlation functions, which quantify the clustering of galaxies. Cosmological datasets can number in the millions (and soon billions) of galaxies, making these O(N^2) and O(N^3) metrics computationally challenging. This talk will detail how we have ported solutions to the GPU. In particular we focus on the novel histogramming bottlenecks inherent in these calculations, and how they can be mitigated. Throughout we will emphasise how GPUs and heterogeneous computing can be used for everyday data analysis with large datasets.  Back
 
Keywords:
Astronomy & Astrophysics, Computational Physics, GTC 2015 - ID S5509
Streaming:
Download:
 
The Ramses Code for Numerical Astrophysics: Toward Full GPU Enabling
Claudio Gheller (ETHZ CSCS)
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that ...Read More
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that populate it: stars, galaxies, black holes The most powerful computing systems are required to pursue such goals and GPUs represent an outstanding opportunity. In this talk, we present one of these codes, Ramses, and the ongoing work to enable this code to efficiently exploit GPUs through the adoption of the OpenACC programming model. The most recent achievement will be shown together with some of the scientific challenges GPUs can help addressing.  Back
 
Keywords:
Astronomy & Astrophysics, OpenACC, Computational Physics, Supercomputing, GTC 2015 - ID S5531
Streaming:
Download:
 
Pulsar Hunting with the Square Kilometre Array
Ewan Barr (Swinburne University of Technology)
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsar ...Read More
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsars. Radio pulsars provide us with phenomenal tools with which we may probe the most extreme environments in the Universe. More massive than our Sun, yet spinning faster than a kitchen blender and sending jets of radio waves out from their magnetic poles, these exotic cosmic lighthouses are key to understanding gravity and allowing us to ask the question: was Einstein right? To answer this question we must use the SKA to scour the Galaxy in search of exotic pulsars binary systems. This task is extremely computationally expensive, requiring the execution of many billions of Fourier transforms. Here I will review the work being done to leverage the power of GPUs to solve the SKAs pulsar searching challenge.  Back
 
Keywords:
Astronomy & Astrophysics, Big Data Analytics, Supercomputing, GTC 2015 - ID S5875
Streaming:
Download:
Augmented Reality & Virtual Reality
Presentation
Media
The Future of Human Vision: Preferential Augmentation Using GPUs
Muhammad Shamim (Baylor College of Medicine)
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more genera ...Read More
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, Medical Imaging, Video & Image Processing, GTC 2015 - ID S5182
Streaming:
Download:
 
Accelerating Computer Vision and Augmented Reality via GPGPU Computing
Jack Dashwood (Metaio)
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely comp ...Read More
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID S5626
Streaming:
Download:
 
VR Direct: How NVIDIA Technology Is Improving The VR Experience
Nathan Reed (NVIDIA), Dario L. Sancho Pradel (Crytek)
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this tal ...Read More
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this talk, we'll show how developers can use NVIDIA GPUs and VR Direct to improve the gaming experience on the Oculus Rift and other VR headsets.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Game Development, Real-Time Graphics, GTC 2015 - ID S5668
Streaming:
Download:
 
Augmented Reality with Google's Project Tango and NVIDIA Technology
Wil Braithwaite (NVIDIA)
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will de ...Read More
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will demonstrate showcases the NVIDIA® VCA cluster for cloud-based rendering, NVENC for low-latency video encoding, and Google's Project Tango with the Tegra K1 processor for pose tracking and video decoding. The demo system presented can also serve graphics to multiple low-latency devices, such as a Virtual Reality HMD, at a rate much faster than the graphics are rendered.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Media & Entertainment, Real-Time Graphics, GTC 2015 - ID S5733
Streaming:
 
VR Everywhere: Consumer Virtual Reality for Desktop, Mobile and Web
Tony Parisi (Third Eye)
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU a ...Read More
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU acceleration and cheap sensors has enabled low-cost consumer-grade VR, and the rapid adoption of software development kits is paving the way for creating virtual reality apps on platforms from desktops to smartphones, and even running in your web browser using WebGL. Join VR pioneer and WebGL developer Tony Parisi as he explores this exciting frontier. This session will take a look at the latest VR hardware devices, supported operating systems and software development kits, and a wide applications already being deployed.  Back
 
Keywords:
Augmented Reality & Virtual Reality, Developer - Tools & Libraries, Real-Time Graphics, GTC 2015 - ID S5737
Streaming:
Download:
Automotive
Presentation
Media
Vision-Based Driver Assistance: Seeing the Way Forward
Ian Riches (Strategy Analytics)
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implication ...Read More
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5108
Streaming:
Download:
 
Through the Eyes of a Car: Visualizing a Car's Camera System
Gernot Ziegler (NVIDIA)
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even a ...Read More
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID S5123
Streaming:
Download:
 
Rapidly Prototyping Automotive User Experiences at Jaguar Land Rover
Matt Jones (Jaguar Land Rover)
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars. ...Read More
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars.  Back
 
Keywords:
Automotive, Embedded Systems, Manufacturing, Real-Time Graphics, GTC 2015 - ID S5137
Streaming:
 
Next Generation Surround-View for Cars
Miguel Sainz (NVIDIA), Timo Stich (NVIDIA)
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final v ...Read More
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID S5295
Streaming:
Download:
 
Pimp My Ride: How to Mod Cars with Tegra
Dave Anderson (NVIDIA)
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted int ...Read More
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted into these cars as a proof-of-concept for next-generation digital clusters and infotainment systems.  Back
 
Keywords:
Automotive, Embedded Systems, Video & Image Processing, GTC 2015 - ID S5396
Streaming:
 
Enabling Next-Gen Vehicle Architectures with Embedded Supercomputing
Uday Pitambare (Delphi)
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computin ...Read More
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computing to up-integrate traditionally disparate vehicle systems. We will also discuss the advantages and challenges involved in this process.  Back
 
Keywords:
Automotive, GTC 2015 - ID S5469
Streaming:
Download:
 
Safe and Seamless Integration of Tegra into the In-Vehicle Network
Stefaan Sonck Thiebaut (OpenSynergy)
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing ...Read More
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing and safety constraints required by the automotive industry. In addition, learn how the solution allows controlled communication between virtualized operating systems and the vehicle networks while maintaining the isolation between both.  Back
 
Keywords:
Automotive, GTC 2015 - ID S5532
Streaming:
Download:
 
Benchmarking Real-World In-Vehicle Applications
Michael Carstens-Behrens (mycable GmbH)
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world appli ...Read More
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world applications, such as infotainment systems, will find the bottlenecks in your system. Find them before the project fails or find options to transfer tasks to the GPU (e.g. using CUDA). Attendees will see how to transform your system architecture into a ""System Resource Model"" then find the ""Critical Use Cases"" of your application and match them with this model. This practical approach will show how to setup benchmarks in parallel to emulate use cases under reproducible conditions based on an example for an automotive infotainment system.  Back
 
Keywords:
Automotive, Embedded Systems, Developer - Performance Optimization, GTC 2015 - ID S5587
Streaming:
Download:
 
Self-Driving Vehicles: Changing the Mission of Human-Machine Interface
Walter Sullivan (Elektrobit)
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can i ...Read More
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can it be selected? Is the idea of a work load manager still relevant? On the other hand, autonomous driving brings new challenges for the vigilance and distraction of the driver. How can the driver be pulled back into the loop when required? When is it required? How can drivers be informed about the limits of the machine? We will also discuss methods on how to "measure" HMI and driving performance in automation, such as steering wheel reversal rate, standard deviation lane position, speed keeping and more.  Back
 
Keywords:
Automotive, Augmented Reality & Virtual Reality, GTC 2015 - ID S5588
Streaming:
Download:
 
Gesture Recognition: Using a Multi Sensor Approach
Shalini Gupta (NVIDIA)
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. ...Read More
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, GTC 2015 - ID S5599
Streaming:
Download:
 
Robust Speech Recognition for Cars
Ian Lane (Carnegie Mellon University)
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, ...Read More
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, we are able to significantly improve the performance of speech recognition, even in challenging noise conditions.  Back
 
Keywords:
Automotive, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5633
Streaming:
 
ZFAS - The Brain of Piloted Driving at Audi
Matthias Rudolph (Audi AG)
During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational ...Read More
During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5637
Streaming:
Download:
 
The Fast Lane from Silicon Valley to Munich
Uwe Higgen (BMW Group)
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. ...Read More
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.  Back
 
Keywords:
Automotive, Embedded Systems, Computer Vision & Machine Vision, GTC 2015 - ID S5789
Streaming:
Download:
 
Audi Piloted Driving: In the Fast Lane to the Future
Daniel Lipinski (Audi of America)
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car ...Read More
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5870
Streaming:
Download:
 
Ubiquitous Perceptive 3D Sensing for a Smart Internet of Things
Louay Eldada (Quanergy Systems, Inc.)
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart s ...Read More
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart sensors provide real-time information on billions of 'Things' and their surroundings (through 3D object detection, tracking, and classification) and, when needed, the ability to control them. The 'Things' include vehicles, infrastructure, buildings, homes, appliances, light controls, thermostats, medical devices, computers, and handheld devices. Growth of personal devices (phones, tablets, laptops, game consoles) is limited by the number of people in the world. The largest growth will come from connected devices in areas such as smart energy, home automation, and transportation.  Back
 
Keywords:
Automotive, Computer Vision & Machine Vision, Machine Learning & Deep Learning, GTC 2015 - ID S5918
Streaming:
 
Electronics & APIs: The Aftermarket's new Bondo
John Waraniak (Specialty Equipment Market Association (SEMA)), John Ellis (Ellis & Associates)
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the v ...Read More
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the vehicle? Drawing heavily on the Vehicle Dynamics Program, The Specialty Equipment Market Association ("SEMA") has developed the Vehicle Electronics Program to ensure that the next generation of in-car electronics realizes its full potential. Learn about this new program including the new proposed federal motor vehicle standard, FMVSS 150. In addition, we'll cover the resources and opportunities available to developers for designing and customizing vehicles.  Back
 
Keywords:
Automotive, Product Design & Styling, GTC 2015 - ID S5545
Streaming:
Download:
Big Data Analytics
Presentation
Media
From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA
Paul Richmond (University of Sheffield)
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system c ...Read More
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system can be simulated and visualized at large scales using the open source FLAME GPU framework. Methods of code generation from XML documents and use of CUDA streams for heterogeneous state execution are presented. Examples include cellular tissue modelling and large scale crowd dynamics.  Back
 
Keywords:
Big Data Analytics, Developer - Tools & Libraries, Life & Material Science, GTC 2015 - ID S5133
Streaming:
Download:
 
Coordinating More Than 3 Million CUDA Threads for Social Network Analysis
Adam McLaughlin (Georgia Institute of Technology)
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in commu ...Read More
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running BC on 192 GPUs.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Supercomputing, GTC 2015 - ID S5156
Streaming:
Download:
 
Fast Triangle Counting for Social Network Analytics on the K40
Oded Green (ArrayFire)
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks. ...Read More
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5176
Streaming:
Download:
 
Big Data on a Budget: Cost Efficient Large-Scale Graph Analytics
Joe Schneible, Ph.D. (Technica Corporation)
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include in ...Read More
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include insights into data storage structures for I/O efficient processing as well as the application of the massive parallelism of the GPU to real world graph data.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Machine Learning & Deep Learning, GTC 2015 - ID S5200
Streaming:
Download:
 
High Performance Indexing of Large Data Sets Using GPU
Massimo Bernaschi (National Research Council of Italy)
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at imp ...Read More
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at improving efficiency and reliability of the indexing process.The solution we propose is scalable and exploits in-memory computing to minimize I/O operations and enhance performance. Moreover we describe the CUDA-based parallelization of the most compute-intensive tasks involved in the indexing process. The integration of the CUDA components within an architecture that is mostly Java-based led us to develop a technique for Java-CUDA interoperability that can be applied to other applications. Some visualisation results will also be presented.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5212
Streaming:
Download:
 
Maximize the Performance of your Cluster: Marrying GPUs and Dataflow Graph Processing
Nam-Luc Tran (EURA NOVA)
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardw ...Read More
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardware. Among these the dataflow graph processing model is the most general, representing jobs as distributed operators (nodes) connected by data channels (edges). In this talk, we explain how we have extended an existing dataflow graph processing framework to fully take into account GPU resources in the cluster. We show how this paradigm fully exploits the batch and streaming features of the GPU in a distributed job. We then finally expose our model for the scheduling on this heterogeneous processing framework.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5215
Streaming:
Download:
 
Unleashing The Power Of GPUs Over The Web
Vishal Vaidyanathan (Royal Caliber)
GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure that pro ...Read More
GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure that provides a suite of GPU-driven machine learning and graph algorithms as a web service. The effortless usability of an HTTP API unlocks the power of GPU computing with none of the attendant complexities. As examples, we will show interactive analytics on web-scale graphs and deep learning on large data sets using nothing more than a modern web browser.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5224
Streaming:
 
Towards Fast SQL Query Processing in DB2-BLU Using GPUs
Sina Meraji (IBM)
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processin ...Read More
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processing in such databases, GPUs can be used as fast, high bandwidth co-processors. As part of our work, we integrate Nvidia GPUs to DB2-BLU by changing the infrastructure of DB2-BLU and developing GPU kernels. We have a hybrid design in which we use some of DB2-BLU features on IBM's POWER8 processor and NVIDIA's GPU accelerator technology for fast query processing. This work was done in collaboration with Peter Kokosielis.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5229
Streaming:
Download:
 
PG-Strom: Query Acceleration Engine of PostgreSQL Powered by GPGPU
Kohei KaiGai (NEC)
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, h ...Read More
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, however, increasion of data size makes performance concerns. PG-Strom is an extension of PostgreSQL database, designed to off-load several CPU intensive query workloads (scan, join and aggregation; right now) to GPGPU, then x10 times faster than existing SQL implementation. Its characteristics well fits usual workloads of BI (business intelligence) tools in cost effective way, but not all. PG-Strom extension is released under the GPLv2 terms, and will be supported by PostgreSQL v9.5.  Back
 
Keywords:
Big Data Analytics, GTC 2015 - ID S5276
Streaming:
Download:
 
Recent Advances in Multi-GPU Graph Processing
Giancarlo Carbone (Sapienza Universtity of Rome)
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achi ...Read More
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achieve excellent performances in the traversal, via a level synchronous Breadth First Search (BFS), of large scale graphs (i.e. million of nodes and billions of edges) using multiple GPUs system. We are going to present our recent activities related to GPU-based graph processing: a new implementation of the BFS based on a 2D partitioning exploiting the atomic operations of the Kepler architecture, two solutions to the st-connectivity problem and all-pairs shortest path. Some of these can be of immediate use in the analysis of large sets of data.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5337
Streaming:
Download:
 
GPU-Accelerated Network Centrality
Erik Saule (University of North Carolina at Charlotte, Department of Computer Science)
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) ...Read More
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) that need to be performed for ensuring the computation is correct. We will show how to interleave shortest path based computation in the context of network centrality metric to reduce the number of memory accesses and to maximize their coalescing. We will also see how the representation of the network in memory is key to balance thread divergence and the number of atomic operations.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5425
Streaming:
Download:
 
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Computing
Rajesh Bordawekar (IBM T. J. Watson Research Center), Ruchir Puri (IBM Research)
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM ...Read More
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5459
Streaming:
Download:
 
Multi-Dimensional, In-GPU-Memory Databases: Streaming Conditional Calculations in Big Data Sets
Peter Strohm (Jedox AG)
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated ...Read More
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated values as input for further processing like conditional calculating (if-then-else) or top-k evaluation and therefore often run into memory problems. We present the design of optimized condition-based processors in large data sets combined with a floating frame approach to stream through these data areas. Conditional calculations are especially useful to split large value sets into clusters for further analyzing or aggregating and we will provide examples on real world social media data including localized Twitter trends and Wikipedia page hits.  Back
 
Keywords:
Big Data Analytics, Developer - Performance Optimization, GTC 2015 - ID S5481
Streaming:
Download:
 
Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems by Extending Cloudera Impala
Jianting Zhang (The City College of New York)
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do ...Read More
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do not support geospatial data. In addition to our work on managing spatial data on single-node GPUs, we have integrated our parallel designs with an open source, a big data system called Impala to support both efficient and scalable distributed spatial query processing in an interactive SQL environment. We present system architecture, data parallel designs for spatial indexing and query processing as well as performance on real datasets for point-in-polygon test based spatial joins.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5489
Streaming:
 
Map-D: Hyper-Interactive GPU-Powered Visualytics for Big Data
Todd Mostak (Map-D)
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while sing ...Read More
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.  Back
 
Keywords:
Big Data Analytics, Data Center, Cloud Computing & HPC, Real-Time Graphics, GTC 2015 - ID S5544
Streaming:
 
Scaling Data Visualization with GPUs and Design
Leo Meyerovich (Graphistry, Inc.)
GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximize vis ...Read More
GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximize visibility. The bad news is that these layouts and basic interactions are computationally intensive enough that analysts can no longer simply slide a slider, drag a graph cluster, etc. With the availability of GPUs, however, the rules have changed. This talk shows examples of smarter designs and how we use GPUs to turn them into interactive tools. For experts, we will discuss how running in browsers and even phones led to Graphistry's tiered GPU visualization engine approach, and touch on our use of WebGL, WebCL, and our own in-house libraries.  Back
 
Keywords:
Big Data Analytics, Web Acceleration, Visualization - In-Situ & Scientific, GTC 2015 - ID S5589
Streaming:
 
Fighting Malware With GPUs in Real Time
Peter Kovac (Avast Software)
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosys ...Read More
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosystem every day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5612
Streaming:
Download:
 
Single CUDA Block Implementation of Time Synchronous Viterbi Search for Speech Recognition
Nigel Cannings (Chase Information Technology Services Limited)
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utteranc ...Read More
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utterances are decoded simultaneously using CUDA streams. The single CUDA block approach is shown to be substantially more efficient and enables overlapping of CPU and GPU computation by merging ten thousands of separate CUDA kernel calls for each utterance. The proposed approach has the disadvantage of large GPU global memory requirement because of the simultaneous decoding feature. However, the latest GPU cards with up to 12GB of global memory fulfill this requirement and the full utilization of the GPU card is possible using all available SMs.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5658
Streaming:
 
POWER8 and GPUs: Helping Unfold the Intricate Loops of Genome Architecture (Presented by IBM)
Ido Machol (Baylor College of Medicine)
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for tr ...Read More
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for true chromatin loops. This team is working with IBM POWER8 and NVIDIA Tesla GPU technologies to creating customized algorithms for enabling genomics scientists to see fine details about genome folding and learn more about genetic regulation. The maps of looping revealed thousands of hidden switches not known to have existed before. For genes that cause diseases or cancers, locating these switches is essential. GPUs help speed up these algorithms up to 200x, reducing the cycle time to process a single chromosome from a week long process to less than a coffee break.  Back
 
Keywords:
Big Data Analytics, Developer - Algorithms, Life & Material Science, GTC 2015 - ID S5821
Streaming:
Download:
 
SenDISA: Distributed Intelligent, Video, Sensor & Actuator Analytics Platform for Smart Cities (Presented by Sensen)
Dr. Subhash Challa (Sensen Networks)
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Serv ...Read More
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Services Australia, Westgate Bridge, Melbourne, Australia, City of Trondheim, Norway, City of Brisbane, Ipswich city, Manly city and more. We will present how our innovative algorithms powered by the GPGPU based SenDISA platform is enabling Big Data analytic applications by fusing data from Video, Sensor & IoT devises and combining them with other transaction data to deliver smart city solutions across the globe. We will provide insights into the architecture of SenDISA and the market specific Big Data solutions serving different market verticals.  Back
 
Keywords:
Big Data Analytics, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5869
Streaming:
Download:
Computational Fluid Dynamics
Presentation
Media
Out-of-Core Proximity Computation on GPU for Particle-Based Fluid Simulations
Duksu Kim ((KISTI) Korea Institute of Science and Technology Information)
Lean how to use your GPU for massive-scale particle-based fluid simulations that require a larger amount of memory space than the video memory. We introduce a novel GPU-based neighbor search algorithm used in particle-based fluid simulations such as ...Read More
Lean how to use your GPU for massive-scale particle-based fluid simulations that require a larger amount of memory space than the video memory. We introduce a novel GPU-based neighbor search algorithm used in particle-based fluid simulations such as SPH. With the proposed method, we can efficiently handle a massive-scale particle-based fluid simulation with a limited GPU video memory in out-of-core manner. We have demonstrated that our method robustly handles massive-scale benchmark scenes consisting of up to 65 million particles and requires up to 16 GB memory by using a GPU having only 3 GB memory. It shows up to 26 times higher performance compared to using NVIDIA's mapped memory technique and 51 times higher performance compared to using a CPU core.  Back
 
Keywords:
Computational Fluid Dynamics, Developer - Algorithms, Computational Physics, Real-Time Graphics, GTC 2015 - ID S5116
Streaming:
Download:
 
AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications
Bjoern Landmann (FluiDyna GmbH)
The presentation shows the potential of GPU acceleration for reducing turn-around times of industrial CFD applications. FluiDyna is adressing this issue in a modular approach: the library "Culises" was developed to accelerate matrix operati ...Read More
The presentation shows the potential of GPU acceleration for reducing turn-around times of industrial CFD applications. FluiDyna is adressing this issue in a modular approach: the library "Culises" was developed to accelerate matrix operations originating from arbitrary problems. This approach can be complemented by a second module that generates the linear system directly on the GPU the resulting code being less general, but allowing higher speed-up. The code aeroFluidX is a finite volume solver dedicated to incompressible aerodynamics, combining a SIMPLE algorithm for unstructured grids with state-of-the-art RANS turbulence modelling. MPI-parallelization allows calculations being split-up on multiple GPU-enabled nodes, leading to speed-ups of 2.5-3x for industrial scale problems.  Back
 
Keywords:
Computational Fluid Dynamics, Automotive, Computational Physics, Supercomputing, GTC 2015 - ID S5189
Streaming:
Download:
 
GiMMiK: Generating Bespoke Matrix-Multiplication Kernels for NVIDIA GPUs
Freddie Witherden (Imperial College London)
Learn how run-time code generation can be used to generate high-performance matrix-multiplication kernels for GPUs. In this talk, I will introduce GiMMiK, an open-source framework for generating bespoke kernels for performing block-by-panel type matr ...Read More
Learn how run-time code generation can be used to generate high-performance matrix-multiplication kernels for GPUs. In this talk, I will introduce GiMMiK, an open-source framework for generating bespoke kernels for performing block-by-panel type matrix-matrix multiplications. The techniques employed by GiMMiK will be described in detail. Benchmarks comparing GiMMiK to cuBLAS will be presented and speed-ups of up to 10x will be demonstrated. Specific applications of GiMMiK in the field of high-order computational fluid dynamics will also be highlighted.  Back
 
Keywords:
Computational Fluid Dynamics, Developer - Performance Optimization, Computational Physics, Supercomputing, GTC 2015 - ID S5207
Streaming:
Download:
 
Multiphysics Simulation Using GPUs
Arman Pazouki (University of Wisconsin-Madison)
We present a GPU-based framework for the fully-resolved simulation of interacting rigid and deformable solid objects that move in fluid flow. The fluid dynamics is based on a meshless approach. Moving Lagrangian markers, distributed in the fluid doma ...Read More
We present a GPU-based framework for the fully-resolved simulation of interacting rigid and deformable solid objects that move in fluid flow. The fluid dynamics is based on a meshless approach. Moving Lagrangian markers, distributed in the fluid domain as well as on the solid surfaces, are used to capture the fluid dynamics, fluid-solid, and solid-solid interactions. Mass and momentum exchange between neighbor markers are determined in a parallel spatial subdivision algorithm. The solid objects' distributed forces are reduced in parallel via thrust reduction algorithms and used later for temporal update via lightweight GPU kernels. Scenarios containing tens of thousands of floating rigid and flexible objects were exercised on several GPU architectures and the linear scalability was shown.  Back
 
Keywords:
Computational Fluid Dynamics, Developer - Algorithms, Computational Physics, Developer - Tools & Libraries, GTC 2015 - ID S5238
Streaming:
Download:
 
Next-Generation CFD: Real-Time Computation and Visualization
Christian Janssen (Hamburg University of Technology)
Dive deep into the fascinating world of real-time computational fluid dynamics. We present details of our CUDA-accelerated flow solver for the simulation of non-linear violent flows in marine and coastal engineering. The solver, the efficient lattice ...Read More
Dive deep into the fascinating world of real-time computational fluid dynamics. We present details of our CUDA-accelerated flow solver for the simulation of non-linear violent flows in marine and coastal engineering. The solver, the efficient lattice boltzmann environment elbe, is accelerated with recent NVIDIA graphics hardware and allows for three-dimensional simulations of complex flows in or near to real-time. Details of the very efficient numerical back end, the pre- and postprocessing tools and the integrated OpenGL visualizer tool will be presented. Join us in this talk to learn about a prototype for next-generation CFD tools for simulation-based design (SBD) and interactive flow field monitoring on commodity hardware.  Back
 
Keywords:
Computational Fluid Dynamics, Visualization - In-Situ & Scientific, Computational Physics, Real-Time Graphics, GTC 2015 - ID S5304
Streaming:
 
Rolls-Royce Hydra on GPUs Using OP2
Istvan Reguly (University of Oxford)
Learn how a Domain Specific Language can be used to accelerate a full-scale industrial CFD application. With OP2, you can easily describe your computational problem at a high level, and then generate CUDA code. We show how parallelization on an unstr ...Read More
Learn how a Domain Specific Language can be used to accelerate a full-scale industrial CFD application. With OP2, you can easily describe your computational problem at a high level, and then generate CUDA code. We show how parallelization on an unstructured mesh is handled over a cluster of GPUs, and how a range of optimizations can be automatically applied during code generation for GPUs, such as conversion from Array-of-Structures to Structure-of-Arrays, the use of shared memory or caches to improve data reuse. We demonstrate that a 4x performance increase can be achieved with a K40 GPU over a server CPU, and present scaling up to 16 GPUs.  Back
 
Keywords:
Computational Fluid Dynamics, Computational Physics, Developer - Programming Languages, GTC 2015 - ID S5318
Streaming:
Download:
 
GPU-Accelerated Fluid Flow with Compute Shaders
Maciej Matyka (University of Wroclaw)
Learn how to utilize compute shaders and write your own, efficient fluid flow solver accelerated with single GPU. First, I will introduce basics of the Lattice Boltzmann method including additional turbulence modelling. Then, an implementation in mod ...Read More
Learn how to utilize compute shaders and write your own, efficient fluid flow solver accelerated with single GPU. First, I will introduce basics of the Lattice Boltzmann method including additional turbulence modelling. Then, an implementation in modern OpenGL will be discussed. I will investigate efficiency of the code and discuss its potential applications in games, medicine and other end-user tools.  Back
 
Keywords:
Computational Fluid Dynamics, Visualization - In-Situ & Scientific, Computational Physics, GTC 2015 - ID S5343
Streaming:
Download:
 
Sparse Fluid Simulation in Direct X
Alex Dunn (NVIDIA)
How to simulate and render game-ready, high resolution fluid in real time on the GPU using DirectX. We'll present a new method for sparsely simulating and rendering traditional grid based fluid systems. By utilizing a simple CPU prediction algorithm ...Read More
How to simulate and render game-ready, high resolution fluid in real time on the GPU using DirectX. We'll present a new method for sparsely simulating and rendering traditional grid based fluid systems. By utilizing a simple CPU prediction algorithm, we can update the virtual memory table of the GPU to reflect only the active areas of a simulation volume, providing compressed memory storage and hardware level, memory translation for performing region look ups. This CPU prediction mechanism has a much wider use case than just fluid simulation, and is a must know for anyone planning on using tiled resources in the future.  Back
 
Keywords:
Computational Fluid Dynamics, Developer - Algorithms, Real-Time Graphics, GTC 2015 - ID S5756
Streaming:
Download:
Computational Physics
Presentation
Media
GPU Acceleration of HFSS Transient
Hsueh-Yung Chao (ANSYS Inc.)
HFSS Transient is a 3-D full-wave time domain electromagnetic field solver based on the discontinuous Galerkin time domain (DGTD) method. It is equipped with local time stepping for efficient simulation on hp-adaptive tetrahedral meshes. The presenta ...Read More
HFSS Transient is a 3-D full-wave time domain electromagnetic field solver based on the discontinuous Galerkin time domain (DGTD) method. It is equipped with local time stepping for efficient simulation on hp-adaptive tetrahedral meshes. The presentation will demonstrate how GPUs can benefit the solution of radiation and scattering problems involving multiscale geometry and complex materials. When there are multiple HPC tasks for parametric sweeps or network analyses with multiple excitations, the speedup with GPU acceleration scales linearly with respect to the number of GPUs. Concepts will be explained for increasing parallel efficiency on mixed-order meshes dominated by low-order elements. This work was done in collaboration with Stylianos.Dosopoulos, Senior R&D Engineer, ANSYS Inc. and Rickard Petersson, Senior R&D Manager, ANSYS Inc.  Back
 
Keywords:
Computational Physics, Manufacturing, GTC 2015 - ID S5183
Streaming:
Download:
 
Compact Cancer Killers: Simulating Next-Generation Laser-Driven Ion Accelerators with GPUs
Michael Bussmann (Helmholtz-Zentrum Dresden - Rossendorf), Axel Huebl (Helmholtz-Zentrum Dresden - Rossendorf)
Radiation therapy with ion beams precisely targets the tumor, leaving surrounding healthy tissue unharmed. Usually, ion accelerators are huge in size and thus only found in few facilities worldwide. Using high-power laser systems for accelerating the ...Read More
Radiation therapy with ion beams precisely targets the tumor, leaving surrounding healthy tissue unharmed. Usually, ion accelerators are huge in size and thus only found in few facilities worldwide. Using high-power laser systems for accelerating the ions could reduce the size and cost of such systems, potentially increasing the number of treatment facilities and thus giving more patients access to this promising therapy method. In order to bring laser acceleration of ions to application, realistic simulations of the acceleration process are needed. We present PIConGPU, a relativistic particle-in-cell plasma simulation code implemented on GPUs that is ideal for optimizing laser ion acceleration.  Back
 
Keywords:
Computational Physics, Life & Material Science, Supercomputing, GTC 2015 - ID S5193
Streaming:
Download:
 
Solutions for Efficient Memory Access for Cubic Lattices and Random Number Algorithms
Matteo Lulli ('Sapienza', University of Rome)
The cubic stencil is one of the most common data set for on-lattice algorithms and high quality random numbers are useful in many areas. Based on the lessons we learned during the development of a highly-tuned implementation of a Monte Carlo (MC) sim ...Read More
The cubic stencil is one of the most common data set for on-lattice algorithms and high quality random numbers are useful in many areas. Based on the lessons we learned during the development of a highly-tuned implementation of a Monte Carlo (MC) simulator for the three-dimensional Ising spin glass, we present solutions for an efficient memory access pattern for the cubic stencil and for lagged-Fibonacci-like PRNGs, in particular for the famous Mersenne-Twister MT19937. We will show both single- and multi-GPU results highlighting the advantages coming from our approach also in the multi-GPU settings, and a comparison of the performances of our PRNG implementations with those of the cuRand library.  Back
 
Keywords:
Computational Physics, Developer - Algorithms, GTC 2015 - ID S5220
Streaming:
Download:
 
BLAZE-DEM: A Polyhedral Discrete Element Simulation Framework for NVIDIA® Kepler GPUs.
Nicolin Govender (CSIR/University of Pretoria and University of Johannesburg)
This talk describes the DEM algorithms and heuristics that are optimized for the parallel NVIDIA® Kepler GPU architecture in detail. This includes a novel collision detection algorithm for convex polyhedra based on the separating plane (SP) metho ...Read More
This talk describes the DEM algorithms and heuristics that are optimized for the parallel NVIDIA® Kepler GPU architecture in detail. This includes a novel collision detection algorithm for convex polyhedra based on the separating plane (SP) method. In addition, we present heuristics optimized for the parallel NVIDIA® Kepler GPU architecture. Our algorithms have minimalistic memory requirements, which enables us to store data in the limited but high bandwidth constant memory on the GPU. We then systematically verify the DEM implementation after we demonstrate the computational scaling on two large-scale simulations.  Back
 
Keywords:
Computational Physics, Astronomy & Astrophysics, GTC 2015 - ID S5244
Streaming:
Download:
 
Fast and Scalable Eigenvalue Solvers for 3D Photonic Crystals on GPUs
Weichung Wang (National Taiwan University)
Explore new algorithms and techniques to solve large-scale Maxwell eigenvalue problems arising in the simulations of bandgap engineering. By using the proposed algorithms and the implementations, we have successfully computed the desired multiple int ...Read More
Explore new algorithms and techniques to solve large-scale Maxwell eigenvalue problems arising in the simulations of bandgap engineering. By using the proposed algorithms and the implementations, we have successfully computed the desired multiple interior eigenvalues of the eigensystems with dimension as large as 4.2 millions within 100 seconds by using one single GPU. The techniques are extended to multiple GPUs to solve eigenvalue problems with different wave vectors simultaneously so that we can shorten the time to plot a complete band structure diagram from days to minutes. The codes can also achieve almost linear scalability for parallel computers ranging from a workstation with multiple GPUs to a cluster with homogeneous or heterogeneous CPU and GPU.  Back
 
Keywords:
Computational Physics, Developer - Algorithms, Life & Material Science, Supercomputing, GTC 2015 - ID S5254
Streaming:
Download:
 
DiamondTile Algorithm for High-Performance Wave Modeling
Anastasia Perepelkina (KIAM RAS)
We will explain how to construct Locally Recursive non-Locally Asynchronous (LRnLA) algorithms that allow to reach peak performance for big data memory bound problems. The DiamondTile algorithm is presented for explicit stencil based modeling on GPGP ...Read More
We will explain how to construct Locally Recursive non-Locally Asynchronous (LRnLA) algorithms that allow to reach peak performance for big data memory bound problems. The DiamondTile algorithm is presented for explicit stencil based modeling on GPGPU. It is implemented for finite difference simulation of acoustic wave equation, elastic seismic media, FDTD electromagnetics, and RKDG method of gas dynamics. The resulting performance in 2nd order wave simulation exceeds that of the optimized CUDA example codes and it reaches more than 50 billions cells per second per one GPGPU device.  Back
 
Keywords:
Computational Physics, Developer - Algorithms, Energy Exploration, GTC 2015 - ID S5315
Streaming:
Download:
 
Accelerating CICE on the GPU
Rob Aulwes (Los Alamos National Laboratory)
CICE is a sea ice model that is part of the Los Alamos National Laboratory's Climate, Ocean and Sea Ice Modeling Group. CICE can be used in a fully coupled atmosphere-ice-ocean-land global climate model. It can also be used as a stand-alone applicat ...Read More
CICE is a sea ice model that is part of the Los Alamos National Laboratory's Climate, Ocean and Sea Ice Modeling Group. CICE can be used in a fully coupled atmosphere-ice-ocean-land global climate model. It can also be used as a stand-alone application. This talk presents the effort currently under way to accelerate CICE on the GPU.  Back
 
Keywords:
Computational Physics, OpenACC, GTC 2015 - ID S5322
Streaming:
 
PyFR: Next Generation Computational Fluid Dynamics on GPU Platforms
Freddie Witherden (Imperial College London)
Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is part ...Read More
Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is particularly well-suited to many-core architectures, (ii) introduce our massively parallel implementation PyFR (www.pyfr.org), which through run-time code generation is able to target NVIDIA GPU hardware and, (iii) showcase some of the high-fidelity, unsteady, flow simulations undertaken using PyFR on both desktop and HPC systems.  Back
 
Keywords:
Computational Physics, Developer - Algorithms, Computational Fluid Dynamics, Supercomputing, GTC 2015 - ID S5372
Streaming:
Download:
 
Chrono::SPIKE: A Nonsmooth Contact Dynamics Framework on the GPU
Daniel Melanz (University of Wisconsin - Madison)
The dynamic simulation of systems involving contacts between bodies is complicated by the nonsmooth nature of frictional constraints. When the number of contacts between bodies increases to millions, as in the case of granular flows in silos or in so ...Read More
The dynamic simulation of systems involving contacts between bodies is complicated by the nonsmooth nature of frictional constraints. When the number of contacts between bodies increases to millions, as in the case of granular flows in silos or in soil dynamics, the computational efficiency of traditional methods can become an issue even on supercomputers. A second-order interior point (PDIP) method is used to solve a nonlinear optimization problem entirely on the GPU. The method displays faster convergence than traditional first-order methods and calls for a significantly smaller number of iterations. To alleviate the computational bottleneck of solving large linear systems, this work uses the parallel sparse solver, SPIKE::GPU, to accelerate the PDIP solution.  Back
 
Keywords:
Computational Physics, Developer - Algorithms, Supercomputing, GTC 2015 - ID S5400
Streaming:
Download:
 
GPU vs Xeon Phi: Performance of Bandwidth Bound Applications with a Lattice QCD Case Study
Mathias Wagner (Indiana University)
Accelerators have become a key ingredient in HPC. GPUs had a head start and are already widely used in HPC applications but now are facing competition from Intel's Xeon Phi accelerators. The latter promise comparable performance and easier portabili ...Read More
Accelerators have become a key ingredient in HPC. GPUs had a head start and are already widely used in HPC applications but now are facing competition from Intel's Xeon Phi accelerators. The latter promise comparable performance and easier portability and even feature a higher memory bandwidth - key to good performance for a wide range of bandwidth-bound HPC applications. In this session we compare their performance using a Lattice QCD application as a case study. We give a short overview of the relevant features of the architectures and discuss some implementation details. Learn about the effort it takes to achieve great performance on both architectures. See which accelerator is more energy efficient and which one takes the performance crown at about 500 GFlop/s.  Back
 
Keywords:
Computational Physics, Developer - Performance Optimization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5447
Streaming:
Download:
 
GPU Parallelization is the Perfect Match with the Discrete Particle Method for Blast Analysis
Wayne Mindle (CertaSIM, LLC)
The computational efficiency of GPU technology to compute the physics of discrete particle interaction is presented. The physical problems that require this type of computation are numerous. In particular, one such area is modeling blast shields that ...Read More
The computational efficiency of GPU technology to compute the physics of discrete particle interaction is presented. The physical problems that require this type of computation are numerous. In particular, one such area is modeling blast shields that are designed to provide underbody protection of military vehicles from explosive mines. The solver technology is called the Discrete Particle Method (DPM) and by itself has proven to be an accurate and predictive tool for simulating the blast event. Combine that with parallel processing of the GPU and you have an efficient and cost effective tool as well.  Back
 
Keywords:
Computational Physics, Manufacturing, GTC 2015 - ID S5449
Streaming:
Download:
 
Practical Combustion Kinetics with CUDA
Russell Whitesides (Lawrence Livermore National Laboratory)
Find out how we transformed our algorithms for combustion kinetics to exploit parallelism on GPUs to enable ~10x speedup of combustion simulations with detailed chemistry. The necessary data access patterns and code organization required CUDA native ...Read More
Find out how we transformed our algorithms for combustion kinetics to exploit parallelism on GPUs to enable ~10x speedup of combustion simulations with detailed chemistry. The necessary data access patterns and code organization required CUDA native implementations of the thermodynamic and kinetic functions. We also exploit CUDA libraries for sparse and dense matrix factorization. The results are shown as improvements in overall simulation speedup along with cost breakdowns for the various portions of the simulation.  Back
 
Keywords:
Computational Physics, Automotive, Manufacturing, GTC 2015 - ID S5468
Streaming:
 
Dealing with Thread Divergence in a GPU Monte Carlo Radiation Therapy Simulator
Nick Henderson (Stanford University)
We present our continued efforts to produce a high performance Monte Carlo simulator for radiation therapy using CUDA and NVIDIA GPUs. The code is based on the main algorithm used in Geant4, a particle physics simulation toolkit. Our work has progres ...Read More
We present our continued efforts to produce a high performance Monte Carlo simulator for radiation therapy using CUDA and NVIDIA GPUs. The code is based on the main algorithm used in Geant4, a particle physics simulation toolkit. Our work has progressed on two fronts. First, we have improved the accuracy of the simulation predictions against computational benchmarks and some experimental data. Second, we have improved the run-time performance using CUB sort and reduce routines to mitigate thread divergence. The technique involves sorting particles into threads based on the selected physics process in each iteration of the simulation algorithm.  Back
 
Keywords:
Computational Physics, Developer - Performance Optimization, Life & Material Science, GTC 2015 - ID S5471
Streaming:
Download:
 
GPU-Accelerated Solver for the 3D Groundwater Flow Equation
Bob Zigon (Beckman Coulter)
Learn how to build a 3D solver for the groundwater flow equation that is accelerated by the GPU. The underlying mathematical model treats the entire subsurface, both saturated and unsaturated, as a whole. The governing nonlinear, time dependent, para ...Read More
Learn how to build a 3D solver for the groundwater flow equation that is accelerated by the GPU. The underlying mathematical model treats the entire subsurface, both saturated and unsaturated, as a whole. The governing nonlinear, time dependent, parabolic partial differential equation is discretized into 19 million nodes. The resulting K20-based GPU solver is 20 times faster than the original single CPU Fortran code.  Back
 
Keywords:
Computational Physics, GTC 2015 - ID S5503
Streaming:
Download:
 
Porting and Optimizing GTC-P Code to NVIDIA GPU
Bei Wang (Princeton University)
Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable particle-in-cell (PIC) code for studying microturbulence in magnetically-confined plasmas. As a representative particle-in-cell (PIC) code, GTC-P includes algorithmic level "sca ...Read More
Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable particle-in-cell (PIC) code for studying microturbulence in magnetically-confined plasmas. As a representative particle-in-cell (PIC) code, GTC-P includes algorithmic level "scatter" and "gather" operations, which feature random memory access, potential fine-grained synchronization and low computational intensity. However, it is challenging to optimize this class of irregular codes on current HPC architectures. In this talk, we will present our efforts in porting and optimizing the GTC-P code on NVIDIA GPU. In particular, we will discuss the redesign of the "shift" kernel for Kepler architecture. The performance of the code will be demonstrated on the top 7 supercomputers world-wide.  Back
 
Keywords:
Computational Physics, Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5650
Streaming:
Download:
Computer Vision & Machine Vision
Presentation
Media
Mobile Visual Search
Martin Peniak (Cortexica)
The attendees will learn about Cortexica's FindSimilar technology. Its algorithms are based on the way the human visual cortex recognises images and objects, meaning that poor lighting conditions, rotated or skewed images and other 'imperfect' ob ...Read More
The attendees will learn about Cortexica's FindSimilar technology. Its algorithms are based on the way the human visual cortex recognises images and objects, meaning that poor lighting conditions, rotated or skewed images and other 'imperfect' objects can all be recognized accurately. In this presentation, you will learn about the challenges in the field of visual search and how our company addresses them by leveraging the processing power of GPUs including the latest NVIDIA K1 processor. This session will include several demonstrations of our technology and the latest mobile applications using NVIDIA K1 processors to speedup the visual search performance.  Back
 
Keywords:
Computer Vision & Machine Vision, Embedded Systems, Video & Image Processing, GTC 2015 - ID S5131
Streaming:
 
SMTool: A GPU based Satellite Image Analysis Tool
Dilip Patlolla (Oak Ridge National Laboratory)
This session will demonstrate our advanced satellite image analytic tool referred as SMTool built on the CUDA platform to process city-scale sub-meter resolution satellite imagery to detect and discriminate man-made structures. Automated analysis of ...Read More
This session will demonstrate our advanced satellite image analytic tool referred as SMTool built on the CUDA platform to process city-scale sub-meter resolution satellite imagery to detect and discriminate man-made structures. Automated analysis of large-scale high resolution satellite imagery requires computationally efficient image representation techniques that characterize the distribution of structures in the scene. The interesting structures could range from simple edges, lines, to complex shapes of objects on the ground.Different representation techniques and their careful implementation exploiting the GPU architecture will be reviewed. We present results of SMTool from our ongoing work supporting global-scale population mapping and polio eradication and immunization efforts.  Back
 
Keywords:
Computer Vision & Machine Vision, Big Data Analytics, Machine Learning & Deep Learning, Supercomputing, GTC 2015 - ID S5201
Streaming:
Download:
 
Real-Time and High Resolution Feature Tracking and Object Recognition
Peter Andreas Entschev (ArrayFire)
This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have chall ...Read More
This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have challenged researchers for decades. Over the last 15 years, numerous approaches were proposed to solve these problems, some of the most important being SIFT, SURF and ORB. Traditionally, these approaches are so computationally complex that processing more than a few frames per second is impossible. Using an NVIDIA K20 GPU with ORB, we are able to process more than 30 frames per second on images in the order of 10000x10000 pixels. Multiple quality and timing benchmarks will be presented, covering some of the most robust feature tracking methods.  Back
 
Keywords:
Computer Vision & Machine Vision, Developer - Algorithms, Video & Image Processing, GTC 2015 - ID S5205
Streaming:
Download:
 
Tracking Objects Better, Faster, Longer
Alptekin Temizel (Middle East Technical University)
In this talk, we demonstrate a real-time long-term-tracker, Hybrid-TLD (H-TLD), which is based on the recently proposed Tracking-Learning-Detection (TLD) framework. TLD simultaneously tracks the object, learns its appearance and detects when it re-ap ...Read More
In this talk, we demonstrate a real-time long-term-tracker, Hybrid-TLD (H-TLD), which is based on the recently proposed Tracking-Learning-Detection (TLD) framework. TLD simultaneously tracks the object, learns its appearance and detects when it re-appears. While it has been shown to have promising results, its high computational cost prohibits running it at higher resolutions and frame-rates. We present our analysis of the framework and our modifications to make it work effectively on a CPU-GPU hybrid setting with a high utilization of both processing units using OpenMP and CUDA. Our results show that 10.25 speed up at 1920x1080 resolution could be obtained. The source code of the developed H-TLD library has been made publicly available.  Back
 
Keywords:
Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5221
Streaming:
Download:
 
Power Efficient Visual Computing on Mobile Platforms
Brant ZHAO (NVIDIA)
Tegra K1 brings desktop GPU into mobile world which makes it possible for mobile platforms to succeed in more and more complex visual computing tasks. With future more powerful Tegra family chips, much more compute applications are expected for the m ...Read More
Tegra K1 brings desktop GPU into mobile world which makes it possible for mobile platforms to succeed in more and more complex visual computing tasks. With future more powerful Tegra family chips, much more compute applications are expected for the mobile world. Besides performance tuning for all these applications, it's also critical to make them power efficient as they are running on mobile devices which have limited power budget. In this work, we will present methodology of doing power analysis and optimizations for mobile computing workloads. Three case studies will be presented to explain the three items of the methodology:(1) Analyze the whole pipeline at system level; (2) Using energy efficient features of the target platforms; (3) Reduce the total instruction count to save energy.  Back
 
Keywords:
Computer Vision & Machine Vision, Developer - Performance Optimization, Automotive, GTC 2015 - ID S5255
Streaming:
Download:
 
Development of a GPU Accelerated Visual Tracking Framework
David Concha (Universidad Rey Juan Carlos)
This session presents the development of a visual tracking system whose ultimate goal is to track multiple articulated objects. Throughout the development, different technologies for GPU programming are used, like OpenGL, Cg and CUDA; various types o ...Read More
This session presents the development of a visual tracking system whose ultimate goal is to track multiple articulated objects. Throughout the development, different technologies for GPU programming are used, like OpenGL, Cg and CUDA; various types of sensor such as cameras or Kinects; and different methodologies like particle filters, Kalman filter or Variable Neighborhood Search (VNS) metaheuristic.  Back
 
Keywords:
Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5317
Streaming:
Download:
 
SceneNet: 3D Reconstruction of Videos Taken by the Crowd on GPU
Chen Sagiv (SagivTech Ltd.)
If you visited a rock concert recently you probably recognized how many people are taking videos of the scenario, using their mobile phone cameras.The aim of SceneNet is to use these multiple video sources to create a high quality 3D video scene that ...Read More
If you visited a rock concert recently you probably recognized how many people are taking videos of the scenario, using their mobile phone cameras.The aim of SceneNet is to use these multiple video sources to create a high quality 3D video scene that can be shared via social networks. The SceneNet pipeline starts at the mobile device where the video streams are acquired, pre-processed and transmitted to the server, where the various video streams are registered and submitted to 3D reconstruction. We will share the compute challenges of SceneNet and the GPU based acceleration on mobile devices and the server, from pre-processing on the mobile device to extremely computationally demanding algorithms such as bundle adjustment and 3D reconstruction. SceneNet is a FP7 European funded project.  Back
 
Keywords:
Computer Vision & Machine Vision, Developer - Algorithms, Video & Image Processing, GTC 2015 - ID S5333
Streaming:
Download:
 
A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience
John Long (New York University Langone Medical Center)
Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, ...Read More
Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, inspect, and refine complex kinematic models. Our custom markerless motion capture system, in conjunction with our use of high-density silicon neural implants (? 100 channels), provides an unprecedented glimpse into the relationship between the brain, memory, and behavior.  Back
 
Keywords:
Computer Vision & Machine Vision, Developer - Algorithms, Life & Material Science, GTC 2015 - ID S5362
Streaming:
Download:
 
GPU + Drones + 3D Imaging for Precision Farming and Construction
Bingcai Zhang (BAE Systems)
Agriculture and construction are two of the largest industries in the world. Democratization of 3-D imaging technology with drones, digital cameras, and GPU is applicable for precision farming and construction. Precision farming can increase crop yie ...Read More
Agriculture and construction are two of the largest industries in the world. Democratization of 3-D imaging technology with drones, digital cameras, and GPU is applicable for precision farming and construction. Precision farming can increase crop yields, reduce pollution, save water, and increase productivity. The demand for precision farming has since increased, however, with more people living on planet Earth with fixed natural resources. Timely precise 3-D measurements are important for construction. Today, most of these 3-D measurements are obtained manually. BAE Systems is developing GPU-accelerated 3-D imaging technology with drone images for precision farming and construction.  Back
 
Keywords:
Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5373
Streaming:
Download:
 
Maximizing Face Detection Performance on GPUs
Paulius Micikevicius (NVIDIA)
In this talk we look at GPU performance optimization for face detection using various techniques and features, including cascades with Haar-like features, multi-block local binary patterns. For each approach we examine various implementation tradeoff ...Read More
In this talk we look at GPU performance optimization for face detection using various techniques and features, including cascades with Haar-like features, multi-block local binary patterns. For each approach we examine various implementation tradeoffs and their performance limiters, as well as performance dependence on data. We also investigate optimization by combining the approaches and by doing additional work pruning.  Back
 
Keywords:
Computer Vision & Machine Vision, Developer - Performance Optimization, GTC 2015 - ID S5457
Streaming:
Download:
 
CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service
Dhruv Batra (Virginia Tech)
In this talk, Attendees can expect to learn about CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms as a cloud service. Our goal is to democratize computer vision; one should not have to ...Read More
In this talk, Attendees can expect to learn about CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms as a cloud service. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. As the first step, CloudCV is focused on object detection and localization in images. CloudCV provides APIs for detecting if one of 200 different object categories such as entities (person, dog, cat, horse, etc), indoor objects (chair, table, sofa, etc), outdoor objects (car, bicycle, etc) are present in the image.  Back
 
Keywords:
Computer Vision & Machine Vision, Data Center, Cloud Computing & HPC, Machine Learning & Deep Learning, GTC 2015 - ID S5474
Streaming:
Download:
 
GPU Accelerated Haze Removal on Tegra K1
Bin Zhou (University of Science and Technology of China)
This talk shows how Tegra K1 GPU accelerates the dehazing process for outdoor computer vision systems. Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adaptin ...Read More
This talk shows how Tegra K1 GPU accelerates the dehazing process for outdoor computer vision systems. Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge processing requirements bring big challenges. We refined the parallel algorithm and performed deep-optimization on Tegra K1 Jetson platform. Compared to ARM CPU, experiments show 156x speedup. The results show Tegra K1 has great potential for embedded real-time computer vision processing.  Back
 
Keywords:
Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5546
Streaming:
 
Building State-of-Art Face Processing Pipeline with GPU
Shuchang Zhou (Megvii Inc.)
Megvii Inc. revisited face-related problems with deep learning techniques powered by GPU. Substantial progress had been made and performance kept increasing with inflowing of data. This brings facial recognition technique closer to solving the identi ...Read More
Megvii Inc. revisited face-related problems with deep learning techniques powered by GPU. Substantial progress had been made and performance kept increasing with inflowing of data. This brings facial recognition technique closer to solving the identity problem, which is fundamental to security, credibility and accountability of Internet. Availability and power-efficiency of GPU enables Megvii to explore deeper and more complex neural network topology, handle higher resolution images and videos, and extend to embedded devices of more limited power profile. As time of writing, Face++ of Megvii is a leading face recognition service provider on cloud, and has processed more than 40 billion images and run on 50 million devices.  Back
 
Keywords:
Computer Vision & Machine Vision, Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5577
Streaming:
Download:
 
Stereovision and the Future of Autonomous Machines
Edwin Azzam (STEREOLABS)
Discover how stereovision and 3D depth sensing on mobile GPUs enable the development of future autonomous cars, drones and robots. We will discuss the benefits and challenges of using stereo cameras as depth sensing sensors, and how to leverage the p ...Read More
Discover how stereovision and 3D depth sensing on mobile GPUs enable the development of future autonomous cars, drones and robots. We will discuss the benefits and challenges of using stereo cameras as depth sensing sensors, and how to leverage the power of embedded GPU to overcome these challenges. We will also show demonstrations of how the technology can be used to create 3D surrounding reconstruction, detect obstacles and navigate autonomously.  Back
 
Keywords:
Computer Vision & Machine Vision, Automotive, Video & Image Processing, GTC 2015 - ID S5751
Streaming:
 
Achieving Real-Time Performances on Facial Motion Capture and Animation on Mobile GPUs
Emiliano Gambaretto (Mixamo)
3D Animation is one of the most prominent forms of contemporary art, with millions people drawn to its emotional power in movie theaters and games every year. Mixamo developed a GPU-powered facial capture and animation technology to enable users to a ...Read More
3D Animation is one of the most prominent forms of contemporary art, with millions people drawn to its emotional power in movie theaters and games every year. Mixamo developed a GPU-powered facial capture and animation technology to enable users to animate a character's face in real-time. This technology, originally targeted to desktop and laptop GPUs, is now available on mobile thanks to the improved performance of new generation hardware. This presentation will focus on the challenges faced and the strategies adopted to port this technology to Tegra K1 powered devices. We adopted two parallel approaches: one approach optimized our tracking algorithm and ported our code to CUDA (from OpenCL); the other completely changed facial tracking paradigm focusing on an intrinsically faster machine learning approach based on a cascade of simple regressors. We will compare performances and strengths of both approaches.  Back
 
Keywords:
Computer Vision & Machine Vision, Developer - Performance Optimization, Developer - Algorithms, GTC 2015 - ID S5761
Streaming:
 
Project Tango Tablet: Application Rapid Fire Presentations
Larry Yang (Google), Eric Lee (Left Field Labs), Jeff Schmitz (NVYVE), Iman Mostafavi (Limbic)
Come to hear from the first wave of application developers exploring the unique odometry and depth sensor capabilities of Google's Tango Tablet using Tegra K1. Five leading-edge developers will showcase the applications that they are developing for ...Read More
Come to hear from the first wave of application developers exploring the unique odometry and depth sensor capabilities of Google's Tango Tablet using Tegra K1. Five leading-edge developers will showcase the applications that they are developing for Tango, how they are using Tango's spatial awareness and share the lessons learned so far.  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2015 - ID S5816
Streaming:
Download:
Data Center, Cloud Computing & HPC
Presentation
Media
Cluster Monitoring and Management Tools
Rajat Phull (NVIDIA), Rob Todd (NVIDIA)
Learn about the monitoring and management tools that NVIDIA provides for professional GPUs in HPC, cluster and datacenter environments. This talk will provide a high level overview of the relevant APIs and utilities, dive more deeply into several new ...Read More
Learn about the monitoring and management tools that NVIDIA provides for professional GPUs in HPC, cluster and datacenter environments. This talk will provide a high level overview of the relevant APIs and utilities, dive more deeply into several new features from recent CUDA releases, and review how this functionality is integrated into user environments and 3rd-party software.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5144
Streaming:
Download:
 
Data Movement Options for Scalable GPU Cluster Communication
Benjamin Klenk (Ruprecht-Karls University of Heidelberg)
In this talk we will explore how existing communication models map to GPUs and what advantages specialized communication models for GPUs offer. GPU computing is used pervasively for many reasons including performance increase and improved energy effi ...Read More
In this talk we will explore how existing communication models map to GPUs and what advantages specialized communication models for GPUs offer. GPU computing is used pervasively for many reasons including performance increase and improved energy efficiency. The Green500 list reveals that the top 10 most energy-efficient computing clusters rely on GPU acceleration. GPU computing at cluster-level is challenging though, as communication models match poorly and hybrid programming models like CUDA+MPI have to be employed. This talk provides observations and insights from experiments with different communication models, and shows promising paths to overcome these limitations.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5146
Streaming:
 
Achieving Near-Native GPU Performance in the Cloud
John Paul Walters (USC Information Sciences Institute)
Explore the use of GPUs in virtualized environments. In this session we describe how GPUs can be used within virtual environments with near-native performance. We begin by showing GPU performance across four hypervisors: VMWare ESXi, KVM, Xen, and LX ...Read More
Explore the use of GPUs in virtualized environments. In this session we describe how GPUs can be used within virtual environments with near-native performance. We begin by showing GPU performance across four hypervisors: VMWare ESXi, KVM, Xen, and LXC. After showing that performance characteristics of each platform, we extend the results to the multi-node case with nodes interconnected by QDR InfiniBand. We demonstrate multi-node GPU performance using GPUDirect-enabled MPI, achieving efficiencies of 97-99% of a non-virtualized system. Examples are drawn from signal processing, big data analytics, and molecular dynamics. The session will conclude with a discussion of the next steps in extending HPC to virtual environments, including our work with the OpenStack platform.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5323
Streaming:
Download:
 
E4-ARKA: ARM64+GPU+IB is Now Here (Presented by E4 Computer Engineering)
Piero Altoè (E4 Computer Engineering)
E4 Computer Engineering introduces ARKA the first server solution based on ARM 64 bit SoC dedicated to HPC. The compute node is boosted by discrete GPU NVIDIA cards K20, and have both 10Gb ethernet and FDR infiniband networks implemented by default. ...Read More
E4 Computer Engineering introduces ARKA the first server solution based on ARM 64 bit SoC dedicated to HPC. The compute node is boosted by discrete GPU NVIDIA cards K20, and have both 10Gb ethernet and FDR infiniband networks implemented by default. In the present talks the hardware configuration of the compute node is described in detail, and to demonstrate the unique capabilities of the ARM+GPU+IB combination, many synthetic benchmarks and application tests are reported, with particular attention to molecular dynamics software.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, Life & Material Science, Supercomputing, GTC 2015 - ID S5422
Streaming:
Download:
 
High-Performance Broadcast with GPUDirect RDMA and InfiniBand Hardware Multicast for Streaming Applications
Dhabaleswar K. (DK) Panda (The Ohio State University)
Learn about the latest developments in middleware design that boosts the performance of GPGPU based streaming applications. Several middlewares already support communication directly from GPU device memory and optimize it using various features offer ...Read More
Learn about the latest developments in middleware design that boosts the performance of GPGPU based streaming applications. Several middlewares already support communication directly from GPU device memory and optimize it using various features offered by the CUDA toolkit, providing optimized performance. Some of these middlewares also take advantage of novel features like hardware based multicast that high performance networks like InfiniBand offer to boost broadcast performance. This talk will focus on challenges in combining and fully utilizing GPUDirect RDMA and hardware multicast features in tandem to design support for high performance broadcast operation for streaming applications. Performance results will be presented to demonstrate the efficacy of the proposed designs.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, Developer - Tools & Libraries, Supercomputing, GTC 2015 - ID S5507
Streaming:
 
HP/NVIDIA Solutions for HPC Compute and Visualization Performance (Presented by HP)
Ed Turkel (HP)
High Performance Computing is characterized by user demand for increasing levels of computational performance, combined with exploding volumes of data, to accomplish their science, engineering, or analytics workloads. Demands for performance growth a ...Read More
High Performance Computing is characterized by user demand for increasing levels of computational performance, combined with exploding volumes of data, to accomplish their science, engineering, or analytics workloads. Demands for performance growth are becoming increasingly limited by the power, space and cost of deployment of new systems, while exploding data volumes challenge traditional client/server computing models. For years, HP has partnered with NVIDIA to develop HPC solutions that are purpose-built for compute and visualization performance and scalability, while delivering innovative energy and space efficiency, with a focus on customer ROI. This session will showcase HP and NVIDIA's latest technologies and solutions in use today by leaders in the HPC community, plus trends for the future.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, GTC 2015 - ID S5825
Streaming:
Download:
 
Supermicro's Application Optimized GPU System Solutions: Winning Strategies for Selecting Best Platforms (Presented by Supermicro)
Don Clegg (Supermicro)
As GPU-enabled computing matures, selecting the best hardware platform is more essential than ever. Successful enterprises understand the importance of optimizing compute power and density, I/O bandwidth and latency, plus electrical power-efficiency ...Read More
As GPU-enabled computing matures, selecting the best hardware platform is more essential than ever. Successful enterprises understand the importance of optimizing compute power and density, I/O bandwidth and latency, plus electrical power-efficiency and cooling to ideally match the intended application within the specified budget. Supermicro, with its industry-leading building-block solutions, delivers the most comprehensive range of GPU-optimized platforms on the market. This presentation, featuring Supermicro's FatTwin, Twin, SuperBlade andrack/tower building blocks will highlight some of the most important architectural innovations to consider when selecting the best GPU platforms.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, GTC 2015 - ID S5865
Streaming:
 
3D Cloud Workstations: Scyld Cloud Workstation (Presented by Penguin Computing)
Gary Yee (Penguin Computing), Thomas Ruge (Colorado Code Craft)
Learn why Scyld Cloud Workstation, a browser-based, high quality, low-bandwidth, 3D accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downl ...Read More
Learn why Scyld Cloud Workstation, a browser-based, high quality, low-bandwidth, 3D accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated - allowing for easy integration with industry security policies.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, Graphics Virtualization, GTC 2015 - ID S5867
Streaming:
 
Huawei and NVIDIA Collaborations in Optimized GPU Computing Solutions (Presented by Huawei)
Gary Xia (Huawei), Francis Lam (Huawei)
This presentation outlines the strong collaboration between Huawei and NVIDIA to offer state-of-the-art optimized GPU computing solutions. An overview of Huawei HPC computing systems and use cases will be presented in addition to a deeper dive into t ...Read More
This presentation outlines the strong collaboration between Huawei and NVIDIA to offer state-of-the-art optimized GPU computing solutions. An overview of Huawei HPC computing systems and use cases will be presented in addition to a deeper dive into the innovative Huawei FusionAccess remote desktop solution.  Back
 
Keywords:
Data Center, Cloud Computing & HPC, GTC 2015 - ID S5868
Streaming:
Download:
Defense
Presentation
Media
Parallel Breadth First Search on GPU Clusters
Bryan Thompson (SYSTAP, LLC)
The goal of this session is to demonstrate our work on scalable and high-performance BFS on GPU clusters. Our proposed implementation achieves over 30 billion edges traversed per second on a cluster of 64 GPUs. The SIMT architecture of the GPUs, the ...Read More
The goal of this session is to demonstrate our work on scalable and high-performance BFS on GPU clusters. Our proposed implementation achieves over 30 billion edges traversed per second on a cluster of 64 GPUs. The SIMT architecture of the GPUs, the imbalance of the GPU memory and the communication bandwidths, and the irregularity nature of the graphs make it difficult to develop efficient scalable graph analytics programs. In this session, we present the secret ingredients of our BFS implementation that help us overcome those difficulties and achieve high performance and scalability. We also show the performance and scalability characteristics of our implementation with a wide range of synthetic and real-life graphs. This is a collaborative work with Dr. Martin Berzins and Harish Kumar Dasari from the University of Utah.  Back
 
Keywords:
Defense, Big Data Analytics, Developer - Performance Optimization, GTC 2015 - ID S5154
Streaming:
Download:
 
Acceleration of Electromagnetic Scattering from Discrete Bodies of Revolution (DBOR)
Eric Dunn (Leidos)
1987 was the year that "The Simpsons" first aired on TV - now the longest-running scripted show in television history. Perhaps a slightly lesser known streak is that in 1987 one of the first studies of electromagnetic scattering from discre ...Read More
1987 was the year that "The Simpsons" first aired on TV - now the longest-running scripted show in television history. Perhaps a slightly lesser known streak is that in 1987 one of the first studies of electromagnetic scattering from discrete bodies of revolution (DBOR) was published. Today, more than 25 years later, this same algorithm is still in use. Come join us to learn how we have used the latest in GPU technology to continue to accelerate this legendary algorithm - including our newest results using a library called Momentous (developed by TechX) that enables distributed GPU matrix factorization.  Back
 
Keywords:
Defense, Computational Physics, Supercomputing, GTC 2015 - ID S5197
Streaming:
Download:
 
Advanced Geospatial Image Processing Using Graphics Processing Units
Ronald Kneusel (Exelis Visual Information Solutions), Atle Borsholm (Exelis Visual Information Solutions)
Attendees will learn about advanced geospatial algorithms implemented for GPUs and integrated with existing high-level programming and analysis environments. Geospatial imagery presents a unique challenge for GPU analysis because of its massive size, ...Read More
Attendees will learn about advanced geospatial algorithms implemented for GPUs and integrated with existing high-level programming and analysis environments. Geospatial imagery presents a unique challenge for GPU analysis because of its massive size, often over 32 GB and larger per image. This talk will introduce a library and framework for working with geospatial images from within existing tools while allowing the user to easily develop new kernels or make use of the existing library of geospatial algorithms optimized for the GPU.  Back
 
Keywords:
Defense, Developer - Performance Optimization, Developer - Algorithms, Developer - Tools & Libraries, Video & Image Processing, GTC 2015 - ID S5236
Streaming:
Download:
 
Accelerating Automated Image Processing Pipelines for Cameras with Novel CFAs on GPUs
Qiyuan Tian (Stanford University), Haomiao Jiang (Stanford University)
L3 (Local, Linear, Learned) is a new technology to automate and customize the design of image processing pipelines for cameras with novel architecture, such as unconventional color filter arrays. L3 classifies sensor image pixels into categories that ...Read More
L3 (Local, Linear, Learned) is a new technology to automate and customize the design of image processing pipelines for cameras with novel architecture, such as unconventional color filter arrays. L3 classifies sensor image pixels into categories that are local in space and response and automatically learns linear operators that transform pixels to the calibrated output space using training data from camera simulation. The local and linear processing of individual pixels makes L3 ideal for parallelization. We accelerated the L3 pipeline on NVIDIA® Shield Tablets using GPUs for real time rendering of video captured by a multispectral camera prototype. The combination of L3 and GPUs delivers high performance with low power for image processing on mobile devices.  Back
 
Keywords:
Defense, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5251
Streaming:
Download:
 
Implementing Graph Analytics with Python and Numba
Siu Kwan Lam (Continuum Analytics, Inc), Stanley Seibert (Continuum Analytics, Inc)
We demonstrate how to implement the densest k-subgraph algorithm by Papailiopoulos et al, using the Numba CUDA compiler for Python. With the rise of social networks, more data scientists want to study the connections within and between the communitie ...Read More
We demonstrate how to implement the densest k-subgraph algorithm by Papailiopoulos et al, using the Numba CUDA compiler for Python. With the rise of social networks, more data scientists want to study the connections within and between the communities that dynamically organize on the Internet. Python is a very productive language for data scientists, but, on its own, may not provide the performance needed to analyze big data sets. To bridge this gap, the Numba compiler allows CUDA kernels to be written directly in the Python language and compiled for GPU execution. Using the densest k-subgraph algorithm as an example, we will show how the agility of Python can be combined with the high performance of GPU computing for graph analytics.  Back
 
Keywords:
Defense, Big Data Analytics, Developer - Algorithms, Developer - Programming Languages, GTC 2015 - ID S5419
Streaming:
Download:
 
Creating Dense Mixed GPU and FPGA Systems With Tegra K1 Using OpenCL
Lance Brown (Colorado Engineering Inc)
With the introduction of comprehensive OpenCL support and IEE754 hard floating point units for Altera FPGAs and availability of NVIDIA® Tegra® K1 GPUs, opportunities for designing compact solutions that used to require many discrete boards ca ...Read More
With the introduction of comprehensive OpenCL support and IEE754 hard floating point units for Altera FPGAs and availability of NVIDIA® Tegra® K1 GPUs, opportunities for designing compact solutions that used to require many discrete boards can now be done in small form factors for Distributed Aperture Systems (DAS), Situational Awareness 360 (SA360), Digital Signal Processing (DSP) and 100s of other high performance embedded computing (HPEC) from mil-aero to commercial to industrial to medical to consumer applications. Funded by Missile Defense Agency, Lance Brown will discuss the challenges and benefits of using multiple Altera Arria 10 FPGAs and multiple NVIDIA® Tegra® K1 GPUs on a single card to speed up 6 degrees of freedom simulations.  Back
 
Keywords:
Defense, Embedded Systems, Computer Vision & Machine Vision, Signal & Audio Processing, GTC 2015 - ID S5429
Streaming:
Download:
 
GPUdb: GPU-Accelerated Distributed Database
Eli Glaser (GIS Federal)
GPUdb is a high performance GPU-accelerated distributed database. Users can ingest arbitrary data and then run queries against the data via an SQL-like syntax. Queries are handled by our highly optimized GPU-accelerated distributed back-end. Complex ...Read More
GPUdb is a high performance GPU-accelerated distributed database. Users can ingest arbitrary data and then run queries against the data via an SQL-like syntax. Queries are handled by our highly optimized GPU-accelerated distributed back-end. Complex filters and server-side visualizations typically complete in under one second, even for billions of objects. Find out how GPUdb can help you solve your big data challenges.  Back
 
Keywords:
Defense, Big Data Analytics, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5484
Streaming:
Download:
 
Real-Time Image Segmentation for Homeland Security Exploiting Hyper-Q Concurrency
Fanny Nina-Paravecino (Northeastern University)
This talk will describe how concurrent kernel execution with Hyper-Q can impact our national security. By exploiting 32 concurrent work queues between the host and the device, we can identify the contents of baggage using CT images. This talk focuses ...Read More
This talk will describe how concurrent kernel execution with Hyper-Q can impact our national security. By exploiting 32 concurrent work queues between the host and the device, we can identify the contents of baggage using CT images. This talk focuses on using Hyper-Q for real-time image segmentation as applied to luggage scanning at airports. Image segmentation plays a key role in this compute pipeline the accuracy and real-time constraints of the application pose computational barriers. We discuss our ability to scale the number of streams using Hyper-Q, run on an NVIDIA GK110. We are able to achieve a ~47x speedup when processing 32 megapixels vs. an optimized OpenMP implementation running on an Intel Core i7-3770K.  Back
 
Keywords:
Defense, Developer - Performance Optimization, Developer - Algorithms, Video & Image Processing, GTC 2015 - ID S5510
Streaming:
Download:
 
Applying MapGraph: GPU-Accelerated Merlin Decision Support and COA Generation for Electronic Warfare Operations
Brad Bebee (SYSTAP, LLC), Matthew Goldsbury (Chesapeake Technology International Corp.)
This session presents work by Chesapeake Technology International Corp (CTI) and SYSTAP to accelerate automated decision support and course of action (COA) generation using MapGraph. COA generation is an enabling analytic for tactical Airborne Electr ...Read More
This session presents work by Chesapeake Technology International Corp (CTI) and SYSTAP to accelerate automated decision support and course of action (COA) generation using MapGraph. COA generation is an enabling analytic for tactical Airborne Electronic Attack (AEA) and for combined cyber and electronic warfare operations. CTI developed the Merlin capability to construct an automated decision space within dense, operationally-relevant environments. This capability enabled solutions to complete problems within tactically-relevant timelines (seconds or minutes) rather than hours. Many of these analytics may be represented as data-parallel graph analytics. GPU-acceleration enables new capabilities to provide the operator multiple COAs in near-realtime with dynamic updates in seconds.  Back
 
Keywords:
Defense, Big Data Analytics, Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5576
Streaming:
Download:
Developer - Algorithms
Presentation
Media
Accurate Floating-Point Summation in CUB
Uri Verner (NVIDIA)
We address the problem of accurate parallel floating-point summation. Two issues with current methods for parallel summation of floating-point numbers on GPUs are (1) loss of precision due to error propagation, and (2) the bitwise-exact result is not ...Read More
We address the problem of accurate parallel floating-point summation. Two issues with current methods for parallel summation of floating-point numbers on GPUs are (1) loss of precision due to error propagation, and (2) the bitwise-exact result is not reproducible with a different architecture or configuration. We present a new efficient method for parallel accurate summation of an array of floating point numbers in CUB. The method computes a full-precision sum by recovering and keeping track of the round-off error. The method is implemented using parallel primitives such as sort and scan, and so it takes advantage of future optimizations of these primitives to new architectures. Our method can reduce the number of iterations in some iterative linear solvers, such as lattice QCD.  Back
 
Keywords:
Developer - Algorithms, Developer - Tools & Libraries, GTC 2015 - ID S5211
Streaming:
Download:
 
Asynchronous K-Means Clustering of Multiple Data Sets Using CUDA
Marek Fiser (Purdue University)
We present an efficient and asynchronous implementation of k-means clustering algorithm that is optimized for clustering of high number of small data sets. The implementation is designed to run multiple instances of k-means clustering in parallel. In ...Read More
We present an efficient and asynchronous implementation of k-means clustering algorithm that is optimized for clustering of high number of small data sets. The implementation is designed to run multiple instances of k-means clustering in parallel. In addition, our implementation hides all disk I/O or networks latency times by using the asynchronous CUDA streams, which can load new datasets or download results, while GPU is busy clustering other datasets. Our implementation was tested on artificially generated data sets as well as on Flow Cytometry data obtained from Acute Myeloid Leukemia samples. We compute 55,000 clusterings from 3,000 data sets in 3.5 minutes on an NVIDIA Tesla K40 processor.  Back
 
Keywords:
Developer - Algorithms, Developer - Performance Optimization, Big Data Analytics, GTC 2015 - ID S5234
Streaming:
Download:
 
Numerical Reproducibility Challenges on Extreme Scale Multi-Threading GPUs
Michela Taufer (University of Delaware)
Learn how to mitigate rounding errors that can hamper result reproducibility when concurrent executions burst and workflow determinism vanishes. This talk unveils the power of mathematical methods to model rounding-errors in scientific applications a ...Read More
Learn how to mitigate rounding errors that can hamper result reproducibility when concurrent executions burst and workflow determinism vanishes. This talk unveils the power of mathematical methods to model rounding-errors in scientific applications and illustrates how these methods can mitigate error drifting on new generation, many-core GPUs. We will discuss performance and accuracy issues for a diverse set of scientific applications that rely on floating point arithmetic. In particular, our experimental study will cover the following exploration space: floating point format and precision (e.g., single, double, and composite precision), numerical range used by the computation, degree of multi-threading, thread scheduling scheme, and algorithmic variant.  Back
 
Keywords:
Developer - Algorithms, Computational Physics, Supercomputing, GTC 2015 - ID S5245
Streaming:
Download:
 
Synthesizing Effective Data Compression Algorithms for GPUs
Martin Burtscher (Texas State University)
Learn how to automatically generate high-performance lossless compression algorithms that are suitable for massively parallel execution on a GPU. Our technique requires no user guidance and can even be employed to synthesize a compressor that is opti ...Read More
Learn how to automatically generate high-performance lossless compression algorithms that are suitable for massively parallel execution on a GPU. Our technique requires no user guidance and can even be employed to synthesize a compressor that is optimized for a specific file or data set. While we target single- and double-precision floating-point data, our approach is equally applicable to other domains. This talk explains how the algorithm generator works and demonstrates how it can create completely novel and GPU-friendly algorithms that achieve heretofore unreached compression ratios.  Back
 
Keywords:
Developer - Algorithms, GTC 2015 - ID S5260
Streaming:
Download:
 
DAG-Scheduled Linear Algebra Using Template-Based Building Blocks
Jonathan Hogg (Science and Technology Faciliities Council)
We describe our experiences using DAG-driven algorithms built from templated BLAS-like building blocks to implement LAPACK-like functionality at the single kernel level. There will be a particular focus on strong scaling of multiple small dense facto ...Read More
We describe our experiences using DAG-driven algorithms built from templated BLAS-like building blocks to implement LAPACK-like functionality at the single kernel level. There will be a particular focus on strong scaling of multiple small dense factorizations, as required for sparse direct methods. The main objective is to overlap expensive latency-bound pivoting operations with highly parallel matrix-matrix multiplication operations. As the later are dependent on the output of previous pivoting decisions, a directed-acyclic graph (DAG) scheduler is implemented using global memory to manage fine-grained inter-block parallelism.  Back
 
Keywords:
Developer - Algorithms, Developer - Tools & Libraries, GTC 2015 - ID S5316
Streaming:
Download:
 
Accelerating Sparse Cholesky Factorization on GPUs
Steven Rennich (NVIDIA), Darko Stosic (CIn/UFPE), Tim Davis (Texas A&M)
Sparse matrix factorization is a fundamental tool in scientific computing. As a major component in sparse direct solvers, it embodies the dominant computational cost in many scientific and engineering applications. Although previous GPU optimizations ...Read More
Sparse matrix factorization is a fundamental tool in scientific computing. As a major component in sparse direct solvers, it embodies the dominant computational cost in many scientific and engineering applications. Although previous GPU optimizations of sparse factorization present impressive results, they still remain far from attaining the GPU's theoretical flop rate due to their highly irregular nature. This talk will focus on such limitations in CHOLMOD, a high performance and well-known sparse matrix factorization, and discuss in detail the techniques used to overcome them. Limitations involve PCIe communication, kernel launch overhead, and device occupancy. Whereas core optimization techniques focus on factorizing branches of the elimination tree entirely on the GPU.  Back
 
Keywords:
Developer - Algorithms, Big Data Analytics, Supercomputing, GTC 2015 - ID S5355
Streaming:
Download:
 
Parallel Analysis of Parallelism: Verifying Concurrent Software System Designs Using GPUs
Anton Wijs (Eindhoven University of Technology)
Learn how formal verification techniques can be used to determine that designs of concurrent systems are correct, and how GPUs can be used effectively to speed up these computationally intensive operations. Concurrent systems are very hard to reason ...Read More
Learn how formal verification techniques can be used to determine that designs of concurrent systems are correct, and how GPUs can be used effectively to speed up these computationally intensive operations. Concurrent systems are very hard to reason about, and formal verification tools, among which are model checkers, can be used to do so automatically. In this session, I will explain how model checkers work, and highlight some of the underlying algorithms, namely the on-the-fly learning and exploring of design state spaces, and decomposing given state spaces into components relevant for the verification of functional properties. Then I will present how these operations can be adapted to employ the power of GPUs, and thereby speed up model checking considerably.  Back
 
Keywords:
Developer - Algorithms, GTC 2015 - ID S5401
Streaming:
Download:
 
Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement
Timothy Davis (Texas A&M University)
Sparse QR factorization is a critical kernel for many problems in computational science (statistics, 'big data', and many more). We present a multi-GPU algorithm that is 5x faster on 8 GPUs vs a single GPU, which itself is 5x-10x faster than a mult ...Read More
Sparse QR factorization is a critical kernel for many problems in computational science (statistics, 'big data', and many more). We present a multi-GPU algorithm that is 5x faster on 8 GPUs vs a single GPU, which itself is 5x-10x faster than a multicore CPU. We rely on three levels of parallelism: large fronts are factorized on many GPUs, each GPU handles large sub-forests, and within each GPU a bucket scheduler extracts parallelism with each front. The scheduler acts like the game of Mancala: each pebble is a row-tile of the matrix. A pebble sits in a bucket corresponding to its leftmost nonzero column tile. The scheduler picks up some pebbles and factorizes them: one pebble stays in place; the rest move one bucket down. The matrix is factorized when each bucket holds one pebble.  Back
 
Keywords:
Developer - Algorithms, GTC 2015 - ID S5424
Streaming:
 
SPIKE::Hybrid - A Hybrid GPU/CPU Linear System Solver for Banded and Sparse Linear Systems
Ang Li (Department of Electrical and Computer Engineering, University of Wisconsin-Madison)
We present a solver for dense banded or sparse linear systems based on the SPIKE methodology. Spike::Hybrid leverages unified memory support to solve systems of moderate size. We discuss three reordering algorithms aimed at improving robustness and e ...Read More
We present a solver for dense banded or sparse linear systems based on the SPIKE methodology. Spike::Hybrid leverages unified memory support to solve systems of moderate size. We discuss three reordering algorithms aimed at improving robustness and efficiency: (1) a diagonal boosting, which attempts to bring large entries to the diagonal; (2) an algorithm for bandwidth reduction and; (3) an algorithm for wavefront reduction. Spike::Hybrid is two to three times faster than Intel's MKL on banded problems and overperforms Pardiso in 25% of the cases for a test set of 120 sparse problems from various applications. We provide details on the CPU and GPU algorithmic steps and highlight the role that unified memory support played in the hybrid implementation. Co-authors: Dan Negrut and Radu Serba.  Back
 
Keywords:
Developer - Algorithms, Developer - Performance Optimization, Developer - Tools & Libraries, GTC 2015 - ID S5454
Streaming:
 
Energy Efficient, High-Performance Solvers through Small Dense Matrix Computations on GPUs
Azzam Haidar (UTK), Stanimire Tomov (UTK)
Here you will learn techniques for small matrix computations on GPUs and their use for energy efficient, high-performance solvers. Work on small problems delivers high performance through improved data re-use. Many numerical libraries and application ...Read More
Here you will learn techniques for small matrix computations on GPUs and their use for energy efficient, high-performance solvers. Work on small problems delivers high performance through improved data re-use. Many numerical libraries and applications need this functionality further developed. We describe the main factorizations -LU, QR, and Cholesky- for a set of small dense matrices in parallel. We achieve significant acceleration and reduced energy consumption against other solutions. Our techniques are of interest to GPU application developers in general. We will show extensions to large entirely GPU solvers, review and compare against the hybrid CPU-GPU algorithms in MAGMA, analyze the pros and cons of hybrid vs. just GPU approaches on high-end systems and low-end embedded devices.  Back
 
Keywords:
Developer - Algorithms, Developer - Tools & Libraries, Supercomputing, GTC 2015 - ID S5476
Streaming:
Download:
 
Exploiting GPU Caches in Sparse Matrix Vector Multiplication
Yusuke Nagasaka (Tokyo Institute of Technology)
We show the technique of sparse matrix vector multiplication (SpMV) fully exploiting GPU's caches. Many sparse algorithms such as conjugate gradient are occupied by SpMV computation, which includes random memory accesses to input vector. On GPU, the ...Read More
We show the technique of sparse matrix vector multiplication (SpMV) fully exploiting GPU's caches. Many sparse algorithms such as conjugate gradient are occupied by SpMV computation, which includes random memory accesses to input vector. On GPU, the problem becomes more serious because it has only small cache. Our new sparse matrix formats for many-core processors significantly increase the cache hit ratio by segmenting the matrix along the columns. Performance evaluations show that we achieve up to x3.0 speedup in SpMV and x1.12 in CG, compared to cuSparse and recently proposed formats such as SELL-C-sigma. In iterative methods, we devise an auto-tuning mechanism for the segment sizes.  Back
 
Keywords:
Developer - Algorithms, Supercomputing, GTC 2015 - ID S5518
Streaming:
Download:
 
Binary Segmentation of Many 3D Cubes in CUDA
Julien Demouth (NVIDIA)
In this session, we will present a CUDA implementation to segment many 3D cubes on GPUs. Our implementation relies on an efficient strategy to decompose the work among blocks of threads. We will also analyze the performance of our code using NVIDIA&r ...Read More
In this session, we will present a CUDA implementation to segment many 3D cubes on GPUs. Our implementation relies on an efficient strategy to decompose the work among blocks of threads. We will also analyze the performance of our code using NVIDIA® Nsight Visual Studio Edition.  Back
 
Keywords:
Developer - Algorithms, Energy Exploration, Medical Imaging, GTC 2015 - ID S5555
Streaming:
Download:
 
Fast and Power Efficient Algorithms for Matrix Multiplication
M Clark (NVIDIA), Ben Barsdell (Harvard University)
We will present the results of of an investigation to speed up and improve power efficiency of dense matrix multiplications in CUDA. These techniques give an effective compute rate greater than the peak performance of a GPU, allowing us to approach 1 ...Read More
We will present the results of of an investigation to speed up and improve power efficiency of dense matrix multiplications in CUDA. These techniques give an effective compute rate greater than the peak performance of a GPU, allowing us to approach 10 TFLOPS sustained in matrix multiplication on a single GPU. Techniques applied include exploitation of Gauss's complex multiplication algorithm and implementing a Strassen-like algorithm to reduce the computational cost from the naive O(n^3). We will discuss how the power efficiency of these dense-linear algebra computations can improved through tile size and input word size choice. Results from the Tesla K80 will show improving power efficiency is the same as improving absolute performance.  Back
 
Keywords:
Developer - Algorithms, Supercomputing, GTC 2015 - ID S5601
Streaming:
 
CUB: Using "Collective" Software Design to Improve Performance-Portability and Reduce Software Lifecycle Overhead
Duane Merrill (NVIDIA)
This tutorial introduces the CUB library of "collective" software abstractions for kernel-level programming. We present examples for how these primitives and the accompanying design methodology can be used to reduce CUDA software developmen ...Read More
This tutorial introduces the CUB library of "collective" software abstractions for kernel-level programming. We present examples for how these primitives and the accompanying design methodology can be used to reduce CUDA software development and maintenance overhead as well as to improve tuning and performance portability.  Back
 
Keywords:
Developer - Algorithms, Developer - Performance Optimization, Developer - Programming Languages, GTC 2015 - ID S5617
Streaming:
Developer - Performance Optimization
Presentation
Media
GPU Source-Code Optimizations: Increase Performance, Reduce Energy
Jared Coplin (Texas State University)
Learn how source-code optimizations can work alone and in combination to improve not only the performance but also the energy consumption and power draw of a modern compute GPU. In addition, understand how lowering the GPU frequency, enabling ECC, an ...Read More
Learn how source-code optimizations can work alone and in combination to improve not only the performance but also the energy consumption and power draw of a modern compute GPU. In addition, understand how lowering the GPU frequency, enabling ECC, and switching from single to double precision affects runtime, energy, and power.  Back
 
Keywords:
Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5110
Streaming:
 
Shared Memory vs. D-Cache: Which works better and why?
Huiyang Zhou (North Carolina State University)
This talk discusses some interesting and somewhat unexpected tradeoffs between shared memory and L1 D-caches. Through detailed case studies, we aim to provide insights to the questions: Is it worthwhile for application developers to explicitly manage ...Read More
This talk discusses some interesting and somewhat unexpected tradeoffs between shared memory and L1 D-caches. Through detailed case studies, we aim to provide insights to the questions: Is it worthwhile for application developers to explicitly manage shared memory with the existence of the hardware managed L1 D-caches in GPUs? and; What are the main reasons for code utilizing shared memory to outperform code leveraging L1 D-caches (and vice versa)?  Back
 
Keywords:
Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5113
Streaming:
 
Voting And Shuffling For Fewer Atomic Operations
Elmar Westphal (Forschungszentrum Jülich GmbH)
Even though atomic operations became much faster with the introduction of the Kepler architecture, they are still a bottleneck in many algorithms and applications. This is especially true for operations that are not natively supported on the device a ...Read More
Even though atomic operations became much faster with the introduction of the Kepler architecture, they are still a bottleneck in many algorithms and applications. This is especially true for operations that are not natively supported on the device and have to be implemented using atomicCAS loops (e.g. double precision additions), because modifying the same data by multiple threads within the same warp will, due to warp divergence, also stall the threads already done. This talk will show how to use warp votes and shuffle operations to pre-combine data within a warp by destination-address, in parallel. This can significantly reduce the total number of atomic operations in a kernel call and eliminates CAS loop iterations caused within the same warp.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Algorithms, GTC 2015 - ID S5151
Streaming:
Download:
 
Maximizing Scalability Performance in HOOMD-blue by Exploiting GPUDirect® RDMA on Green500 Supercomputer
Pak Lui (Mellanox Technologies), Gilad Shainer (Mellanox Technologies)
To demonstrate the application performance improvement using GPUDirect RDMA, we utilized a general-purpose GPU Molecular Dynamics simulation application called HOOMD-blue. The code was modified and tuned for GPUDirect RDMA and for dual GPU/InfiniBand ...Read More
To demonstrate the application performance improvement using GPUDirect RDMA, we utilized a general-purpose GPU Molecular Dynamics simulation application called HOOMD-blue. The code was modified and tuned for GPUDirect RDMA and for dual GPU/InfiniBand configuration in order to exploit higher scalability performance than ever achieved on this energy-efficient cluster before the introduction of GPUDirect RDMA. The goal is to present the improvements seen in the application performance of HOOMD-blue, as well as to show the best practices for properly configuring and running GPUDirect RDMA over both of the GPUs and the dual FDR InfiniBand hardware available on the Wilkes supercomputer.  Back
 
Keywords:
Developer - Performance Optimization, Computational Physics, Life & Material Science, Supercomputing, GTC 2015 - ID S5169
Streaming:
Download:
 
CUDA Optimization with NVIDIA Nsight Eclipse Edition: A Case Study
Christoph Angerer (NVIDIA), Julien Demouth (NVIDIA)
In this session, we will study a real CUDA application and use NVIDIA® Nsight Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those idea ...Read More
In this session, we will study a real CUDA application and use NVIDIA® Nsight Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Tools & Libraries, GTC 2015 - ID S5173
Streaming:
Download:
 
CUDA Optimization with NVIDIA Nsight Visual Studio Edition: A Case Study
Christoph Angerer (NVIDIA), Julien Demouth (NVIDIA)
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas ...Read More
In this session, we will study a real CUDA application and use Nsight(TM) Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Tools & Libraries, GTC 2015 - ID S5174
Streaming:
Download:
 
When 10,000 Threads Aren't Enough: Scaling GPU Codes to Multiple Nodes (Presented by Allinea)
Beau Paisley (Allinea Software)
Scientific computation benefits hugely from the power offered by GPUs, but maintaining high­ performance when scaling to multiple nodes is hard. In this talk we show how a single­ page HTML report can summarize an entire GPU­ enabled para ...Read More
Scientific computation benefits hugely from the power offered by GPUs, but maintaining high­ performance when scaling to multiple nodes is hard. In this talk we show how a single­ page HTML report can summarize an entire GPU­ enabled parallel run, highlighting performance bottlenecks and offering solutions. We also blur the lines between profiling and debugging in Allinea's high­ productivity development tool for science and simulation: Allinea Forge.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Tools & Libraries, Supercomputing, GTC 2015 - ID S5179
Streaming:
 
Modeling CUDA Compute Performance by Critical Path
Patric Zhao (NVIDIA)
Currently, lots of tools such as NVVP can get CUDA kernel timeline of GPU. However, they are not sufficient to do analysis for heterogeneous applications where both GPU and CPU are used for computations. In reality, dependency of CPU and GPU APIs and ...Read More
Currently, lots of tools such as NVVP can get CUDA kernel timeline of GPU. However, they are not sufficient to do analysis for heterogeneous applications where both GPU and CPU are used for computations. In reality, dependency of CPU and GPU APIs and its execution time also play a key role for the whole application's performance. Hence, we need to figure out the API's dependency and execution time to determine critical path for further performance analysis. So, in this talk, we introduce a workflow of getting necessary data by CUDA tool chains, and how to determine critical path, and how to visualize the data.  Back
 
Keywords:
Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5180
Streaming:
 
Porting Real-Time Signal Processing Pipeline CUDA Kernels from Kepler to Maxwell
Ismayil Guracar (Siemens Medical Solutions)
Responding to frequent improvements in GPU technology is a constant challenge in diagnostic medical imaging equipment with decade-long product lifetimes. Our high performance signal processing kernels, originally developed for the Fermi-based Quadro ...Read More
Responding to frequent improvements in GPU technology is a constant challenge in diagnostic medical imaging equipment with decade-long product lifetimes. Our high performance signal processing kernels, originally developed for the Fermi-based Quadro 2000 and recently ported to Kepler K2000 now must be retargeted to the new Maxwell architecture K2200 card. Our experiences in preparing for this change will be presented along with details about the observed performance differences (better with Maxwell). Experiments on instruction level parallelism and device memory bandwidth will be explained which can also be used to better understand your target GPU and prepare new and existing kernels to take advantage of the latest architectures and capabilities.  Back
 
Keywords:
Developer - Performance Optimization, Medical Imaging, GTC 2015 - ID S5223
Streaming:
Download:
 
GPU Acceleration of WSMP (Watson Sparse Matrix Package)
Natalia Gimelshein (NVIDIA), Steve Rennich (NVIDIA)
The Watson Sparse Matrix Package (WSMP) is a well-known collection of algorithms for efficiently solving sparse systems of linear equations that has long been among the best-performing sparse solver codes on the CPU. Recently, the direct sparse solve ...Read More
The Watson Sparse Matrix Package (WSMP) is a well-known collection of algorithms for efficiently solving sparse systems of linear equations that has long been among the best-performing sparse solver codes on the CPU. Recently, the direct sparse solver capabilities of WSMP have been modified to leverage GPU computing, resulting in significant performance improvements. This talk will focus on detailing the very non-invasive approach used to accelerate WSMP's direct sparse capabilities using GPUs. Performance results for the case of both single-node and distributed-memory solves will also be presented. This work was done in collaboration with Seid Koric from NCSA and Anshul Gupta from IBM.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Algorithms, Supercomputing, GTC 2015 - ID S5232
Streaming:
Download:
 
MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction
Eri Rubin (SagivTech LTD.)
In this talk, we present MAPS: a novel library that helps developers write CUDA kernels faster, easier and with better performance, without losing flexibility. This library exposes a set of data structures and iterators, similar to STL containers, el ...Read More
In this talk, we present MAPS: a novel library that helps developers write CUDA kernels faster, easier and with better performance, without losing flexibility. This library exposes a set of data structures and iterators, similar to STL containers, eliminating the need for complex index calculations that appear when implementing memory optimizations. The resulting code is shorter and simpler. Under the hood, the library implements complex platform-specific memory optimizations. Benchmarks show that the library has minimal overhead compared to implementing such optimizations manually, and sometimes even surpasses their performance. The library is header-only and open source.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Tools & Libraries, GTC 2015 - ID S5263
Streaming:
Download:
 
Avoiding Shared Memory Bank Conflicts in Rate Conversion Filtering
Mrugesh Gajjar (Siemens Corporate Technology)
Shared memory bank conflicts can be a significant performance limiter, depending on thread-dependent access patterns. We will present ideas on how to reduce shared memory bank conflicts in rate conversion filtering--a frequently used signal processin ...Read More
Shared memory bank conflicts can be a significant performance limiter, depending on thread-dependent access patterns. We will present ideas on how to reduce shared memory bank conflicts in rate conversion filtering--a frequently used signal processing function in a variety of tasks such as image resizing. We find severe performance degradation for specific downsampling factors in rate conversion due to heavy bank conflicts in shared memory. We propose a novel technique for avoiding it via the use of scrambled addressing across threads. This technique is applicable more generally across many GPU architectures. We will demonstrate effectiveness with specific examples and performance measurements on NVIDIA GPUs and leave the attendee with ideas on how to identify and mitigate bank conflicts.  Back
 
Keywords:
Developer - Performance Optimization, Medical Imaging, Video & Image Processing, GTC 2015 - ID S5282
Streaming:
Download:
 
Direct Convolution for Deep Neural Network Classification on Tegra X1
Alan Wang (NVIDIA)
We prototype a direct convolution implementation to accelerate classification with a deep neural network. We take the Overfeat network as an example, analyzing some of its properties like math/memory ratio and input/coefficient ratio. We then discuss ...Read More
We prototype a direct convolution implementation to accelerate classification with a deep neural network. We take the Overfeat network as an example, analyzing some of its properties like math/memory ratio and input/coefficient ratio. We then discuss the workload distribution of the implementation and how we partition the computation into CUDA blocks. We also dive into details about how we optimize for data reuse, including the use of 3D texture for input pixels and a coefficient layout designed for coalesced stores. Experiments with Overfeat Layer 6 on Tegra X1 show that we can achieve 75% utilization of GFLOPs currently, with room for further optimization as future work.  Back
 
Keywords:
Developer - Performance Optimization, Video & Image Processing, GTC 2015 - ID S5306
Streaming:
Download:
 
Memory Bandwidth Bootcamp: Best Practices
Tony Scudiero (NVIDIA)
A GPU's high bandwidth memory system is one of its best features for helping real workloads achieve high performance. To maximize application performance, some understanding of how best to utilize the memory system is essential. This talk covers the ...Read More
A GPU's high bandwidth memory system is one of its best features for helping real workloads achieve high performance. To maximize application performance, some understanding of how best to utilize the memory system is essential. This talk covers the basics of a CUDA programmer's view of the GPU memory hierarchy. It will cover many of the topics discussed in the CUDA best practices programming guide including measuring and understanding kernel performance, bottleneck analysis, as well as common topics such as address coalescing, shared memory, and occupancy with a focus on their impact on GPU DRAM bandwidth.  Back
 
Keywords:
Developer - Performance Optimization, GTC 2015 - ID S5353
Streaming:
 
Memory Bandwidth Bootcamp: Beyond Best Practices
Tony Scudiero (NVIDIA)
Some compute kernels simply do not have controllable access patterns, and no reasonable restructuring will change that. For kernels limited by memory bandwidth with unpredictable access patterns, low occupancy, or divergent execution, optimizations b ...Read More
Some compute kernels simply do not have controllable access patterns, and no reasonable restructuring will change that. For kernels limited by memory bandwidth with unpredictable access patterns, low occupancy, or divergent execution, optimizations beyond the basics can be employed to achieve better bandwidth utilization. This talk continues from the foundation of the CUDA best practices guide and explores methods to improve memory utilization among kernels which do not have well-behaved memory access patterns.  Back
 
Keywords:
Developer - Performance Optimization, GTC 2015 - ID S5376
Streaming:
Download:
 
Exploiting CUDA Dynamic Parallelism for Low-Power ARM-Based Prototypes
Vishal Mehta (Barcelona Supercomputing Centre)
Learn to exploit CUDA features for saving energy and thus your pockets. This session briefs about the Pedraforca prototype developed at Barcelona Supercomputing Centre under the Mont-Blanc project. The prototype is based on NVIDIA® Tegra® and ...Read More
Learn to exploit CUDA features for saving energy and thus your pockets. This session briefs about the Pedraforca prototype developed at Barcelona Supercomputing Centre under the Mont-Blanc project. The prototype is based on NVIDIA® Tegra® and NVIDIA® Tesla® platforms and aims at reducing the raw power footprint of the HPC clusters. This session describes in depth how to exploit CUDA dynamic parallelism and CUDA streams for GPU applications to be ported on low power ARM based prototypes. Also includes architectural description of the prototype, power budget comparisons, and various example codes for improving the programming skills of CUDA users.  Back
 
Keywords:
Developer - Performance Optimization, Embedded Systems, Supercomputing, GTC 2015 - ID S5384
Streaming:
Download:
 
Building High Performance Input-Adaptive GPU Applications with Nitro
Saurav Muralidharan (University of Utah)
Many irregular parallel computations such as sparse matrix-vector multiply (SpMV) and sorting have multiple different implementations (a.k.a variants) that are each suited for different classes of inputs. In this talk, attendees will learn about the ...Read More
Many irregular parallel computations such as sparse matrix-vector multiply (SpMV) and sorting have multiple different implementations (a.k.a variants) that are each suited for different classes of inputs. In this talk, attendees will learn about the Nitro automatic performance tuning framework and how it can be used to build high-performance input-adaptive GPU applications that automatically use the optimal variant for the given input data set. This is accomplished using a machine learning-based model that maps properties of the input data set to variants. We will present the Nitro C++ library and Python-based tuning interface and demonstrate their use in tuning 5 high-performance CUDA benchmarks. Finally, we will use heuristics in Nitro to reduce training time and other overheads.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Tools & Libraries, GTC 2015 - ID S5520
Streaming:
Download:
 
Featured Talk: Memory Management Tips, Tricks and Techniques
Stephen Jones (SpaceX)
GPUs can push teraflops of mathematical power, but feeding the SMs with data can often be harder than optimising your algorithm. A well-designed program must take into account both access of data from within the GPU as well as allocation and transfer ...Read More
GPUs can push teraflops of mathematical power, but feeding the SMs with data can often be harder than optimising your algorithm. A well-designed program must take into account both access of data from within the GPU as well as allocation and transfer of data between CPU and GPU. This talk will cover techniques including sub-allocation, shared memory management, and parallel memory structures such as stacks, queues and ring-buffers which can greatly improve the throughput of your algorithms. 75% of programs are limited by memory bandwidth and not compute power, so careful memory management is critical to a high-performance program.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Algorithms, Developer - Programming Languages, GTC 2015 - ID S5530
Streaming:
Download:
 
Tegra X1 Developer Tools
Sebastien Domine (NVIDIA)
The new Tegra X1 SoC will reach out a variety of Mobile and Embedded devices, running operating systems like Linux BSPs to Android, and bringing a whole new level of compute sophistication. In this session, the audience will learn about how to quickl ...Read More
The new Tegra X1 SoC will reach out a variety of Mobile and Embedded devices, running operating systems like Linux BSPs to Android, and bringing a whole new level of compute sophistication. In this session, the audience will learn about how to quickly get jump started with developing for Tegra X1 via JetPack and the Tegra Android Developer Pack(TADP), and learn about the current and future developer tools offering. We will cover the entire spectrum, from multi-core 64-bit application profiling with the latest version of the Tegra System Profiler, the latest CUDA developer tools that take advantage of the new Maxwell architecture, to the newest version of the Tegra Graphics Debugger and its new ability to generate offline capture with source code generation that allows for cross compilation of the recording on a Desktop PC.  Back
 
Keywords:
Developer - Performance Optimization, Embedded Systems, Developer - Tools & Libraries, GTC 2015 - ID S5648
Streaming:
Download:
Developer - Programming Languages
Presentation
Media
Accelerating R Applications with CUDA
Patric Zhao (NVIDIA)
R is a free software environment that provides a programming language and built-in libraries of mathematics operations for statistics, data analysis, machine learning and much more. In this talk, I will give an overview of applying GPU in R, and focu ...Read More
R is a free software environment that provides a programming language and built-in libraries of mathematics operations for statistics, data analysis, machine learning and much more. In this talk, I will give an overview of applying GPU in R, and focus on three topics. First, I will introduce accelerating R computations by CUDA libraries, including apply drop-in library (nvblas) with zero coding effort, and step-by-step guide how to call CUDA-accelerated libraries such as cuFFT. Second, I am going to show how to accelerate legacy codes by directives (OpenACC), and write up your own CUDA algorithms in R. Third, I will illustrate the way to use CUDA tool chains with R as diverse as nvprof, cuda-memcheck and cuda-debug. Finally, I will present CUDA-accelerated results of several R benchmark.  Back
 
Keywords:
Developer - Programming Languages, Big Data Analytics, OpenACC, GTC 2015 - ID S5145
Streaming:
 
Session 2 of 4: An Introduction to the GPU Memory Model (Presented by Acceleware)
Chris Mason (Acceleware Ltd.)
This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. To learn the basics of CUDA programming required for this session, attend the session entitled An Introduction t ...Read More
This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. To learn the basics of CUDA programming required for this session, attend the session entitled An Introduction to GPU Programming. This session begins with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. A programming demonstration of shared and constant memory will be delivered. Printed copies of the material will be provided to all attendees for each session collect all four!  Back
 
Keywords:
Developer - Programming Languages, Developer - Performance Optimization, GTC 2015 - ID S5662
Streaming:
 
Session 4 of 4: Essential CUDA Optimization Techniques (Presented by Acceleware)
Dan Cyca (Acceleware Ltd.)
This tutorial is for those with some background in CUDA including an understanding of the CUDA memory model and streaming multiprocessor. Our earlier tutorials will provide the background information necessary for this session. This informative tutor ...Read More
This tutorial is for those with some background in CUDA including an understanding of the CUDA memory model and streaming multiprocessor. Our earlier tutorials will provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. This session will include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. Printed copies of the material will be provided to all attendees for each session collect all four!  Back
 
Keywords:
Developer - Programming Languages, Developer - Performance Optimization, GTC 2015 - ID S5664
Streaming:
 
CUDA 7 and Beyond
Mark Harris (NVIDIA)
CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn how new support for C++11 in CUDA 7, along with new features and performance improvements in the Thrust C++ parallel algorithms library, and support for r ...Read More
CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn how new support for C++11 in CUDA 7, along with new features and performance improvements in the Thrust C++ parallel algorithms library, and support for runtime compilation, makes parallel C++ more productive than ever. CUDA 7 also includes cuSOLVER, a new direct linear solver library, as well as new features and improved performance in other CUDA libraries. In this talk you'll hear about these features and get insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.  Back
 
Keywords:
Developer - Programming Languages, GTC 2015 - ID S5820
Streaming:
Download:
Developer - Tools & Libraries
Presentation
Media
Radically Simplified GPU Parallelization: The Alea Dataflow Programming Model
Luc Bläser (HSR University of Applied Sciences Rapperswil), Daniel Egloff (InCube Group, Quantalea)
Many programmers still leave the massive GPU parallel power unused be it because of lacking experience in CUDA or because of limited time and budget. We aim to drastically simplify GPU parallelization by introducing our Alea dataflow programming mo ...Read More
Many programmers still leave the massive GPU parallel power unused be it because of lacking experience in CUDA or because of limited time and budget. We aim to drastically simplify GPU parallelization by introducing our Alea dataflow programming model based on .NET. Complex computations can be easily and rapidly composed of a set of prefabricated and customizable operations that underlie asynchronous execution. The runtime system automatically translates this abstract model to efficient GPU code and schedules the operations with minimum memory transfers. By way of illustrative application cases of finance and statistics, we explain the model, take a look at the runtime system, and demonstrate its performance that proves to be as good as in manually optimized CUDA implementations.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Programming Languages, GTC 2015 - ID S5132
Streaming:
Download:
 
Kokkos: Manycore Performance Portability for C++ HPC Applications
H. Carter Edwards (Sandia National Laboratories), Christian Trott (Sandia National Laboratories)
The Kokkos library enables development of HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust fo ...Read More
The Kokkos library enables development of HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for best performance. In contrast Kokkos integrates compile-time polymorphic data layout with parallel execution policies to portably manage memory access patterns. This year we present recently added Kokkos concepts & capabilities with an emphasis on improvements to usability, such as C++11 lambda support now available in CUDA 7.0 for portable nested parallelism. We will also present a new application of Kokkos to Sandia's multithreaded graph library (MTGL).  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Algorithms, Supercomputing, GTC 2015 - ID S5166
Streaming:
Download:
 
Data Visualization of the Graphics Pipeline: Tracking State with the StateViewer
Rama Hoetzlein (NVIDIA)
Graphics state has increased in complexity with advances in the GPU pipeline over time. Large graphics applications now have to record and track vertex buffers, frame buffers, constant buffers, textures, various shaders, and raster state. Existing so ...Read More
Graphics state has increased in complexity with advances in the GPU pipeline over time. Large graphics applications now have to record and track vertex buffers, frame buffers, constant buffers, textures, various shaders, and raster state. Existing solutions for state bucketing only observe API switches. A novel tool and technique, the StateViewer, is presented which can independently trace and visualize deep changes in state, which includes deltas in mapped buffer values. Data visualization of these values allows the user to visually identify patterns in graphics usage not previously observed. This can directly suggest focus areas in large applications that would benefit from redesign with an emphasis on next generation command-based graphics APIs.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Performance Optimization, GTC 2015 - ID S5186
Streaming:
Download:
 
Jacobi-Davidson Eigensolver in Cusolver Library
Lung-Sheng Chien (NVIDIA)
The cuSolver, a new NVIDIA library, targets on sparse linear system and sparse eigenvalue system on a single GPU. In this talk, we will present Jacobi-Davidson method based on batched sparse QR in cuSolver library. The attendee can learn how to use c ...Read More
The cuSolver, a new NVIDIA library, targets on sparse linear system and sparse eigenvalue system on a single GPU. In this talk, we will present Jacobi-Davidson method based on batched sparse QR in cuSolver library. The attendee can learn how to use cuSolver library in the scientific computation. Sparse eigenvalue system exists in many fields, including quantum chemistry, structure of photonic crystal and structural mechanics. Jacobi-Davidson is a Newton-like subspace method to solve exterior/interior eigenpairs. To improve performance of Jacobi-Davidson, pre-conditioners and subspace operations are key components. In this work, we use batched sparse QR factorization to speedup pre-conditioner on a subspace.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Algorithms, GTC 2015 - ID S5237
Streaming:
Download:
 
Efficient, Automatic Application Checkpointing as a Powerful Tool for CUDA Development
Max Grossman (Rice University)
Despite continued improvement in state-of-the-art CUDA debugging and profiling tools, optimizing performance and debugging numerical/correctness errors in complex, real-world CUDA programs can be a massive developer pain. This pain is primarily a res ...Read More
Despite continued improvement in state-of-the-art CUDA debugging and profiling tools, optimizing performance and debugging numerical/correctness errors in complex, real-world CUDA programs can be a massive developer pain. This pain is primarily a result of: 1) the scale of CUDA programs and the data they process, 2) the complexity of the applications that CUDA is applied to, and 3) the inherent reduction in inspectability that comes with separate address spaces. This work uses efficient checkpointing of program and CUDA state to allow programmers to recall arbitrary points in program execution. This allows quick iteration on isolated application segments, drastically reducing the time to debug poor performance and correctness errors.  Back
 
Keywords:
Developer - Tools & Libraries, Energy Exploration, GTC 2015 - ID S5294
Streaming:
Download:
 
Thrust++ : Portable, Abstract Library for Medical Imaging Applications
Bharatkumar Sharma (Siemens Technology and Services), Santhosh Sharma (Siemens Technology and Services)
This talk will introduce you to the extensions made in the Thrust Library to provide means and methodologies to facilitate the development of structured, ef?cient and portable parallel medical imaging applications with minimal loss in performance as ...Read More
This talk will introduce you to the extensions made in the Thrust Library to provide means and methodologies to facilitate the development of structured, ef?cient and portable parallel medical imaging applications with minimal loss in performance as compared to the hand optimized code. Thrust++ library adds 2D/3D data structures, imaging algorithms, and patterns optimized for usage in medical applications. We will demonstrate the result of our extensions to Thrust on Computed Tomography Reconstruction using cone beam reconstruction technique called Feldkamp algorithm. Our experimental results demonstrates that abstraction improves the productivity and not only ensures ease of use but also provides performance at par with native implementation  Back
 
Keywords:
Developer - Tools & Libraries, Medical Imaging, GTC 2015 - ID S5338
Streaming:
Download:
 
Three Ways to Debug Parallel CUDA Applications: Interactive, Batch, and Corefile
Chris Gottbrath (Rogue Wave Software)
There are now three ways that one can approach debugging an MPI application running on a cluster that is accelerated with NVIDIA GPUs. These are interactive hands on debugging, non-interactive batch debugging and post-mortem debugging using the GPU c ...Read More
There are now three ways that one can approach debugging an MPI application running on a cluster that is accelerated with NVIDIA GPUs. These are interactive hands on debugging, non-interactive batch debugging and post-mortem debugging using the GPU corefile functionality introduced in CUDA 7.0. Each of these techniques is a useful tool with benefits and limitations. This talk will introduce these three debugging techniques and provide some suggestions on selecting the optimal approach for a variety of debugging scenarios such as hangs, numerical errors, and crashes. Specific examples will be given using the TotalView debugger but the concepts covered may apply to other debugging tools such as GDB and the NVIDIA NSIGHT debugger.  Back
 
Keywords:
Developer - Tools & Libraries, Supercomputing, GTC 2015 - ID S5417
Streaming:
 
Adaptive OpenCL Libraries for Platform Portability
Amol Apte (EM Photonics)
Learn about considerations to account for when implementing OpenCL libraries in order to make them usable with acceptable performance across a wide range of devices. Because the many platforms supported by OpenCL function very differently, applicatio ...Read More
Learn about considerations to account for when implementing OpenCL libraries in order to make them usable with acceptable performance across a wide range of devices. Because the many platforms supported by OpenCL function very differently, applications often end up with separate code paths for different platforms in order to achieve good performance. In order to avoid this, libraries providing implementations of commonly-used algorithms can be parameterized in a way that allows execution strategies to be automatically determined based on device characteristics that can be queried from the OpenCL runtime.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Algorithms, GTC 2015 - ID S5428
Streaming:
 
The Graphics Debugger for Linux
Sebastien Domine (NVIDIA)
Come see the exciting new Graphics Debugger for Linux and rejoice! No longer will you be stuck with a poor OpenGL debugging experience on Linux. Say goodbye to littering your precious engine code with glGetError macros! Say hello to actual hardware s ...Read More
Come see the exciting new Graphics Debugger for Linux and rejoice! No longer will you be stuck with a poor OpenGL debugging experience on Linux. Say goodbye to littering your precious engine code with glGetError macros! Say hello to actual hardware supported performance counters and debugging!  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2015 - ID S5451
Streaming:
 
Boost.Compute: A C++ Library for Parallel Computing
Kyle Lutz (Google)
Learn about Boost.Compute, an open-source C++ library for parallel-computing based on OpenCL. This talk will give an overview of the library and its internals and will show how it can be used to quickly prototype and develop high-performance applicat ...Read More
Learn about Boost.Compute, an open-source C++ library for parallel-computing based on OpenCL. This talk will give an overview of the library and its internals and will show how it can be used to quickly prototype and develop high-performance applications without the usual complexity of low-level GPU computing libraries. Its STL-like API and an embedded lambda-expression framework allow developers to express complex functionality in native C++ and run it seamlessly on the GPU using the run-time kernel generation and execution infrastructure. Attendees should have a working knowledge of C++ and GPU computing concepts.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Algorithms, GTC 2015 - ID S5488
Streaming:
Download:
 
How to Quickly Improve Data Layout in Legacy Code
Robert Strzodka (Heidelberg University)
Learn how a few changes can dramatically improve data layout in your code for GPU acceleration. If you have a large C/C++ project and think about porting it to the GPU or have started using the GPU but dread having to change data layout for better pe ...Read More
Learn how a few changes can dramatically improve data layout in your code for GPU acceleration. If you have a large C/C++ project and think about porting it to the GPU or have started using the GPU but dread having to change data layout for better performance this session is for you. Natural use of classes and structs creates an array of structs (AoS) data layout. However, regular parallel access to multi-valued containers on the GPU requires a struct of arrays (SoA) layout for performance. Along with massive parallelization this is the most important code adaptation for the GPU. Unfortunately, changing an AoS code to an SoA code by hand is tedious and error-prone. The session will show how to reuse the existing AoS code and still improve data layout and speed with few code changes.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Performance Optimization, GTC 2015 - ID S5501
Streaming:
 
A GPU Run-Time for Event-Driven Task-Parallelism
Athanasios Konstantinidis (Reservoir Labs)
This talk introduces a novel GPU run-time system implemented on top of CUDA, which presents an EDT (Event-Driven Tasks) execution model to developers and compilers allowing easier and more efficient utilization of GPU resources in the presence of non ...Read More
This talk introduces a novel GPU run-time system implemented on top of CUDA, which presents an EDT (Event-Driven Tasks) execution model to developers and compilers allowing easier and more efficient utilization of GPU resources in the presence of non-embarrassingly parallel computations. The proposed model views GPU computations as directed dependence graphs that can be dynamically traversed at run-time. Our experimental evaluation showed substantial speed-ups over the state-of-the-art, thus indicating the effectiveness of our approach.  Back
 
Keywords:
Developer - Tools & Libraries, Developer - Performance Optimization, Developer - Programming Languages, GTC 2015 - ID S5506
Streaming:
Download:
 
DirectCompute for DirectX12 and Innovations from the Game Space
Chas. Boyd (Microsoft)
DirectCompute, first introduced in WIndows7 DirectX11 is currently used in many of the latest high-performance 3D games. Now, DirectX 12 adds new innovations that both improve performance-critical game scenarios and further broaden the applicability ...Read More
DirectCompute, first introduced in WIndows7 DirectX11 is currently used in many of the latest high-performance 3D games. Now, DirectX 12 adds new innovations that both improve performance-critical game scenarios and further broaden the applicability of the GPU computing. Come and learn how game developers exploit current DirectCompute and new DirectX12 compute capabilities and find out which ones can be beneficial to your use case. Techniques covered include persistent mapped memory ranges, hardware image format conversion, default asynchronous resource access, and asynchronous task dispatch. We will also present the improvements to the GPU programming language which support these advancements, and the advanced IDE tools for performance profiling and development.  Back
 
Keywords:
Developer - Tools & Libraries, Game Development, GTC 2015 - ID S5561
Streaming:
 
Introduction to GPU Computing Using the ArrayFire Acceleration Library (Presented by ArrayFire)
Umar Arshad (ArrayFire)
New to GPU computing and don't know where to start? Are you an expert in GPU computing and tired of writing your kernels from scratch? If that is the case, this tutorial is perfect for you! In today's tutorial we will introduce you to a simple and ...Read More
New to GPU computing and don't know where to start? Are you an expert in GPU computing and tired of writing your kernels from scratch? If that is the case, this tutorial is perfect for you! In today's tutorial we will introduce you to a simple and intuitive API that will make GPU computing highly productive and stress free. Last November, the ArrayFire library was open-source and made available to the general public. In today's talk we will walk you through the installation process, introduce the ArrayFire API, and show several applications that have been developed using the API.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2015 - ID S5797
Streaming:
 
Application Development for Mobile Devices: A Case Study for the Tegra K1 (Presented by ArrayFire)
Oded Green (ArrayFire)
Embedded systems are ubiquitous in today's world. The Tegra K1 is a front-runner for low-power efficient high performance embedded systems. In this tutorial we will share with you our experience in developing applications for the Tegra K1. We will s ...Read More
Embedded systems are ubiquitous in today's world. The Tegra K1 is a front-runner for low-power efficient high performance embedded systems. In this tutorial we will share with you our experience in developing applications for the Tegra K1. We will show how the ArrayFire library and its simple API can be used to deploy stable applications for the Tegra K1 in a quick manner. Specifically, we will cover the zero-copy capabilities of CUDA and show how ArrayFire leverages that capability to efficiently run a very large number of algorithms in real-time. Using ArrayFire's graphic package it is easy to create stunning visuals using Tegra's OpenGL 4.4 capability, this too will be discussed in today's tutorial.  Back
 
Keywords:
Developer - Tools & Libraries, Embedded Systems, Developer - Algorithms, GTC 2015 - ID S5804
Streaming:
 
Automated Image Captioning with ConvNets and Recurrent Nets
Andrej Karpathy (Stanford University and Google Researcher)
A Meetup hosted by the Silicon Valley HPC & GPU Supercomputing Meetup Group. GTC attendees welcome. ...Read More
A Meetup hosted by the Silicon Valley HPC & GPU Supercomputing Meetup Group. GTC attendees welcome.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2015 - ID S5926
Streaming:
Education & Training
Presentation
Media
The NVIDIA Education Outreach Program: Equipping Educators with GPU Computing Tools
Mark Ebersole (NVIDIA), Joe Bungo (NVIDIA)
NVIDIA's GPU Education Outreach Program enables classroom and lab use of NVIDIA technologies. Learn more about how NVIDIA plans to provide teaching materials, real GPU resources and software development tools for academic teaching faculty and system ...Read More
NVIDIA's GPU Education Outreach Program enables classroom and lab use of NVIDIA technologies. Learn more about how NVIDIA plans to provide teaching materials, real GPU resources and software development tools for academic teaching faculty and system administrators world-wide. We will cover options available to give students access to GPU computing platforms, as well as how educators can access these systems and content. Additionally, we will discuss upcoming education outreach programs and seek feedback on how NVIDIA can help educators more easily teach massively parallel programming to their students or user base.  Back
 
Keywords:
Education & Training, GTC 2015 - ID S5898
Streaming:
Embedded Systems
Presentation
Media
Synthetic Aperture Radar on Jetson TK1
Massimiliano Fatica (NVIDIA)
This talk will present the details of a Synthetic Aperture Radar (SAR) imaging on the smallest CUDA-capable platform available, the Jetson TK1. The full processing starting from the raw radar data has been implemented using both Octave with CUDA acce ...Read More
This talk will present the details of a Synthetic Aperture Radar (SAR) imaging on the smallest CUDA-capable platform available, the Jetson TK1. The full processing starting from the raw radar data has been implemented using both Octave with CUDA acceleration and CUDA directly. The results indicate that GPU accelerated embedded platforms have considerable potential for this type of workload and in conjunction with low power consumption, light weight and standard programming tools, could open new horizons in the embedded space.  Back
 
Keywords:
Embedded Systems, Video & Image Processing, GTC 2015 - ID S5157
Streaming:
Download:
 
Evolutionary Artificial Potential Field for Path Planning: A GPU Implementation
Ulises Orozco-Rosas (Instituto Politécnico Nacional)
Autonomous path planning plays an important role in mobile robots with many methods developed for off-line, on-line and a combination of both approaches. Important points that designers consider in the development of new methods are computational com ...Read More
Autonomous path planning plays an important role in mobile robots with many methods developed for off-line, on-line and a combination of both approaches. Important points that designers consider in the development of new methods are computational complexity that is closely related to the time-consumption to find the optimal path, reliability in real-life applications, computer resource requirements, and other factors. In this session a GPU implementation of the Evolutionary Artificial Potential Field (EAPF) is presented as an innovative method for path planning in mobile robot navigation. The results demonstrate that the parallel Evolutionary Artificial Potential Field overcomes the sequential implementation and the original Artificial Potential Field proposal.  Back
 
Keywords:
Embedded Systems, Developer - Performance Optimization, Automotive, GTC 2015 - ID S5272
Streaming:
Download:
 
Mobile 3D Mapping With Tegra K1
Karol Majek (Institute of Mathematical Machines)
This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work ...Read More
This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work shows how to replace traditional CUDA-enabled laptops with embedded Tegra K1. Attendees will learn about the problems and challenges of embedding parallel 3D mapping algorithm and how to improve its speed.  Back
 
Keywords:
Embedded Systems, Computer Vision & Machine Vision, GTC 2015 - ID S5383
Streaming:
Download:
 
Early Evaluation of the Jetson TK1 Development Board for Power and Performance
Jee Choi (Georgia Tech)
In this session, we will describe our experience in evaluating the Jetson TK1 development for performance, energy and power. We first describe the benchmarks used in our evaluation and present the performance and power results for various throughputs ...Read More
In this session, we will describe our experience in evaluating the Jetson TK1 development for performance, energy and power. We first describe the benchmarks used in our evaluation and present the performance and power results for various throughputs, including single- and double- precision compute, memory bandwidth, and more. We will also present our model for predicting the energy costs of various operations under different frequency and voltage settings, and show how different settings map to different arithmetic intensity regimes in terms of performance and energy efficiencies. Finally, we present preliminary results in using the Jetson TK1 for computing the fast multipole method (FMM) kernel and compare its performance and energy efficiency against that of high-end Tesla GPUs.  Back
 
Keywords:
Embedded Systems, GTC 2015 - ID S5407
Streaming:
Download:
 
Deploying Low-Power Embedded Devices with Tegra K1 (Presented by GE)
Dustin Franklin (GE Intelligent Platforms)
Tegra's low power and computational efficiency are driving the development of new and exciting embedded devices. Explore CUDA-accelerated applications in sensor processing, security & surveillance, robotics, networking, medical imaging, industri ...Read More
Tegra's low power and computational efficiency are driving the development of new and exciting embedded devices. Explore CUDA-accelerated applications in sensor processing, security & surveillance, robotics, networking, medical imaging, industrial machine vision, energy & agriculture, that tap TK1 to provide next-generation features and capabilities to the user, all while consuming minimal power. Miniaturized Tegra modules can be quickly integrated into end-user products with a variety of packaging options available. Leverage TK1's friendly software ecosystem and code compatibility with NVIDIA's discrete GPUs to architect scalable embedded systems with reduced risk and shortened development cycles.  Back
 
Keywords:
Embedded Systems, Video & Image Processing, GTC 2015 - ID S5436
Streaming:
Download:
 
TK1-Based Solutions for Intelligent Video Analytic Applications
Hai Tao (Beijing Vion Technology Inc. (BVT))
This talk demonstrates how GPU-based embedded computer vision system are transforming the world of video processing in several vertical markets including ATM safety, intelligent transportation systems (ITS), business intelligence (BI), and smart vide ...Read More
This talk demonstrates how GPU-based embedded computer vision system are transforming the world of video processing in several vertical markets including ATM safety, intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. By taking full advantage of the TK1's 300+ GFLOPS computing power, BVT has built and deployed embedded systems for people counting, shopping traffic gender and age analysis, perimeter monitoring, violence and chasing detection, and ATM service area protection. These application systems require development of custom-made computer vision algorithms and efficient implementation of these algorithms in GPU. In addition, we will also demonstrate how the world's first TK1-based smart cameras are being developed for various applications including license plate recognition, face recognition and crowd management. Compared to the previous DSP-based smart camera solution, the powerful embedded GPU-based solution is the first that can support imaging sensor resolution up to 12 mega-pixels. The talk will also provide technical details on Cuda implementation of several computer vision algorithms.  Back
 
Keywords:
Embedded Systems, Computer Vision & Machine Vision, GTC 2015 - ID S5811
Streaming:
 
A Performance, Energy and Accuracy Aware Benchmarking Methodology for Robot Vision
Luigi Nardi (Imperial College London)
We introduce SLAMBench, a publicly-available software framework for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption for real-time 3D scene understanding. 3D scen ...Read More
We introduce SLAMBench, a publicly-available software framework for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption for real-time 3D scene understanding. 3D scene understanding offers great potential for a new level of scene modelling, localisation and real environmental interaction for many types of robot, but its high computational requirements means that use on mass market embedded platforms is challenging. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and a powerful mechanism for reliable accuracy comparison of different implementation and algorithms. We experimentally investigate SLAMBench execution time, energy and accuracy on a variety of multicore and GPU-accelerated platforms.  Back
 
Keywords:
Embedded Systems, Developer - Performance Optimization, Computer Vision & Machine Vision, GTC 2015 - ID S5896
Streaming:
Download:
 
Fly Me to the Moon: The Role of GPUs in Lunar Exploration (Presented by GE)
Kevin Peterson (Astrobotics)
Since the beginning of the space age, access to the Moon has been limited to a select few. Only three governments have landed robotic spacecraft on the lunar surface the United States, the former Soviet Union, and China. The cost and complexity of ...Read More
Since the beginning of the space age, access to the Moon has been limited to a select few. Only three governments have landed robotic spacecraft on the lunar surface the United States, the former Soviet Union, and China. The cost and complexity of missions to the Moon have restricted this activity to large national governments that invest hundreds of millions of dollars per mission. This paradigm is shifting as commercial lunar delivery services pave the way for low cost access beyond low earth orbit. GPUs are powering the design, analysis, and flight of these services. This talk presents the use of GPUs in design and flight of Astrobtic's Griffin lander.  Back
 
Keywords:
Embedded Systems, GTC 2015 - ID S5914
Streaming:
Energy Exploration
Presentation
Media
Accelerating Curvature Estimate in 3D Seismic Data Using GPGPU
Joner Duarte (Tecgraf)
In this session, a highly parallelized curvature estimate method will be presented and optimized for GPU using stencil computation. In the oil and gas industry, seismic interpretation is a vital step to cut production costs by helping geophysicists t ...Read More
In this session, a highly parallelized curvature estimate method will be presented and optimized for GPU using stencil computation. In the oil and gas industry, seismic interpretation is a vital step to cut production costs by helping geophysicists to choose proper well drilling locations. Volumetric curvature attributes are widely used to visualize folds, faults and other key structures that define a possible reservoir. We show an implementation that maximizes memory access, loading necessary data to GPU shared memory using a circular buffer. A visualization at interactive time is provided to fine adjust the parameters of calculation before processing the whole 3D volume.  Back
 
Keywords:
Energy Exploration, Developer - Performance Optimization, GTC 2015 - ID S5240
Streaming:
Download:
 
Introducing GPUs to a Commercial Reservoir Simulator
Dominic Walsh (Schlumberger)
Reservoir simulators are used to model the flow of oil and gas from the rock substrate, through intelligent down hole devices, into wells all the way to surface facilities. They live in a domain that couples high levels of uncertainty with high econo ...Read More
Reservoir simulators are used to model the flow of oil and gas from the rock substrate, through intelligent down hole devices, into wells all the way to surface facilities. They live in a domain that couples high levels of uncertainty with high economic impact. Currently reservoir simulation is compute bound and GPUs have the potential to greatly improve this situation. This talk will detail the introduction of a GPU based linear solver to the INTERSECT simulator. How the level of parallelism impacted the linear solver numerical algorithms but how the off-load nature introduced asynchronous patterns into the host application. Results will be presented demonstrating the improved performance not only to large cluster based but also to under-the-desk scenarios.  Back
 
Keywords:
Energy Exploration, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5298
Streaming:
Download:
 
How Schlumberger Leveraged NVIDIA GPUs Using Open Inventor Toolkit
Michael Heck (FEI - Visualization Sciences Group), Oyvind Yrke (Schlumberger)
This session will explain how Schlumberger's Petrel leveraged NVIDIA GPUs through usage of Open Inventor toolkit. Seismic interpretation, including volume rendering in geobody recognition and height field rendering for huge horizons will be presente ...Read More
This session will explain how Schlumberger's Petrel leveraged NVIDIA GPUs through usage of Open Inventor toolkit. Seismic interpretation, including volume rendering in geobody recognition and height field rendering for huge horizons will be presented.  Back
 
Keywords:
Energy Exploration, Developer - Tools & Libraries, GTC 2015 - ID S5329
Streaming:
Download:
 
Justifying Reverse Time Migration Order of Accuracy on NVIDIA GPUs
Marcel Nauta (Acceleware)
Theoretical Full Wave Modelling improvements present compromises between various hardware metrics such as computational intensity, GPU/CPU memory usage, and hard disk requirements. Varying the spatial order of accuracy changes a kernel from memory bo ...Read More
Theoretical Full Wave Modelling improvements present compromises between various hardware metrics such as computational intensity, GPU/CPU memory usage, and hard disk requirements. Varying the spatial order of accuracy changes a kernel from memory bound to compute bound and strongly affects register usage. This non-linear relationship between compute cost and order of accuracy determines the optimal configuration on a given hardware architecture. The first part of the presentation will focus on optimizing GPU kernels with varying spatial and temporal orders of accuracy. The second part will discuss benchmarks that show the optimal throughput of RTM jobs. Both isotropic and TTI kernels will be considered to illustrate flavours of the wave equation with differing computational intensity.  Back
 
Keywords:
Energy Exploration, Computational Physics, GTC 2015 - ID S5350
Streaming:
Download:
 
Demonstrating Innovative Reservoir Modeling Workflows Enabled by a GPU-Accelerated Implicit Simulator
Dave Dembeck (Stone Ridge Technology)
Learn how the speed and compute density of GPUs are transforming engineering workflows. We have built a fully-accelerated reservoir simulator that reduces run times from hours to minutes. This increased speed has shifted emphasis from long single run ...Read More
Learn how the speed and compute density of GPUs are transforming engineering workflows. We have built a fully-accelerated reservoir simulator that reduces run times from hours to minutes. This increased speed has shifted emphasis from long single runs to much faster workflows where ensembles of hundreds of simulations are available for evaluation by engineers. We present real-field results generated by our simulator which are up to 50x faster than current commercial offerings. We discuss our workflow acceleration tools which display the ensemble results while maintaining model context.  Back
 
Keywords:
Energy Exploration, Visualization - In-Situ & Scientific, Computational Physics, Supercomputing, GTC 2015 - ID S5392
Streaming:
Download:
 
Speeding up a Finite Element Computation on GPU
Nelson Inoue (Pontifical Catholic University of Rio de Janeiro PUC-Rio)
Nowadays, oil and gas industry has sought to apply numerical analysis in some problems, such as wellbore stability and reservoir simulation. The analytical solutions may be only applied to cases that have simple geometry, homogeneous material, simple ...Read More
Nowadays, oil and gas industry has sought to apply numerical analysis in some problems, such as wellbore stability and reservoir simulation. The analytical solutions may be only applied to cases that have simple geometry, homogeneous material, simple loading and boundary conditions. However, most of the real engineering problems can only be solved through numerical analysis, and in stress analysis, the Finite Element Method (FEM) is largely employed between different numerical methods. The stress analysis programs based on FEM, by its nature, spend much time in petroleum problems, which have huge dimension (petroleum reservoir) and are solver over the time. In this talk, we are going to present our research, development and implementation of an in-house finite element code in which the fou  Back
 
Keywords:
Energy Exploration, Developer - Performance Optimization, Developer - Algorithms, GTC 2015 - ID S5403
Streaming:
Download:
 
GPU Acceleration of Acquisition Footprint Removal in Post-Stack Seismic Data
Jonathan Marbach (CGG GeoSoftware)
Learn how GPU-accelerated acquisition footprint removal improves seismic interpretation results and workflow throughput. Even in modern seismic surveys, acquisition footprint can persist in post-stack 3D surveys, causing artifacts in downstream inter ...Read More
Learn how GPU-accelerated acquisition footprint removal improves seismic interpretation results and workflow throughput. Even in modern seismic surveys, acquisition footprint can persist in post-stack 3D surveys, causing artifacts in downstream interpretation workflows. CGG's Insight Earth can now perform structure-oriented de-striping, including removing oblique footprint, in record-time via GPU acceleration. In this talk, the presenters will not only demonstrate the benefits of these advances to interpreters, but will discuss how their perspective on GPU acceleration has changed after several years of inclusion in their commercial interpretation system.  Back
 
Keywords:
Energy Exploration, Developer - Performance Optimization, GTC 2015 - ID S5437
Streaming:
Download:
 
CUDA-Based Implementation of GSLIB: The Geostatistical Software Library
Daniel Baeza (Advanced Laboratory for Geostatistical and Supercomputing (ALGES))
GSLIB is a well-known toolbox for engineers and geologists for estimation, simulation and data exploration of mineral resources. It was developed thirty years ago and has been used in the mining industry until now without significant changes in its c ...Read More
GSLIB is a well-known toolbox for engineers and geologists for estimation, simulation and data exploration of mineral resources. It was developed thirty years ago and has been used in the mining industry until now without significant changes in its code. Many applications have been developed using this library, becoming essential for many practitioners. Recent efforts in multi-core implementations have been proposed, adding OpenMP pragmas in the legacy code, obtaining reasonable but bounded speedup measurements. This presentation shows our current efforts in order to apply CUDA into legacy GSLIB code. We will show two new CUDA implementations of GSLIB methods: variogram calculation and sequential indicator simulation.  Back
 
Keywords:
Energy Exploration, GTC 2015 - ID S5456
Streaming:
Download:
 
GPU Acceleration of Q Compensation Migration
Fei Han (China University of Petroleum, Beijing)
Learn how to use a single GPU as batch processor to divide Q compensation migration into thousands of independent systems having a complex dynamics but relatively limited computing requirements. Then, get a new Seismic imaging algorithm on GPU, and a ...Read More
Learn how to use a single GPU as batch processor to divide Q compensation migration into thousands of independent systems having a complex dynamics but relatively limited computing requirements. Then, get a new Seismic imaging algorithm on GPU, and all computational sections of the application are executed on four GPUs in a node using CUDA and MPI. The application we implement Q compensation migration but the idea can be applied to other pre-stack time migrations. The speedup of acceleration on GPU is very high. This work was done in collaboration with Sam Zandong Sun, Professor and Doctoral Supervisor at China University of Petroleum, Beijing, and Bin Zhou, Adjunct Research Professor at University of Science and Technology of China.  Back
 
Keywords:
Energy Exploration, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5557
Streaming:
 
Game Engine Technology to Build Advanced Scientific Software Applications
Michele Isernia (HUE AS)
HueSpace is the only game engine for scientific computing, combining lightning fast computation on GPUs and CPUs with state of the art domain-oriented, multi-dimensional visualization and intelligent data handling of E&P data in an all-in-one, ea ...Read More
HueSpace is the only game engine for scientific computing, combining lightning fast computation on GPUs and CPUs with state of the art domain-oriented, multi-dimensional visualization and intelligent data handling of E&P data in an all-in-one, easy-to-use toolkit. Applications that rely on HueSpace benefit from unparalleled interactivity & scalability. The HueSpace Core Engine defines a scalable Object Model, which it then uses to implement an event-driven, multi-threaded dataflow architecture. It does this by efficiently managing the interaction between the Data, Compute, and Visualization systems to deliver exceptional application and system performance, with little to no limitations as far as data size. HueSpace truly realizes NVIDIA's Visual Computing for Science.  Back
 
Keywords:
Energy Exploration, Developer - Tools & Libraries, Manufacturing, GTC 2015 - ID S5623
Streaming:
Download:
 
GPU-Powered Simulations of Seismic Waves in Nonlinear Media
Daniel Roten (San Diego Supercomputer Center (SDSC))
3D wave propagation simulations help to predict source directivity and focusing effects that often lead to strong shaking and structure damage during large earthquakes. With the introduction of our highly optimized GPU finite difference code, the fre ...Read More
3D wave propagation simulations help to predict source directivity and focusing effects that often lead to strong shaking and structure damage during large earthquakes. With the introduction of our highly optimized GPU finite difference code, the frequency band of such simulations was decisively broadened to overlap with the spectrum relevant for common buildings. Because it is generally accepted that the stress-strain relationship may become nonlinear at higher frequencies, our latest GPU code implements Drucker-Prager plasticity without compromising scalability. Simulations of M7.8 earthquakes on the southern San Andreas fault show that plastic yielding in the fault damage zone and in shallow sedimentary deposits could drastically alter the level of shaking in the Los Angeles basin. This work was done in collaboration with Yifeng Cui, Director at High Performance Geocomputing Laboratory.  Back
 
Keywords:
Energy Exploration, Supercomputing, GTC 2015 - ID S5731
Streaming:
Finance
Presentation
Media
Designing a GPU-Based Counterparty Credit Risk System
Patrik Tennberg (TriOptima)
Counter Party Credit risk calculation such as CVA, DVA and FVA is complex and time consuming. Using GPUs can drastically cut execution time at the cost of increased complexity. In this speech I will discuss our counter party credit risk engine and ho ...Read More
Counter Party Credit risk calculation such as CVA, DVA and FVA is complex and time consuming. Using GPUs can drastically cut execution time at the cost of increased complexity. In this speech I will discuss our counter party credit risk engine and how we where able to drastically cut development time and creating an environment where quants can be productive without detailed knowledge about GPUs, multithreading and memory consumption. I will also discuss multi-GPU programming and how you can seamlessly provide a design where one physical GPU (e.g. K40) can be divided into several logical without impacting the programming model. The advantage of this approach is that you can utilize the GPU better without adding complexity. I will also discuss memory management and CPU/GPU multithreading.  Back
 
Keywords:
Finance, Big Data Analytics, GTC 2015 - ID S5125
Streaming:
Download:
 
GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
Daniel Egloff (InCube Group and QuantAlea)
In algorithmic trading large amounts of time series data are analyzed to derive buy and sell orders so that the strategy is profitable but also risk measures are at an acceptable level. Bootstrapping walk forward optimization is becoming increasingly ...Read More
In algorithmic trading large amounts of time series data are analyzed to derive buy and sell orders so that the strategy is profitable but also risk measures are at an acceptable level. Bootstrapping walk forward optimization is becoming increasingly popular to avoid curve fitting and data snooping. It is computationally extremely expensive but can be very well distributed to a GPU cluster. We present a framework for bootstrapping walk forward optimization of trading strategies on GPU clusters, which allows us to analyze strategies in minutes instead of days. Moreover, we show how signal generation can be combined with machine learning to make the strategies more adaptive to further improve the robustness and profitability.  Back
 
Keywords:
Finance, Machine Learning & Deep Learning, GTC 2015 - ID S5126
Streaming:
Download:
 
Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGAs
Rajesh Bordawekar (IBM T. J. Watson Research Center)
We experimentally implement key financial risk modeling algorithms (e.g., Monte Carlo Pricing) on nvidia TK1 and compare its performance against a FPGA implementation. We compute both the FLOPS/dollar and FLOPS/watt, and describe pro and cons of usin ...Read More
We experimentally implement key financial risk modeling algorithms (e.g., Monte Carlo Pricing) on nvidia TK1 and compare its performance against a FPGA implementation. We compute both the FLOPS/dollar and FLOPS/watt, and describe pro and cons of using two different architectures for implementing financial risk modeling algorithms.  Back
 
Keywords:
Finance, Embedded Systems, Developer - Algorithms, GTC 2015 - ID S5227
Streaming:
Download:
 
Groovy and GPU: Enhancing Pricing Performance and Quant Productivity
Felix Grevy (Misys), Bram Leenhouwers (Misys)
Discover how Misys quants use a groovy DSL to write efficient GPU enabled pricing models without any OpenCL or CUDA knowledge. Allowing progressive migration from legacy code to GPU enabled models, this framework leverages GPGPU strengths to achieve ...Read More
Discover how Misys quants use a groovy DSL to write efficient GPU enabled pricing models without any OpenCL or CUDA knowledge. Allowing progressive migration from legacy code to GPU enabled models, this framework leverages GPGPU strengths to achieve high performance pricing with a really short learning curve. This session consists in a global overview of the framework, along with some simple pricing examples demonstrating the strengths and ease of use of this approach. We will also discuss how technical concerns are separated from financial modeling to maximize quants efficiency while leaving room for continuous platform improvement on the development side.  Back
 
Keywords:
Finance, GTC 2015 - ID S5249
Streaming:
Download:
 
Optimizing High-Dimensional Dynamic Stochastic Economic Models for MPI+GPU Clusters
Simon Scheidegger (University of Zurich)
In our talk we will present programming and optimization techniques for exposing the potential of CSCS's Cray XC30 "Piz Daint" cluster for economic modelling. Macroeconomic phenomena are often modeled as constrained optimization problems. ...Read More
In our talk we will present programming and optimization techniques for exposing the potential of CSCS's Cray XC30 "Piz Daint" cluster for economic modelling. Macroeconomic phenomena are often modeled as constrained optimization problems. Targeting limited level of detail, it is often possible to find local solutions to such problems, useful only for examining the macro-economy dynamics around a steady state. Solving globally a model with high heterogeneity (different types of consumers, sectors, or countries) yields into dramatic increase of computational and storage costs. In our solver we combine adaptive sparse grids with MPI+GPU implementation, which allows to compute global solutions for e.g. international real business cycle models with unprecedentedly high heterogeneity.  Back
 
Keywords:
Finance, Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5259
Streaming:
Download:
 
Real-Time Heston Stochastic Volatility Tracking via GPUs for Streaming Transactions Data
Yong Zeng (University of Missouri at Kansas City)
Volatility is influential in investment, risk management and security valuation, and is regarded as one of the most important financial market indicators. For a model well-fitting the stylized facts of transactions data, this session demonstrates how ...Read More
Volatility is influential in investment, risk management and security valuation, and is regarded as one of the most important financial market indicators. For a model well-fitting the stylized facts of transactions data, this session demonstrates how online tracking of Heston stochastic volatility is made possible by GPU computing. The evolving distribution of the volatility and others as new trade occurs is governed by a stochastic partial differential equation (SPDE). Numerically solving such a SPDE as new data flowing in provides the tracking of volatility. The algorithm can be parallelized and each group of threads solves a PDE using red-black Gaussian-Seidel algorithm. The workload sharing among GPUs is embarrassingly parallel and the code scales linearly with the number of GPUs.  Back
 
Keywords:
Finance, Big Data Analytics, Supercomputing, GTC 2015 - ID S5273
Streaming:
 
A Fast, Portable and Robust Calibration Approach for Stochastic Volatility Models
Matthew Dixon (University of San Francisco)
The ability to rapidly recalibrate financial derivative models such as stochastic volatility models reduces model risk due to the reliance on stale option chain quotes. This talk will address the following objectives: (1) Gain insight into the challe ...Read More
The ability to rapidly recalibrate financial derivative models such as stochastic volatility models reduces model risk due to the reliance on stale option chain quotes. This talk will address the following objectives: (1) Gain insight into the challenges of robustly recalibrating stochastic volatility (SV) models and how frequent recalibration reduces error in pricing; (2) Learn about the challenges of deploying the same modeling codebase on GPUs and multi-core CPUs and, (3) Understand how the Xcelerit platform can be used to efficiently deploy C++ written SV models on GPUs and multi-core CPUs.  Back
 
Keywords:
Finance, GTC 2015 - ID S5334
Streaming:
 
Potential Future Exposure and Collateral Modelling of the Trading Book Using NVIDIA GPUs
Grigorios Papamanousakis (Aberdeen Asset Management), Jinzhe Yang (Aberdeen Asset Management), Grzegorz Kozikowski (University of Manchester)
We consider the problem of calculating the collateral exposure of a large derivative book (interest rate swaps, swaptions, inflation swaps, equity options, CDS and cross currency swaps) of a global asset manager. In our presentation we explain how we ...Read More
We consider the problem of calculating the collateral exposure of a large derivative book (interest rate swaps, swaptions, inflation swaps, equity options, CDS and cross currency swaps) of a global asset manager. In our presentation we explain how we construct a multi-period, multi-curve, stochastic basis spread model for the calculation of the Potential Future Exposure and the future collateral requirements within an NVIDIA GPU framework. The complexity that arises through the 1mln scenarios x 100k deals x 100 time steps x 10+ curves is an ideal acceleration case for NVIDIA Tesla GPUs. We present the GPU architecture within our framework and the acceleration results.  Back
 
Keywords:
Finance, Big Data Analytics, GTC 2015 - ID S5360
Streaming:
Download:
 
Big Data in Real Time: An Approach to Predictive Analytics for Alpha Generation and Risk Management in the Financial Markets
Yigal Jhirad (Cohen & Steers), Blay Tarnoff (Cohen & Steers)
Our presentation this year will provide an update on the signal processing aspect of the presentation we gave last year - An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management. We will demonstrate the use ...Read More
Our presentation this year will provide an update on the signal processing aspect of the presentation we gave last year - An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management. We will demonstrate the use of signal processing on financial time-series data to inform us of market patterns and signals that may be evolving in real time. We will implement a signal filtering algorithm on a real time basis on securities price time-series data and will develop a cluster chart organizing these patterns visually.  Back
 
Keywords:
Finance, Big Data Analytics, Supercomputing, GTC 2015 - ID S5498
Streaming:
 
Optimizing Performance of Financial Risk Calulations
Amit Kalele (Tata Consultancy Services Limited), Pradeep Gupta (NVIDIA), Mahesh Barve (Tata Consultancy Services, India)
Risk management is a classical problem in ?nance. Value at Risk (VaRs) and Incremental Risk Charge (IRC), are used as important measures to quantify market and credit risk. The large number of instruments or assets and their frequent revaluations mak ...Read More
Risk management is a classical problem in ?nance. Value at Risk (VaRs) and Incremental Risk Charge (IRC), are used as important measures to quantify market and credit risk. The large number of instruments or assets and their frequent revaluations makes them a signi?cant computational task. These computations are repeated many times in the tasks like back testing, deal synthesis and batch jobs, which runs over night or for days, a signi?cant reduction in turn around time can be achieved. The current state-of-the-art platforms like, K40 GPU, not only enables fast computations but also reduces the computational cost in terms of energy requirement. In this talk we present the performance tuning the VaR estimation problems, option pricing and IRC calculation on latest NVIDIA platforms.  Back
 
Keywords:
Finance, Developer - Performance Optimization, GTC 2015 - ID S5522
Streaming:
Download:
 
Retail bank: 400 times faster
Jun Xie (Lactec)
We present a database query engine that speeds up the process of querying a database table using GPU. The process has been applied to a CRM project of large bank where we observed 400 times faster than querying with DB2. ...Read More
We present a database query engine that speeds up the process of querying a database table using GPU. The process has been applied to a CRM project of large bank where we observed 400 times faster than querying with DB2.  Back
 
Keywords:
Finance, Big Data Analytics, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5547
Streaming:
Download:
 
Accelerating Derivatives Contracts Pricing Computation with GPGPUs
Daniel Augusto Magalhães Borges da Silva (BMFBOVESPA), Alexandre Barbosa (BMFBOVESPA)
Explore new techniques used by BMFBOVESPA, the Brazilian Stock Exchange, in the implementation of a new Close-out Risk Evaluation system (CORE) that saved $5 billion dollars in collaterals. CORE uses a set of GPGPUs to produce future price estimates ...Read More
Explore new techniques used by BMFBOVESPA, the Brazilian Stock Exchange, in the implementation of a new Close-out Risk Evaluation system (CORE) that saved $5 billion dollars in collaterals. CORE uses a set of GPGPUs to produce future price estimates on time. Session topics are: High-Level overview of BMFBOVESPA Clearing House; the CORE risk system; coding guidelines and the interface between CPU and GPU, as calculation routines needed to be the same for both environments; the use of GPUs to calculate 1.32 billion prices and its importance in a crisis event; performance analysis confronting CPU and GPU timings, showing that CPU is not powerful enough; the multi GPU and three tier production environment; daily usage and market results. No prior knowledge is required to attend this session.  Back
 
Keywords:
Finance, GTC 2015 - ID S5570
Streaming:
Download:
 
A True Story: GPU in Production for Intraday Risk Calculations
Regis Fricker (Societe Generale)
Explore a real-life case of GPU implementation in the context of trading and risk-management of numerically highly demanding financial computations. In this talk, we will present how, at Société Générale, we overcame the p ...Read More
Explore a real-life case of GPU implementation in the context of trading and risk-management of numerically highly demanding financial computations. In this talk, we will present how, at Société Générale, we overcame the practical difficulties and the technical puzzles to put GPU into a concrete production environment. We will first show how GPU have changed the life of the trading desks by speeding up their pricing capabilities and delivering faster risk-analyses. We will then examine specific questions such as: How to use NVIDIA GPUs in a managed library (.NET) ? How to use this technology in the specific context of finance distributed calculation? Insights will be provided on the problems we encountered at each step and on the innovative solutions we have implemented to address them.  Back
 
Keywords:
Finance, GTC 2015 - ID S5666
Streaming:
Download:
Game Development
Presentation
Media
Far Cry 4 and Assassin's Creed Unity: Spicing Up PC Graphics with NVIDIA GameWorks
Andrei Tatarinov (NVIDIA)
In this talk we will unveil the details of the development of PC versions of Far Cry 4 and Assassin's Creed Unity, focusing on advanced graphics effects enabled on PC platform thanks to GameWorks. The talk will describe technical challenges that eng ...Read More
In this talk we will unveil the details of the development of PC versions of Far Cry 4 and Assassin's Creed Unity, focusing on advanced graphics effects enabled on PC platform thanks to GameWorks. The talk will describe technical challenges that engineers and artists from NVIDIA and Ubisoft faced and solved together to make these two already great-looking games look even more stunning on PC.  Back
 
Keywords:
Game Development, Developer - Algorithms, Real-Time Graphics, GTC 2015 - ID S5671
Streaming:
Download:
General Interest
Presentation
Media
CUDA Center of Excellence Achievement Session & Awards
If you are an academic researcher you won't want to miss this session! In this session, we highlight and reward excellent research taking place at institutions at the forefront of GPU computing teaching and research - NVIDIA Centers of Excellence (C ...Read More
If you are an academic researcher you won't want to miss this session! In this session, we highlight and reward excellent research taking place at institutions at the forefront of GPU computing teaching and research - NVIDIA Centers of Excellence (COE). We asked each of our COEs to submit a proposal with their best achievement over the past year. An NVIDIA panel of GPU computing luminaries, selected four exemplars from our twenty-two COEs to represent the amazing GPU computing research being done. Each of the finalists will each give a 15 minute presentation. After the presentation we will award a NVIDIA Achievement Award to one of the four COEs. The COE finalists are: 1. Harvard University, Extended excitonic systems, vibrational-excitonic effects & GPUs 2. Technische Universität Dresden, The OpenACC Profiling Interface 3. Tokyo Tech, Big Data Processing on GPU-based Supercomputers 4. Universidade Federal Fluminense, Education, Outreach & GRID  Back
 
Keywords:
General Interest, GTC 2015 - ID S5115
Streaming:
 
NVIDIA Graduate Fellow Fast Forward Talks
We invite you a special presentation from our 2014-2015 Graduate Fellowship recipients to learn "what's next" out of the world of research and academia. The NVIDIA Graduate Fellowship recipients were selected from 200 applications in 27 co ...Read More
We invite you a special presentation from our 2014-2015 Graduate Fellowship recipients to learn "what's next" out of the world of research and academia. The NVIDIA Graduate Fellowship recipients were selected from 200 applications in 27 countries. Sponsored projects involve a variety of technical challenges, including machine learning algorithms, computational photography, energy-efficient SRAMs, image processing languages, and much more. We believe that these minds lead the future in our industry and we are proud to support the 2014-2015 NVIDIA Graduate Fellows. For more information on the 2014-2015 NVIDIA Graduate Fellows, please visit www.nvidia.com/fellowship  Back
 
Keywords:
General Interest, GTC 2015 - ID S5564
Streaming:
Graphics Virtualization
Presentation
Media
So You Want to Deploy High Resolution Graphics Desktop Virtualization
Chip Charnley (Ford Motor Company)
A review of the process and results of a High Resolution Graphics Proof of Concept for implementing XenApp and XenDesktop conducted jointly by Ford Motor Company, Cisco and Citrix. ...Read More
A review of the process and results of a High Resolution Graphics Proof of Concept for implementing XenApp and XenDesktop conducted jointly by Ford Motor Company, Cisco and Citrix.  Back
 
Keywords:
Graphics Virtualization, Manufacturing, GTC 2015 - ID S5206
Streaming:
Download:
 
University's Desktop Virtualization Delivers Graphics-Intense Apps on Any Device
George Thornton (Logical Front), Jim Galib (Roger Williams University), Ryan Tiebout (Rogers Williams University)
The rapid evolution of technology is changing the way we learn, work and educate. Attend this session to hear from Roger Williams University and learn how they overcame their challenges with a solution from Logical Front, NVIDIA, Citrix, and Dell. Sp ...Read More
The rapid evolution of technology is changing the way we learn, work and educate. Attend this session to hear from Roger Williams University and learn how they overcame their challenges with a solution from Logical Front, NVIDIA, Citrix, and Dell. Specifically hear how they provide students remote access to their graphics-intensive apps like AutoCAD, Revit, and Adobe Creative Suite 6, improve 3D rendering and user experience, even during peak traffic times and, allow students the flexibility to work from anywhere, on any device.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5225
Streaming:
Download:
 
Customer Success Story: Desktop Virtualization with NVIDIA GRID for a Large Construction Company
Jits Langedijk (PQR)
Learn how one of the largest construction companies in the Netherlands successfully implemented a VDI environment with NVIDIA GRID, Citrix XenDesktop and VMware virtualization. Hear about their use cases and lessons learned of as well as how to run A ...Read More
Learn how one of the largest construction companies in the Netherlands successfully implemented a VDI environment with NVIDIA GRID, Citrix XenDesktop and VMware virtualization. Hear about their use cases and lessons learned of as well as how to run AutoDesk, BIM and other applications with VDI.  Back
 
Keywords:
Graphics Virtualization, Manufacturing, GTC 2015 - ID S5265
Streaming:
Download:
 
Remote Visualization in Healthcare
Michael Harwood (Wipro Limited)
Michael Harwood will be presenting on utilizing GPU's in the remote visualization of graphics intensive healthcare applications. With a focus on best practices and real world experiences in delivering radiology and oncology applications via Citrix b ...Read More
Michael Harwood will be presenting on utilizing GPU's in the remote visualization of graphics intensive healthcare applications. With a focus on best practices and real world experiences in delivering radiology and oncology applications via Citrix based remote applications and desktops, the discussion will delve into the key elements of the architecture, as well as specific metrics and solutions to commonly encountered challenges,  Back
 
Keywords:
Graphics Virtualization, Medical Imaging, GTC 2015 - ID S5283
Streaming:
Download:
 
Building the Business Case on Implementing VDI for Professional Graphics
Tony Berholt (Xenit AB)
In this session we will explore a business case analysis of implementing a virtualized desktop infrastructure for delivering professional graphics based on exemplary projects. We will look at methods of illustrating and presenting performance metrics ...Read More
In this session we will explore a business case analysis of implementing a virtualized desktop infrastructure for delivering professional graphics based on exemplary projects. We will look at methods of illustrating and presenting performance metrics, cost-benefit analysis and risks typical to this particular area. Which aspects should be examined in detail and which should not? What are typical business gains and technical pitfalls to our exemplary case? We also discuss how to go through the process of advancing the organization from idea to POC to pilot and get outside stakeholders and the IT department onboard.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5301
Streaming:
Download:
 
VDI Evolution at the Speed of GRID - VDI 2.0 IS Here! (Presented by Cisco)
Shawn Kaiser (Cisco Systems), Jason Marchesano (Cisco Systems, Inc.)
2015 will be the year for VDI and desktop virtualization Learn how Cisco and NVIDIA partner to deliver this next generation Desktop Virtualization Solution with GRID vGPU acceleration. VDI is not the same animal it used to be: User requirements and ...Read More
2015 will be the year for VDI and desktop virtualization Learn how Cisco and NVIDIA partner to deliver this next generation Desktop Virtualization Solution with GRID vGPU acceleration. VDI is not the same animal it used to be: User requirements and expectations have changed as has the Operating system and applications that feeds the beast. Join industry solution experts Shawn Kaiser and Jason Marchesano to discuss how you can evolve to VDI 2.0 and literally put the past to rest.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5336
Streaming:
Download:
 
Outscale/Dassault Cloud Implementation with NVIDIA GRID and Cisco UCS (Presented by Cisco)
Laurent Seror (Outscale)
Customer success story for Nvidia GRID and Cisco UCS. ...Read More
Customer success story for Nvidia GRID and Cisco UCS.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5342
Streaming:
 
Creating a "Snappy" Virtual Desktop User Experience
Aivars Apsite (Metro Health)
Metro Health will review their latest VDI implementation to address numerous performance issues, including graphics performance. They will address the need to have the VDI session be perceived as "snappy". A definition of "snappy" ...Read More
Metro Health will review their latest VDI implementation to address numerous performance issues, including graphics performance. They will address the need to have the VDI session be perceived as "snappy". A definition of "snappy" will be provided, and metrics shown how snappy was improved with their new spec. Latency and IOPS will be discussed. Metro will show how their latest hardware specs have improved performance. Lastly, video processing and 3D graphics has been a performance challenge for virtual desktops. Metro will review how their use of the Teradici Apex CPU offload cards and NVIDIA K1 cards with the vSGA protocol improved their VDI video performance. Throughout the presentation, Metro Health will also provide their lessons learned of implementing virtual desktop  Back
 
Keywords:
Graphics Virtualization, Medical Imaging, Real-Time Graphics, GTC 2015 - ID S5351
Streaming:
Download:
 
Training and Support System in the Cloud for Search and Rescue Missions
Pawel Musialik (Institute of Mathematical Machines)
This work concerns the development of training and support system for SAR missions based on NVIDIA GRID technology. The architecture of cloud system will be discussed. This system can be deployed in the disaster zone as Mobile Data Centre and in typi ...Read More
This work concerns the development of training and support system for SAR missions based on NVIDIA GRID technology. The architecture of cloud system will be discussed. This system can be deployed in the disaster zone as Mobile Data Centre and in typical Data Centre. We developed software tools for registration and gathering robotic data (3D cloud of points) into the common coordinate system. The rendering of 3D data is accessible via SaaS (Software as a Service) model. This software is dedicated for SAR teams working with modern UAV (Unmanned Aerial Vehicles) and UGV (Unmanned Ground Vehicles). GRID technology helps with integration of many data sources and visualisation over Ethernet. Training system is using these 3D maps as reference training area for rigid body simulation of robots.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, Defense, Signal & Audio Processing, GTC 2015 - ID S5374
Streaming:
Download:
 
SoftLayer - NVIDIA GRID Cloud Solutions
Jerry Gutierrez (SoftLayer, an IBM Company)
NVIDIA is redefining general-purpose computing and proud to be working closely with IBM and SoftLayer to bring the capabilities of GPUs to the SoftLayer Cloud offerings. ...Read More
NVIDIA is redefining general-purpose computing and proud to be working closely with IBM and SoftLayer to bring the capabilities of GPUs to the SoftLayer Cloud offerings.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5375
Streaming:
 
Benchmarking 3D Workloads at Scale on NVIDIA GRID with Horizon View 6 Using View Planner (Presented by VMWare)
Banit Agrawal (VMware), Luke Wignall (NVIDIA), Lan Vu (VMware)
If you are looking for the guidance and the "tool" on how to do the scaling for the 3D workloads in Horizon 6 with View on NVidia Grid GPU, you have come to the right session. In this session, we provide a deep dive on the scale testing of ...Read More
If you are looking for the guidance and the "tool" on how to do the scaling for the 3D workloads in Horizon 6 with View on NVidia Grid GPU, you have come to the right session. In this session, we provide a deep dive on the scale testing of various 3D workloads using View Planner 3.5 tool. View Planner 3.5 is a capacity planning tool that supports the real user workload including Office applications, video, audio, Interactive (mouse) tests and characterizes the true user experience for desktops and also has a feature of bring your own applications (BYOA). Using the BYOA feature of this tool, we show how you can quickly characterize your GRID GPU to get the scaling results for different 3D workloads and benchmarks while meeting the desired user experience.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5385
Streaming:
 
Citrix HDX 3D Virtualization: Six Years of Remoting 3D Apps
Derek Thorslund (Citrix Systems), Mayunk Jain (Citrix Systems)
Citrix introduced NVIDIA GPU accelerated remoting of 3D graphics in 2009, allowing organizations to keep their intellectual property safe in the data center while enabling secure remote access over WAN connections from a variety of devices. Learn how ...Read More
Citrix introduced NVIDIA GPU accelerated remoting of 3D graphics in 2009, allowing organizations to keep their intellectual property safe in the data center while enabling secure remote access over WAN connections from a variety of devices. Learn how this technology has evolved thanks to the ongoing collaboration between Citrix and NVIDIA, resulting in major gains in performance and reductions in total cost of ownership. Discover through a series of case studies how companies in various vertical markets have taken advantage of Citrix's proven HDX 3D Pro solution to realize a wide range of benefits. Catch up on the evolution of the HDX 3D Pro partner ecosystem and the latest hardware and software innovations in 3D graphics remoting that are now available with XenApp and XenDesktop.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5390
Streaming:
Download:
 
VMware Horizon 6 and NVIDIA vGPU: Installation and Configuration Best Practices
Jeff Weiss (NVIDIA), Luke Wignall (NVIDIA)
In this tutorial, we will train attendees on how to deploy Horizon 6 View with NVIDIA GRID based on best practices learned from the field. Some of the topics that will be covered include how to: install GPUs in a server, install vSphere for GRID vGPU ...Read More
In this tutorial, we will train attendees on how to deploy Horizon 6 View with NVIDIA GRID based on best practices learned from the field. Some of the topics that will be covered include how to: install GPUs in a server, install vSphere for GRID vGPU and come up with storage (vSAN), install NVIDIA GRID VIB, launch vCenter Web Client and create VM, add Shared PCI Device and set the VGPU profile, install VMware Horizon Agent 6.0.1 and Setup Monterey Enable, add VM to pools, manually clone VMs and add to pools, monitor performance and, debug common errors.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5405
Streaming:
 
GPU-Enabled VDI and Rendering at Architecture and Engineering Firm HDR
Clint Pearson (HDR, Inc.), Jeremy Korell (HDR, Inc.)
At GTC 2013, Clint and Jeremy caught the vision of GPU-enabled VDI using NVIDIA GRID, as well as many other applications of GRID for HDR, a global engineering and architecture design firm based in Omaha Nebraska. Ever since, Clint and Jeremy have be ...Read More
At GTC 2013, Clint and Jeremy caught the vision of GPU-enabled VDI using NVIDIA GRID, as well as many other applications of GRID for HDR, a global engineering and architecture design firm based in Omaha Nebraska. Ever since, Clint and Jeremy have been leading the HDR IT Group to fund and implement a GPU-Enabled VMware View system to enable global work-sharing from a central data center.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, Manufacturing, GTC 2015 - ID S5414
Streaming:
Download:
 
Get the Best out of NVIDIA GPUs for 3D Design and Engineering in the Cloud
Andrea Rodolico (NICE)
Learn how interactive, responsive high-end 3D graphics is now possible and viable in many cloud infrastructures, both private or public, thanks to the NVIDIA GRID. NICE Desktop Cloud Visualization (DCV) leverages NVIDIA GPUs and provides hardware acc ...Read More
Learn how interactive, responsive high-end 3D graphics is now possible and viable in many cloud infrastructures, both private or public, thanks to the NVIDIA GRID. NICE Desktop Cloud Visualization (DCV) leverages NVIDIA GPUs and provides hardware acceleration for both Linux and Windows applications, enabling remote working and collaboration across virtually any network and any device. In this talk we share success stories on how NVIDIA GPUs have helped Enterprise and Research customers to run CAD, CAE, Oil & Gas or other heavy-duty 3D applications in many diverse network scenarios. We also update the audience on the latest features included in the 2014 release of NICE DCV, including GPU-accelerated compression leveraging the hardware encoder of the latest NVIDIA GPUs.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5415
Streaming:
Download:
 
Desktop Virtualization 101: The Technical and Business Drivers That Make It Happen
Jeff Weiss (NVIDIA), Luke Wignall (NVIDIA)
An overview of server and desktop virtualization. A brief history of virtualization, the business drivers and the technologies that help to enable virtual desktops and application remoting. ...Read More
An overview of server and desktop virtualization. A brief history of virtualization, the business drivers and the technologies that help to enable virtual desktops and application remoting.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5431
Streaming:
 
Rendering Rich User Experiences in Virtualized Environments
John Meza (Esri)
This presentation will show how NVIDIA® GRID K1 and K2 boards benefit Esri ArcGIS Pro, a newly developed 3D GIS platform using DirectX and OpenGL, in Hyper-V, XenDesktop, XenApp and Horizon View virtualized environments. We will show the FPS, el ...Read More
This presentation will show how NVIDIA® GRID K1 and K2 boards benefit Esri ArcGIS Pro, a newly developed 3D GIS platform using DirectX and OpenGL, in Hyper-V, XenDesktop, XenApp and Horizon View virtualized environments. We will show the FPS, elapsed time, virtual GPU configurations and other usability metrics for each virtualized environment configured with NVIDIA® GRID K1 and K2 boards. The attendee will see how the environment scaled while providing and acceptable user experience and what the density per GPU was achieved. Also the capabilities of cloud-based VDI environments configured with NVIDIA® GRID boards to support graphics applications will be discussed and demonstrated. The attendee will see live demonstrations of the user experience in these virtualized environments.  Back
 
Keywords:
Graphics Virtualization, Real-Time Graphics, GTC 2015 - ID S5450
Streaming:
 
Protecting Intellectual Property: CAD/CAM for Contractors and Countries of Concern
Fred Devoir (Textron Inc.)
Attend this session and join a discussion about protecting intellectual property in a distributed design and development environment. We will explore the challenges of deploying 3D CAD/CAM toolsets for contract designers as well as remote manufacturi ...Read More
Attend this session and join a discussion about protecting intellectual property in a distributed design and development environment. We will explore the challenges of deploying 3D CAD/CAM toolsets for contract designers as well as remote manufacturing engineering facilities in countries of concern.  Back
 
Keywords:
Graphics Virtualization, Manufacturing, GTC 2015 - ID S5482
Streaming:
Download:
 
Exploring Design Considerations: CAD/CAM Experiences from the Experts Using Citrix and VMware
Fred Devoir (Textron Inc.), Randall Siggers (Textron Inc.)
Join an interactive discussion and Q&A session highlighting the design considerations for deployment of 3D remote graphics solutions with a focus on end-user experience as applied for use with CAD/CAM. Designing for user experience should include ...Read More
Join an interactive discussion and Q&A session highlighting the design considerations for deployment of 3D remote graphics solutions with a focus on end-user experience as applied for use with CAD/CAM. Designing for user experience should include considerations for availability, performance, and personalization. Explore the different use cases for Citrix and VMware as applied for use with Autodesk AutoCAD, Dassault Catia, PTC ProEngineer, and Siemens NX. Included will be demonstrated tuning of Dassault Systèmes Catia V6 for Citrix XenDesktop, massive WriteCache design consideration for ANSYS, how to design for user concurrency without sacrificing performance or user experience, and VMware vGPU design considerations in regards to Autodesk on VMware Horizon View.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5485
Streaming:
Download:
 
Performance Testing in Virtualized Environments
Emily Apsey (Esri)
Esri is currently developing it's new Desktop GIS application, ArcGIS Pro, which uses DirectX and OpenGL rendering libraries for visualization of 2D and 3D spatial and raster data. We foresee heavy use of ArcGIS Pro in virtualized environments that ...Read More
Esri is currently developing it's new Desktop GIS application, ArcGIS Pro, which uses DirectX and OpenGL rendering libraries for visualization of 2D and 3D spatial and raster data. We foresee heavy use of ArcGIS Pro in virtualized environments that are configured to use the NVIDIA GRID. This presentation will cover our test approach in architecture and methodology, including how we used Visual Studio load tests as well as developed a lightweight add-in, and the lessons learned along the way. Further discussion points include required test metrics and how to acquire those metrics, also GPU monitoring tools used, NVIDIA-SMI, GPU-Z, and internally generated metrics.  Back
 
Keywords:
Graphics Virtualization, Developer - Performance Optimization, GTC 2015 - ID S5493
Streaming:
 
Virtual Texturing in the Cloud
Ananth Balasubramaniam (Amazon)
This session will show you how to leverage cloud rendering to render extremely detailed environments in real-time using virtual textures. The techniques shown here will allow you to overcome the storage and rendering constraints of mobile devices whi ...Read More
This session will show you how to leverage cloud rendering to render extremely detailed environments in real-time using virtual textures. The techniques shown here will allow you to overcome the storage and rendering constraints of mobile devices while still delivering stunning, interactive content on them. You will learn how a combination of Amazon's EC2 infrastructure (G2 instance powered by GRID K520), Amazon AppStream and the GRID SDK come together to deliver a graphics experience on par with high-end gaming PCs on low-power SoCs. The session will also walk you through design considerations, source code, performance comparisons, optimizations, and, of course, a demo.  Back
 
Keywords:
Graphics Virtualization, Game Development, Real-Time Graphics, GTC 2015 - ID S5499
Streaming:
 
Dedicating GPUs for VDI and SBC Workloads: How the ROI and Business Value More Than Justifies the Expense
Mark Margevicius (VMWare)
Many customers are experimenting with GPUs for their VDI and SBC-based users. And, while the necessary capital investments pose challenges, we will share how customers have successfully justified these costs based on ROI and other newly created value ...Read More
Many customers are experimenting with GPUs for their VDI and SBC-based users. And, while the necessary capital investments pose challenges, we will share how customers have successfully justified these costs based on ROI and other newly created value. This session will provide practical, real-world advice on how IT administrators can create a strong business case for GPUs.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5533
Streaming:
Download:
 
Scaling Out Virtual GPU with NVIDIA GRID and VMware Horizon
Aivars Apsite (Metro Health), Cedric Courteix (VMware), Clint Pearson (HDR Inc.), John Meza (Esri)
Let's review your design options to provide access for graphical intense applications MRI, CAD, and others in a virtual desktop environment. In this session, I will demonstrate the benefits of adding NVIDIA GRID cards to dedicate GPU for horizon ...Read More
Let's review your design options to provide access for graphical intense applications MRI, CAD, and others in a virtual desktop environment. In this session, I will demonstrate the benefits of adding NVIDIA GRID cards to dedicate GPU for horizon View applications. Soon you will be able to virtualize the GPU hardware and share it among multiple View desktops. NVIDIA GRID vGPU on VMware vSphere brings the full benefit of virtualized graphics acceleration on NVIDIA GRID. This technology provides exceptional graphics performance for virtual desktops equivalent to local PCs when sharing a GPU among multiple users.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5542
Streaming:
Download:
 
Scalability Testing for Virtualized GPU Environments
Manvender Rawat (NVIDIA), Jason K Lee (NVIDIA)
As more and more companies realize the benefits of virtualization, it is essential for success to ensure virtualized graphics is a true replacement to existing desktops and workstations. This means that graphics performance must be acceptable to Desi ...Read More
As more and more companies realize the benefits of virtualization, it is essential for success to ensure virtualized graphics is a true replacement to existing desktops and workstations. This means that graphics performance must be acceptable to Designers and Power Users even under heavy server load and high user density. Learn how to deploy, configure and test performance scalability for graphics rich applications using NVIDIA virtual GPU. This session will include a walkthrough of Virtual GPU setups on both Citrix and VMware VDI platforms and will provide insight on how to optimize and tune the environments for best performance. The session will also give an overview of the deployment and performance measurement tools used for testing, as well as present detailed results of scale testing  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5560
Streaming:
 
Storage: Critical to the Success of vGPU Workloads (Presented by Pure Storage)
Kyle Grossmiller (Pure Storage), Ravi Venkat (Pure Storage)
Storage is a critical part of any Virtual Desktop Infrastructure (VDI) deployment. Successful deployment of a large scale of rich graphics VDI with vGPU requires determining the Input/Output Operations per Second (IOPs). The type of application, data ...Read More
Storage is a critical part of any Virtual Desktop Infrastructure (VDI) deployment. Successful deployment of a large scale of rich graphics VDI with vGPU requires determining the Input/Output Operations per Second (IOPs). The type of application, data set size, redundancy, and compressibility combined with other parts of the VDI architecture drive the need for IOPs. Fortunately, the advent of affordable all flash arrays with enterprise services like HA, replication, and snapshots can meet or exceed the needs for these demanding systems. This session will cover the factors to determine the IOPs and how all flash arrays is a new class of storage ideally suited to power the storage for this demanding architecture.  Back
 
Keywords:
Graphics Virtualization, Developer - Performance Optimization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5594
Streaming:
 
Implementing NVIDIA GRID with XenDesktop: A Technical Deep Dive
Garrett Taylor (The Kanavel Group)
End-to-end implementation of GRID for XenDesktop including best practices and design considerations. The talk will be focused on XenServer 6.2 FP1 and XenDesktop/XenApp 7.6 with HDX 3D Pro but we will discuss VMWare solutions as well. We will also di ...Read More
End-to-end implementation of GRID for XenDesktop including best practices and design considerations. The talk will be focused on XenServer 6.2 FP1 and XenDesktop/XenApp 7.6 with HDX 3D Pro but we will discuss VMWare solutions as well. We will also discuss GRID in several industry verticals including education and healthcare.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, Real-Time Graphics, GTC 2015 - ID S5620
Streaming:
Download:
 
Delivering Best-in-Class Graphic Intensive Virtual Workspaces with Cisco UCS (Presented by Cisco)
Aniket Patankar (Cisco Systems Inc.), Timothy Ma (Cisco)
Desktop virtualization implementers are reaping the benefits of Cisco Unified Computing to provide end users an uncompromised experience. This session will detail key design strategies and best practices for deploying Graphic Intensive VDI on Cisco U ...Read More
Desktop virtualization implementers are reaping the benefits of Cisco Unified Computing to provide end users an uncompromised experience. This session will detail key design strategies and best practices for deploying Graphic Intensive VDI on Cisco UCS with NVIDIA GPU. Topics will include: an overview on enhanced Graphics capabilities supported by newer generation of UCS servers, UCS2.0; how Cisco UCS Director can help unify and automate the management of your desktop virtualization infrastructure from end to end; how to simplify the manageability of GPU-enabled VDI solutions with Cisco UCS C-Series Rack Servers with Single Connect technology and; how to accelerate your path to ROI with VDI using the latest Cisco Validated Designs for VDI with FlexPod, VSPEX.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5638
Streaming:
 
Unlock the Benefits of Virtualized Desktops and Applications for Design and Engineering
Patrick Godwin (Godwin Global)
Attendees will hear how NVIDIA GRID Technology now enables professional designers and engineers to collaborate and interact with their projects on any device, anywhere on the network. NVIDIA GRID also provides the mobility, security and control that ...Read More
Attendees will hear how NVIDIA GRID Technology now enables professional designers and engineers to collaborate and interact with their projects on any device, anywhere on the network. NVIDIA GRID also provides the mobility, security and control that IT operations demand, making it an ideal solution for virtualizing 3D-intensive manufacturing workloads. With more and more manufacturing companies leveraging the value of virtualization, join this session to learn more about what it takes to maintain productive workflows and the value of having a GRID enabled virtualized environment. This talk will describe benefits for manufacturers such as mobility and productivity, IP protection, BYOD access with high performance and CAD/CAM application support.  Back
 
Keywords:
Graphics Virtualization, Manufacturing, GTC 2015 - ID S5698
Streaming:
Download:
 
Workstation Physical to Virtual: A Road Fraught with Perils? Here is your Guide! (Presented by Dell)
Gary Radburn (Dell)
In this session we intend to give you tips and tricks to navigate the implementation of a virtualized workstation environment to ensure it is a good fit for your company. How to look at the infrastructure, performance and end users and develop a plan ...Read More
In this session we intend to give you tips and tricks to navigate the implementation of a virtualized workstation environment to ensure it is a good fit for your company. How to look at the infrastructure, performance and end users and develop a plan to include all needs and wants and have all aspects satisfied. We will also look at items to help you along the way including tools that will help performance analysis, and even how to set up a proof of concept in virtualization with no money down! Virtualization does not have to be daunting now is the right time to see if this exciting technology is right for your organization.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5826
Streaming:
 
Worlds Collide: What Happens When VDI Meets GPU? (Presented by Citrix)
Gunnar Berger (Citrix)
As Windows and productivity applications like Microsoft office become more and more graphically aware and graphics rich applications demand mobility, security and increased accessibility, providing employees with graphical processing to scale and eco ...Read More
As Windows and productivity applications like Microsoft office become more and more graphically aware and graphics rich applications demand mobility, security and increased accessibility, providing employees with graphical processing to scale and economically is more critical than ever. In this session attendees will learn how to break free from the traditional physical GPU per user model. Virtualizing the GPU in a hosted environment with XenApp and XenDesktop opens a wide range of new mobile workstyles, cloud, and DaaS based offerings for a wider range of employees. We will also dive into the security and compliance benefits of keeping your sensitive apps and data where you have the strongest control over them; your data center. Come armed with your questions and use cases. No topic is too controversial for our experts.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5872
Streaming:
Download:
 
Breaking the Barriers of Mobility and Cloud in Product Development and Engineering (Presented by HP)
Nicholas Holian (Hewlett Packard)
Take collaboration and mobility to new heights. Whether you're a CAD engineer or work in command and control, energy, gaming, or financial trading, you need a dedicated, secure infrastructure environment that allows you to collaborate, innovate, and ...Read More
Take collaboration and mobility to new heights. Whether you're a CAD engineer or work in command and control, energy, gaming, or financial trading, you need a dedicated, secure infrastructure environment that allows you to collaborate, innovate, and work anywhere, on any device. Join us for a live demonstration with HP Blade, WS460c, technology. Learn how the HP engineering Virtual Desktop Infrastructure (eVDI) solution can deliver a high-performance graphical environmentco-located with high-performance computing and product lifecycle managementthat enables better collaboration, IP protection, reduced cost, and faster time to market for a mobile workforce. These break the barriers of traditional thinking to allow for significant gains for IT and individual business units.  Back
 
Keywords:
Graphics Virtualization, Product Design & Styling, Manufacturing, GTC 2015 - ID S5349
Streaming:
Download:
 
NVIDIA GRID and vGPU: Best Practices for Designing and Monitoring
Florian Becker (Lakeside Software, Inc.), Ben Murphy (Lakeside Software Inc.)
Learn how to implement NVIDIA GRID technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameter ...Read More
Learn how to implement NVIDIA GRID technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameters and GPU utilization and use the data to accurately size and scale the vGPU implementation in VDI use cases. Monitor virtual GPUs to proactively detect changes in performance requirements of the end-user community and manage the end-user experience and to pinpoint performance bottlenecks in the environment.  Back
 
Keywords:
Graphics Virtualization, Manufacturing, GTC 2015 - ID S5111
Streaming:
 
Case Study: Georgia Tech Uses Citrix XenApp with NVIDIA GRID to Deliver Engineering Applications
Florian Becker (Lakeside Software), Didier Contis (Georgia Institute of Technology College of Engineering)
Georgia Tech's College of Engineering (CoE) is providing students and faculty with a wide variety of complex software packages covering many engineering application domains. CoE has been leveraging a combination of Citrix XenDesktop/XenApp and NVIDI ...Read More
Georgia Tech's College of Engineering (CoE) is providing students and faculty with a wide variety of complex software packages covering many engineering application domains. CoE has been leveraging a combination of Citrix XenDesktop/XenApp and NVIDIA GRID technologies to provide remote access to some of the most graphics intensive engineering software. Learn from Georgia Tech's experiences about leading practices in the delivery of high performance graphics applications with NVIDIA GRID in conjunction with App and Desktop Virtualization. This session will also cover tools and processes that help GRID customers monitor, manage the user experiences and make design decision on how to size their XenDesktop/XenApp environments to meet demand.  Back
 
Keywords:
Graphics Virtualization, Manufacturing, GTC 2015 - ID S5128
Streaming:
Download:
 
Effective Planning for Density and Performance in a Virtual Desktop Deployment with NVIDIA GRID
Jason Southern (NVIDIA)
Adding GPU's to a virtual desktop deployment is only the beginning. Optimising the deployment to make best use of the GPU's and deliver the best experience to the end user. In this session we will discuss: choosing between Passthrough, vGPU or API ...Read More
Adding GPU's to a virtual desktop deployment is only the beginning. Optimising the deployment to make best use of the GPU's and deliver the best experience to the end user. In this session we will discuss: choosing between Passthrough, vGPU or API Intercept methods, effectively selecting the right vGPU profile and card, benchmarking and the effects of virtualization; optimizing the virtual infrastructure and, fine tuning remote graphics protocols. This session will include real-world examples and demonstrations of the impact minor changes have on performance.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5213
Streaming:
Download:
 
Delivering Production Deployments Using Virtualization and NVIDIA GRID
Adam Jull (IMSCAD)
Attendees will hear IMSCAD case studies of real world deployments using NVIDIA GRID, various design applications on Citrix and the challenges faced when deploying this technology. If your looking to virtualize with NVIDIA GRID, this session is a must ...Read More
Attendees will hear IMSCAD case studies of real world deployments using NVIDIA GRID, various design applications on Citrix and the challenges faced when deploying this technology. If your looking to virtualize with NVIDIA GRID, this session is a must see.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, Product Design & Styling, Manufacturing, GTC 2015 - ID S5219
Streaming:
Download:
 
Bringing Dassault Engineering Into The Cloud with NVIDIA GRID SDK
Stefan Schoenefeld (NVIDIA), Christophe Delattre (Dassault Systèmes)
In this session you will learn how Dassault Systems uses NVIDIA GRID cards and the NVIDIA GRID SDK to bring their applications into the cloud. We will take a look behind the scene at the technologies used to provide fast, high-quality graphics stream ...Read More
In this session you will learn how Dassault Systems uses NVIDIA GRID cards and the NVIDIA GRID SDK to bring their applications into the cloud. We will take a look behind the scene at the technologies used to provide fast, high-quality graphics streaming and will cover implementation and design details of the Dassault remoting application. Furthermore we will talk about new, exciting ideas for next generation application remoting jointly developed by Dassault Systems and NVIDIA engineers. Finally we will give demos of the Dassault remoting application as well as some of our next generation prototypes.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, Manufacturing, GTC 2015 - ID S5253
Streaming:
 
Evolution of an NVIDIA GRID Deployment
Erik Bohnhorst (NVIDIA), Ronald Grass (Citrix)
In this session you will learn about the entire lifecycle of how a customer moved from a local workstation deployment to a centralized and remote solution with CITRIX XenDesktop and NVIDIA GRID. The session will guide you through the lifecycle of the ...Read More
In this session you will learn about the entire lifecycle of how a customer moved from a local workstation deployment to a centralized and remote solution with CITRIX XenDesktop and NVIDIA GRID. The session will guide you through the lifecycle of the project covering the business need, the proof of concept, challenges, learning's and the actual deployment. You will walk away from this session with a better understanding of the challenges and solutions of a NVIDIA GRID opportunity.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5393
Streaming:
 
Building the Best User Experience with Citrix XenApp & NVIDIA GRID
Thomas Poppelgaard (Poppelgaard.com)
Citrix XenApp (formerly Citrix WinFrame Server, Citrix MetaFrame Server and Citrix Presentation Server) is an application virtualization product that allows users to connect to their corporate applications from a wide range of computer systems and mo ...Read More
Citrix XenApp (formerly Citrix WinFrame Server, Citrix MetaFrame Server and Citrix Presentation Server) is an application virtualization product that allows users to connect to their corporate applications from a wide range of computer systems and mobile devices. XenApp can host applications on central servers and allow users to interact with them remotely or stream and deliver them to user devices for local execution. Learn in this session customer cases, how and why NVIDIA GRID provided the best user experience. Learn how to build better user experience with application such as Google Earth, Adobe Reader, MS Office in a Citrix XenApp with NVIDIA GRID.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5445
Streaming:
Download:
 
Publishing Medical Image Studies with NVIDIA GRID
G Allan Johnson (Duke University)
Biomedical imaging has experienced exponential growth over the last 30 years in instrumentation, applications, and data volume. Traditional methods for publishing imaging studies are no longer adequate. We describe a new paradigm for publication of b ...Read More
Biomedical imaging has experienced exponential growth over the last 30 years in instrumentation, applications, and data volume. Traditional methods for publishing imaging studies are no longer adequate. We describe a new paradigm for publication of biomedical imaging libraries- collections of large multi-dimensional images curated around a central theme. Libraries are shared via an NVIDIA GRID K2 server running CITRIX. Several libraries have been assembled: 1) A (0.5 TB) multidimensional atlas of the mammalian brain based on MRI of the mouse, rat, and monkey; 2) an (0.25 TB) interactive CT/MR atlas of the mouse with both in vivo and ex vivo MR and CT- at microscopic resolution; and 3) clinical libraries for teaching and surgical planning.  Back
 
Keywords:
Graphics Virtualization, Medical Imaging, GTC 2015 - ID S5558
Streaming:
 
Cloud Gaming & Application Delivery with NVIDIA GRID Technologies
Franck Diard (NVIDIA)
This session presents the future of game engines and application delivery running in the cloud and the technologies behind NVIDIA® GRID. The audience will learn about the key components of NVIDIA® GRID, like optimal capture, efficient compr ...Read More
This session presents the future of game engines and application delivery running in the cloud and the technologies behind NVIDIA® GRID. The audience will learn about the key components of NVIDIA® GRID, like optimal capture, efficient compression, fast streaming, and low latency rendering, make cloud gaming and application delivery possible. Franck will demonstrate how these components fit together, how to use the GRID APIs, and how to optimize their usage to deliver an ultimate experience, with live demos.  Back
 
Keywords:
Graphics Virtualization, GTC 2015 - ID S5582
Streaming:
Life & Material Science
Presentation
Media
Many-Body Forces for Molecular Dynamics
Peter Eastman (Stanford University)
Learn to implement many-body forces on a GPU. Interactions involving three or more atoms are becoming increasingly important in molecular simulations. They present unique challenges not found in conventional pairwise forces. I will describe how we im ...Read More
Learn to implement many-body forces on a GPU. Interactions involving three or more atoms are becoming increasingly important in molecular simulations. They present unique challenges not found in conventional pairwise forces. I will describe how we implemented them in OpenMM, with an emphasis on optimization strategies to minimize thread divergence and avoid unnecessary memory access. I will also discuss how you can use OpenMM as a library to implement your own many-body forces without writing a line of CUDA code.  Back
 
Keywords:
Life & Material Science, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5130
Streaming:
Download:
 
Attacking HIV with Petascale Molecular Dynamics Simulations on Titan and Blue Waters
James Phillips (University of Illinois at Urbana-Champaign)
The highly parallel molecular dynamics code NAMD was was one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007, and is now used to perform petascale biomolecular simulations, including a 64-million-atom model of the ...Read More
The highly parallel molecular dynamics code NAMD was was one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007, and is now used to perform petascale biomolecular simulations, including a 64-million-atom model of the HIV virus capsid, on the GPU-accelerated Cray XK7 Blue Waters and ORNL Titan machines. Come learn the opportunities and pitfalls of taking GPU computing to the petascale, the importance of CUDA 6.5 and Kepler/Maxwell features in combining multicore host processors and GPUs in a legacy message-driven application, and the promise of remote graphics for improving productivity and accessibility in petascale biology.  Back
 
Keywords:
Life & Material Science, Graphics Virtualization, Supercomputing, GTC 2015 - ID S5149
Streaming:
Download:
 
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Jorge González-Domínguez (JGU Mainz)
In this session, you will learn about how to develop a bioinformatics tool for GPU clusters using CUDA and UPC++, a PGAS language. In particular, the analyzed tool detects epistasis between SNP-pairs of a GWAS dataset. I will describe: (1) how to dis ...Read More
In this session, you will learn about how to develop a bioinformatics tool for GPU clusters using CUDA and UPC++, a PGAS language. In particular, the analyzed tool detects epistasis between SNP-pairs of a GWAS dataset. I will describe: (1) how to distribute the workload among different GPUs using a UPC++; and (2) how to exploit the GPU characteristics to speedup the epistasis detection. Results on two different clusters with different GPUs and characteristics will be presented.  Back
 
Keywords:
Life & Material Science, GTC 2015 - ID S5164
Streaming:
Download:
 
GPU-Accelerated Algorithm for the Whole Genome Assembly Problem
Michal Kierzynka (Poznan University of Technology)
Learn how to assemble genomes more efficiently and more accurately with a new GPU-accelerated de novo assembler. As Next Generation Sequencing tends to generate immense amounts of genomic data, scientists around the world are constantly looking for f ...Read More
Learn how to assemble genomes more efficiently and more accurately with a new GPU-accelerated de novo assembler. As Next Generation Sequencing tends to generate immense amounts of genomic data, scientists around the world are constantly looking for faster, cheaper and better software tools for the DNA assembly. This session will describe the first ever GPU-based algorithm for the DNA assembly and explain how the GPUs are used to effectively tackle this complex problem. Participants will also learn some key optimizations that have helped us to achieve the peak performance in sequence alignment on GPUs. Moreover, examples will be given how the software performs on real data coming from the Illumina sequencer. You cannot miss this session if you want to stay up to date!  Back
 
Keywords:
Life & Material Science, Developer - Algorithms, GTC 2015 - ID S5184
Streaming:
Download:
 
Scaling Ion Torrent Semiconductor Sequencing Analysis with GPU's
Mohit Gupta (Thermo Fisher Scientific), Jakob Siegel (Thermo Fisher Scientific)
Learn how GPU's are playing a central role in conquering compute challenges posed by current and next generation of Ion Torrent DNA sequencing chips in Ion Proton DNA sequencer. We will showcase our complete signal processing pipeline running on GPU ...Read More
Learn how GPU's are playing a central role in conquering compute challenges posed by current and next generation of Ion Torrent DNA sequencing chips in Ion Proton DNA sequencer. We will showcase our complete signal processing pipeline running on GPU and the journey in developing CUDA code for data fitting algorithms targeted at different GPU architectures like Fermi, Keplar and Maxwell. We will also share our evaluation of NVIDIA's aligner nvBowtie and how it stands in terms of speed and accuracy of alignments. We will touch upon several examples in the life sciences field that concern with performing cutting edge research in clinical diagnostics, drug discovery and human identification whose work is rapidly accelerated by turnaround time of our technology powered by GPU's.  Back
 
Keywords:
Life & Material Science, GTC 2015 - ID S5216
Streaming:
Download:
 
Accelerated Sparse Matrix Multiplication for Quantum Chemistry with CP2K on Hybrid Supercomputers
Ole Schütt (ETH Zürich)
Learn how we achieve great GPU performance with an auto-tuned sparse matrix multiplication library, enabling quantum simulation of millions of electrons. Our tool of choice is CP2K, a leading code in the field of electronic structure and simulation. ...Read More
Learn how we achieve great GPU performance with an auto-tuned sparse matrix multiplication library, enabling quantum simulation of millions of electrons. Our tool of choice is CP2K, a leading code in the field of electronic structure and simulation. Exploiting the locality and sparsity this code achieves a linear computational complexity for DFT, allowing for novel science. Massive parallelism over thousands of GPUs leads to excellent time to solution. The major computational kernel is block-sparse matrix matrix multiplication. We will discuss results and development insights, including GPU kernels and latency hiding node-parallel techniques. We propose sparse matrix multiplications as a powerful abstraction to formulate streaming algorithms in general.  Back
 
Keywords:
Life & Material Science, Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5217
Streaming:
Download:
 
Single Precision Hybrid Model for Molecular Dynamics Simulations
Ross Walker (University of California San Diego), Scott LeGrand (Amazon)
In this talk we will highlight the work we have done to develop what we term the SPXP precision model. This is the first fully single and fixed precision hybrid model to provide conservation of energy in MD simulations equivalent to full double preci ...Read More
In this talk we will highlight the work we have done to develop what we term the SPXP precision model. This is the first fully single and fixed precision hybrid model to provide conservation of energy in MD simulations equivalent to full double precision runs but without the need for double precision arithmetic. By exploiting the nature of fixed precision arithmetic and custom machine code accumulator functions we can effectively emulate double precision performance in the latest generation GPUs.  Back
 
Keywords:
Life & Material Science, Developer - Performance Optimization, GTC 2015 - ID S5226
Streaming:
 
GPUs in GAMESS: The Story of Libcchem
Dave Tomlinson (Iowa State University/Ames Lab)
Learn about GPU acceleration in the General Atomic and Molecular Electronic Structure System (GAMESS), one of the most widely used and freely available electronic structure codes in use today. The focus of this talk is libcchem, a high performance li ...Read More
Learn about GPU acceleration in the General Atomic and Molecular Electronic Structure System (GAMESS), one of the most widely used and freely available electronic structure codes in use today. The focus of this talk is libcchem, a high performance library developed for GAMESS to provide both high performance CPU and GPU code for performance critical methods. An overview of the methods in libcchem and how the methods are impacted by GPUs as well as a comparison of GAMESS CPU and GPU code will be given.  Back
 
Keywords:
Life & Material Science, GTC 2015 - ID S5241
Streaming:
Download:
 
Beyond Pair Potential: A CUDA Implementation of REBO Potential
Przemyslaw Tredak (University of Warsaw)
Classical molecular dynamics is a very important method in computational physics, chemistry and biology. It is also very computationally demanding. That is why it was among the first scientific methods to be ported to run on the GPUs. However, only s ...Read More
Classical molecular dynamics is a very important method in computational physics, chemistry and biology. It is also very computationally demanding. That is why it was among the first scientific methods to be ported to run on the GPUs. However, only some types of potentials used in MD, namely pair potentials, were ported. Other types, like REBO many body potential, very important to simulate systems of C and H, are still computed on the CPU. The reason for this lies in a huge complexity of many body potentials, as well as in a lack of an efficient communication scheme between threads that would resolve race conditions without atomic operations. This work shows a method of overcoming these difficulties in the CUDA implementation of 2nd generation REBO potential and achieved speedup.  Back
 
Keywords:
Life & Material Science, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5358
Streaming:
Download:
 
Auto-Tuning Kernel Launch Parameters for Maximum Performance
Joshua Anderson (University of Michigan)
Learn how to efficiently auto-tune kernel launch parameters and attain maximum performance in your application. Launch parameters have a large effect on the run time of the kernel, and there is no way to know the best choice a priori. Auto-tuning par ...Read More
Learn how to efficiently auto-tune kernel launch parameters and attain maximum performance in your application. Launch parameters have a large effect on the run time of the kernel, and there is no way to know the best choice a priori. Auto-tuning parameters provides maximum performance under any circumstances. This talk introduces the auto-tuning method used in the HOOMD-blue particle simulation toolkit and shows how you can use the same technique in your own applications. In practice, the method is very effective at finding the optimal launch parameters, and it can retune new parameters as the application runs. Retuning is important in Molecular Dynamics (MD) and Monte Carlo (MC) applications where the kernel workload can change drastically as the application run progresses.  Back
 
Keywords:
Life & Material Science, Developer - Performance Optimization, GTC 2015 - ID S5433
Streaming:
Download:
 
GPU-Accelerated Quantum ESPRESSO: Achievements and Challenges in Running Real Science Cases
Filippo Spiga (High Performance Computing Service (University of Cambridge) / Quantum ESPRESSO Foundation)
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. The GPU-accelerated Quantum ESPRESSO project started early 2011 and it has evolved and extended beyond the initi ...Read More
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. The GPU-accelerated Quantum ESPRESSO project started early 2011 and it has evolved and extended beyond the initial goals and QE-GPU plug-in it is now used by many users that run their calculations from small workstations to big supercomputers. Some new features have been under development and testing for long time to ensure robustness, correctness, longevity and portability. Top performance is not always the first priority. The aim of this talk is to present challenges and achievements of running on various heterogeneous systems by presenting real science cases gathered from Quantum ESPRESSO users.  Back
 
Keywords:
Life & Material Science, Supercomputing, GTC 2015 - ID S5441
Streaming:
 
Fast Sparse Matrix Multiplication for QMD using Parallel Merge
Jamaludin Mohd Yusof (Los Alamos National Laboratory), Nikolay Sakharnykh (NVIDIA)
We present a novel sparse matrix formulation that uses modified merge algorithms. In contrast to conventional sparse matrix algorithms, which suffer from data divergence within large work arrays, this method allows us to maintain contiguous data layo ...Read More
We present a novel sparse matrix formulation that uses modified merge algorithms. In contrast to conventional sparse matrix algorithms, which suffer from data divergence within large work arrays, this method allows us to maintain contiguous data layouts at all stages of the process. This also allows us to take advantage of ideas from optimized parallel merge algorithms for efficient GPU performance. Performance comparisons are presented. We are motivated by quantum mechanical simulations of atomic systems, which are limited by the computational cost of the eigenvalue solution. Linear scaling methods have been developed which require multiplication of large sparse matrices, where the number of non-zeros per row can be relatively large although still much less than the matrix dimension.  Back
 
Keywords:
Life & Material Science, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5443
Streaming:
 
GPU-Accelerated Virtual Screening: Rationale, Challenges, and Case Studies
Olexandr Isayev (University of North Carolina at Chapel Hill)
With the unprecedented growth of chemical databases incorporating billions of synthetically feasible chemicals, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large c ...Read More
With the unprecedented growth of chemical databases incorporating billions of synthetically feasible chemicals, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large chemical libraries. In this presentation, we will discuss the rationale and the main challenges (both theoretical and technical) for screening very large repositories of virtual compounds. We will present several proof-of-concept case studies regarding the screening of large libraries (? 1 billion compounds) using our novel GPU-accelerated cheminformatics platform to (1) rapidly compute chemical descriptors, (2) identify molecules with a defined bioactivity, and (3) materials with a desired property.  Back
 
Keywords:
Life & Material Science, Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5463
Streaming:
 
Real-Time Data Compression for Mass Spectrometry
Jose de Corral (Waters Corporation)
Learn how the GPU enables a technique to perform Mass Spectrometry data compression in real-time. Mass Spectrometry data is large and it is getting larger with every new generation of instruments. This presents a serious problem of data storage. The ...Read More
Learn how the GPU enables a technique to perform Mass Spectrometry data compression in real-time. Mass Spectrometry data is large and it is getting larger with every new generation of instruments. This presents a serious problem of data storage. The GPU performs this data compression algorithm in real-time while the data is being acquired by the instrument, resulting in less data reaching the file system, and a reduced post-acquisition data processing time. Given the amount of computation involved, typically trillions of floating point operations, a conventional CPU solution cannot keep up with real-time acquisition.  Back
 
Keywords:
Life & Material Science, GTC 2015 - ID S5472
Streaming:
Download:
 
I Can't Believe It's Not Just Molecular Dynamics (It's Machine Learning Too).
Scott LeGrand (Amazon)
There is a surprising algorithmic overlap between Deep Neural Networks (DNNs) and Molecular Dynamics. This talk will describe bidirectional technology transfers between these two seemingly disparate fields that has resulted from applying the wisdom g ...Read More
There is a surprising algorithmic overlap between Deep Neural Networks (DNNs) and Molecular Dynamics. This talk will describe bidirectional technology transfers between these two seemingly disparate fields that has resulted from applying the wisdom gained porting the AMBER molecular dynamics package to 4 generations of NVIDIA GPUs over the past 6 years to the development of a deep neural network system. Finally, I will present record-breaking AMBER performance numbers for Maxwell GPUs and GPU clusters.  Back
 
Keywords:
Life & Material Science, Machine Learning & Deep Learning, GTC 2015 - ID S5478
Streaming:
Download:
 
GPU-Optimized Algorithms for Coarse-Grained MD Simulations of Protein-Nanoparticle Biocorona Formation
Samuel Cho (Wake Forest University)
We will describe the GPU-optimized algorithms we developed in order to perform novel coarse-grained MD simulations of 15 apolipoproteins (243 residues each) interacting with a silver nanoparticle, represented by 500 individual beads. The advancement ...Read More
We will describe the GPU-optimized algorithms we developed in order to perform novel coarse-grained MD simulations of 15 apolipoproteins (243 residues each) interacting with a silver nanoparticle, represented by 500 individual beads. The advancement of nanomedicine that can deliver drugs into areas of the cells that were previously inaccessible are becoming realized through nanoparticle development, but they readily interact with biomolecular species that result in biocorona formation that result in nanotoxicity. We will outline the GPU-optimized neighbor list and cell list algorithms, as well as bit-wise shift compression algorithms that decreases the data transfer between GPUs, that were necessary to perform these MD simulations.  Back
 
Keywords:
Life & Material Science, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5480
Streaming:
 
Fast Method to Find Critical Points of the Electron Density in Large Systems
Jorge Garza (Universidad Autonoma Metropolitana)
Learn how to distribute on GPUs the evaluation of some scalar fields involved in quantum chemistry methods. In particular, we analyze the electron density in large-size systems. We found critical points, bond paths and molecular graphs in a fast way ...Read More
Learn how to distribute on GPUs the evaluation of some scalar fields involved in quantum chemistry methods. In particular, we analyze the electron density in large-size systems. We found critical points, bond paths and molecular graphs in a fast way by accelerating all evaluations (density and its derivatives) with GPUs. Additionally, we show how the evaluation of atomic properties, defined within the atoms in molecules approach, has been implemented on GPUs. This presentation is the final stage of one application designed to be executed on GPUs to analyze scalar and vector fields.  Back
 
Keywords:
Life & Material Science, Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5538
Streaming:
Download:
 
GPU DNA Sequencing Base Quality Recalibration
Mauricio Carneiro (Broad Institute of MIT and Harvard), Nuno Subtil (NVIDIA)
Base recalibration is a crucial step in data processing for DNA and RNA sequencing. Established in 2010 by our group in conjunction with the 1000 Genomes project, recalibrating the probability of error for each base in a genome based on counting obse ...Read More
Base recalibration is a crucial step in data processing for DNA and RNA sequencing. Established in 2010 by our group in conjunction with the 1000 Genomes project, recalibrating the probability of error for each base in a genome based on counting observations and re-modeling the empirical error has proven to correctly down estimate the systematic errors made by the sequencing instrument allowing bayesian variant calling algorithms to make the most accurate choice. The task of counting observations in the entire genome is daunting and slow. In this talk we will show how we adapted the algorithm for GPU processing to improve the very long runtimes of this process and how the use of GPUs puts us one step closer to enable fast diagnostics of critical patients in need of a fast answer.  Back
 
Keywords:
Life & Material Science, Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5579
Streaming:
 
Accelerating the Cure: GPU-Driven Drug Discovery for Targets in Cancer
Rommie Amaro (University of California, San Diego)
This session discusses our work under the Compute-the-Cure award. We are dramatically accelerating the drug discovery pipeline for targets in cancer by incorporating the outstanding advances achieved in molecular dynamics (MD) simulations via GPU tec ...Read More
This session discusses our work under the Compute-the-Cure award. We are dramatically accelerating the drug discovery pipeline for targets in cancer by incorporating the outstanding advances achieved in molecular dynamics (MD) simulations via GPU technologies into computer-aided drug design. GPU-based MD can be used to rapidly reveal novel druggable binding sites that are otherwise "hidden" in x-ray crystallographic structures and offer novel opportunities for drug discovery. I will also describe the development of automated GPU-based workflows to facilitate the broad adoption of GPU-based technologies in anti-cancer therapeutic discovery programs, with the hope to accelerate the discovery of new and safer medicines.  Back
 
Keywords:
Life & Material Science, Education & Training, Computational Physics, GTC 2015 - ID S5586
Streaming:
Download:
 
Acceleration of a Molecular Modelling Code for the Analysis and Visualization of Weak Interactions between Molecules
Michael Krajecki (Université de Reims Champagne-Ardenne)
At the interface between Chemistry, HPC and Biochemistry, this research work aims at exploiting the recent analysis method, "NCI" (Non Covalent Interactions), for molecular docking simulations using a new software, AlgoGen, within Drug-Desi ...Read More
At the interface between Chemistry, HPC and Biochemistry, this research work aims at exploiting the recent analysis method, "NCI" (Non Covalent Interactions), for molecular docking simulations using a new software, AlgoGen, within Drug-Design studies. NCI is a breakthrough in the field. The use of a GPU to accelerate this scientific application is very attractive in view of exploiting NCI in molecular docking, which is a very challenging tool in Medicinal Chemistry. A first GPU-accelerated version of the NCI code is proposed here.  Back
 
Keywords:
Life & Material Science, GTC 2015 - ID S5785
Streaming:
Download:
Machine Learning & Deep Learning
Presentation
Media
Faster Convolutional Neural Networks by Separable Filters
Che-Rung Lee (National Tsing Hua University)
Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, and p ...Read More
Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, and performs two 1D convolutions consecutively. The GPU implementation consists of two kernels. First is a batched SVD routine on GPUs that can compute multiple small matrices simultaneously. Second is the computation of convolution, which combines three methods using different memory spaces for various filter size. Experimental results show that the implementations can achieve 1.35x~2.66x speedup in the forward pass and the backward pass comparing to the state of art GPU implementations of CNNs.  Back
 
Keywords:
Machine Learning & Deep Learning, Developer - Performance Optimization, Computer Vision & Machine Vision, GTC 2015 - ID S5147
Streaming:
 
Accelerate Computation in Collaborative Filtering Using a Multi-GPU Platform
Ying Liu (University of Chinese Academy of Sciences)
As the explosive growth of data, data mining has become a significant research domain. Recommendation systems, that automatically push knowledge from massive data collection to the users, is a hot topic. Collaborative filtering (CF) is one of the ess ...Read More
As the explosive growth of data, data mining has become a significant research domain. Recommendation systems, that automatically push knowledge from massive data collection to the users, is a hot topic. Collaborative filtering (CF) is one of the essential algorithms in recommendation system. The goal of this session is to show how to accelerate the computation in CF by using multi-GPU platform. Firstly, we identify the computation kernel, similarity matrix calculation. Then, present a CUDA multi-thread model, where the data elements are processed in a data-parallel fashion. Next, propose a workload partitioning scheme thereby balanced workload can be distributed to different GPUs. Experiments on a real-world dataset demonstrate its performance on a platform with 4 Tesla K10 cards.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, GTC 2015 - ID S5158
Streaming:
Download:
 
Tuning the Performance of Convolutional Neural Network for Image Classification on GPU
Joseph Wang (Alibaba)
Convolutional neural networks (CNNs) have achieved an impressive suite of results on image classification. Industry adoption, for instance by Alibaba, also indicates bright prospects. In this talk we will present several methods to optimize and accel ...Read More
Convolutional neural networks (CNNs) have achieved an impressive suite of results on image classification. Industry adoption, for instance by Alibaba, also indicates bright prospects. In this talk we will present several methods to optimize and accelerate GPU implementation of Convolutional Neural Networks. An optimized implementation is given as an example which has smaller memory footprints and performs 1.4 to 3 times faster than Caffe.  Back
 
Keywords:
Machine Learning & Deep Learning, Developer - Performance Optimization, GTC 2015 - ID S5159
Streaming:
Download:
 
Accelerating Deep Convolution Neural Networks For Large-Scale Speech Tasks Using GPUs
Rajesh Bordawekar (IBM T. J. Watson Research Center)
This presentation describes GPU acceleration of convolution neural networks for speech processing workloads. We compare three alternatives for implementing core computational kernels, hand-coded, using CUBLAS, and using CUDNN. We describe impact of e ...Read More
This presentation describes GPU acceleration of convolution neural networks for speech processing workloads. We compare three alternatives for implementing core computational kernels, hand-coded, using CUBLAS, and using CUDNN. We describe impact of each approach on the algorithmic design and discuss how each approach impacts performance and result accuracy.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5231
Streaming:
Download:
 
Deep Neural Network Training Accelerated by Multi-GPU
Xiaoliang Lu (IFLYTEK CO.,LTD.)
This presentation proposes a novel, multi-GPU, parallel strategy to speed up deep neural network training. The strategy proposed here, RSPS, is a data parallelism method. For RSPS, there is no need of center node, multiple GPUs make up a ring and wor ...Read More
This presentation proposes a novel, multi-GPU, parallel strategy to speed up deep neural network training. The strategy proposed here, RSPS, is a data parallelism method. For RSPS, there is no need of center node, multiple GPUs make up a ring and work asynchronously. Every GPU transmits the model information to the next GPU directly. Theoretical analysis of the speedup and the model latency of RSPS are presented, the speedup extreme is given. The proposed strategy can be extended to multi-GPU even multi-server architecture easily. Experiment results show that the proposed strategy achieves an approximate linear speedup without loss in recognition performance from 3 GPUs to 8 GPUs. The proposed strategy is an efficient and effective GPU parallel strategy for DNN training.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5277
Streaming:
 
Deep Learning at Scale
Ren Wu (Baidu)
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies f ...Read More
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.98% - a relative 10.2% improvement over the previous best result.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, Supercomputing, GTC 2015 - ID S5280
Streaming:
Download:
 
To 3D or not to 3D? Why GPUs Are Critical for 3D Mass Spectrometry Imaging
Eri Rubin (SagivTech Ltd.)
Big data kind of problems emerge in the analysis of biological samples. Advanced acquisition methods that provide 3D mass spectrometry information along with sophisticated learning algorithms call for fast computation methods. GPUs are an enabling te ...Read More
Big data kind of problems emerge in the analysis of biological samples. Advanced acquisition methods that provide 3D mass spectrometry information along with sophisticated learning algorithms call for fast computation methods. GPUs are an enabling technology to allow the analysis of the ever growing mass spectrometry data. Come hear about the machine learning algorithms migrated to the GPU environment, including Probabilistic Latent Semantic Analysis and Hierarchical Clustering Distance Calculation, with acceleration of more than 1-2 orders of magnitude. This work was under the framework of 3D Massomics, a European FP7 funded project that includes partners with expertise in imaging mass spectrometry, analytical chemistry, medicine, statistics, bioinformatics, and parallel computing.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, Life & Material Science, GTC 2015 - ID S5311
Streaming:
Download:
 
CUDA in Urban Search and Rescue: Mission Planing Module for Icarus Project
Pawel Musialik (Institute of Mathematical Machines)
This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Com ...Read More
This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Components for Assisted Rescue and Unmanned Search operations) concentrates on aiding these efforts by providing robotic components for rescue teams. Adding mobile robots into the tray rises the need for additional planning effort, which would consume a lot of time using classical approach. We will present how this can be prevented by using CUDA based mission planners for solving tasks like path planning, patrol communication relay location etc. A number of CUDA-implemented algorithms will be shown along with example results.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5319
Streaming:
Download:
 
GPUs and Machine Learning: A Look at cuDNN
Sharan Chetlur (NVIDIA)
We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Previously, ...Read More
We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Previously, Neural Network framework developers had to implement these low-level routines for GPUs on an ad-hoc basis, optimizing individual computational kernels by hand and repeating this work as new parallel processors emerged. cuDNN alleviates this burden by providing tuned black box implementations of these functions. The library is easy to integrate into existing frameworks, and provides optimized performance and memory usage across GPU generations. We discuss supported functionality, algorithmic implementation details and performance achieved.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, Developer - Tools & Libraries, GTC 2015 - ID S5331
Streaming:
 
Nonlinear Structured Prediction Using the GPU
Alexander Schwing (University of Toronto)
Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed data, e. ...Read More
Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed data, e.g., a single object within an image. We show how to enrich deep learning to jointly predict a set of random variables while leveraging learned variable correlations. To this end we present an efficient GPU driven algorithm based on neural networks that is able to jointly capture nonlinearities for multiple variables and their correlations.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5368
Streaming:
Download:
 
A Reduction of the Elastic Net to Support Vector Machines Leveraging GPU Computing
Jacob Gardner (Washington University in St. Louis)
In past years we have seen many projects that build and maintain highly optimized implementations of Support Vector Machines (SVM) that leverage GPUs. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its po ...Read More
In past years we have seen many projects that build and maintain highly optimized implementations of Support Vector Machines (SVM) that leverage GPUs. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its popularity in many high impact applications, including genetics, neuroscience and systems biology. Rather than crafting a new GPU implementation for the Elastic Net, we introduce a novel reduction from the Elastic Net to the SVM, two seemingly disparate algorithms. This allows us to implement the Elastic Net in a way that spends almost all of its time in an SVM solver. As a result, we can leverage already existing GPU implementations of SVM solvers, and achieve in 11 lines of MATLAB code the fastest Elastic Net by multiple orders of magnitude.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, GTC 2015 - ID S5543
Streaming:
Download:
 
Transparent Parallelization of Neural Network Training
Cyprien Noel (Flickr / Yahoo Inc.), Simon Osindero (Flickr / Yahoo Inc.)
Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing trai ...Read More
Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe's complexity.  Back
 
Keywords:
Machine Learning & Deep Learning, Developer - Performance Optimization, Computer Vision & Machine Vision, GTC 2015 - ID S5552
Streaming:
Download:
 
A High-Density GPU Solution for DNN Training
Franco Mana (NUANCE)
We, at Nuance, use DNN in ASR systems in multiple languages and tasks. We train DNN with large amounts of data. DNNs are trained with gradient descent methods that are slow and difficult to parallelize across machines. In order to speed up the traini ...Read More
We, at Nuance, use DNN in ASR systems in multiple languages and tasks. We train DNN with large amounts of data. DNNs are trained with gradient descent methods that are slow and difficult to parallelize across machines. In order to speed up the training process, we have developed training algorithms/recipes which can be used to train a DNN in parallel on multiple GPU devices. This can significantly reduce the DNN training time. We will present benchmark results that include the basic computational operations included in DNN training (SGEMM, Memory copy throughput, etc.) as well as the end-to-end training time on different GPU based hardware configurations. In particular we will benchmark systems based on K10 versus systems based on K80, with a number of GPU varying from 1 to 16.  Back
 
Keywords:
Machine Learning & Deep Learning, Signal & Audio Processing, Supercomputing, GTC 2015 - ID S5571
Streaming:
Download:
 
Application of GPUs to Classification Problems Using Deep Learning Architectures
Elliot English (MetaMind)
In this talk, we discuss the latest techniques for solving image classification, localization, and detection problems on a multi-GPU architecture. We will cover issues and algorithms associated with training convolutional neural networks, as well as ...Read More
In this talk, we discuss the latest techniques for solving image classification, localization, and detection problems on a multi-GPU architecture. We will cover issues and algorithms associated with training convolutional neural networks, as well as other network architectures, on small clusters of GPUs.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5580
Streaming:
Download:
 
Visual Object Recognition Using Deep Convolutional Neural Networks
Rob Fergus (Facebook)
This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. ...Read More
This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5581
Streaming:
 
Multi-GPU Training for Large-Scale Visual Object Recognition
Wei Xia (Orbeus)
Despite the great progress of the deep learning models (Deep Convolutional Neural Networks) in the field of visual recognition in the past few years, one of the greatest bottlenecks lies in the extremely long training hours (from several weeks to mon ...Read More
Despite the great progress of the deep learning models (Deep Convolutional Neural Networks) in the field of visual recognition in the past few years, one of the greatest bottlenecks lies in the extremely long training hours (from several weeks to months) to handle tens of millions of training images. The goal of this session is to share the results that we achieved when we used multiple-GPUs installed in one server to speed-up the training process. By configuring 16 GPUs (8 Titan Zs) and optimizing the parallel implementation for the CNN training, up to 14x speed increase is achieved without compromising, and even sometimes boosting, the model's accuracy. Comprehensive experimental results have demonstrated the linear scalability of the proposed multi-GPU training processes.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID S5585
Streaming:
Download:
 
Deep Neural Networks for Visual Pattern Recognition Problems
Dan Ciresan (IDSIA)
GPU-optimized Deep Neural Networks (DNNs) excel on visual pattern recognition tasks. They are successfully used for automotive problems like pedestrian and traffic sign detection. DNNs are fast and extremely accurate. They help the field of connectom ...Read More
GPU-optimized Deep Neural Networks (DNNs) excel on visual pattern recognition tasks. They are successfully used for automotive problems like pedestrian and traffic sign detection. DNNs are fast and extremely accurate. They help the field of connectomics by making it possible to segment and reconstruct the neuronal connections in large sections of brain tissue for the first time. This will bring a new understanding of how biological brains work. DNNs power automatic navigation of a quadcopter in the forest.  Back
 
Keywords:
Machine Learning & Deep Learning, Medical Imaging, GTC 2015 - ID S5590
Streaming:
 
Extending the Limits of Machine Learning with GPUs
John Canny (UC Berkeley)
BIDMach is a rich, extensible machine learning toolkit that fully exploits GPU acceleration. On a single machine with NVIDIA GPU, it holds records for most common ML tasks, outperforming cluster systems. This tutorial will overview BIDMach, from its ...Read More
BIDMach is a rich, extensible machine learning toolkit that fully exploits GPU acceleration. On a single machine with NVIDIA GPU, it holds records for most common ML tasks, outperforming cluster systems. This tutorial will overview BIDMach, from its matrix layer, through to defining new learning algorithms. The tutorial is interactive, and we will provide an EC2 image for participants to follow along. Specifically we will cover:a hardware-agnostic matrix library (BIDMat), in-memory learning, scaling up to terabyte sources, parameter tuning, custom learners, creating new models, and interactive machine learning.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, Visualization - In-Situ & Scientific, GTC 2015 - ID S5621
Streaming:
Download:
 
Reconstruction Networks for Efficient Face Detection and Landmark Localization
Bo Yu (Carnegie Mellon University), Ian Lane (Carnegie Mellon University)
In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within an ima ...Read More
In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within an image without explicitly performing image segmentation or generating key point descriptors. We show that Reconstruction Networks can learn the structure of face and facial landmarks automatically, even under various poses and illumination conditions and outperform state-of-the-art performance for Face Detection and Facial Landmark Localization while requiring only a fraction of the computational cost.  Back
 
Keywords:
Machine Learning & Deep Learning, Automotive, Computer Vision & Machine Vision, GTC 2015 - ID S5629
Streaming:
 
Deep Learning Made Easy with GraphLab
Piotr Teterwak (Dato)
Deep Learning is a promising machine learning technique with a high barrier to entry. In this talk, we provide an easy entry into this field via "deep features" from pre-trained models. These features can be trained on one data set for one ...Read More
Deep Learning is a promising machine learning technique with a high barrier to entry. In this talk, we provide an easy entry into this field via "deep features" from pre-trained models. These features can be trained on one data set for one task and used to obtain good predictions on a different task, on a different data set. No prior experience necessary. Real time demos will be given using GraphLab Create, a popular open source based software. GraphLab Create utilizes NVIDA GPUs for significant performance speedup.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, Video & Image Processing, GTC 2015 - ID S5630
Streaming:
Download:
 
Speech: The Next Generation
Bryan Catanzaro (Baidu)
Speech is the user interface of the future, but today's implementations often fail when we need them the most, such as in noisy environments or when the microphone isn't close at hand. At Baidu, an increasing fraction of our users employ speech int ...Read More
Speech is the user interface of the future, but today's implementations often fail when we need them the most, such as in noisy environments or when the microphone isn't close at hand. At Baidu, an increasing fraction of our users employ speech interfaces to find what they are looking for. In this talk, I will show how next generation deep learning models can provide state-of-the-art speech recognition performance. We train these models using clusters of GPUs using CUDA, MPI and Infiniband.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5631
Streaming:
 
Distributed Optimization of CNNs and RNNs
William Chan (Carnegie Mellon University)
Deep Learning methods including Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have demonstrated powerful acoustic modeling capabilities for Automatic Speech Recognition (ASR). However, these me ...Read More
Deep Learning methods including Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have demonstrated powerful acoustic modeling capabilities for Automatic Speech Recognition (ASR). However, these methods often need large volumes of training data and consequently long training times. In this GTC talk, we will describe our distributed asynchronous training platform for training CNNs and RNNs across an array of GPUs.  Back
 
Keywords:
Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5632
Streaming:
Download:
 
Featured Talk: Recent Advances in GPU-Accelerated Speech and Language Processing
Ian Lane (Carnegie Mellon University)
Recent advances in Deep Learning have resulting in significant improvements in speech recognition, natural language processing and related tasks. In this talk, I will give an overview of the state-of-the-art in Deep Learning for Speech and Language P ...Read More
Recent advances in Deep Learning have resulting in significant improvements in speech recognition, natural language processing and related tasks. In this talk, I will give an overview of the state-of-the-art in Deep Learning for Speech and Language Processing and present recent work at CMU on GPU-Accelerated methods for Real-Time Speech and Language Processing, joint optimization for Spoken Language Understanding, and continuous On-line Learning methods.  Back
 
Keywords:
Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5634
Streaming:
 
GPU-Accelerated Large Vocabulary Continuous Speech Recognition (LVCSR) for Scalable Distributed Speech Recognition
Jungsuk Kim (Carnegie Mellon University)
In previous work, we developed GPU-Accelerated Speech Recognition Engine optimized for faster than real time speech recognition on heterogeneous CPU-GPU architecture. In this work, we extended this work to focus on developing a scalable server-client ...Read More
In previous work, we developed GPU-Accelerated Speech Recognition Engine optimized for faster than real time speech recognition on heterogeneous CPU-GPU architecture. In this work, we extended this work to focus on developing a scalable server-client speech recognition solution specifically optimized for simultaneous decoding of multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, we applied "Producer-Consumer" multi-threaded model. In this model, a single producer thread accepts work items and passes these to the consumer threads via work queue. We divide entire speech recognition process into three consumer classes. These consumer classes are pipelined and connected via task queues to achieve maximum hardware utilization.  Back
 
Keywords:
Machine Learning & Deep Learning, Signal & Audio Processing, GTC 2015 - ID S5635
Streaming:
Download:
 
RGBD Occlusion Detection via Deep Convolutional Neural Networks
Vivek Venugopalan (United Technologies Research Center)
Occlusion edge detection is a very challenging task in robotics and unmanned autonomous vehicles. The occlusion edges in images correspond to range discontinuity in the scene from the point of view of the observer and extracting these edges from raw ...Read More
Occlusion edge detection is a very challenging task in robotics and unmanned autonomous vehicles. The occlusion edges in images correspond to range discontinuity in the scene from the point of view of the observer and extracting these edges from raw image and videos are very computationally intensive. Deep Learning techniques have largely replaced existing methods for extracting information in similar applications by mapping the problem to large multi-layer neural networks. These techniques rely on utilizing Deep Convolutional Neural Networks (DCNNs) with multiple hidden layers for capturing the local spatial correlations, that help in identifying occlusion edges in images and videos.  Back
 
Keywords:
Machine Learning & Deep Learning, Embedded Systems, Computer Vision & Machine Vision, GTC 2015 - ID S5646
Streaming:
Download:
 
GPUs and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
Olga Russakovsky (Computer Science, Stanford University), Alex Berg (UNC Chapel Hill)
This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussion of ...Read More
This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussion of large-scale image recognition, a history of the ILSVRC and an overview of current techniques and trends in image classification and object detection, as well as the role that GPUs have played in this challenge.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5665
Streaming:
 
Collaborative Feature Learning from Social Media
Hailin Jin (Adobe)
Image feature representation plays an essential role in image recognition. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits its ...Read More
Image feature representation plays an essential role in image recognition. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits its applicability to domains where labels are hard to obtain. I will present a new data-driven feature learning paradigm which does not rely on category labels. Instead, we learn from user behavior data collected on social media. We use the image relationship discovered in the latent space from the user behavior data to guide the image feature learning. Also presented is a new large-scale image and user behavior dataset collected on Behance.net. The dataset consists of 1.9 million images and over 300 million view records from 1.9 million users.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5713
Streaming:
Download:
 
Opening Keynote
Jen-Hsun Huang (NVIDIA)
Don't miss GTC's opening keynote address from NVIDIA CEO and co-founder Jen-Hsun Huang. He'll discuss the latest breakthroughs in visual computing, including how NVIDIA is fueling the revolution in deep learning. ...Read More
Don't miss GTC's opening keynote address from NVIDIA CEO and co-founder Jen-Hsun Huang. He'll discuss the latest breakthroughs in visual computing, including how NVIDIA is fueling the revolution in deep learning.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5715
Streaming:
Download:
 
DeepFont: Large-Scale Real-World Font Recognition from Images
Jianchao Yang (Adobe)
This works addresses the problem of recognizing font style of the text from an image. Our algorithm is based on a carefully designed deep convolutional neural network. Since collecting real-world training text images for font recognition is extremely ...Read More
This works addresses the problem of recognizing font style of the text from an image. Our algorithm is based on a carefully designed deep convolutional neural network. Since collecting real-world training text images for font recognition is extremely difficult, we have to resort to synthetic training data, which unfortunately has a large domain mismatch from the real-world test examples. Besides data augmentation techniques of adding synthetic degradations, we also present a domain adaptation framework to bring the gap between synthetic training and real-world testing. In particular, we introduce a convolutional neural network decomposition approach to obtain effective features for classification, which is done based on stacked convolutional auto encoders. Millions of images are used in the model, which could not have been trained without the GPU and CUDA. The proposed DeepFont system achieves top-5 accuracy of over 80% on a large labeled real-world test set we collected.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5720
Streaming:
Download:
 
Clarifai: Scaling Deep Learning
Matthew Zeiler (Clarifai)
The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ago. ...Read More
The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ago. Although the fundamental techniques were developed in the 1980s and 1990s, it was only recently that they were applied at large scale, due to the advent of general-purpose GPU computing and the availability of internet-scale datasets. The deep learning experts at Clarifai have spent years working alongside pioneers of the field and form a team who has vast experience developing new deep learning techniques and building state of the art systems that solve real problems. In this talk we will present some of the latest technologies we have developed and show how they can be applied to power a new generation of intelligent applications.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, Developer - Tools & Libraries, GTC 2015 - ID S5740
Streaming:
 
Real-Time, Content-Driven Representations at Twitter
Clement Farabet (Twitter)
Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ...Read More
Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ever-changing (on a weekly, daily, or sometimes hourly basis), and the volume ever-growing; making it very challenging to automatically and continuously expose relevant content. Manually defining features to represent this data is showing its limits. In this talk, I provide an overview of how automated, content-driven representationsenabled by modern deep-learning algorithmsenables us to build adaptive systems which capture the richness of this content. Specifically, the presentation focuses on deep representations for images and images+text.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5760
Streaming:
 
Recent Advances in Deep Learning at Microsoft: A Selected Overview
Li Deng (Microsoft)
Since 2009, Microsoft has engaged with academic pioneers of deep learning and has created industry-scale successes in speech recognition as well as in speech translation, object recognition, automatic image captioning, natural language, multimodal pr ...Read More
Since 2009, Microsoft has engaged with academic pioneers of deep learning and has created industry-scale successes in speech recognition as well as in speech translation, object recognition, automatic image captioning, natural language, multimodal processing, semantic modeling, web search, contextual entity search, ad selection, and big data analytics. Much of these successes are attributed to the availability of big datasets for training deep models, the powerful general-purpose GPU computing, and the innovations in deep learning architectures and algorithms. In this talk, a selected overview will be given to highlight our work in some of these exciting applications, as well as the lessons we have learned along the way as to what tasks are best solved by deep learning methods.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5788
Streaming:
 
Deep Convolutional Neural Network for Computer Vision Products
Li Xu (Sensetime Group Limited)
We have witnessed many ground-breaking results in computer vision research using deep learning techniques. In this talk, we introduce recent achievements in our group (http://sensetime.com/) which we believe will bridge the gap between research and p ...Read More
We have witnessed many ground-breaking results in computer vision research using deep learning techniques. In this talk, we introduce recent achievements in our group (http://sensetime.com/) which we believe will bridge the gap between research and product development and will bring about many computer-vision-enabled smart products. We show that our unified deep CNN framework, accelerated using modern GPU architecture, can be easily applied to various vision tasks including image processing, pedestrian detection, object localization and face recognition, meanwhile achieving state-of-the-art performance.  Back
 
Keywords:
Machine Learning & Deep Learning, Media & Entertainment, Video & Image Processing, GTC 2015 - ID S5800
Streaming:
Download:
 
Simplified Machine Learning Programming for CUDA (Presented by ArrayFire)
Umar Arshad (ArrayFire)
Looking for a simplified way to program machine learning algorithms? This tutorial will give you hands on experience implementing Deep Belief Networks using ArrayFire and other CUDA tools. Learn the best practices for implementing parallel versions o ...Read More
Looking for a simplified way to program machine learning algorithms? This tutorial will give you hands on experience implementing Deep Belief Networks using ArrayFire and other CUDA tools. Learn the best practices for implementing parallel versions of popular algorithms on GPUs. Instead of reinventing the wheel, you will learn where to find and how to use excellent versions of these algorithms already available in CUDA and ArrayFire libraries. You will walk away equipped with the best tools and knowledge for implementing accelerated machine learning algorithms.  Back
 
Keywords:
Machine Learning & Deep Learning, Developer - Algorithms, Developer - Tools & Libraries, GTC 2015 - ID S5803
Streaming:
Download:
 
Create Deep Intelligence in the Internet of Things (IoT) (Presented by Preferred Networks)
Nobuyuki Ota (Preferred Networks, Inc)
Preferred Networks, Inc (PFN) specialized in distributed machine learning technology, with a focus on Deep Learning, for the Internet of Things (IoT). In this session, we will first introduce PFN's goal - the realization of Distributed Deep Intellig ...Read More
Preferred Networks, Inc (PFN) specialized in distributed machine learning technology, with a focus on Deep Learning, for the Internet of Things (IoT). In this session, we will first introduce PFN's goal - the realization of Distributed Deep Intelligence using GPU technology synergistic implementation and integration of Deep Learning intelligence throughout the IoT networks. We will show our current Deep Learning projects for IoT, including Surveillance camera, Retail solution, Automobile, and Bio/Healthcare. In particular, we will discuss the development of distributed Deep Neural Networks for drug discovery using the entire PubChem database via GPU technologies.  Back
 
Keywords:
Machine Learning & Deep Learning, Emerging Companies Summit, Life & Material Science, GTC 2015 - ID S5813
Streaming:
Download:
 
Large-Scale Deep Learning For Building Intelligent Computer Systems
Jeff Dean (Google)
Over the past few years, we have built large-scale computer systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. We have made significant improve ...Read More
Over the past few years, we have built large-scale computer systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. We have made significant improvements in the state-of-the-art in many of these areas, and our software systems and algorithms have been used by dozens of different groups at Google to train state-of-the-art models for speech recognition, image recognition, various visual detection tasks, language modeling, language translation, and many other tasks. In this talk, I'll highlight some of the distributed systems and algorithms that we use in order to train large models quickly. I'll then discuss ways in which we have applied this work to a variety of problems in Google's products, usually in close collaboration with other teams. This talk describes joint work with many people at Google.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5817
Streaming:
Download:
 
Deep Learning: What's Next
Andrew Ng (Baidu)
Deep Learning has transformed many important tasks, including speech and image recognition. Deep Learning systems scale well by absorbing huge amounts of data to create accurate models. The computational resources afforded by GPUs have been instrumen ...Read More
Deep Learning has transformed many important tasks, including speech and image recognition. Deep Learning systems scale well by absorbing huge amounts of data to create accurate models. The computational resources afforded by GPUs have been instrumental to this scaling. However, as Deep Learning has become more mainstream, it has generated some hype, and has been linked to everything from world peace to evil killer robots. In this talk, Dr. Ng will help separate hype from reality, and discuss potential ways that Deep Learning technologies can benefit society in the short and long term.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5818
Streaming:
Download:
 
Optimized GPU Kernels for Deep Learning
Amir Khosrowshahi (Nervana Systems)
Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in significant ...Read More
Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in significant performance improvements over existing methods. In particular, we show how operations such as convolutions and dense matrix multiply can be efficiently implemented using a custom assembler to attain state-of-the-art performance on the NVIDIA Maxwell GPU architecture. Additionally, we can significantly reduce memory bandwidth and run much larger models by using limited precision with a minimal tradeoff in model accuracy.  Back
 
Keywords:
Machine Learning & Deep Learning, Developer - Performance Optimization, Computer Vision & Machine Vision, GTC 2015 - ID S5873
Streaming:
Download:
 
Featured Talk: Getting Started with DIGITS: Deep GPU Training System
Allison Gray (NVIDIA)
DIGITS provides a user-friendly interface for training and classification that can be used to train DNNs with a few clicks. DIGITS gives users easy access to existing databases and previously trained network models, as well as training activities in ...Read More
DIGITS provides a user-friendly interface for training and classification that can be used to train DNNs with a few clicks. DIGITS gives users easy access to existing databases and previously trained network models, as well as training activities in progress. Modifying your network configuration to maximize accuracy is easily accomplished with this platform too. The network configuration process is intuitive, making it easy for experienced DL experts to use and researchers just getting started. The main console helps users keep track of their changes. This tool runs as a web application making it easy to share results and collaborate. The workflow for using DIGITS will be presented and discussed in this session.  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2015 - ID S5924
Streaming:
Manufacturing
Presentation
Media
Performance Gains Achieved Through Modern OpenGL in the Siemens' DirectModel Rendering Engine
Jeremy Bennett (Siemens PLM Software), Michael Carter (Siemens PLM Software)
Advances in GPU Technology have opened the door for significant performance gains for applications willing to use the modern OpenGL APIs. This talk will provide details of how the Direct Model Scene Graph and Rendering Engine has adapted its renderin ...Read More
Advances in GPU Technology have opened the door for significant performance gains for applications willing to use the modern OpenGL APIs. This talk will provide details of how the Direct Model Scene Graph and Rendering Engine has adapted its rendering architecture to handle not only today's, but tomorrow's advances, and how the use of these technologies have significantly increased rendering performance.  Back
 
Keywords:
Manufacturing, Developer - Performance Optimization, Rendering & Ray Tracing, Real-Time Graphics, GTC 2015 - ID S5387
Streaming:
Download:
 
GPUs to Mars: Full-Scale Simulation of SpaceX's Mars Rocket Engine
Stephen Jones (SpaceX), Adam Lichtl (SpaceX)
SpaceX is designing a new, methane-fueled engine powerful enough to lift the equipment and personnel needed to colonize Mars. A vital aspect of this effort involves the creation of a multi-physics code to accurately model a running rocket engine. The ...Read More
SpaceX is designing a new, methane-fueled engine powerful enough to lift the equipment and personnel needed to colonize Mars. A vital aspect of this effort involves the creation of a multi-physics code to accurately model a running rocket engine. The scale and complexity of turbulent non-premixed combustion has so far made it impractical to simulate, even on today's largest supercomputers. We present a novel approach using wavelets on GPUs, capable of capturing physics down to the finest turbulent scales.  Back
 
Keywords:
Manufacturing, Developer - Algorithms, Computational Physics, Supercomputing, GTC 2015 - ID S5398
Streaming:
Download:
 
Simulation-Based CGI for Automotive Applications
Benoit Deschamps (PSA Peugeot Citroën)
To reduce the gap between a physical mock-up and a virtual mock-up, a combination of real-time rendering and simulation enable better decision making. Leveraging NVIDIA Optix to develop specific automotive tools, we are able to run simulations and vi ...Read More
To reduce the gap between a physical mock-up and a virtual mock-up, a combination of real-time rendering and simulation enable better decision making. Leveraging NVIDIA Optix to develop specific automotive tools, we are able to run simulations and visualize solutions to a wide range of problems, such as what is the best vehicle geometry to minimize gravel impact on the door. In addition, tools such as RTT DeltaGen enable photo real results that help us experiment and visualize changing vehicle designs; for example when changing the slope of the windshield, how are elements inside the car affected due to the reflective properties of glass.  Back
 
Keywords:
Manufacturing, Automotive, Rendering & Ray Tracing, GTC 2015 - ID S5628
Streaming:
Download:
 
GPU-Accelerated Finite Element Analysis and Design Optimization on the Cloud
Krishnan Suresh (University of Wisconsin, Madison)
The audience will learn how GPUs can accelerate cloud-based finite element analysis and design optimization. The computational challenges underlying such tasks will be discussed, followed by their solution through fast GPU linear solvers. A case-stud ...Read More
The audience will learn how GPUs can accelerate cloud-based finite element analysis and design optimization. The computational challenges underlying such tasks will be discussed, followed by their solution through fast GPU linear solvers. A case-study involving integration of massively parallel GPU computing with modern browser technology will demonstrate and identify new frontiers in engineering.  Back
 
Keywords:
Manufacturing, Data Center, Cloud Computing & HPC, Product Design & Styling, Computational Physics, GTC 2015 - ID S5330
Streaming:
 
Revolutionize Your Modeling and Design Workflow with CATIA Live Rendering, Iray and NVIDIA VCA
Pierre Maheut (Dassault Systèmes), Xavier Melkonian (Dassault Systèmes)
Using a concrete example with an actual CAD model running in CATIA, CATIA Live Rendering break down the frontier between industrial modeling and realistic rendering for design. Powered by Iray and coupled with NVIDIA VCA, it ensures real-time photo r ...Read More
Using a concrete example with an actual CAD model running in CATIA, CATIA Live Rendering break down the frontier between industrial modeling and realistic rendering for design. Powered by Iray and coupled with NVIDIA VCA, it ensures real-time photo realistic rendering and unprecedented speed batching for all of your marketing assets. Follow-up a live actual creation workflow from ideation to marketing assets using the 3DEXPERIENCE platform.  Back
 
Keywords:
Manufacturing, Product Design & Styling, Visualization - Large Scale & Multi-Display, Rendering & Ray Tracing, GTC 2015 - ID S5541
Streaming:
Download:
 
Accelerating Mountain Bike Development with Optimized Design Visualization
Geoff Casey (Santa Cruz Bicycles)
Santa Cruz Bicycles is an industry leading manufacturer of high-end, high-performance mountain bikes. Join Product Design Manager, Geoff Casey as he demonstrates his team's approach to creating bikes that are at the forefront of engineering. With co ...Read More
Santa Cruz Bicycles is an industry leading manufacturer of high-end, high-performance mountain bikes. Join Product Design Manager, Geoff Casey as he demonstrates his team's approach to creating bikes that are at the forefront of engineering. With color and graphic design such a critical aspect of bike design, the company leverages visual computing tools to gain an advantage in a highly competitive industry. Harnessing the power of the GPU in conjunction with Bunkspeed's 3D visualization software, Santa Cruz's design team rapidly realizes their vision in real time, making on the fly design decisions that cut both time and cost out of the product development lifecycle.  Back
 
Keywords:
Manufacturing, Product Design & Styling, Rendering & Ray Tracing, GTC 2015 - ID S5659
Streaming:
Download:
 
WebGL Visualization Tools and GPUs for Marketing of Robotics and Automation Products
Steve Rueckhaus (Yaskawa America, Inc. Motoman Robotics Division)
Yaskawa Motoman successfully improved the speed and quality of rendering processes to promote its latest robotic and automation solutions by leveraging the strengths of WebGL visualization applications (CL3VER) and NVIDIA's Quadro GPU technology. Ga ...Read More
Yaskawa Motoman successfully improved the speed and quality of rendering processes to promote its latest robotic and automation solutions by leveraging the strengths of WebGL visualization applications (CL3VER) and NVIDIA's Quadro GPU technology. Gain insight into how Yaskawa's Sales & Marketing Group provides interactive 3D marketing experiences to enhance the promotion of next generation robotic solutions.  Back
 
Keywords:
Manufacturing, Product Design & Styling, GTC 2015 - ID S5673
Streaming:
Download:
 
NVIDIA GRID at PSA Peugeot Citroen: The Year in Review
Alain Gonzalez (PSA PEUGEOT CITROËN)
After a year of 500 users working with NVIDIA GRID in a virtualized CAD environment at PSA Peugeot Citroen, we will present the who, what, where, why, and how the PSA IT department enables CAD workstations end users to work almost anywhere. Learn how ...Read More
After a year of 500 users working with NVIDIA GRID in a virtualized CAD environment at PSA Peugeot Citroen, we will present the who, what, where, why, and how the PSA IT department enables CAD workstations end users to work almost anywhere. Learn how virtualization helps us to handle our business challenges and the benefits and improvements virtualization brought to our business.  Back
 
Keywords:
Manufacturing, Graphics Virtualization, Product Design & Styling, Automotive, GTC 2015 - ID S5625
Streaming:
Media & Entertainment
Presentation
Media
Impressions: The Global Impact of Culture, Imagery and Visual Communication
Don Levy (Smith Brook Farm)
We are what we see. The question is how does what we see influence our lives and the lives of future generations? We live in a visual world. This has brought us closer together and enabled people everywhere to share everything from the latest pop cul ...Read More
We are what we see. The question is how does what we see influence our lives and the lives of future generations? We live in a visual world. This has brought us closer together and enabled people everywhere to share everything from the latest pop culture phenomenon to the most catastrophic news. Infographics and animation explain every subject. From an early age, he's appreciated the power of images to move people. Today, the line between fact and fiction is virtually gone. Many of the images that impressed me in my most formative years were of dreams and hope and aspiration. Others made me think. With a curiosity born of my Hollywood experience in the dream factory and thinking back on how the pictures of my own youth continue to influence, I'll share with you some thoughts and ideas  Back
 
Keywords:
Media & Entertainment, Augmented Reality & Virtual Reality, Developer - Performance Optimization, Computer Vision & Machine Vision, GTC 2015 - ID S5118
Streaming:
Download:
 
GPU-Accelerated Undecimated Wavelet Transform for Film and Video Denoising
Hermann Fuerntratt (Joanneum Research)
The Undecimated Wavelet transform (UWT) is a valuable tool for all kinds of image and video enhancement tasks such as denoising, deconvolution and superresolution. Due to its translation invariance, it provides superior results when compared with the ...Read More
The Undecimated Wavelet transform (UWT) is a valuable tool for all kinds of image and video enhancement tasks such as denoising, deconvolution and superresolution. Due to its translation invariance, it provides superior results when compared with the classical discrete wavelet transform, but at the cost of a significantly higher computational complexity. In this session, we will present an highly-efficient GPU implementation of the UWT for 16-bit or 32-bit floating point images, based on modern GPU implementation strategies like register blocking and the computation of multiple outputs per thread. Furthermore, we will show how the UWT is used within a novel film and video denoising algorithm which is able to deal with very different kinds of noise like film grain and digital sensor noise.  Back
 
Keywords:
Media & Entertainment, Video & Image Processing, GTC 2015 - ID S5152
Streaming:
Download:
 
So You Want to Create the Holodeck? A Closer Look at OTOY's Lightfield Technology
Jules Urbach (OTOY Inc.)
Attendees will learn about OTOY's light field rendering technology which allows for immersive experiences on mobile HMDs and next gen displays. OTOY is actively developing a groundbreaking light field rendering pipeline, including the world's first ...Read More
Attendees will learn about OTOY's light field rendering technology which allows for immersive experiences on mobile HMDs and next gen displays. OTOY is actively developing a groundbreaking light field rendering pipeline, including the world's first portable 360 LightStage capture system and a cloud-based graphics platform for creating and streaming light field media for virtual reality and emerging holographic displays. OTOY's breakthroughs in compression and rendering on NVIDIA GPUs have dramatically reduced the barriers for light field video streaming, making it a viable media format that gives content creators everywhere a simple, cost-effective way to bring high quality, interactive 3D content to multiple platforms for the world to enjoy.  Back
 
Keywords:
Media & Entertainment, Augmented Reality & Virtual Reality, Rendering & Ray Tracing, Real-Time Graphics, GTC 2015 - ID S5168
Streaming:
 
Real-Time Camera Tracking in the "1st & 10" System
Louis Gentry (Sportvision), Rand Pendleton (Sportvision)
Sportvision's "1st & 10" real-time system for displaying graphics during American football games has traditionally relied on hardware to calibrate and compute camera parameters necessary for inserting the "yellow line" and ot ...Read More
Sportvision's "1st & 10" real-time system for displaying graphics during American football games has traditionally relied on hardware to calibrate and compute camera parameters necessary for inserting the "yellow line" and other effects into the scene. The hardware solution is limited to lock-down, broadcast cameras only. The vast compute power available in GPUs today provided a means for expanding the system to support both lock-down and mobile cameras without the need for hardware sensors. In this presentation, we will discuss how the optical camera tracking system works and its use on live NFL broadcasts.  Back
 
Keywords:
Media & Entertainment, Augmented Reality & Virtual Reality, Real-Time Graphics, Video & Image Processing, GTC 2015 - ID S5187
Streaming:
Download:
 
FurryBall RT: New OptiX Core and 30x Speed Up
Jan Tománek (AAA studio)
Jan will present a completely rewritten FurryBall, a real-time, production-quality, GPU-accelerated render, using CUDA and OptiX. Now called FurryBall RT, performance and viewport interactivity has improved 10-30X times compared to the earlier DX-bas ...Read More
Jan will present a completely rewritten FurryBall, a real-time, production-quality, GPU-accelerated render, using CUDA and OptiX. Now called FurryBall RT, performance and viewport interactivity has improved 10-30X times compared to the earlier DX-based version. FurryBall's power was proven in rendering a complete, full-length animated 3D stereo movie for cinemas on NVIDIA GPUs.  Back
 
Keywords:
Media & Entertainment, Rendering & Ray Tracing, GTC 2015 - ID S5188
Streaming:
Download:
 
Get into VR with 360 Video
Nicolas Burtey (VideoStitch)
Both Facebook and Hollywood view VR as a new medium, not only for computer- generated images but also for video. VideoStitch has developed 360-degree video stitching software that combines multiple HD video streams in real time using CUDA and NVIDIA ...Read More
Both Facebook and Hollywood view VR as a new medium, not only for computer- generated images but also for video. VideoStitch has developed 360-degree video stitching software that combines multiple HD video streams in real time using CUDA and NVIDIA GPUs. Camera manufacturers, the defense industry and movie production companies are among initial customers. This talk gives an overview of the state of art for creating 360 degree video including the challenges making multi-sensor cameras and combining 6-12 HD video streams for up to 8K video in real time with multiple GPUs.  Back
 
Keywords:
Media & Entertainment, Augmented Reality & Virtual Reality, Video & Image Processing, GTC 2015 - ID S5261
Streaming:
Download:
 
GPU Accelerated Video Frame Search on Video Streams
Halil Enver Soylu (Erlab Software)
In this session, attendees will learn how Erlab uses GPU processing for real-time analysis of broadcast video for an image search and automatic ad insertion system for catch-up TV. In existing conventional catch-up TV systems, operators watch tens of ...Read More
In this session, attendees will learn how Erlab uses GPU processing for real-time analysis of broadcast video for an image search and automatic ad insertion system for catch-up TV. In existing conventional catch-up TV systems, operators watch tens of channels to flag the first and last frames of programs to extract the program from the streams. It's slow and costly operation. Our GPU-accelerated video frame catcher application extracts program contents from real-time streams automatically. The application compares program feeds against reference frames from the beginning and ending credits of each program and uses matches to signal the start and end of each program. The solution brings new analysis opportunities for advertising business as well. It can be implemented on the purpose of ad  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics, Video & Image Processing, GTC 2015 - ID S5274
Streaming:
 
True CMYK and N-Channel Rendering on GPU
Nathan Carr (Adobe Systems)
In this talk, we present our orchestration of the OpenGL pipeline in Adobe Illustrator for rendering vector graphics in true CMYK color space and the more general N-Channel color space. We use multiple color attachments on frame buffers to accommodat ...Read More
In this talk, we present our orchestration of the OpenGL pipeline in Adobe Illustrator for rendering vector graphics in true CMYK color space and the more general N-Channel color space. We use multiple color attachments on frame buffers to accommodate 4+ color channels. To utilize the blend hardware, the alpha channel requires special treatment and replication in all output attachments of the frame buffer. Based upon our experience, we recommend few tweaks to the OpenGL pipeline to speedup 4+ color channel support.  Back
 
Keywords:
Media & Entertainment, GTC 2015 - ID S5285
Streaming:
 
Practical Real-Time Video Rendering with Modern OpenGL and GStreamer
Heinrich Fink (ToolsOnAir Broadcast Engineering GmbH)
Learn about using OpenGL and GStreamer for advanced video rendering on the GPU. We will present two R&D projects at ToolsOnAir: (1) "gl-frame-bender", our open-source OpenGL benchmarking tool that we use to investigate advanced OpenGL m ...Read More
Learn about using OpenGL and GStreamer for advanced video rendering on the GPU. We will present two R&D projects at ToolsOnAir: (1) "gl-frame-bender", our open-source OpenGL benchmarking tool that we use to investigate advanced OpenGL methods for video rendering and (2) how we use and extend GStreamer to implement a live video mixing engine that is completely processed by graphics hardware. We will show practical examples of modern OpenGL techniques that we found to be most effective when rendering video. We will talk about our contribution to GStreamer's support for hardware codecs and OpenGL, and how it helps us to implement a flexible high-performance video mixing pipeline.  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics, Video & Image Processing, GTC 2015 - ID S5314
Streaming:
Download:
 
GPU Powered VDI Regenerates the Creative Capability of Dr. Who VFX Studio
Barry Daniels (Exponential-e)
Discover how to set creativity free from infrastructure and location restrictions, with applications such as Maya and NUKE running on virtual machines in the cloud, powered by NVIDIA GRID technology. The live demonstration will reveal the power of l ...Read More
Discover how to set creativity free from infrastructure and location restrictions, with applications such as Maya and NUKE running on virtual machines in the cloud, powered by NVIDIA GRID technology. The live demonstration will reveal the power of low-latency network over public internet, with access to files anywhere in the world. We'll seamlessly access power-hungry graphics files stored in a London VFX studio, demonstrating a solution that has the flexibility and performance of a local desktop. At the end of the session, you will understand the value of no longer being restricted to desk-bound workstations and confident that security and privacy of your creative files will remain in your hands.  Back
 
Keywords:
Media & Entertainment, Graphics Virtualization, GTC 2015 - ID S5339
Streaming:
Download:
 
Shadertoy: From the Web to Virtual Reality
Pol Jeremias (Beautypi), Inigo Quilez (Beautypi)
What is beyond the web? In this talk the Shadertoy.com creators will cover how Shadertoy has changed and evolved over the years. The website started as a small community to create and share procedural shaders, growing to host more than 4000 creations ...Read More
What is beyond the web? In this talk the Shadertoy.com creators will cover how Shadertoy has changed and evolved over the years. The website started as a small community to create and share procedural shaders, growing to host more than 4000 creations, evolving overtime to a playground to create sound from the GPU, and finally transition to virtual reality. Join the Shadertoy team for 25 minutes of live-coding, gpu generated music, procedural visuals, virtual reality and Shadertoy, lots of Shadertoy.  Back
 
Keywords:
Media & Entertainment, Augmented Reality & Virtual Reality, Real-Time Graphics, GTC 2015 - ID S5347
Streaming:
 
CTB Directional Gradient Detection Using 2D-DWT for Intra-Frame Prediction in HEVC
Maria Pantoja (Santa Clara University), Damian Ruiz Coll (Universidad Politecnica Valencia)
HEVC has 35 different intra prediction modes. The purpose of the project is to detect the dominant edge of the Prediction Blocks (PB). HEVC needs two arrays of neighbouring (up and left of the block) pixels of each available PB size to compute the pr ...Read More
HEVC has 35 different intra prediction modes. The purpose of the project is to detect the dominant edge of the Prediction Blocks (PB). HEVC needs two arrays of neighbouring (up and left of the block) pixels of each available PB size to compute the predictor. These inter-PBs dependencies forces the HEVC reference software implementation of the search engine of the optimal directional prediction to be sequential. We propose a parallel algorithm that estimates the directional modes for each Prediction Unit (PU), that has higher probability of being optimal by using wavelets reducing the pool of possible directional modes to just 3 to 5.The 2D-DWT will be applied only at the CTB (64x64) level and we will test different edges extensions (zero, mirror, etc) and wavelet filters.  Back
 
Keywords:
Media & Entertainment, Video & Image Processing, GTC 2015 - ID S5365
Streaming:
Download:
 
Interactive Modelling and Rendering of Clouds
Jesper Mosegaard (Alexandra Instituttet)
In this presentation we explain how five small Danish animation/vfx companies and a Danish research institute worked together with the vision of increasing productivity and visual quality in clouds for small creative companies - these effects typical ...Read More
In this presentation we explain how five small Danish animation/vfx companies and a Danish research institute worked together with the vision of increasing productivity and visual quality in clouds for small creative companies - these effects typically requires a lot of waiting time for simulation or rendering. Our solution is fully interactive through utilization of the GPU. The graphics artist can manipulate mesh-geometry and will get interactive updates in final rendering quality of clouds with wispy features and multi-scatter light. We will explain how we carefully selected and implemented GPU algorithms going from mesh to voxel fields with wispy cloud appearances. We will also argue for an industry that needs more interactive tools to truly take advantage of the creative process.  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics, GTC 2015 - ID S5430
Streaming:
Download:
 
GPU Power through Javascript for Anyone with Universe 2.0 SDK
Sean Safreed (Red Giant)
Red Giant Universe is a set of tools for creating visual effects across a wide range of popular DCC apps. It is now accessible by artists with basic Javascript programming skills. The system enables users to create in minutes or hours what used to ta ...Read More
Red Giant Universe is a set of tools for creating visual effects across a wide range of popular DCC apps. It is now accessible by artists with basic Javascript programming skills. The system enables users to create in minutes or hours what used to take days or weeks to write in a mainstream computer language. This session will follow on the introductory session from 2014, with new expanded coverage of the SDK, Javascript examples and new additions to the system for real-time vector render and photo based rendering all in real-time on the GPU.  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics, Video & Image Processing, GTC 2015 - ID S5483
Streaming:
 
The Fabric Engine DFG: GPU Visual Programming for Visual Effects
Peter Zion (Fabric Software Inc.)
In this session you will learn how the Fabric Engine Data-Flow Graph (DFG) provides an easy-to-use but powerful node-based visual programming interface that can be used for programming of CUDA GPUs. Fabric Engine is a platform for the creation of eff ...Read More
In this session you will learn how the Fabric Engine Data-Flow Graph (DFG) provides an easy-to-use but powerful node-based visual programming interface that can be used for programming of CUDA GPUs. Fabric Engine is a platform for the creation of effects, simulations and tools for the media and entertainment industry, where visual programming is a popular development paradigm. Fabric Engine can be used standalone as well as integrated into popular off-the-shelf applications such as Maya, 3D Studio Max, and Softimage from Autodesk and Nuke from The Foundry. Fabric Engine can also be used for general CPU and GPU programming, providing a visual programming environment for general GPU programming.  Back
 
Keywords:
Media & Entertainment, Developer - Programming Languages, Real-Time Graphics, GTC 2015 - ID S5553
Streaming:
 
Using OpenCL for Performance-Portable, Hardware-Agnostic, Cross-Platform Video Processing
Dennis Adams (Sony Creative Software Inc.)
This talk will discuss how Sony Creative Software used OpenCL to build a 4K video pipeline in Vegas Pro and the new Catalyst Prepare applications. It will cover the design as well as the promises and pitfalls of writing over 100 OpenCL kernels for al ...Read More
This talk will discuss how Sony Creative Software used OpenCL to build a 4K video pipeline in Vegas Pro and the new Catalyst Prepare applications. It will cover the design as well as the promises and pitfalls of writing over 100 OpenCL kernels for all aspects of video processing from color management to plug-in video effects.  Back
 
Keywords:
Media & Entertainment, Video & Image Processing, GTC 2015 - ID S5592
Streaming:
Download:
 
JPEG2000 on GPU: A Fast 4K Video Mastering, Archiving, and Contribution
Jiri Matela (Comprimato)
JPEG2000 is state-of-the-art video compression adopted by all digital cinemas. Besides that, it has become the format of choice for longterm archiving mainly because it significantly saves disk space, provides superior image quality, and it allows fo ...Read More
JPEG2000 is state-of-the-art video compression adopted by all digital cinemas. Besides that, it has become the format of choice for longterm archiving mainly because it significantly saves disk space, provides superior image quality, and it allows for mathematically lossless compression. The recent development in standardization of master video formats (IMF) makes JPEG2000 the emerging video compression for 4K delivery and because of the very high image quality it is being used for broadcast contribution as well. The talk will cover various applications of JPEG2000 in digital video production workflows and it will explain how NVIDIA GPUs enable such workflows with speed sufficient for 4K video processing.  Back
 
Keywords:
Media & Entertainment, Defense, Medical Imaging, Video & Image Processing, GTC 2015 - ID S5602
Streaming:
 
Advancements in V-Ray RT GPU
Vladimir Koylazov (Chaos Software), Blagovest Taskov (Chaos Software Ltd.)
This talk discusses recent advancements in V-Ray RT GPU towards a fully-featured production renderer. Covered topics include implementations on the the GPU for hair raytracing, sub-surface scattering, out-of-core texture paging, displacement and othe ...Read More
This talk discusses recent advancements in V-Ray RT GPU towards a fully-featured production renderer. Covered topics include implementations on the the GPU for hair raytracing, sub-surface scattering, out-of-core texture paging, displacement and others.  Back
 
Keywords:
Media & Entertainment, Rendering & Ray Tracing, GTC 2015 - ID S5608
Streaming:
Download:
 
TurbulenceFD 2: Distributed Sparse Grid Fluid Simulation and Rendering for VFX
Jascha Wetzel (Jawset Visual Computing)
Voxel-based fluid simulation for VFX is currently mostly done on dense grids, but their poor spatial adaptivity restricts scalability. On GPUs, scalability is further restricted by the limited GPU memory. This talk gives an overview of the architectu ...Read More
Voxel-based fluid simulation for VFX is currently mostly done on dense grids, but their poor spatial adaptivity restricts scalability. On GPUs, scalability is further restricted by the limited GPU memory. This talk gives an overview of the architecture of TurbulenceFD 2 (TFD2) and how it handles fluid simulation and rendering. TFD2 implements a distributed sparse grid simulation and rendering framework that is highly adaptive and combines the memory and compute power of multiple GPUs and CPUs.  Back
 
Keywords:
Media & Entertainment, GTC 2015 - ID S5611
Streaming:
 
High-Performance Video Encoding Using NVIDIA GPUs
Abhijit Patait (NVIDIA)
This session is intended to provide a broad overview of the video encoding capabilities of current and future versions of NVIDIA's NVENC, a hardware accelerated encoder that ships with NVIDIA GPUs. We will provide an overview of the hardware capabil ...Read More
This session is intended to provide a broad overview of the video encoding capabilities of current and future versions of NVIDIA's NVENC, a hardware accelerated encoder that ships with NVIDIA GPUs. We will provide an overview of the hardware capabilities and software APIs used for video encoding, with an overview of recent improvements in features, performance and quality. We will also provide a quick overview of how NVIDIA video encoding can be used in applications such as transcoding, video streaming, and GPU virtualization.  Back
 
Keywords:
Media & Entertainment, Developer - Tools & Libraries, Video & Image Processing, GTC 2015 - ID S5613
Streaming:
Download:
 
GPU Computing: A VFX Plug-In Developer's Perspective
Stephen Bash (GenArts Inc.)
Making GPU plug-ins is hard! This talk is a very personal view on why, how CUDA helps, where it hurts, some of the emerging challenges, and what makes using CUDA for image-processing visual effects so worthwhile. I'll talk about multi-GPU, the chall ...Read More
Making GPU plug-ins is hard! This talk is a very personal view on why, how CUDA helps, where it hurts, some of the emerging challenges, and what makes using CUDA for image-processing visual effects so worthwhile. I'll talk about multi-GPU, the challenges of mixed languages, multiple OSes and supporting lots of hosts. We'll get into some technical details such as APIs and libraries, but it will be easily understood by anyone.  Back
 
Keywords:
Media & Entertainment, GTC 2015 - ID S5618
Streaming:
Download:
 
BLINK: A GPU-Enabled Image Processing Framework
Mark Davey (The Foundry)
We present BLINK, a language and framework for developing image processing algorithms across a range of computation devices. BLINK-based algorithms are automatically translated to optimised code for both GPUs and CPUs. This "write-once" app ...Read More
We present BLINK, a language and framework for developing image processing algorithms across a range of computation devices. BLINK-based algorithms are automatically translated to optimised code for both GPUs and CPUs. This "write-once" approach enables us to target both existing and new GPU hardware with minimal extra effort. Many algorithms produce visibly different results if mathematical operations are allowed to differ across platforms. Therefore BLINK has been designed to ensure numerically identical results between NVIDIA GPUs and CPUs. BLINK is at the heart of a number of key Foundry plug-ins and applications. An overview of this work and performance profiles will be presented, highlighting the speed gains achieved by using NVIDIA GPUs.  Back
 
Keywords:
Media & Entertainment, Video & Image Processing, GTC 2015 - ID S5619
Streaming:
Download:
 
GPU-Accelerated Image Processing for Modern Moving Images: Tachyon Wormhole
Lance Maurer (Cinnafilm, Inc.)
Cinnafilm CEO and founder Lance Maurer will discuss Tachyon Wormhole, a scalable, real-time, GPU-accelerated tool for lengthening or shortening video by precise amounts, avoiding the need for added editorial. This permits creating new commercial brea ...Read More
Cinnafilm CEO and founder Lance Maurer will discuss Tachyon Wormhole, a scalable, real-time, GPU-accelerated tool for lengthening or shortening video by precise amounts, avoiding the need for added editorial. This permits creating new commercial breaks and revenue opportunities. Processing is performed simultaneously to video, audio and captions, and the system also offers professional transcoding, motion-compensated frame-rate conversion, and unlimited format conversions. Wormhole is a software engineering marvel, receiving both "Best of Show" award at NAB 2014 and the prestigious HPA Engineering Excellence Award for 2014. Wormhole is a joint project between Cinnafilm and Wohler Technologies.  Back
 
Keywords:
Media & Entertainment, Real-Time Graphics, Video & Image Processing, GTC 2015 - ID S5624
Streaming:
Download:
 
Designing Studio Infrastructure for Uncompressed 6K Workflow: Using Adobe Premier for House of Cards and Gone Girl
Jeff Brue (Open Drives)
Jeff Brue, CTO of Open Drives and Post Production Engineer for House of Cards and Fox's upcoming Gone Girl, will discuss the infrastructure challenges and solutions for working in 6K. The talk will cover system requirements and the unique scenarios ...Read More
Jeff Brue, CTO of Open Drives and Post Production Engineer for House of Cards and Fox's upcoming Gone Girl, will discuss the infrastructure challenges and solutions for working in 6K. The talk will cover system requirements and the unique scenarios that arise when visual effects are integrated in a deep seamless manner into editorial through Adobe Premiere. Jeff will also discuss designing large scale editorial and VFX deployments with HP and NVIDIA.  Back
 
Keywords:
Media & Entertainment, GTC 2015 - ID S5636
Streaming:
Download:
 
Creating High-Dynamic-Range Content for Dolby Vision
Thad Beier (Dolby Laboratories)
Thad Beier will present Dolby's high-dynamic range, wide color gamut system called "Dolby Vision", describing the motivation behind its development and the positive, visceral reaction that content producers and viewers alike have on first ...Read More
Thad Beier will present Dolby's high-dynamic range, wide color gamut system called "Dolby Vision", describing the motivation behind its development and the positive, visceral reaction that content producers and viewers alike have on first seeing content created and viewed in this radically wider image space. He will discuss how NVIDIA's GPU technology is integral to every step of the production process, from off-line computation to real-time image processing.  Back
 
Keywords:
Media & Entertainment, GTC 2015 - ID S5639
Streaming:
 
Delta Mush: Smoothing Deformations While Preserving Detail for VFX and Game Characters
Joe Mancewicz (Rhythm & Hues Studios)
Peek under the hood of Rhythm & Hues Studio's powerhouse cleanup deformer, the Delta Mush. Delta Mush is a Voodoo deformer, which smooths arbitrary deformation of a polygonal mesh without smoothing the original detail of the model. Delta Mush do ...Read More
Peek under the hood of Rhythm & Hues Studio's powerhouse cleanup deformer, the Delta Mush. Delta Mush is a Voodoo deformer, which smooths arbitrary deformation of a polygonal mesh without smoothing the original detail of the model. Delta Mush does not require meticulous up-front tuning: it easily accommodates model and rig changes; and it has proven to be versatile far beyond cleanup. It has been used in all character rigs at R&H since it was developed in 2010.  Back
 
Keywords:
Media & Entertainment, Game Development, Real-Time Graphics, GTC 2015 - ID S5641
Streaming:
Download:
 
Canvas: GPU Image Processing on Giant Surfaces
Thomas Soetens (Immersive Design Studios)
We will discuss how we are bridging the transition from FPGA to GPU-based image processing with our proprietary software - CANVAS: a GPU image-processing platform designed for various AV applications including multi-screen warping, blending, pixel- m ...Read More
We will discuss how we are bridging the transition from FPGA to GPU-based image processing with our proprietary software - CANVAS: a GPU image-processing platform designed for various AV applications including multi-screen warping, blending, pixel- mapping and color matching. We will present a case-study based on a project at Montreal's Bell Centre hockey arena, featuring projections on ice during the 2013 NHL playoffs. The installation required image warping and blending with 12 overlapping projectors -each set of 6 projectors mapping in 6K onto the arena ice. The use of CANVAS allowed for pixel by pixel resolution, easy warping and blending, as well as cutting the projector calibration time from 8-12 hrs down to just 15 min. Attendees will learn about how to push the limits of the GPU's.  Back
 
Keywords:
Media & Entertainment, Visualization - Large Scale & Multi-Display, Video & Image Processing, GTC 2015 - ID S5642
Streaming:
Download:
 
Redshift: Production-quality, final-frame rendering on the GPU
Panagiotis Zompolas (Redshift Rendering Technologies), Robert Slater (Redshift Rendering Technologies)
This talk introduces Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. The talk covers features that make Redshift unique among commercial GPU renderers such ...Read More
This talk introduces Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. The talk covers features that make Redshift unique among commercial GPU renderers such as out-of-core data access, memory efficiency, multiple GI modes and comprehensive shading capabilities, among others. It also focuses on the technical challenges the Redshift development team faced while implementing final-frame, production-quality rendering on the GPU. A few customer work examples will also be demonstrated. This talk will be of interest both to the industry professional who wants to learn more about GPU-accelerated production-quality rendering as well as the software developer who's interested on GPU-accelerated rendering  Back
 
Keywords:
Media & Entertainment, Rendering & Ray Tracing, GTC 2015 - ID S5716
Streaming:
Download:
Medical Imaging
Presentation
Media
Multi-GPU Accelerated Refraction-Corrected Reflection Reconstruction for 3D Ultrasound Breast Imaging
Qun (Maxine) Liu (QT Ultrasound,LLC), Martin Cwikla (QT Ultrasound,LLC)
In this session, the presenters will discuss the acceleration and parallelization of a series of algorithms which together comprise the overall refraction-corrected ray-tracing algorithm. In addition, the members of the audience will be shown a metho ...Read More
In this session, the presenters will discuss the acceleration and parallelization of a series of algorithms which together comprise the overall refraction-corrected ray-tracing algorithm. In addition, the members of the audience will be shown a method of overlapping CPU and GPU computations by grouping multiple CPU worker threads, as well as using CUDA streams and CUDA events, in order to improve GPU throughput. Lastly, the presenters will describe how to best manage the available memory, in order to achieve optimal performance. In a general perspective, the presenters will provide an idea of the entire software architecture and hierarchy, for simplifying code improvements, maintenance, and repair.  Back
 
Keywords:
Medical Imaging, GTC 2015 - ID S5239
Streaming:
Download:
 
Using Multiple GPUs To Reconstruct The Brain From Histological Images.
Marcel Huysegoms (Forschungszentrum Jülich)
In this talk we present an effective approach for registering histological brain sections simultaneously by free-form deformations leading to improved 3D volume models. Since the task poses an optimization of several thousand transformation parameter ...Read More
In this talk we present an effective approach for registering histological brain sections simultaneously by free-form deformations leading to improved 3D volume models. Since the task poses an optimization of several thousand transformation parameters, it will be formulated as a Markov Random Field whose energy is composed of millions of similarity measurements, each requiring the establishment and reduction of a joint histogram. The huge amount of calculations is only feasible by utilizing multiple GPUs whereas the extent of resources scales with the number of images involved. In order to achieve optimal weak scaling we include a detailed Kepler performance analysis and compare the 3D results with the ones of regular registration algorithms.  Back
 
Keywords:
Medical Imaging, Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5286
Streaming:
Download:
 
3 Engineers, 2 Months: The World's First Operating Room Enhanced by High Performance Computing
John Clarke (Cydar Ltd)
Learn how a tiny team built and deployed a 424TFlop/s supercomputer in only two months. This supercomputer is used to provide real-time enhanced visualizations to endovascular surgeons during aortic aneurysm repair. Real-time machine vision demands n ...Read More
Learn how a tiny team built and deployed a 424TFlop/s supercomputer in only two months. This supercomputer is used to provide real-time enhanced visualizations to endovascular surgeons during aortic aneurysm repair. Real-time machine vision demands not only massive parallel data processing but also massive dataflows and unavoidably serial processing. In this talk, we describe how three advanced machine vision algorithms were each taken from single high-end GPU and moved to a cloud of GPU servers where the price-performance sweet spot is far from the high end. We describe the design and performance of our work and data distribution systems which are solutions to the cloud specific problems of slow intra-cloud networking and occasional cloud server hiatuses.  Back
 
Keywords:
Medical Imaging, Computer Vision & Machine Vision, Supercomputing, GTC 2015 - ID S5346
Streaming:
Download:
 
Mobile Wireless Ultrasound with GPU Beamforming
Jesper Mosegaard (Alexandra Instituttet)
Learn how the Synthetic Aperture Sequential Beamforming (SASB) algorithm is efficiently implemented on the GPU - and how this will enable high-quality mobile and wireless medical ultrasound imaging. The vision is to get rid of the thick cable and PC ...Read More
Learn how the Synthetic Aperture Sequential Beamforming (SASB) algorithm is efficiently implemented on the GPU - and how this will enable high-quality mobile and wireless medical ultrasound imaging. The vision is to get rid of the thick cable and PC that accompanies current ultrasound transducers. SASB is divided into two stages; the first stage is simple enough to be implemented with simple (low-power) electronics, while the second stage is efficiently executed even on mobile GPUs. The bandwidth requirements between these stages is significantly lower than traditional beamforming, while retaining approximately the same quality. With our future development  Back
 
Keywords:
Medical Imaging, GTC 2015 - ID S5357
Streaming:
Download:
 
Real-Time Adaptivity in Head-and-Neck and Lung Cancer Radiotherapy Using a Multi-GPU Framework
Anand Santhanam (University of California, Los Angeles)
Attendees will learn the specific steps involved in adaptive radiation therapy and how each of the steps are significantly benefitted by GPU based algorithms. We will present our GPU implementation details and results for enabling real-time adaptive ...Read More
Attendees will learn the specific steps involved in adaptive radiation therapy and how each of the steps are significantly benefitted by GPU based algorithms. We will present our GPU implementation details and results for enabling real-time adaptive radiotherapy for head and neck cancer. The methods will focus on the algorithm details of the deformable image registration, systematic model guided validation of the registration process and a non-voxel based dose convolution, all of which are critical in improving head and neck cancer treatment. Results presented in this talk will focus light on the system accuracy and computational speed as compared with current clinical frameworks. A discussion on how this framework gets benefitted by a cloud setup will conclude the presentation.  Back
 
Keywords:
Medical Imaging, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5438
Streaming:
Download:
 
Fast Digital Tomosynthesis for LIVE Radiation Therapy
Alexandros-Stavros Iliopoulos (Department of Computer Science, Duke University)
Learn about the recently developed LIVE radiation oncology imaging system for 4D localization of moving tumors, and how its computational reconstruction algorithm may enable clinical applicability during adaptive radiation therapy treatments. We disc ...Read More
Learn about the recently developed LIVE radiation oncology imaging system for 4D localization of moving tumors, and how its computational reconstruction algorithm may enable clinical applicability during adaptive radiation therapy treatments. We discuss the approach of LIVE for high-fidelity reconstruction from a partial patient scan, together with its clinical significance and resulting computational challenges. By exploiting the GPU computing model and using a novel algorithm formulation, we obtain a simple and efficient reconstruction process, allowing LIVE to go into clinical trials for the first time. We present results with patient data, and remark on remaining challenges.  Back
 
Keywords:
Medical Imaging, Video & Image Processing, GTC 2015 - ID S5492
Streaming:
Download:
 
Accelerating Proton Computed Tomography on Heterogeneous Systems
Thomas Uram (Argonne National Laboratory)
Proton computed tomography is a medical imaging technology with the potential to produce more accurate volumetric reconstructions at a lower radiation dose than X-ray computed tomography, albeit with greater computational demands. While in X-ray CT t ...Read More
Proton computed tomography is a medical imaging technology with the potential to produce more accurate volumetric reconstructions at a lower radiation dose than X-ray computed tomography, albeit with greater computational demands. While in X-ray CT the photons propagate through the target volume in straight lines, the protons in pCT are scattered by the material in the target volume, resulting in a curvilinear path that must be approximated, and a system of equations that must be iteratively solved. We describe the adaptation of the two dominant compute phases for the GPU, compare their performance on the CPU and GPU, and describe efforts to improve the GPU performance. The first phase achieves an 11x speedup; the second phase involves a sparse iterative solver and achieves a 2x speedup.  Back
 
Keywords:
Medical Imaging, GTC 2015 - ID S5497
Streaming:
 
Computer Aided Detection for 3D Breast Imaging and GPU Technology
Haili Chui (Hologic Inc.), Xiangwei Zhang (Hologic Inc.)
In this talk, we will provide an overview of the current CAD (Computer Aided Detection) system & technology, and we will discuss the usage of GPU optimization as a key enabling factor to make such systems for 3D breast imaging. More specifically, ...Read More
In this talk, we will provide an overview of the current CAD (Computer Aided Detection) system & technology, and we will discuss the usage of GPU optimization as a key enabling factor to make such systems for 3D breast imaging. More specifically, we will cover the following topics: 1) The on-going transition of breast imaging from 2D to 3D; 2) The making of a 3D CAD system; 3) The role of GPU optimization; 4) Trends in medical imaging big data analysis and risk modeling.  Back
 
Keywords:
Medical Imaging, Video & Image Processing, GTC 2015 - ID S5512
Streaming:
Download:
 
NeuroGPU : Accelerating Biophysical Neuronal Modeling with CUDA
Roy Ben-Shalom (UCSF Neurology Department)
Learn how to implement fast, cheap and realistic neuronal modeling through NeuroGPU, the first open-source, biophysical rigorous compartmental neuronal modeling environment based in for GPUs. In this talk we will discuss: 1) an overview of the mathem ...Read More
Learn how to implement fast, cheap and realistic neuronal modeling through NeuroGPU, the first open-source, biophysical rigorous compartmental neuronal modeling environment based in for GPUs. In this talk we will discuss: 1) an overview of the mathematics of neuronal modeling, 2) computational challenges imposed by traditional modeling environments, and 3) how these can be overcome through implementation in CUDA. Examples of advanced scientific computing tasks, including evolutionary algorithms for model optimization, will be provided using NeuroGPU.  Back
 
Keywords:
Medical Imaging, Developer - Algorithms, Life & Material Science, Supercomputing, GTC 2015 - ID S5525
Streaming:
Download:
 
3D Backprojection: Meeting the Challenge for Performance in Medical Imaging
Lars Nyland (NVIDIA), Julien Demouth (NVIDIA), Feiwen Zhu (NVIDIA), Sky Wu (NVIDIA)
In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilizatio ...Read More
In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilization of different components of a GPU by this kernel. Our focus will be on Kepler and Maxwell architectures.  Back
 
Keywords:
Medical Imaging, Signal & Audio Processing, GTC 2015 - ID S5534
Streaming:
Download:
 
Real-Time Multi-Plane Tomosynthesis Using GPUs
Oleg Konings (Triple Ring Technologies), Tobias Funk, Ph.D. (Triple RingTechnologies Inc)
We have built a GPU accelerated X-ray tomosynthesis system for interventional radiology. Reduced scatter, a more efficient dose, and a larger field of view are the tangible results of the system. The goal of this session is to describe the novel hard ...Read More
We have built a GPU accelerated X-ray tomosynthesis system for interventional radiology. Reduced scatter, a more efficient dose, and a larger field of view are the tangible results of the system. The goal of this session is to describe the novel hardware and software configuration of the scanning-beam digital x-ray reconstruction system. This system employs dual detectors which continuously generate sets of data that are processed and reconstructed as large 1000x1800 images. The presenters will explain the complex hardware configuration which uses FPGAs in conjunction with the GPUdirect RDMA feature of NVIDIA®Tesla GPUs. Details of the CUDA filtering and reconstruction algorithms will be examined, with an emphasis on low-level optimizations.  Back
 
Keywords:
Medical Imaging, Computational Physics, GTC 2015 - ID S5575
Streaming:
Download:
 
Acquiring Dramatic Gains in Image Quality: GPU-Accelerated Beamforming
Andre Lehovich (Decision Sciences Medical)
Synthetic-aperture ultrasound systems offer a potential for dramatic gains in image quality, compared to classical ultrasound, provided one can do the data acquisition and computations quickly enough. In our synthetic aperture implementation, we are ...Read More
Synthetic-aperture ultrasound systems offer a potential for dramatic gains in image quality, compared to classical ultrasound, provided one can do the data acquisition and computations quickly enough. In our synthetic aperture implementation, we are able to achieve 10 fps images by using a GPU to accelerate the beamforming process, a significant speedup over the frame rates available using CPU processing.  Back
 
Keywords:
Medical Imaging, GTC 2015 - ID S5615
Streaming:
Download:
NVScene
Presentation
Media
The Timeless Way of Building Geometry: How to Create Content with Signed Distance Functions
Johann Korndorfer (Demogroup Mercury)
Good Signed Distance Functions define geometry by providing a semantic description that is very close to the essence of what the shape actually is - but that is almost completely lost when working with polygons or voxels, who solve the problem by thr ...Read More
Good Signed Distance Functions define geometry by providing a semantic description that is very close to the essence of what the shape actually is - but that is almost completely lost when working with polygons or voxels, who solve the problem by throwing big amounts of data at it. Think vector graphics vs. pixels. As a result, SDFs can describe objects in an elegant way that makes variation, animation and last-minute changes trivial. Since building good SDFs is not straightforward, this talk will focus on patterns that help with the modelling process by making it more structured, such as two different families of operators, debugging views, and assorted best practices that have helped us build last year's "the timeless" and other 64k and 4k intros. A very short introduction will show what a Signed Distance Function is (spoiler: a piece of code), what properties it should have and how it is commonly rendered. We will then probably spend most of the time live-coding SDFs and looking at  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5724
Streaming:
 
Textmode Is Awesome
Bo Samson (Gameloft)
We'll be defining what a textmode demo is, and why we should care. From there, we'll have a look at what makes one go from good to great, design-wise. From content to colours to graphic style, these tiny choices can have huge impact. We'll also se ...Read More
We'll be defining what a textmode demo is, and why we should care. From there, we'll have a look at what makes one go from good to great, design-wise. From content to colours to graphic style, these tiny choices can have huge impact. We'll also see how your design can be dependent on your pixel-to-character converter. We'll cover the most popular approaches, both in colour mixing and subsampling, along with their forces and drawbacks. If time allows, we'll look at what cool stuff has been or could be done with the extremely low resolution. Come along, we'll have a blast!  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5725
Streaming:
 
Tips and Techniques For Efficient and Impressive Animations
Alexander Lehmann (Filmmaker)
Creating animated short films, music videos and demos is an extremely complex process. Between a concept and the final release many fields of filmmaking and design needs to be mastered and applied. In this session Alexander will show and explain the ...Read More
Creating animated short films, music videos and demos is an extremely complex process. Between a concept and the final release many fields of filmmaking and design needs to be mastered and applied. In this session Alexander will show and explain the workflow which he developed to create impressive animations on a tight budget of both time and funds. We will look at fun and time efficient processes and techniques that allows you to become a self-employed "one man 3D army". Alexander will also cover how he started his animation studio and how the demoscene has always played a role in it.  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5726
Streaming:
 
On Finishing Creative Projects
Thomas Mann (Framefield)
Building on his talk at GTC 2014, Thomas will talk about our creative design process for real-time animations. Especially on how to turn abstract ideas and concepts into moving images; make consistent design decisions; get inspired by programming on ...Read More
Building on his talk at GTC 2014, Thomas will talk about our creative design process for real-time animations. Especially on how to turn abstract ideas and concepts into moving images; make consistent design decisions; get inspired by programming on the way and how to tweak the look and timing into a finished product.  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5727
Streaming:
 
Graphics Programming Through the Ages
Michael Dille (GGT/NASA Ames Research Center), Keith Bare (NetApp)
Many new computers over the years have generated great excitement with ever more powerful graphical abilities. To facilitate this, machine designers developed a variety of creative (if largely now arcane) programming interfaces that allowed software ...Read More
Many new computers over the years have generated great excitement with ever more powerful graphical abilities. To facilitate this, machine designers developed a variety of creative (if largely now arcane) programming interfaces that allowed software authors to squeeze impressive displays from scant computational resources, an art perfected by the demoscene. This talk will focus on a few famous case studies such as the Commodore 64, the Amiga, and early PCs while exploring how each respective architecture influenced the style and appearance of demos on that platform. This chronology of hardware history provides the context to then appreciate the evolution of demos from machine-specific skills demonstrations to immersive graphical simulations, reaching the modern emphasis on aesthetics and production while offering a nod to today's "low-fi" demoscene that retains a focus on pure programming challenge.  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5728
Streaming:
 
Creating Interactive Visuals for Large Audiences
Joel Pryde (Stimulant)
This session will cover what Joel has learned from working on interactive visuals for festivals, conferences and public places and some of the challenges of working in these venues. Both for his professional work at Stimulant (http://stimulant.com) a ...Read More
This session will cover what Joel has learned from working on interactive visuals for festivals, conferences and public places and some of the challenges of working in these venues. Both for his professional work at Stimulant (http://stimulant.com) as well as his side projects where he has created a number of very large scale interactive pieces for festivals such as Decibel, conferences like CES and various other public venues. This talk would include a quick high level overview of the work I've done and some of the tools and practices that have served me in building these creations and allowing them to react to music, the audience and changes in environment.  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5729
Streaming:
 
Shadertoy Hackathon Finale
Inigo Quilez (Beautypi), Pol Jeremias (Beautypi)
The founders of Shadertoy will host a hackathon where the audience is invited to participate. More info TBA! ...Read More
The founders of Shadertoy will host a hackathon where the audience is invited to participate. More info TBA!  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5730
Streaming:
 
Reinventing the Wheel - One Last Time
Ricardo Cabello
- Hey! Assembly is in 4 months! Do you want to do a demo for it? - Lets do it! Do you have new effects? - Hmm... Yeah. But I think we should do a new demosystem. The code for our last demo ended up a bit too messy. - Oh! Ok. 8 years later... ...Read More
- Hey! Assembly is in 4 months! Do you want to do a demo for it? - Lets do it! Do you have new effects? - Hmm... Yeah. But I think we should do a new demosystem. The code for our last demo ended up a bit too messy. - Oh! Ok. 8 years later...  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5782
Streaming:
 
Thinking Outside the Cartridge: Modern Ideas Applied to Archaic Devices
Jake Taylor (Fuse)
Ever wanted to make something that looks and sounds awesome with a gaming console you've played on your entire life? Ever wanted to build a proper, modern toolset to do so and experiment with functional programming at the same time? Then this talk i ...Read More
Ever wanted to make something that looks and sounds awesome with a gaming console you've played on your entire life? Ever wanted to build a proper, modern toolset to do so and experiment with functional programming at the same time? Then this talk is for you! We'll take a look at two recent Super Nintendo demos, and, in particular, the ideas and methodologies applied when making them. First, we'll go over some of the details of the SNES' quirky hardware and the usual methods of making it tick. Building from there, we'll look at how most of this can be reduced to "simple" data processing, and how modern development techniques can be applied to make this simpler and more interesting. All in all, this talk aims to show how applicable modern programming practices can be to unexpected problem domains, and how inspiring it can be to work with creative, out-of-the-box solutions. After all, a little ancient console dev never hurt anyone, right?  Back
 
Keywords:
NVScene, Game Development, GTC 2015 - ID S5783
Streaming:
 
GPU Unchained
Timothy Lottes
Live voyage thought a collection of low-level and advanced GPU programming topics with a focus on unconventional thinking. Starting with an interactive look at driving the CPU from the GPU: showing a GL based pipeline where shaders can write to a com ...Read More
Live voyage thought a collection of low-level and advanced GPU programming topics with a focus on unconventional thinking. Starting with an interactive look at driving the CPU from the GPU: showing a GL based pipeline where shaders can write to a command buffer which the CPU executes, enabling GPU driven reconfiguration of resources and rendering pipeline. Exploring methods to use this kind of rapid development tool for manual run-time profile guided optimization. Continuing with a visual exploration of advanced filtering techniques for real-time ray based rendering, and methods to enable 1080p ray-march at 120Hz and beyond.  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5784
Streaming:
 
Real Virtuality: Adventures in WebVR
Antti Jädertpolm (Fthr / TPOLM) (Vizor.io)
In this session, we will go over the brief history of online VR (WebVR) and what the future holds for the medium. We will showcase different frameworks and tools that are currently available and being used for WebVR and discuss how both businesses an ...Read More
In this session, we will go over the brief history of online VR (WebVR) and what the future holds for the medium. We will showcase different frameworks and tools that are currently available and being used for WebVR and discuss how both businesses and art can benefit from it. Antti will demonstrate how he has used VR to create unique demoscene related effects through his own visual programming interface and share his experiences with VR in general.  Back
 
Keywords:
NVScene, Augmented Reality & Virtual Reality, Web Acceleration, GTC 2015 - ID S5860
Streaming:
Download:
 
Android Performance Patterns: Flow
Etienne Caron (TrueKey)
On mobile devices, tactile feedback provides a very close, personal interaction with users. Lack of speed or sluggishness compromises this feedback loop, and multiple UX studies have shown this has a very real impact on users and how they use your so ...Read More
On mobile devices, tactile feedback provides a very close, personal interaction with users. Lack of speed or sluggishness compromises this feedback loop, and multiple UX studies have shown this has a very real impact on users and how they use your software. Fluid feedback can have huge impact on getting your work noticed and adopted by users. A well crafted UI/UX can induce 'flow', or hyperfocus in your users. Something demos usually excel at provoking in viewers. In this session, we'll leverage demoscene know-how to create rich dynamic user interfaces, combining shader rendering tricks with traditional Android UI elements. We'll also learn how to efficiently use the Android platform tools to keep your framerate at a rock-solid 60fps.  Back
 
Keywords:
NVScene, Rendering & Ray Tracing, GTC 2015 - ID S5861
Streaming:
 
NVScene Opening and Shadertoy Hackathon Kickoff
 
Keywords:
NVScene, GTC 2015 - ID S5906
Streaming:
 
Enough: An Interactive Picture Book
Isaac Cohen (Cabbibo)
Enough is an exploration in interactive storytelling. Because of the power of real time graphics, we are entering an era of redefining what fiction means. Although many times the progression of this art finds itself in the realm of RPGs and FPSs , En ...Read More
Enough is an exploration in interactive storytelling. Because of the power of real time graphics, we are entering an era of redefining what fiction means. Although many times the progression of this art finds itself in the realm of RPGs and FPSs , Enough tries to reexamine what a 'Picture Book' could mean in the time of modern GPUs.  Back
 
Keywords:
NVScene, Real-Time Graphics, GTC 2015 - ID S5921
Streaming:
OpenACC
Presentation
Media
Showing the Missing Middle: Enabling OpenACC Performance Analysis
Guido Juckeland (Technische Universität Dresden - ZIH)
Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to r ...Read More
Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to record how much time is spent in OpenACC regions and what device activity it turns into. See how this can be turned into a natural timeline based visualization to show with great detail what an OpenACC application is doing at any point in time.  Back
 
Keywords:
OpenACC, Developer - Performance Optimization, Developer - Tools & Libraries, Supercomputing, GTC 2015 - ID S5139
Streaming:
Download:
 
Experiences in Porting Scientific Applications to GPUs Using OpenACC
Saber Feki (KAUST), Ahmed Al-Jarro (Fujitsu Laboratories of Europe Ltd)
Learn how to effectively use the directive-based OpenACC programming model to accelerate scientific applications and easily harness the computational power of GPUs. We share in this session our experiences in porting and tuning three applications to ...Read More
Learn how to effectively use the directive-based OpenACC programming model to accelerate scientific applications and easily harness the computational power of GPUs. We share in this session our experiences in porting and tuning three applications to GPUs using OpenACC: (i) an explicit seismic imaging kernel used in the Reverse Time Migration and Full Waveform Inversion applications, widely used in oil and gas exploration, where we show that fine tuning some of its clauses results in better performance, (ii) an implicit solver used in CFD for simulating the fluid structure interaction of flow over airfoil, and (iii) a CEM code that is based on the time-domain volume-integral-equation for simulating transient electromagnetics using both CAPS and PGI compilers.  Back
 
Keywords:
OpenACC, Developer - Performance Optimization, Computational Physics, Energy Exploration, Developer - Programming Languages, GTC 2015 - ID S5160
Streaming:
Download:
 
Introduction to Compiler Directives with OpenACC
Jeff Larkin (NVIDIA)
Compiler directives, such as OpenACC and OpenMP, simplify parallel programming by exposing concepts at a high level and insulating developers from low-level, architectural details. In this session participants will learn the fundamentals of using com ...Read More
Compiler directives, such as OpenACC and OpenMP, simplify parallel programming by exposing concepts at a high level and insulating developers from low-level, architectural details. In this session participants will learn the fundamentals of using compiler directives to program for GPUs. This session will be taught using OpenACC, but the skills will be directly transferable to OpenMP. At the end of this tutorial participants will be able to use compiler directives to accelerate an application on a GPU.  Back
 
Keywords:
OpenACC, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5192
Streaming:
Download:
 
Advanced OpenACC Programming
Jeff Larkin (NVIDIA)
This tutorial will teach advanced topics in using OpenACC to accelerate applications on GPUs. Some experience in OpenACC and/or OpenMP will be beneficial for attending this session. Participants will learn how to further improve the performance of an ...Read More
This tutorial will teach advanced topics in using OpenACC to accelerate applications on GPUs. Some experience in OpenACC and/or OpenMP will be beneficial for attending this session. Participants will learn how to further improve the performance of an OpenACC application using advanced topics, such as aynchronicity and interoperability with accelerated libraries. After attending this session participants will be able to optimize an OpenACC application for additional GPU performance.  Back
 
Keywords:
OpenACC, Developer - Performance Optimization, Developer - Programming Languages, GTC 2015 - ID S5195
Streaming:
Download:
 
Comparing OpenACC and OpenMP Performance and Programmability
Jeff Larkin (NVIDIA), Guido Juckeland (Technische Universität Dresden - ZIH)
OpenACC and OpenMP provide programmers with two good options for portable, high-level parallel programming for GPUs. This talk will discuss similarities and differences between the two specifications in terms of programmability, portability, and perf ...Read More
OpenACC and OpenMP provide programmers with two good options for portable, high-level parallel programming for GPUs. This talk will discuss similarities and differences between the two specifications in terms of programmability, portability, and performance.  Back
 
Keywords:
OpenACC, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5196
Streaming:
Download:
 
Featured Panel: GPU Computing with OpenACC and OpenMP
Jeff Larkin (NVIDIA), Michael Wolfe (NVIDIA), Fernanda Foertter (Oak Ridge National Lab), Duncan Poole (NVIDIA)
This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The panel wil ...Read More
This panel will discuss the current state of GPU programming using compiler directives, such as OpenACC and OpenMP. This session is a forum for discussing both the successes and shortcomings of using compiler directives to program GPUs. The panel will include users, speakers from compiler and tools vendors, and representatives of open source efforts to support directives. Session participants are encouraged to participate in the discussions of this panel.  Back
 
Keywords:
OpenACC, Developer - Tools & Libraries, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5198
Streaming:
 
Porting Computational Physics Applications to the Titan Supercomputer with OpenACC and OpenMP (Presented by Cray)
Aaron Vose (Cray Inc.)
This session presents valuable "lessons learned" during the process of porting computational physics applications to the Titan supercomputer with hybrid OpenACC and OpenMP. Specifically, three real-world HPC codes are enhanced with OpenACC ...Read More
This session presents valuable "lessons learned" during the process of porting computational physics applications to the Titan supercomputer with hybrid OpenACC and OpenMP. Specifically, three real-world HPC codes are enhanced with OpenACC directives to take advantage of the Kepler GPUs and OpenMP directives to target the CPUs of the Titan supercomputer. The first application is TACOMA, a computational fluid dynamics code which solves finite-volume, block-structured, compressible flows. The second application is Delta5D, a Monte Carlo fusion code which follows particle orbits in Boozer space using Hamiltonian guiding center equations solved with an adaptive time step integrator. Finally, the third application is NekCEM, a high-fidelity electromagnetics solver based on spectral element methods. While the science behind these applications may differ significantly, the same porting process and lessons learned apply to each.  Back
 
Keywords:
OpenACC, Developer - Performance Optimization, Computational Physics, Developer - Programming Languages, GTC 2015 - ID S5202
Streaming:
Download:
 
GPU Acceleration Using OpenACC and C++ Classes
Mathew Colgrove (NVIDIA)
This tutorial provides strategies of using OpenACC to accelerate C++ classes. Examples illustrate topics such as member functions, inheritance, templates, containers, the implicit 'this' pointer, private data and deep copies. OpenACC 2.0 features s ...Read More
This tutorial provides strategies of using OpenACC to accelerate C++ classes. Examples illustrate topics such as member functions, inheritance, templates, containers, the implicit 'this' pointer, private data and deep copies. OpenACC 2.0 features such as unstructured data regions and the "routine" directive are highlighted. We also discuss current limitations and the future directions of OpenACC. Familiarity with OpenACC is recommended but not required.  Back
 
Keywords:
OpenACC, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5233
Streaming:
Download:
 
Extended OpenACC Programming to Exploit GPU-Specific Features Still at a High Level
Seyong Lee (Oak Ridge National Laboratory)
We present an extended OpenACC programming model to fully exploit GPU-specific features still at a high level. Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution for GPU programming. However, too mu ...Read More
We present an extended OpenACC programming model to fully exploit GPU-specific features still at a high level. Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution for GPU programming. However, too much abstraction in the directive models makes it difficult for users to control architecture-specific features, incurring large performance gap between the directive models and low-level CUDA/OpenCL. We propose and implement new OpenACC extensions to support 1) hybrid programming of the unified memory and separate memory and 2) exploiting GPU-specific memories and synchronizations in an abstract manner. Experimental results show that the extended OpenACC programming can perform similarly to low-level CUDA programs, while at high level.  Back
 
Keywords:
OpenACC, Developer - Performance Optimization, Developer - Programming Languages, GTC 2015 - ID S5366
Streaming:
Download:
 
OpenACC 2.5 and Beyond
Michael Wolfe (NVIDIA)
Learn about the new features being added to OpenACC in the upcoming 2.5 version, and the new data management features being designed for the subsequent version. OpenACC is the popular directive-based API for GPU and accelerator programming, first rel ...Read More
Learn about the new features being added to OpenACC in the upcoming 2.5 version, and the new data management features being designed for the subsequent version. OpenACC is the popular directive-based API for GPU and accelerator programming, first released in 2011, supported by the Cray and PGI commercial products, and being implemented by numerous open-source compilers. The latest OpenACC release includes several simplifications and exposes some new behavior that programmers should be aware of. This presentation will also discuss the continuing work on deep data structure management features being designed for the subsequent release.  Back
 
Keywords:
OpenACC, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5382
Streaming:
Download:
 
OpenACC for Fortran Programmers
Michael Wolfe (NVIDIA)
Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fo ...Read More
Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fortran programs. The second half will be intermediate material, with more advanced hints and tips for Fortran programmers with larger applications that they want to accelerate with a GPU. Among the topics to be covered will be dynamic device data lifetimes, global data, procedure calls, derived type support, and much more.  Back
 
Keywords:
OpenACC, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5388
Streaming:
Download:
 
Porting Apps to Titan: Results from the Inaugural GPU Hackathon
Mi Sun Min (Agronne National Laboratory), Fernanda Foertter (Oak Ridge National Laboratory), Adam Simpson (Oak Ridge National Laboratory), Steven Young (Oak Ridge National Laboratory), Seth Johnson (Oak Ridge National Laboratory)
This session will showcase the results of the inaugural GPU Hackathon held at the Oak Ridge Leadership Computing Facility. The event hosted six teams paired with mentors over a week where applications where ported to GPUs using OpenACC directives. Th ...Read More
This session will showcase the results of the inaugural GPU Hackathon held at the Oak Ridge Leadership Computing Facility. The event hosted six teams paired with mentors over a week where applications where ported to GPUs using OpenACC directives. The talk will describe the progress of each team from beginning to end as well as details about their implementation. Best practices, lessons learned as well as anecdotes from mentors who participated in this training event will be shared.  Back
 
Keywords:
OpenACC, Developer - Programming Languages, Supercomputing, GTC 2015 - ID S5515
Streaming:
Download:
OpenPOWER
Presentation
Media
Enabling Financial Service Firms to Compute Heterogeneously with Gateware Defined Networking (GDN)
John Lockwood (Algo-Logic Systems, Inc.)
Stock, futures, and option exchanges; market makers; hedge funds; and traders require real-time knowledge of the best bid and ask prices for the instruments that they trade. By monitoring live market data feeds and computing an order book with Field ...Read More
Stock, futures, and option exchanges; market makers; hedge funds; and traders require real-time knowledge of the best bid and ask prices for the instruments that they trade. By monitoring live market data feeds and computing an order book with Field Programmable Gate Array (FPGA) logic, these firms can track the balance of pending orders for equities, futures, and options with sub-microsecond latency. Tracking the open orders by all participants ensures that the market is fair, liquidity is made available, trades are profitable, and jitter is avoided during bursts of market activity.  Back
 
Keywords:
OpenPOWER, Finance, Supercomputing, GTC 2015 - ID S5677
Streaming:
 
Accelerator Opportunities with OpenPower
Nick Finamore (Altera Corporation)
The OpenPower architecture provides unique capabilities which will enable highly effective and differentiated acceleration solutions. The OpenPower Accelerator Workgroup is chartered to develop both hardware the software standards which provide vendo ...Read More
The OpenPower architecture provides unique capabilities which will enable highly effective and differentiated acceleration solutions. The OpenPower Accelerator Workgroup is chartered to develop both hardware the software standards which provide vendors the ability to develop these solutions. The presentation will cover an overview of the benefits of the OpenPower architecture for acceleration solutions. We will provide an overview of the Accelerator Workgroups plans and standards roadmap. We will give an overview of the OpenPower CAPI development kit. We will also walk through an example of a CAPI attached acceleration solution.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5678
Streaming:
 
Using Docker in High Performance Computing in OpenPOWER Environment
Sam Sanjabi (IBM Systems & Technology Lab (Canada)), Xuebin Min (IBM (China))
OpenPOWER will be one of major platforms in High Performance Computing (HPC). IBM Load Sharing Facility (LSF) is the most famous cluster workload management software aimed to explore computation capacity of clusters to the maximum in HPC, and LSF is ...Read More
OpenPOWER will be one of major platforms in High Performance Computing (HPC). IBM Load Sharing Facility (LSF) is the most famous cluster workload management software aimed to explore computation capacity of clusters to the maximum in HPC, and LSF is proved running well on OpenPOWER platform. As an open platform for developers and system administrators to build, ship and run applications, Docker has been widely used in cloud. Could we extend Docker benefits to HPC? Yes, we do. By integrating LSF and Docker in OpenPOWER platform, we achieved better application Docking in OpenPOWER HPC.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5680
Streaming:
 
Changing the Game: Burst Buffer Technologies
Jeff Sisilli (DataDirect Networks)
Planning for exascale, accelerating time to discovery and extracting results from massive data sets requires organizations to continually seek faster and more efficient solutions to provision I/O and accelerate applications. New burst buffer technolo ...Read More
Planning for exascale, accelerating time to discovery and extracting results from massive data sets requires organizations to continually seek faster and more efficient solutions to provision I/O and accelerate applications. New burst buffer technologies are being introduced to address the long-standing challenges associated with the overprovisioning of storage by decoupling I/O performance from capacity. Some of these solutions allow large datasets to be moved out of HDD storage and into memory quickly and efficiently. Then, data can be moved back to HDD storage once processing is complete much more efficiently with unique algorithms that align small and large writes into streams, thus enabling users to implement the largest, most economical HDDs to hold capacity.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5681
Streaming:
 
Introducing the Little-Endian OpenPOWER Software Development Environment and Its Application Programming Interfaces
Michael Gschwind (IBM Systems & Technology Group)
Over the past three decades, the Power Architecture has been an important asset in IBM's systems strategy. During the time, Power-based systems powered desktops, technical workstations, embedded devices, game consoles, supercomputers and commercial ...Read More
Over the past three decades, the Power Architecture has been an important asset in IBM's systems strategy. During the time, Power-based systems powered desktops, technical workstations, embedded devices, game consoles, supercomputers and commercial UNIX servers.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5682
Streaming:
 
Porting Scientific Applications to OpenPOWER
Dirk Pleiter (Jülich Supercomputing Centre)
While over the past years significant experience for using GPUs with processors based on the x86 ISA has been obtained, GPU-accelerated systems with POWER processors have become available only very recently. In this talk we report on early experience ...Read More
While over the past years significant experience for using GPUs with processors based on the x86 ISA has been obtained, GPU-accelerated systems with POWER processors have become available only very recently. In this talk we report on early experiences of porting selected scientific applications to GPU-accelerated POWER8 systems. We will explore basic performance features through micro-benchmarks, but our main focus will be on results for full applications or mini-applications. These have been selected such that hardware characteristics can be explored for applications with significantly different performance signatures. The application domains range from physics to life sciences and have in common that they are in need of supercomputing resources.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5683
Streaming:
 
The Future of Interconnect with OpenPOWER
Scot Schultz (Mellanox Technologies)
Mellanox ConnectX-4 EDR 100Gb/s technology was introduced in November at the SC'14 conference in New Orleans, LA. ConnectX-4 EDR 100Gb/s with CAPI support tightly integrates with the POWER CPU at the local bus level and provides faster access betwee ...Read More
Mellanox ConnectX-4 EDR 100Gb/s technology was introduced in November at the SC'14 conference in New Orleans, LA. ConnectX-4 EDR 100Gb/s with CAPI support tightly integrates with the POWER CPU at the local bus level and provides faster access between the POWER CPU and the network device. We will discuss the latest interconnect advancements that maximize application performance and scalability on OpenPOWER architecture, including enhanced flexible connectivity with the latest Mellanox ConnectX-3 Pro Programmable Network Adapter. The new programmable adapter provides maximum flexibility for users to bring their own customized applications such as IPSEC encryption, enhanced flow steering and Network Address Translation (NAT), data inspection, data compression and others.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5685
Streaming:
 
Enabling Coherent FPGA Acceleration
Allan Cantle (Nallatech)
The presentation will introduce CAPI, Coherent Accelerator Processor Interface, to the audience and will detail the CAPI HDK, Hardware Development Kit, implementation that is offered to OpenPOWER customers through Nallatech. Several high level exampl ...Read More
The presentation will introduce CAPI, Coherent Accelerator Processor Interface, to the audience and will detail the CAPI HDK, Hardware Development Kit, implementation that is offered to OpenPOWER customers through Nallatech. Several high level examples will be presented that show where FPGA acceleration brings significant performance gains and how these can often be further advantaged by the Coherent CAPI interface. Programming methodologies of the accelerator will also be explored where customers can either leverage pre-compiled accelerated libraries that run on the accelerator or where they can write their own Accelerated functions in OpenCL.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5686
Streaming:
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
John Ashley (NVIDIA)
Learn how applications can be accelerated on IBM Power8 systems with NVIDIA® Tesla® Accelerated Computing Platform, the leading platform for accelerating big data analytics and scientific computing. The platform combines the world's fastest ...Read More
Learn how applications can be accelerated on IBM Power8 systems with NVIDIA® Tesla® Accelerated Computing Platform, the leading platform for accelerating big data analytics and scientific computing. The platform combines the world's fastest GPU accelerators, the widely used CUDA® parallel computing model, NVLink, high-speed GPU interconnect to power supercomputers, and a comprehensive ecosystem of software developers, software vendors, and datacenter system OEMs to accelerate discovery and insight.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5687
Streaming:
 
PGI Compilers for OpenPOWER Platforms
Douglas Miles (PGI Compilers & Tools)
High-performance computing (HPC) systems are now built around a de facto node architecture with high-speed latency-optimized SIMD-capable CPUs coupled to massively parallel bandwidth-optimized Accelerators. In recent years, as many as 90% of the Top ...Read More
High-performance computing (HPC) systems are now built around a de facto node architecture with high-speed latency-optimized SIMD-capable CPUs coupled to massively parallel bandwidth-optimized Accelerators. In recent years, as many as 90% of the Top 500 Computing systems relied entirely on x86 CPU-based systems. OpenPOWER and the increasing success of Accelerator-based systems offer an alternative that promises unrivaled multi-core CPU performance and closer coupling of CPUs and GPUs through technologies like NVIDIA's NVLink high-speed interconnect.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5688
Streaming:
 
Using NVM Express SSDs and CAPI to Accelerate Data-Center Applications in OpenPOWER Systems
Stephen Bates (PMC-Sierra)
NVM Express is a standards based method of communication with PCIe attached Non-Volatile Memory. An NVM Express open-source driver has been an integrated part of the Linux kernel since March 2012 (version 3.3) and allows for very high performance. Cu ...Read More
NVM Express is a standards based method of communication with PCIe attached Non-Volatile Memory. An NVM Express open-source driver has been an integrated part of the Linux kernel since March 2012 (version 3.3) and allows for very high performance. Currently there are NVM Express SSDs on the market that can achieve read speeds of over 3GB/s. We present results for a platform consisting of an NVM Express SSD, a CAPI accelerator card and a software stack running on a Power8 system. We show how the threading of the Power8 CPU can be used to move data from the SSD to the CAPI card at very high speeds and implement accelerator functions inside the CAPI card that can process the data at these speeds. We discuss several applications that can be serviced using this combination of NVMe SSD and CAPI.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5689
Streaming:
Download:
 
Life at the Intersection: OpenPOWER, Open Compute, and the Future of Cloud Software & Infrastructure
Aaron Sullivan (Rackspace)
Open hardware has the potential to disrupt the datacenter and the world of software development in very positive ways. OpenPOWER takes that potential a few steps further, both in the core system, and with technologies like CAPI. These innovations rai ...Read More
Open hardware has the potential to disrupt the datacenter and the world of software development in very positive ways. OpenPOWER takes that potential a few steps further, both in the core system, and with technologies like CAPI. These innovations raise the possibility of performance and efficiency improvements to a magnitude not seen for a long time. This talk will explore past experience and current impressions of someone who has done development work at the intersection of OpenStack and Open Compute for a few years. It will cover his experience working with teams building & integrating hardware and software, for large scale as-a-Service deployments of OpenStack Nova and Ironic on Open Compute hardware.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5690
Streaming:
 
Reflections on Migrating IBM APP Genomic Workflow Acceleration to IBM POWER8
Chandler Wilkerson (Rice University)
Migrating any workflow to a new hardware platform generates challenges and requires adaptability. With the transition from POWER7 to POWER8, the addition of PowerKVM obviates the need for VIOS and provides the opportunity to manage virtual machines o ...Read More
Migrating any workflow to a new hardware platform generates challenges and requires adaptability. With the transition from POWER7 to POWER8, the addition of PowerKVM obviates the need for VIOS and provides the opportunity to manage virtual machines on the POWER platform in a much more Linux-friendly manner. In addition, a number of changes to Red Hat's Enterprise Linux operating system between versions 6 and 7 (7 being required for full POWER8 support at the time of this project's start) have required modifying the standard processes outlined in the tested IBM solution. This presentation will take attendees through the growing pains and lessons learned while migrating a complex system to a new platform.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5691
Streaming:
 
Trusted Computing Applied in OpenPOWER Linux
Mao Qiu Yin (Teamsun), Zhiqiang Tian (Teamsun)
The computer system security problem is more and more emphasized by the Chinese government and it has created its own security standards. OpenPOWER as a new open platform, it urgently needs to achieve China's trusted computing security standard and ...Read More
The computer system security problem is more and more emphasized by the Chinese government and it has created its own security standards. OpenPOWER as a new open platform, it urgently needs to achieve China's trusted computing security standard and provides the prototype system that conforms to the specifications in order to satisfy the demands of the development of OpenPOWER ecosystem in China.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5692
Streaming:
 
Tyan OpenPOWER Products and Future Product Plans
Albert Mu (Tyan)
Introduce TYAN and brief on what contribution has been made to OpenPOWER community in the past twelve months. TYAN will also share the future product plan and associate milestones to the audiences. Invited to participate in OpenPOWER Foundation, TYAN ...Read More
Introduce TYAN and brief on what contribution has been made to OpenPOWER community in the past twelve months. TYAN will also share the future product plan and associate milestones to the audiences. Invited to participate in OpenPOWER Foundation, TYAN developed the OpenPOWER reference board following the spirit of innovation and collaboration that defines the OpenPOWER architecture. In addition, TYAN contribute the associate reference design to the community. In the presentation, TYAN would like to share our value proposition to the community and reveal future product plan and associate milestones to the audiences participating in the first OpenPOWER Summit 2015.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5693
Streaming:
 
Key-Value Store Acceleration with OpenPOWER
Michaela Blott (Xilinx)
Distributed key-value stores such as memcached form a critical middleware application within today's web infrastructure. However, typical x86-based systems yield limited performance scalability and high power consumption as their architecture with i ...Read More
Distributed key-value stores such as memcached form a critical middleware application within today's web infrastructure. However, typical x86-based systems yield limited performance scalability and high power consumption as their architecture with its optimization for single thread performance is not well-matched towards the memory-intensive and parallel nature of this application.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5694
Streaming:
 
FPGA Acceleration in a Power8 Cloud
Fei Chen (IBM, China Research Lab)
OpenStack is one of the popular software that people use to run a cloud. It managers hardware resources like memory, disks, X86 and POWER processors and then provide IaaS to users. Based on existing OpenStack, more kinds of hardware resource can also ...Read More
OpenStack is one of the popular software that people use to run a cloud. It managers hardware resources like memory, disks, X86 and POWER processors and then provide IaaS to users. Based on existing OpenStack, more kinds of hardware resource can also be managed by OpenStack and be provided to users, like GPU and FPGA. FPGA has been widely used for many kinds of applications, and POWER8 processor has integrated an innovated interface called CAPI (Coherent Accelerator Processor Interface) for direct connection between FPGA and POWER8 chip. CAPI not only provides low latency, high bandwidth and cache coherent interconnection between user's accelerator hardware and the application software, but also provides an easy programming capability for accelerator hardware developers.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5695
Streaming:
 
POWER8: The First OpenPOWER Processor
Michael Gschwind (IBM Systems & Technology Group)
The POWER8 processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM and the first processor supporting the new OpenPOWER software environment. Power8 was designed to deliver unprecedented performance for emerging worklo ...Read More
The POWER8 processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM and the first processor supporting the new OpenPOWER software environment. Power8 was designed to deliver unprecedented performance for emerging workloads, such as Business Analytics and Big Data applications, Cloud computing and Scale out Datacenter workloads. It is fabricated using IBM's 22-nm Silicon on Insulator (SOI) technology with layers of metal, and it has been designed to significantly improve both single-thread performance and single-core throughput over its predecessor, the POWER7i processor.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5696
Streaming:
 
Data Centric Interactive Visualization of Very Large Data
Bruce D'Amora (IBM T. J. Watson Research Center), Gordon Fossum (Thomas J. Watson Research Center)
The traditional workflow for high-performance computing simulation and analytics is to prepare the input data set, run a simulation, and visualize the results as a post-processing step. This process generally requires multiple computer systems design ...Read More
The traditional workflow for high-performance computing simulation and analytics is to prepare the input data set, run a simulation, and visualize the results as a post-processing step. This process generally requires multiple computer systems designed for accelerating simulation and visualization. In the medical imaging and seismic domains, the data to be visualized typically comprise uniform three-dimensional arrays that can approach tens of petabytes. Transferring this data from one system to another can be daunting and in some cases may violate privacy, security, and export constraints.  Back
 
Keywords:
OpenPOWER, Visualization - In-Situ & Scientific, Supercomputing, GTC 2015 - ID S5697
Streaming:
 
On Chip Controller (OCC)
Todd Rosedahl (IBM)
The On Chip Controller (OCC) is a co-processor that is embedded directly on the main processor die. The OCC can be used to control the processor frequency, power consumption, and temperature in order to maximize performance and minimize energy usage. ...Read More
The On Chip Controller (OCC) is a co-processor that is embedded directly on the main processor die. The OCC can be used to control the processor frequency, power consumption, and temperature in order to maximize performance and minimize energy usage. This presentation will include an overview of the power, thermal, and performance data that the OCC can access as well as the various control knobs, including adjusting the processor frequency and memory bandwidth. Details about the OCC processor, firmware structure, loop timings, off-load engines, and bus accesses will be given along with descriptions of example algorithms, system settings, and potential future enhancements.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5699
Streaming:
Download:
 
HPC Solution Stack on OpenPOWER
Jing Li (IBM, STG China)
This demo will show the capability of IBM OpenPOWER that can be the foundation of the complicated High Performance Computing complete solution. From the HPC cluster deployment, job scheduling, system management, application management to the science ...Read More
This demo will show the capability of IBM OpenPOWER that can be the foundation of the complicated High Performance Computing complete solution. From the HPC cluster deployment, job scheduling, system management, application management to the science computing workloads on top of them, all these components can be well constructed on top of IBM OpenPOWER platform with good usability and performance. Also this demo shows the simplicity of migrating a complete x86 based HPC stack to the OpenPOWER platform.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5700
Streaming:
Download:
 
Power 8 Microprocessor
Satish Kumar Sadasivam (IBM STG)
The primary objective of this presentation is to provide a performance evaluation methodology to the OpenPower user community to evaluate the performance using the advanced instrumentation capabilities available in the Power 8 Microprocessor. And als ...Read More
The primary objective of this presentation is to provide a performance evaluation methodology to the OpenPower user community to evaluate the performance using the advanced instrumentation capabilities available in the Power 8 Microprocessor. And also to present a case study on how CPI stack cycle accounting model can be effectively used to evaluate the performance of SPEC 2006 workloads in various SMT modes.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5701
Streaming:
 
SuperVessel: OpenPOWER R&D Cloud with Operation and Practice Experience Sharing
Yong Hua Lin (IBM Research)
SuperVessel cloud (www.ptopenlab.com) is the cloud platform built on top of POWER/OpenPOWER architecture technologies. It aims to provide the open remote access for all the ecosystem developers and university students. We (IBM Research China, IBM Sys ...Read More
SuperVessel cloud (www.ptopenlab.com) is the cloud platform built on top of POWER/OpenPOWER architecture technologies. It aims to provide the open remote access for all the ecosystem developers and university students. We (IBM Research China, IBM System Technology Lab in China and partners) have built and launched this cloud for more than 3 months, and rapidly attracted the public users from more than 30 universities, including those from GCG and the United States.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5702
Streaming:
 
Accelerated Photodynamic Cancer Therapy Planning with FullMonte on OpenPOWER
Jeffrey Cassidy (University of Toronto)
Photodynamic therapy (PDT) is a minimally-invasive cancer therapy which uses a light-activated drug (photosensitizer/PS). When the photosensitizer absorbs a photon, it excites tissue oxygen into a reactive state which causes very localized cell damag ...Read More
Photodynamic therapy (PDT) is a minimally-invasive cancer therapy which uses a light-activated drug (photosensitizer/PS). When the photosensitizer absorbs a photon, it excites tissue oxygen into a reactive state which causes very localized cell damage. The light field distribution inside the tissue is therefore one of the critical parameters determining the treatment's safety and efficacy. While FDA-approved and used for superficial indications, PDT has yet to be widely adopted for interstitial use for larger tumours using light delivered by optical fibres due to a lack of simulation and planning optimization software.  Back
 
Keywords:
OpenPOWER, Life & Material Science, Supercomputing, GTC 2015 - ID S5703
Streaming:
 
System Management Tool for OpenPOWER
Song Yu (IBM STG China), Li Guang Cheng (IBM STG China), Ma Yuan Liang (Teamsun), Chen Qing Hong (Teamsun)
OpenPOWER is a new generation platform. As a new system, the infrastructure level management is the most important requirement while the OpenPOWER machines are widely used in cloud area and non-cloud area.In cloud area, the end user normally cares ab ...Read More
OpenPOWER is a new generation platform. As a new system, the infrastructure level management is the most important requirement while the OpenPOWER machines are widely used in cloud area and non-cloud area.In cloud area, the end user normally cares about the SaaS or PaaS but, for the cloud admin, they must consider how to manage the OpenPOWER physical node to provide service. The cloud admin must also quickly and automatically provision physical machines and physical nodes into the cloud in order to provide service. How to self-service for physical node is a new challenge in public cloud.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5704
Streaming:
Download:
 
Data Center and Cloud Computing Market Landscape and Challenges
Manoj Roge (Xilinx)
In this talk, we will gain an understanding of Data center and Cloud computing market landscape and challenges, discuss technology challenges that limit scaling of cloud computing that is growing at an exponential pace and wrap up with insights into ...Read More
In this talk, we will gain an understanding of Data center and Cloud computing market landscape and challenges, discuss technology challenges that limit scaling of cloud computing that is growing at an exponential pace and wrap up with insights into how FPGAs combined with general purpose processors are transforming next generation data centers with tremendous compute horsepower, low-latency and extreme power efficiency.  Back
 
Keywords:
OpenPOWER, Data Center, Cloud Computing & HPC, Supercomputing, GTC 2015 - ID S5705
Streaming:
Download:
 
Power and Speed: Maximizing Application Performance on IBM Power Systems with XL C/C++ Compiler
Yaoqing Gao (IBM Canada)
This presentation will provide the latest news on IBM's compilers on Power. The major features to enhance portability such as improved standards compliance and gcc compiler source code and option compatibility will be presented. The presentation wil ...Read More
This presentation will provide the latest news on IBM's compilers on Power. The major features to enhance portability such as improved standards compliance and gcc compiler source code and option compatibility will be presented. The presentation will also cover performance tuning and compiler optimization tips to maximize workload performance on IBM Power Systems including exploitation of the POWER8 processor and architecture.  Back
 
Keywords:
OpenPOWER, Developer - Performance Optimization, Supercomputing, GTC 2015 - ID S5706
Streaming:
 
XL C/C++ and GPU Programming on Power Systems
Kelvin Li (IBM)
The OpenPOWER foundation is an organization with a mandate to enable member companies to customize the POWER CPU processors and system platforms for optimization and innovation for their business needs. One such customization is the integration of gr ...Read More
The OpenPOWER foundation is an organization with a mandate to enable member companies to customize the POWER CPU processors and system platforms for optimization and innovation for their business needs. One such customization is the integration of graphics processing unit (GPU) technology with the POWER processor. IBM has recently announced the IBM POWER S824L system, a data processing powerhouse that integrates the nVidia Tesla GPU with IBM's POWER8 processor. This joint presentation with nVidia and IBM will contain details of the S824L System, including an overview of the Tesla GPU and how it interoperates with the POWER8 processor. It will also describe the nVidia software stack and how it works with the POWER8 compilers.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5707
Streaming:
 
China POWER Technology Alliance (CPTA)
Zhu YaDong (SuZhou PowerCore)
The objective is to position China POWER Technology Alliance (CPTA) as a mechanism to help global OpenPOWER Foundation members engage with China organizations on POWER-based implementations in China. ...Read More
The objective is to position China POWER Technology Alliance (CPTA) as a mechanism to help global OpenPOWER Foundation members engage with China organizations on POWER-based implementations in China.  Back
 
Keywords:
OpenPOWER, Supercomputing, GTC 2015 - ID S5708
Streaming:
 
Introduction to OPAL: the OpenPower Abstraction Layer
Stewart Smith (IBM)
A tour of the boot and runtime components of OpenPower firmware. A tour through the boot process, skiboot (boot and runtime), the petitboot bootloader and where OEM customizations are possible. ...Read More
A tour of the boot and runtime components of OpenPower firmware. A tour through the boot process, skiboot (boot and runtime), the petitboot bootloader and where OEM customizations are possible.  Back
 
Keywords:
OpenPOWER, GTC 2015 - ID S5791
Streaming:
 
Customizing and Contributing to OPAL: The OpenPower Abstraction Layer
Stewart Smith (IBM)
An overview of how to build and release OPAL for OEMs, where we will go over the OPAL development and release processes and cover how to work with upstream. This session is useful for OEMs and those deploying POWER systems who may want to customize t ...Read More
An overview of how to build and release OPAL for OEMs, where we will go over the OPAL development and release processes and cover how to work with upstream. This session is useful for OEMs and those deploying POWER systems who may want to customize their firmware.  Back
 
Keywords:
OpenPOWER, GTC 2015 - ID S5792
Streaming:
 
OpenPOWER ISV Roundtable Event
Randall Ross (Canonical)
Canonical* (the company behind Ubuntu), is pleased to host the upcoming OpenPOWER Independent Software Vendor (ISV) Round-table. This event is your chance to meet OpenPOWER member organizations that are creating solutions that harness the unique capa ...Read More
Canonical* (the company behind Ubuntu), is pleased to host the upcoming OpenPOWER Independent Software Vendor (ISV) Round-table. This event is your chance to meet OpenPOWER member organizations that are creating solutions that harness the unique capabilities of the OpenPOWER architecture in game-changing ways. The session will be hosted in a "birds of a feather" style. There will be brief presentations and demos by OpenPOWER members that showcase existing OpenPOWER based solutions, less formal "lightning talks", and a facilitated round-table discussion to explore future solutions that would be a natural fit for OpenPOWER and for your business.  Back
 
Keywords:
OpenPOWER, GTC 2015 - ID S5794
Streaming:
 
OpenPOWER Firmware Training Lab
Patrick Williams (IBM)
Architectural overview, selected deep-dive,and hands-on Lab to learn building, modifying, and testing. ...Read More
Architectural overview, selected deep-dive,and hands-on Lab to learn building, modifying, and testing.  Back
 
Keywords:
OpenPOWER, Embedded Systems, Education & Training, GTC 2015 - ID S5795
Streaming:
 
DB2 BLU w/GPU Demo: Concurrent Execution of an Analytical Workload on a POWER8 Server with K40 GPUs
Sina Meraji, PhD (IBM), Berni Schiefer (IBM)
In this technology preview demonstration, we will show the concurrent execution of an analytical workload on a POWER8 server with K40 GPUs. DB2 will detect both the presence of GPU cards in the server and the opportunity in queries to shift the proce ...Read More
In this technology preview demonstration, we will show the concurrent execution of an analytical workload on a POWER8 server with K40 GPUs. DB2 will detect both the presence of GPU cards in the server and the opportunity in queries to shift the processing of certain core operations to the GPU. The required data will be copied into the GPU memory, the operation performed and the results sent back to the P8 processor for any remaining processing. The objective is to 1) reduce the elapsed time for the operation and 2) Make more CPU available to other SQL processing and increase overall system throughput by moving intensive CPU processing tasks to GPU.  Back
 
Keywords:
OpenPOWER, GTC 2015 - ID S5835
Streaming:
Download:
 
OpenPOWER Firmware Training Lab
Patrick Williams (IBM)
Architectural overview, selected deep-dive,and hands-on Lab to learn building, modifying, and testing. ...Read More
Architectural overview, selected deep-dive,and hands-on Lab to learn building, modifying, and testing.  Back
 
Keywords:
OpenPOWER, Embedded Systems, Education & Training, GTC 2015 - ID S5859
Streaming:
 
IBM
Srini Chari (Cabot Partners)
 
Keywords:
OpenPOWER, GTC 2015 - ID S5922
Streaming:
 
Present and Future Leadership Computers at Oak Ridge National Laboratory
Jack Wells (Oak Ridge National Laboratory)
Pending ...Read More
Pending  Back
 
Keywords:
OpenPOWER, GTC 2015 - ID S5923
Streaming:
Real-Time Graphics
Presentation
Media
GPU-Driven Large Scene Rendering in OpenGL
Christoph Kubisch (NVIDIA), Pierre Boudier (NVIDIA)
We will present the latest OpenGL technology from NVIDIA (NV_command_list) and rendering algorithms to render large scenes, typically found in CAD/DCC applications. Through the use of new powerful OpenGL extensions, the GPU can be leveraged very effi ...Read More
We will present the latest OpenGL technology from NVIDIA (NV_command_list) and rendering algorithms to render large scenes, typically found in CAD/DCC applications. Through the use of new powerful OpenGL extensions, the GPU can be leveraged very efficiently to do more work autonomously of the CPU. We provide algorithms and usage scenarios for scenes made out of many parts (millions) including GPU creating its own work for rendering (occlusion culling) and transformation updates. The data management allows to minimize data transfers and a high flexibility to make changes to the scene, so that interactive editing and viewing of large data sets is possible.  Back
 
Keywords:
Real-Time Graphics, Developer - Performance Optimization, Developer - Algorithms, Rendering & Ray Tracing, GTC 2015 - ID S5135
Streaming:
Download:
 
Nvpro-Pipeline: A Research Rendering Pipeline
Markus Tavenrath (NVIDIA)
Nvpro-pipeline is a research rendering pipeline based on SceniX featuring a scene graph, an effect system including support for OIT algorithms, a xbar which generates a flat list of objects to render, a frustum culling system, and RiX as rendering ba ...Read More
Nvpro-pipeline is a research rendering pipeline based on SceniX featuring a scene graph, an effect system including support for OIT algorithms, a xbar which generates a flat list of objects to render, a frustum culling system, and RiX as rendering backend which supports several OpenGL techniques to keep the CPU cost of rendering as minimal as possible. This talk will present the different modules of the pipeline and some of the implementation details.  Back
 
Keywords:
Real-Time Graphics, Developer - Performance Optimization, Rendering & Ray Tracing, GTC 2015 - ID S5148
Streaming:
Download:
 
GPU-Based Scene Generation for Flight Simulation
Tim Woodard (Diamond Visionics)
Flight simulation is incredibly demanding from a graphics perspective. Both fidelity and performance are of utmost importance. By leveraging modern GPU capabilities, it is now possible to greatly increase both performance and fidelity by orders of ma ...Read More
Flight simulation is incredibly demanding from a graphics perspective. Both fidelity and performance are of utmost importance. By leveraging modern GPU capabilities, it is now possible to greatly increase both performance and fidelity by orders of magnitude when compared to traditional scene-graph approaches. Furthermore, both significant consolidation of hardware and distributed rendering are now possible, greatly simplifying large-scale simulator facility design and maintenance. Learn how modern GPU-based approaches are being utilized to provide high-quality training for today's pilots.  Back
 
Keywords:
Real-Time Graphics, Visualization - Large Scale & Multi-Display, Developer - Algorithms, GTC 2015 - ID S5204
Streaming:
Download:
 
Displacement Mapping: a New Method to Achieve Realistic Geo-Specific Feature Densities
Brett Chladny (Renaissance Sciences Corporation)
Explore new techniques in identifying, representing, and rendering realistic number of geo-specifically placed features in synthetic environments. Adding 3D models to synthetic environments greatly enhance visual cues that enable the perception of de ...Read More
Explore new techniques in identifying, representing, and rendering realistic number of geo-specifically placed features in synthetic environments. Adding 3D models to synthetic environments greatly enhance visual cues that enable the perception of depth, motion, and realism. However, constraints in hardware performance and budgets often limit the amount of 3D features in the scene. This session presents an innovative automated process that leverages geospatial data sources and GPU tessellation technologies to inject realistic numbers of features. A framework for extracting feature information from commonly available data will be discussed. We will also explore a new a library that uses the power of modern GPUs to achieve near constant rendering performance regardless of feature density.  Back
 
Keywords:
Real-Time Graphics, Visualization - Large Scale & Multi-Display, GTC 2015 - ID S5235
Streaming:
Download:
 
Dense 3D Culture Rendering Using NVIDIA Solutions in Immersive Fast-Jet Simulators
William Paone (Boeing)
NVIDIA solutions used in immersive visual systems for flight simulation have allowed image generators to render complex scenes. This includes dense 3D terrain culture including buildings, trees, roads and towers. When rendered with high resolution ph ...Read More
NVIDIA solutions used in immersive visual systems for flight simulation have allowed image generators to render complex scenes. This includes dense 3D terrain culture including buildings, trees, roads and towers. When rendered with high resolution photo-specific imagery, this culture improves low to medium altitude flight realism and situational awareness. However, adding dense culture to the scene over already complex terrain skin rendering creates heavy stresses on the system and GPUs. For immersive image generator system design, new NVIDIA technology has allowed these designs to be scaled to manageable and deliverable sizes. This talk will discuss how the GPU roadmap has allowed this to happen with low level flight examples, and source types that are being used for urban rendering.  Back
 
Keywords:
Real-Time Graphics, Visualization - Large Scale & Multi-Display, Manufacturing, GTC 2015 - ID S5258
Streaming:
Download:
 
Slicing the Workload: Multi-GPU Rendering Approaches
Ingo Esser (NVIDIA)
Since modern workstation applications become less CPU bound due to more efficient rendering pipelines, the GPU can become the bottleneck in a system and multi-GPU rendering can become an option to further speed up the rendering process. The first par ...Read More
Since modern workstation applications become less CPU bound due to more efficient rendering pipelines, the GPU can become the bottleneck in a system and multi-GPU rendering can become an option to further speed up the rendering process. The first part of this talk will show the available tools for multi-gpu programming, including GPU-bound OpenGL contexts and functionality for synchronization and data transfer. The second part will dive into the details of designing a multi-threaded rendering pipeline which can be used to split up and distribute rendering tasks across a set of GPUs. Several split approaches and their resulting scaling behavior will be presented and discussed.  Back
 
Keywords:
Real-Time Graphics, Media & Entertainment, Rendering & Ray Tracing, GTC 2015 - ID S5291
Streaming:
 
High-Quality Rasterization
Chris Wyman (NVIDIA)
We describe three new rendering algorithms that rasterize many samples per pixel, taking advantage of Maxwell GPU features to make images that are sharper and less aliased. "ACAA" is a simple variation of MSAA that uses less memory. "A ...Read More
We describe three new rendering algorithms that rasterize many samples per pixel, taking advantage of Maxwell GPU features to make images that are sharper and less aliased. "ACAA" is a simple variation of MSAA that uses less memory. "AGAA" brings MSAA quality to deferred rendering, while shading less than twice per pixel. And thirdly, "FTIZB" renders alias-free hard shadows with 32 samples per pixel at real-time speeds.  Back
 
Keywords:
Real-Time Graphics, Media & Entertainment, Rendering & Ray Tracing, Video & Image Processing, GTC 2015 - ID S5442
Streaming:
Download:
 
Massively-Parallel Vector Graphics
Diego Nehab (IMPA)
In this talk, we will describe the first massively-parallel vector graphics rendering pipeline. Traditional rendering methods draw shapes one after the other into an output image, or use sequential algorithms to build acceleration data structures bef ...Read More
In this talk, we will describe the first massively-parallel vector graphics rendering pipeline. Traditional rendering methods draw shapes one after the other into an output image, or use sequential algorithms to build acceleration data structures before rendering all pixels in parallel. We present an acceleration data structure that can be built efficiently and in parallel for all input segments. We also show how share samples between pixels in parallel to enable production-quality antialiasing filters and a large number of samples per pixel. The pipeline is feature-rich, and renders complex vector graphics in state-of-the-art quality and performance. The talk will be particularly interesting to researchers and developers that deal with rendering of complex 2D content.  Back
 
Keywords:
Real-Time Graphics, Developer - Algorithms, GTC 2015 - ID S5578
Streaming:
Download:
 
WarThunder: Bringing WaveWorks Online
Tim Tcheblokov (NVIDIA)
Developing a game that works across multiple APIs and platforms and pushes graphics quality at the same time is no small feat. In this talk, we'll review how NVIDIA WaveWorks helped bring the graphics quality of WarThunder to the next level, fully u ...Read More
Developing a game that works across multiple APIs and platforms and pushes graphics quality at the same time is no small feat. In this talk, we'll review how NVIDIA WaveWorks helped bring the graphics quality of WarThunder to the next level, fully utilizing all capabilities of the PC platform.  Back
 
Keywords:
Real-Time Graphics, Developer - Algorithms, Game Development, GTC 2015 - ID S5669
Streaming:
Download:
 
NVIDIA VXGI: Dynamic Global Illumination for Games
Alexey Panteleev (NVIDIA)
VXGI is the new real-time dynamic global illumination technology from NVIDIA that can completely change the way that games look. We'll demonstrate the possibilities it provides, describe what is required to use VXGI in a rendering engine, and talk a ...Read More
VXGI is the new real-time dynamic global illumination technology from NVIDIA that can completely change the way that games look. We'll demonstrate the possibilities it provides, describe what is required to use VXGI in a rendering engine, and talk about the basics of the algorithm that is applied to compute indirect illumination, along with the limitations of this algorithm. We'll also show some techniques that you can use with VXGI's custom voxelization and cone tracing shaders.  Back
 
Keywords:
Real-Time Graphics, Game Development, GTC 2015 - ID S5670
Streaming:
Download:
 
Khronos API Standards Update: Including Vulkan, OpenCL 2.1 and SPIR-V
Neil Trevett (NVIDIA)
Discover how over 120 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding compute, graphics and vision applications. This session includes the v ...Read More
Discover how over 120 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding compute, graphics and vision applications. This session includes the very latest updates, including the newly announced Vulkan, SPIR-V and OpenCL 2.1 specifications.  Back
 
Keywords:
Real-Time Graphics, Web Acceleration, Computer Vision & Machine Vision, GTC 2015 - ID S5734
Streaming:
 
NVIDIA Turf Effects: Massive Grass Rendering With Dynamic Simulation
Evgeny Makarov (NVIDIA)
Rendering massive amount of grass is a challenging task. You have to deal with high geometry complexity and massive overdraw. Irregular grass blades distribution and assets variation requires efficient memory management and API state changes. Convinc ...Read More
Rendering massive amount of grass is a challenging task. You have to deal with high geometry complexity and massive overdraw. Irregular grass blades distribution and assets variation requires efficient memory management and API state changes. Convincing simulation is also required to achieve plausible rendering results. This talk focuses on the key aspects of the Turf Effects library which provides a scalable solution for massive grass rendering. It shows how HW tessellation can be paired with geometrical representation of individual grass blades to support dense grass rendering with continues level of detail and at a various scale. Pure geometrical representation also plays an important role in advanced physical interaction which includes collision with dynamic scene objects.  Back
 
Keywords:
Real-Time Graphics, Game Development, GTC 2015 - ID S5748
Streaming:
 
New GPU Features of NVIDIA's Maxwell Architecture
Alexey Panteleev (NVIDIA)
NVIDIA's GeForce® GTX 900-series GPUs, powered by NVIDIA Maxwell architecture, are the most power-efficient graphics cards on the planet. But Maxwell is also a trove of new and exciting graphics features that can be used to implement effects and ...Read More
NVIDIA's GeForce® GTX 900-series GPUs, powered by NVIDIA Maxwell architecture, are the most power-efficient graphics cards on the planet. But Maxwell is also a trove of new and exciting graphics features that can be used to implement effects and techniques not previously possible. In this talk, we'll discuss new functionality enabled by the Maxwell architecture, and examine practical ways to use those features.  Back
 
Keywords:
Real-Time Graphics, GTC 2015 - ID S5752
Streaming:
Download:
 
Delivering Workstation Class Graphics Anywhere with HP Remote Graphics Software (Presented by HP)
Annika Muehlbradt (HP)
HP Remote Graphics Software enables instant, secure access to graphics-rich application anywhere. With native Windows and Linux support, built in collaboration functionality, rock solid performance, and the ease and simplicity of deployment, HP RGS i ...Read More
HP Remote Graphics Software enables instant, secure access to graphics-rich application anywhere. With native Windows and Linux support, built in collaboration functionality, rock solid performance, and the ease and simplicity of deployment, HP RGS is the go-to remote protocol for workstation class users. Join us for an interactive discussion highlighting the benefits of remote workstation access and a live demonstration of the latest software innovations in 3D graphics remoting.  Back
 
Keywords:
Real-Time Graphics, Graphics Virtualization, Media & Entertainment, GTC 2015 - ID S5824
Streaming:
 
Better Decisions in Moments of Truth with DTI 3D (Presented by DTI)
Tom Curtin (Dimension Technologies Inc (DTI))
Can 3D save lives? NASA believes it can. That's why they have been working with Dimension Technologies Inc. (DTI) on a glasses-free 3D display that can provide pilots and co-pilots with real-time information in 3D to improve their situational awaren ...Read More
Can 3D save lives? NASA believes it can. That's why they have been working with Dimension Technologies Inc. (DTI) on a glasses-free 3D display that can provide pilots and co-pilots with real-time information in 3D to improve their situational awareness. The goal is to make information easier and faster to comprehend, and improve decision making especially in moments of crisis when lives are on the line. This presentation will show you how DTI combines its unique Time Multiplexed Backlight integrated with an eye-tracking camera and a high-powered GPU. The result is a cockpit display that delivers full-resolution mission-critical 3D to two viewers at the same time -- without glasses, without restrictions in head movement, without moirés, and without compromising 2D image quality.  Back
 
Keywords:
Real-Time Graphics, Big Data Analytics, Emerging Companies Summit, Media & Entertainment, GTC 2015 - ID S5836
Streaming:
Download:
Rendering & Ray Tracing
Presentation
Media
GPU-Accelerated Spectral Caustic Rendering of Homogeneous Caustic Objects
Budianto Tandianus (Nanyang Technological University)
We propose a two-step acceleration scheme for spectral caustics rendering that takes into account information across visible wavelengths of the scene, index of refraction (caustic object), light power, and material re?ectance (surface). In the first ...Read More
We propose a two-step acceleration scheme for spectral caustics rendering that takes into account information across visible wavelengths of the scene, index of refraction (caustic object), light power, and material re?ectance (surface). In the first step, we analyze the index of refraction and we cluster the wavelengths based on refraction direction similarity in order to reduce the intersection tests. In the second step, we consider the surrounding objects properties (material re?ectance and light power) and we compute the re?nement amount of each wavelength cluster. Our accelerated algorithm can produce rendering results close to the reference images with a signi?cant acceleration. We implement our two acceleration schemes by using OptiX, a GPU rendering engine built on top of CUDA.  Back
 
Keywords:
Rendering & Ray Tracing, GTC 2015 - ID S5210
Streaming:
Download:
 
A Feasibility Study of Ray Tracing on Mobile GPUs
Yangdong Deng (Tsinghua University)
Ray tracing is considered to be a promising technology for enhancing visual experience of future graphics applications. This work investigates the feasibility of ray tracing on mobile GPUs exemplified by Tegra and PowerVR series. A ray tracer was dev ...Read More
Ray tracing is considered to be a promising technology for enhancing visual experience of future graphics applications. This work investigates the feasibility of ray tracing on mobile GPUs exemplified by Tegra and PowerVR series. A ray tracer was developed by integrating state-of-the-art construction and traversal algorithms. We then performed a detailed characterization of the ray tracing workload in terms of runtime, memory usage, and power consumption on both NVIDIA Tegra K1 and PowerVR SGX 544-MP3 GPUs. The results are compared against mobile CPU and desktop GPU implementations. It is proved that the Tegra K1 GPU already allows constructing the acceleration structure of 1M-triangle scene in ~100ms and performing traversal at a throughput of up to 70 million rays per second.  Back
 
Keywords:
Rendering & Ray Tracing, Embedded Systems, Real-Time Graphics, GTC 2015 - ID S5214
Streaming:
 
VMD: Publication-Quality Ray Tracing of Molecular Graphics with OptiX
John Stone (University of Illinois at Urbana-Champaign)
This session will describe the adaptation of the popular molecular graphics program VMD to support both batch and interactive ray tracing using NVIDIA OptiX, on computers ranging from laptops all the way up to large scale Cray XK7 supercomputers such ...Read More
This session will describe the adaptation of the popular molecular graphics program VMD to support both batch and interactive ray tracing using NVIDIA OptiX, on computers ranging from laptops all the way up to large scale Cray XK7 supercomputers such as Blue Waters and Titan. We will describe the benefits of custom VMD-specific geometric primitives and memory layouts, and relate our experiences adapting the Tachyon CPU-based ray tracing engine used by VMD, to NVIDIA's OptiX GPU ray tracing framework. The session will present performance data for workstation and supercomputer class visualizations, integration of OptiX into VMD, interactive ray tracing, many example movies and visualizations, and avenues for further improvement.  Back
 
Keywords:
Rendering & Ray Tracing, Visualization - In-Situ & Scientific, Media & Entertainment, GTC 2015 - ID S5386
Streaming:
Download:
 
Custom Iray Applications and MDL for Consistent Visual Appearance Throughout Your Pipeline.
Dave Hutchinson (Lightworks), Dave Coldron (Lightworks)
Take a tour through the possibilities that Iray physically based visualization and GPU scaling can unlock for your interactive photoreal applications and workflows. We demonstrate how Iray features and technology can be integrated and exposed within ...Read More
Take a tour through the possibilities that Iray physically based visualization and GPU scaling can unlock for your interactive photoreal applications and workflows. We demonstrate how Iray features and technology can be integrated and exposed within your existing digital tools, like the new breakthrough Iray+ for 3DSMax plug-in. Iray can enable your entire workflow from design and validation through marketing and consumer experiences with the same consistent photorealistic MDL powered visualization. Whether you want to build custom standalone applications and integrations, or use remote visualization to enable mobile, collaborative or cloud workflows, you will leave this presentation with a very clear view on what your next steps need to be to achieve your Iray goals.  Back
 
Keywords:
Rendering & Ray Tracing, Developer - Tools & Libraries, Manufacturing, Media & Entertainment, GTC 2015 - ID S5409
Streaming:
 
Accelerad: Daylight Simulation for Architectural Spaces Using GPU Ray Tracing
Nathaniel Jones (MIT)
This talk introduces Accelerad, a simulation tool for modeling naturally and artificially lit spaces using NVIDIA® OptiX ray tracing engine. Three challenges encountered in implementing physically-based ray tracing on the GPU are presented: (1) ...Read More
This talk introduces Accelerad, a simulation tool for modeling naturally and artificially lit spaces using NVIDIA® OptiX ray tracing engine. Three challenges encountered in implementing physically-based ray tracing on the GPU are presented: (1) the need for large numbers of bounces which leads to poor warp coherence; (2) the use of irradiance caching that does not naturally lend itself to parallelism, and (3) the need for validation against physical measurement. The solutions implemented in Accelerad are described, along with test results which show that Accelerad achieves accuracy comparable to current best simulation practices in the building industry while running at speeds up to fifty times faster.  Back
 
Keywords:
Rendering & Ray Tracing, Visualization - In-Situ & Scientific, Computational Physics, GTC 2015 - ID S5416
Streaming:
Download:
 
Browser-Based 3D Presentation Platform for AEC & MFG Made Easy (Presented by CL3VER)
Viktor Nordstrom (CL3VER)
In this session, you will learn how a cloud based authoring platform is used to create immersive, interactive 3D presentations for the web and mobile devices using existing CAD & Multi-media data. CL3VER presentations are completely interactive a ...Read More
In this session, you will learn how a cloud based authoring platform is used to create immersive, interactive 3D presentations for the web and mobile devices using existing CAD & Multi-media data. CL3VER presentations are completely interactive and NVIDIA GPU accelerated, enabling architects and manufacturers to engage new clients and stakeholders in a very compelling and immersive environment. Attendees will learn about case studies of real world deployments across various Industry applications and the challenges faced when deploying this technology. If you're looking to create awesome 3D Projects on any desktop browser or tablet, this session is a must see.  Back
 
Keywords:
Rendering & Ray Tracing, Emerging Companies Summit, Manufacturing, Real-Time Graphics, GTC 2015 - ID S5812
Streaming:
 
Sharing Physically Based Materials Between Renderers with MDL
Jan Jordan (NVIDIA), Lutz Kettner (NVIDIA)
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based def ...Read More
The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based definitions can be defined while developers will learn what's entailed in supporting MDL within their own product/renderer.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Media & Entertainment, GTC 2015 - ID S5190
Streaming:
Download:
 
Innovations in OptiX
David McAllister (NVIDIA)
OptiX is the industry's premier ray tracing engine in terms of performance, functionality, and adoption. We will present three recent advances in OptiX. First, the renovation of the core of OptiX, including using an LLVM-based compiler pipeline, whi ...Read More
OptiX is the industry's premier ray tracing engine in terms of performance, functionality, and adoption. We will present three recent advances in OptiX. First, the renovation of the core of OptiX, including using an LLVM-based compiler pipeline, which brings several performance benefits and opens the door for long-desired new features. Second, the OptiX VCA allows OptiX-based applications to transparently use NVIDIA Visual Computing Appliance for massively parallel, shared, remote rendering. Third, we will share exciting results of our top partners and their recent successes with OptiX.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Media & Entertainment, GTC 2015 - ID S5246
Streaming:
Download:
 
NVIDIA Material Definition Language: A Sneak Peek at the MDL Handbook
Andy Kopra (NVIDIA)
This tutorial will describe the MDL Handbook currently in development by NVIDIA, the Handbook's strategy for explaining appearance models as used in rendering software, and how you can use the current Handbook materials now available through the Web ...Read More
This tutorial will describe the MDL Handbook currently in development by NVIDIA, the Handbook's strategy for explaining appearance models as used in rendering software, and how you can use the current Handbook materials now available through the Web to begin to learn MDL. The Handbook is designed for an audience of varying skill sets and experience, providing historical and technical background for important topics in physically based rendering, as well as practical examples of MDL code that will be useful in actual design and production environments. Artists, designers, and software engineers will find this tutorial a useful first look at how MDL describes physical appearance and the implications it can have on their working methods and final products.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Media & Entertainment, GTC 2015 - ID S5303
Streaming:
Download:
 
Bringing Physically Based Rendering to your Application
Martin-Karl Lefrancois (NVIDIA)
A study in how key applications have incorporated NVIDIA® Iray® and how you can do the same. The presentation begins with an overview of NVIDIA® Iray®. We then will show examples of NVIDIA® Iray® integrations, an overview of v ...Read More
A study in how key applications have incorporated NVIDIA® Iray® and how you can do the same. The presentation begins with an overview of NVIDIA® Iray®. We then will show examples of NVIDIA® Iray® integrations, an overview of various supported geometry, what is MDL and rendering using NVIDIA® Visual Computing Appliance. Come to this session to learn how to create a custom rendering experience that fits your application.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Media & Entertainment, GTC 2015 - ID S5536
Streaming:
Download:
 
Advanced Rendering Solutions from NVIDIA
Phillip Miller (NVIDIA)
Learn about the latest breakthroughs and offerings in NVIDIA's Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will be ...Read More
Learn about the latest breakthroughs and offerings in NVIDIA's Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will be explored and demonstrated, along with what's possible with the latest in NVIDIA OptiX for accelerating custom ray tracing development. Industry trends and production examples will also be explored as advanced in both interactive and production rendering possibilities continue to revolutionize workflows.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Media & Entertainment, GTC 2015 - ID S5643
Streaming:
 
Flexible Cluster Rendering with NVIDIA VCA
Phillip Miller (NVIDIA), Ankit Patel (NVIDIA)
Learn how NVIDIA Visual Computing Appliances (VCA) are enabling a wide variety of rendering solutions to scale across hundreds of GPUs and stream their results back for interactive sessions of unprecedented performance. Commercial solutions employing ...Read More
Learn how NVIDIA Visual Computing Appliances (VCA) are enabling a wide variety of rendering solutions to scale across hundreds of GPUs and stream their results back for interactive sessions of unprecedented performance. Commercial solutions employing Iray, VRay-RT, and OptiX will all be shown working with a remote cluster of VCAs. The mechanics of supporting the VCA from applications, managing clusters, and possibilities for streaming will also be explored.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Visualization - Large Scale & Multi-Display, GTC 2015 - ID S5644
Streaming:
 
Easy Photorealism with NVIDIA Iray
Phillip Miller (NVIDIA)
Come learn how you can create stunning photorealistic imagery and animations with interactive ease by employing Iray within your favorite 3D applications. A full spectrum of Iray possibilities will be discussed working within tools like 3d Max, Cinem ...Read More
Come learn how you can create stunning photorealistic imagery and animations with interactive ease by employing Iray within your favorite 3D applications. A full spectrum of Iray possibilities will be discussed working within tools like 3d Max, Cinema4D, Maya, Revit, and Rhino each unfolding the latest capabilities of the new Iray 2015 framework. Distributed rendering to local machines or powerful VCA clusters will also be explored. The cross-use of material and light descriptions between applications will also be demonstrated.  Back
 
Keywords:
Rendering & Ray Tracing, Product Design & Styling, Media & Entertainment, GTC 2015 - ID S5645
Streaming:
Signal & Audio Processing
Presentation
Media
Implementing Radar Algorithms on CUDA Hardware
Pietro Monsurro (University of Rome "Sapienza")
This talk investigates the implementation of radar algorithms on GPUs. The focus is on electronically scanned search radars. GPUs enable us to develop high performance digital processing systems with limited development time. It is possible to employ ...Read More
This talk investigates the implementation of radar algorithms on GPUs. The focus is on electronically scanned search radars. GPUs enable us to develop high performance digital processing systems with limited development time. It is possible to employ a single commercial board to perform all the algorithms of a search radar including downconversion, amplitude/phase correction, pulse compression, beam forming, spectrum analysis, and CFAR noise floor estimation.  Back
 
Keywords:
Signal & Audio Processing, Developer - Performance Optimization, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5270
Streaming:
Download:
 
Memory-Efficient Heterogeneous Speech Recognition Hybrid in the GPU-Equipped Mobile Devices
Alexei V. Ivanov (Verbumware Inc.)
Weighted Finite State Transducer (WFST)-based speech recognition systems permit their efficient implementation within the GPU computational paradigm. Our previous research has shown that speech recognition with GPUs can be done in a fast, accurate an ...Read More
Weighted Finite State Transducer (WFST)-based speech recognition systems permit their efficient implementation within the GPU computational paradigm. Our previous research has shown that speech recognition with GPUs can be done in a fast, accurate and power-efficient manner. However, completely compiled non-trivial WFSTs are too bulky to fit into a memory footprint of a typical mobile device. This problem represents the most fundamental obstacle in front of proliferation of the autonomous mobile-based speech recognition technology. In this presentation we're going to demonstrate a way to overcome this difficulty. A Tegra K1 device equipped with 2 GB of RAM will do autonomous recognition of English speech in a mid-sized vocabulary (20K words) task, defined by a tri-gram language model.  Back
 
Keywords:
Signal & Audio Processing, Machine Learning & Deep Learning, GTC 2015 - ID S5296
Streaming:
Download:
 
GPU-Based GPS Signal Generator: Low Cost and High Bandwidth Alternative
Iva Bartunkova (University of Federal Armed Forces Munich)
GPS, Galileo and other GNSS signal generation on GPU can be a new low cost alternative to standard signal simulators used for verification and testing of GPS receivers. Standard simulators generate multiple narrow band signals with dedicated hardware ...Read More
GPS, Galileo and other GNSS signal generation on GPU can be a new low cost alternative to standard signal simulators used for verification and testing of GPS receivers. Standard simulators generate multiple narrow band signals with dedicated hardware for synchronization and up-conversion. This presentation introduces a new concept of broadband signal generation covering all GPS bands in one stream. Implementation was done on gaming level PC system with two GeForce GPUs and reached performance of 1 GHz sample rate for 4 GNSS service bands. Algorithms for digital signal generation of GPS and Galileo signals will be presented together with techniques for transfer of generated data to reach real-time performance and high signal quality.  Back
 
Keywords:
Signal & Audio Processing, Developer - Performance Optimization, GTC 2015 - ID S5359
Streaming:
Download:
 
Real-Time Telemetry Group Variant of Shaped Offset Quadrature Phase Shift Keying (SOQPSK-TG) Communications with CUDA
Andrew McMurdie (Brigham Young University)
In this session we discuss our CUDA implementation of frame synchronization, frequency offset estimation, channel equalization, and demodulation for integrated Network Enhanced Telemetry(iNET) formatted SOQPSK-TG communications. Application is aerona ...Read More
In this session we discuss our CUDA implementation of frame synchronization, frequency offset estimation, channel equalization, and demodulation for integrated Network Enhanced Telemetry(iNET) formatted SOQPSK-TG communications. Application is aeronautical telemetry downlinks. Algorithmic improvements yielding better parallelization allow us to receive and process samples in real-time for a sample rate greater than 20 Mb/s. Multiple channel equalizers are implemented and tested to produce multiple output bit streams. Bit-error rates for tests with real data are presented, showing that the system can efficiently equalize and process the data.  Back
 
Keywords:
Signal & Audio Processing, Developer - Performance Optimization, Developer - Algorithms, GTC 2015 - ID S5448
Streaming:
Download:
 
Accelerated SWT-Based EEG Denoising Technique to Correct the Ocular Artifact
Mahesh Khadtare (I2IT, Pune, IN), Pragati Dharmale (SNHU, NH)
This session introduces algorithmic implementation of Stationary Wavelet Transform which is different than traditional wavelet transform. We showcase the fast lifting transform approach lead to data parallel implementation and boost the performance o ...Read More
This session introduces algorithmic implementation of Stationary Wavelet Transform which is different than traditional wavelet transform. We showcase the fast lifting transform approach lead to data parallel implementation and boost the performance on GPU. Demonstration of denoise of EEG signal from Ocular artifact and how effective EEG analysis done for BCI computation. The session focuses on the algorithm and GPU implementation in the session.  Back
 
Keywords:
Signal & Audio Processing, Developer - Algorithms, Medical Imaging, GTC 2015 - ID S5505
Streaming:
Download:
 
An Open Architecture, Server-Based Solution for Next Generation Electronic Warfare System
Gregory Gannet (Lockheed Martin, Mission Systems and Training)
Military Electronic Warfare (EW) systems include electronic surveillance subsystems that are required to instantaneously process wide bandwidths of the RF spectrum to meet their requirements of providing situational awareness to both the warfighter a ...Read More
Military Electronic Warfare (EW) systems include electronic surveillance subsystems that are required to instantaneously process wide bandwidths of the RF spectrum to meet their requirements of providing situational awareness to both the warfighter and other combat systems. GPUs provide an exciting, dynamic target processor option for the pulse detection and pulse measurement processing that is at the core of EW systems. Lockheed Martin has successfully incorporated GPU technology into a server based EW system demonstrating that GPUs provide a cost-effective hardware solution that is developer friendly and readily upgradable and scalable to mitigate the challenges associated with an ever evolving threat environment. This talk provides an overview of the challenges and the solutions of the open system architecture, as well as a look at our current performance benchmarks.  Back
 
Keywords:
Signal & Audio Processing, GTC 2015 - ID S5810
Streaming:
Download:
Supercomputing
Presentation
Media
Multi GPU Programming with MPI
Jiri Kraus (NVIDIA)
In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA a ...Read More
In this session you will learn how to program multie GPU systems or GPU clusters using the Message Passing Interface (MPI) and OpenACC or CUDA. The session starts by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA and also covers advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. The latest improvements with CUDA-aware MPI, the Multi Process Service (MPS aka Hyper-Q for MPI) and MPI support in the NVIDIA performance analysis tools are covered.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5117
Streaming:
Download:
 
Breakthrough Science on GPU Clusters
John Taylor (CSIRO), Tomasz Bednarz (CSIRO)
This presentation will outline CSIRO's accelerated computing strategy, its development and its achievements over the past 5 years. We will provide a detailed description of the accelerated computing facility. Experiences with implementing and managi ...Read More
This presentation will outline CSIRO's accelerated computing strategy, its development and its achievements over the past 5 years. We will provide a detailed description of the accelerated computing facility. Experiences with implementing and managing the facility will be discussed. Examples of the accelerated computing program projects, which partners computational scientists with science teams, will be presented. Finally we will consider the future directions of the CSIRO's accelerated computing strategy including the accelerated computing facility and its associated programs. Steve McMahon, Solution Architect and Senior Systems Administrator at CSIRO, is a co-author of this talk.  Back
 
Keywords:
Supercomputing, Big Data Analytics, Life & Material Science, GTC 2015 - ID S5120
Streaming:
 
A CUDA Implementation of the High Performance Conjugate Gradient (HPCG) Benchmark
Everett Phillips (NVIDIA)
This talk will present the details of a CUDA implementation of the HPCG benchmark, including key optimization strategies and performance results on a wide range of GPU systems: from the smallest CUDA capable platform - the Jetson TK1, to the largest ...Read More
This talk will present the details of a CUDA implementation of the HPCG benchmark, including key optimization strategies and performance results on a wide range of GPU systems: from the smallest CUDA capable platform - the Jetson TK1, to the largest GPU supercomputers - Titan (Cray XK7 at ORNL) and Piz Daint (Cray XC30 at CSCS). HPCG was recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. HPCG solves a large sparse linear system of equations using a multigrid preconditioned conjugate gradient algorithm, and is designed to represent modern application workloads.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, Developer - Algorithms, GTC 2015 - ID S5185
Streaming:
 
StarPU: Programming for Heterogeneous MultiGPU Systems
Joao Gazolla (Universidade Federal Fluminense), Esteban Clua (Universidade Federal Fluminense)
Learn implementation techniques for using Heterogenous MultiGPU Systems. We will show you an overview of the framework and teach you how to exploit the starPU power a unified run-time system for heterogeneous multi-core architectures that gives a uni ...Read More
Learn implementation techniques for using Heterogenous MultiGPU Systems. We will show you an overview of the framework and teach you how to exploit the starPU power a unified run-time system for heterogeneous multi-core architectures that gives a unified view of the computational resources. The attendees will learn how to use the framework on a strategy that will include code examples and programming demonstrations.  Back
 
Keywords:
Supercomputing, Developer - Performance Optimization, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5252
Streaming:
Download:
 
A Simulation of Global Atmosphere Model NICAM on TSUBAME2.5 Using OpenACC
Hisashi Yashiro (RIKEN Advanced Institute for Computational Science)
OpenACC was applied to the a global high-resolution atmosphere model named Nonhydrostatic ICosahedral Atmospheric Model (NICAM). We succeed the execution of the dynamical core test without re-writing any specific kernel subroutines for GPU execution. ...Read More
OpenACC was applied to the a global high-resolution atmosphere model named Nonhydrostatic ICosahedral Atmospheric Model (NICAM). We succeed the execution of the dynamical core test without re-writing any specific kernel subroutines for GPU execution. Only 5% of the lines of source code were modified, demonstrating good portability. The performance and scalability was evaluated using the TSUBAME2.5 supercomputer. The results showed that the kernels generated by OpenACC achieved good performance, which was appropriate to the memory performance of GPU, as well as weak scalability. A large-scale simulation was carried out using 2560 GPUs, which achieved 60 TFLOPS.  Back
 
Keywords:
Supercomputing, OpenACC, Computational Physics, GTC 2015 - ID S5297
Streaming:
Download:
 
Recovering Structural Information about Nanoparticle Systems
Abhinav Sarje (Lawrence Berkeley National Laboratory)
The inverse modeling problem of recovering nanostructures from X-ray scattering data obtained through experiments at light-source synchrotrons is an ideal example of a Big Data and Big Compute application. This session will give an introduction and o ...Read More
The inverse modeling problem of recovering nanostructures from X-ray scattering data obtained through experiments at light-source synchrotrons is an ideal example of a Big Data and Big Compute application. This session will give an introduction and overview to this problem and its solutions as being developed at the Berkeley Lab. X-ray scattering based extraction of structural information from material samples is an important tool applicable to numerous applications such as design of energy-relevant nano-devices. We exploit the use of parallelism available in clusters of GPUs to gain efficiency in the reconstruction process. To develop a solution, we apply Particle Swarm Optimization (PSO) in a massively parallel fashion, and develop high-performance codes and analyze the performance.  Back
 
Keywords:
Supercomputing, Computational Physics, Life & Material Science, GTC 2015 - ID S5326
Streaming:
Download:
 
Porting CloverLeaf to CUDA Fortran
Greg Ruetsch (NVIDIA)
This talk will discuss aspects of porting the CloverLeaf hydrodynamics code to CUDA Fortran. In particular, the use of unified or managed memory in CUDA Fortran is discussed in the context of the CloverLeaf code as well as in general code development ...Read More
This talk will discuss aspects of porting the CloverLeaf hydrodynamics code to CUDA Fortran. In particular, the use of unified or managed memory in CUDA Fortran is discussed in the context of the CloverLeaf code as well as in general code development. The use of the read-only data cache from CUDA Fortran is also discussed, as well as the use of new reduction intrinsics.  Back
 
Keywords:
Supercomputing, GTC 2015 - ID S5379
Streaming:
Download:
 
GPUDirect: Integrating the GPU with a Network Interface
Davide Rossetti (NVIDIA)
In the GPU off-loading programming model, the CPU is the initiator, e.g. it prepares and orchestrates work for the GPU. In GPU-accelerated multi-node programs, the CPU has to do the same for the network interface as well. But the truth is that both t ...Read More
In the GPU off-loading programming model, the CPU is the initiator, e.g. it prepares and orchestrates work for the GPU. In GPU-accelerated multi-node programs, the CPU has to do the same for the network interface as well. But the truth is that both the GPU and the network have sophisticated hardware resources, and these can be effectively short-circuited so to get rid of the CPU altogether. Meet PeerSync, which is a set of CUDA-Infiniband Verbs interoperability APIs which opens an unlimited number of possibilities. It also provides a scheme to go beyond the GPU-network duo, i.e. effectively employing the same ideas to other 3rd party devices.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5412
Streaming:
Download:
 
Lesson Learned Using GPU Direct over RDMA on Production Heterogeneous Clusters
Filippo Spiga (High Performance Computing Service, University of Cambridge)
GPUDirect over RDMA is a technology that provides a direct data path between the GPU memory directly to/from the Infiniband card. One year ago we deployed Wilkes, a system designed to exploit GPU Direct from each of the two GPU cards in a node. The s ...Read More
GPUDirect over RDMA is a technology that provides a direct data path between the GPU memory directly to/from the Infiniband card. One year ago we deployed Wilkes, a system designed to exploit GPU Direct from each of the two GPU cards in a node. The software stack evolved up to a point where applications' owners can start being productive and leverage this functionalities without bother of architectural details. The aim of this talk is to present the experience gains in deploying and running a system based on commodity hardware that can fully take the advantage of GPU Direct. We will show examples used for training purposes, some best practices in running GDR-ready applications and few example of improvements in performance and scalability for applications co-developed on Wilkes.  Back
 
Keywords:
Supercomputing, Developer - Tools & Libraries, GTC 2015 - ID S5426
Streaming:
 
High-Performance Molecular Simulation With GROMACS: Heterogeneous Acceleration on x86, ARM & Power
Erik Lindahl (KTH Royal Institute of Technology)
This session will showcase how the latest CUDA devices have expanded beyond x86 in high performance computing (HPC), and are enabling new combinations with power-efficient ARM or extreme-performance Power processors. In particular we will describe th ...Read More
This session will showcase how the latest CUDA devices have expanded beyond x86 in high performance computing (HPC), and are enabling new combinations with power-efficient ARM or extreme-performance Power processors. In particular we will describe the challenges in accelerating our molecular simulation code GROMACS, combined with general HPC conclusions. We will cover challenges and advantages compared to x86 and discuss strategies for scheduling and partitioning work over wide ranges of GPU & CPU hardware, in particular for heterogeneous acceleration, large-scale parallelization, and achieving outstanding scientific code performance. The registrants should ideally have some experience from scientific computing and/or biomolecular simulation.  Back
 
Keywords:
Supercomputing, Life & Material Science, GTC 2015 - ID S5434
Streaming:
Download:
 
Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand
Dhabaleswar K. (DK) Panda (The Ohio State University), Khaled Hamidouche (The Ohio State University)
Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory ...Read More
Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY library, framework for MPI Datatype processing using CUDA kernels, and more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Impact of processor affinity to GPU and network affecting the performance will be presented.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, Developer - Tools & Libraries, GTC 2015 - ID S5461
Streaming:
 
Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters
Dhabaleswar K. (DK) Panda (The Ohio State University)
Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that mak ...Read More
Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop applications with dynamic and irregular communication patterns. However, the existing UPC and OpenSHMEM standards do not allow communication calls to be made directly on GPU device memory.This talk discusses simple extensions to the OpenSHMEM and UPC models to address this issue. Runtimes to support these extensions, optimize data movement using features like CUDA IPC and GPUDirect RDMA and exploiting overlap are presented. We demonstrate the use of the extensions and performance impact of the runtime designs.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, Developer - Tools & Libraries, GTC 2015 - ID S5470
Streaming:
Download:
 
Tackling Performance Bottlenecks in the Diversifying CUDA HPC Ecosystem: A Molecular Dynamics Perspective
Szilárd Páll (KTH Royal Institute of Technology)
The rapid evolution of CUDA GPU architecture and the new heterogenous platforms that break the hegemony of x86 offer opportunities for performance optimizations, but also pose challenges for scalable heterogeneous parallelization of the GROMACS molec ...Read More
The rapid evolution of CUDA GPU architecture and the new heterogenous platforms that break the hegemony of x86 offer opportunities for performance optimizations, but also pose challenges for scalable heterogeneous parallelization of the GROMACS molecular simulation package. This session will present our latest efforts to harness recent CUDA architectures to improve algorithmic efficiency and performance of our molecular dynamics kernels. We will also discuss load balancing and latency-hiding challenges emphasized by the expansions of GPU-accelerated platforms with CPUs a ranging from a power-optimized ARM architectures to extreme-performance highly multi-threaded Power and Xeon CPUs. Come to learn about our experiences in developing portable heterogeneous high performance code!  Back
 
Keywords:
Supercomputing, Computational Physics, Life & Material Science, GTC 2015 - ID S5504
Streaming:
Download:
 
Tightly Coupled Accelerators with Proprietary Interconnect and Its Programming and Applications
Toshihiro Hanawa (The University of Tokyo), Taisuke Boku (University of Tsukuba)
Get the latest information on an our developed Tightly Coupled Accelerators (TCA) architectures and learn its programming environment and applications. We built up an experimental system HA-PACS/TCA at the Center for Computational Sciences, Universit ...Read More
Get the latest information on an our developed Tightly Coupled Accelerators (TCA) architectures and learn its programming environment and applications. We built up an experimental system HA-PACS/TCA at the Center for Computational Sciences, University of Tsukuba, based on the TCA architecture as a proprietary interconnect realizing direct connection among GPUs beyond the nodes using "GPU Direct Support for RDMA" technology. Currently, we develop high level parallel programming language and several application programs. In this session, we introduce the concept of TCA architecture, and show the performance of the application using TCA cluster. We also describe an original directive-based PGAS language "Xcalable ACC", which utilizes TCA architecture effectively with high productivity.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5519
Streaming:
Download:
 
Learn How to Create Petascale Computer Simulations With CUDA and LibGeoDecomp
Andreas Schäfer (Friedrich-Alexander-Universität Erlangen-Nürnberg)
Computer simulations have become a major workhorse for many scientific and engineering disciplines. GPUs allow us to build supercomputers at a fraction of the cost of traditional designs. Yet, many developers fear the vendor lock-in caused by porting ...Read More
Computer simulations have become a major workhorse for many scientific and engineering disciplines. GPUs allow us to build supercomputers at a fraction of the cost of traditional designs. Yet, many developers fear the vendor lock-in caused by porting their code to CUDA. LibGeoDecomp is an auto-parallelizing library for computer simulations. In this tutorial we will show how it enables developers to reap the benefits of GPU-equipped machines while keeping their simulation models portable.To explore the design space, we will implement two different simulation models: one stencil code, based on a regular grid, and a particle method. I will give a brief review of the most beneficial optimization techniques and how to solve typical challenges like in-situ visualization and remote steering.  Back
 
Keywords:
Supercomputing, Computational Physics, Developer - Tools & Libraries, GTC 2015 - ID S5528
Streaming:
Download:
 
FMM Goes GPU: Smooth Trip or Bumpy Ride?
Bartosz Kohnke (Max Planck Institute for Biophysical Chemistry)
The N-body problem provides a very simple, yet scientific algorithm to utilize modern GPUs. However, the computational complexity is O(N^2). An algorithm reducing runtime and complexity to optimal O(N) for any required precision is the Fast Multipole ...Read More
The N-body problem provides a very simple, yet scientific algorithm to utilize modern GPUs. However, the computational complexity is O(N^2). An algorithm reducing runtime and complexity to optimal O(N) for any required precision is the Fast Multipole Method (FMM). In this talk, we present our CUDA-enabled, templated C++ implementation. The algorithm requires several operators, partly depending up on each other, to exchange information in a tree-like data structure. We especially focus on the utilization of unified memory to minimize the porting efforts and the employment of dynamic parallelism to achieve a better computational workload. We will present timings/scalings for all FMM operators and will discuss remaining bottlenecks, like tree dependencies or redundancies in the kernel setup.  Back
 
Keywords:
Supercomputing, Developer - Algorithms, Computational Physics, GTC 2015 - ID S5548
Streaming:
Download:
 
GPU Errors on HPC Systems: Characterization, Quantification and Implications for Architects and Operations
James Rogers (Oak Ridge National Laboratory (ORNL))
Titan, the world's #1 Open Science Supercomputer, consists of more than 18,000 GPUs that scientists from various domains such as astrophysics, fusion, climate, and combustion use routinely to run large-scale simulations. Unfortunately, while the per ...Read More
Titan, the world's #1 Open Science Supercomputer, consists of more than 18,000 GPUs that scientists from various domains such as astrophysics, fusion, climate, and combustion use routinely to run large-scale simulations. Unfortunately, while the performance efficiency of GPUs is well understood, their resilience characteristics in a large-scale computing system have not been fully evaluated. We present a detailed study to provide a thorough understanding of GPU errors on a large-scale GPU-enabled system. Our data spans more than 18 months, gathered on the Titan supercomputer at the Oak Ridge Leadership Computing Facility. We present several findings from our field data and discuss the implications of our results for future GPU architects, current and future HPC centers.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5566
Streaming:
Download:
 
Improving GPU Utilization with the Multi-Process Service (MPS)
Priyanka Sah (NVIDIA)
Heterogeneous clusters with multi-core CPUs and one or many GPUs per node have gained wide popularity in scientific computing. While MPI-based distributed memory applications commonly assign multiple MPI ranks to each node, sharing GPUs amongst multi ...Read More
Heterogeneous clusters with multi-core CPUs and one or many GPUs per node have gained wide popularity in scientific computing. While MPI-based distributed memory applications commonly assign multiple MPI ranks to each node, sharing GPUs amongst multiple processes can incur significant context switching overhead or lead to under-utilization of the GPU resources, especially in the strong scaling regime. Using the Multi Process Service (MPS) allows efficient sharing of GPUs among multiple CPU processes, leading to better utilization of the GPU resources and higher performance. This talk will focus on legacy MPI applications and demonstrate how to efficiently overlap work from multiple processes on the GPU,how to profile code under MPS on a node, using newly released tools in CUDA 6.5.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, Developer - Performance Optimization, GTC 2015 - ID S5584
Streaming:
Download:
 
Heterogeneous HPC, Architectural Optimization, and NVLink
Steve Oberlin (NVIDIA)
The emergence of heterogeneous computing has demonstrated that the highest performance and efficiency can be achieved in a general way by tightly coupling compute engines optimized for latency-sensitive and throughput-oriented operations. This talk w ...Read More
The emergence of heterogeneous computing has demonstrated that the highest performance and efficiency can be achieved in a general way by tightly coupling compute engines optimized for latency-sensitive and throughput-oriented operations. This talk will explore heterogeneous node design and architecture and how NVLink, a new scalable node integration channel, enables uncompromising performance on the most demanding applications, using the next-generation DoE CORAL Summit and Sierra supercomputer systems as a case in point.  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5649
Streaming:
Download:
 
GPU Enhanced Molecular Dynamics of Lipid Membrane Systems
Russell Devane (Procter & Gamble)
Lipid membrane systems show up in a broad range of industrial relevant applications. From human skin to fabric enhancers, the complex behavior of lipid systems presents challenges to product designers. At Procter & Gamble we are using GPU enhance ...Read More
Lipid membrane systems show up in a broad range of industrial relevant applications. From human skin to fabric enhancers, the complex behavior of lipid systems presents challenges to product designers. At Procter & Gamble we are using GPU enhanced molecular dynamics to probe these complexities to not only help interpret experimental measurements but drive future experiments and refine our mechanistic understanding of processes. Aided by high performance computing resources provided through the DOE INCITE program, the level of complexity that we can capture in a simulation has progressed significantly. ***This talk is part of the "Accelerating Industrial Competitiveness through Extreme-Scale Computing" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.***  Back
 
Keywords:
Supercomputing, Life & Material Science, GTC 2015 - ID S5718
Streaming:
 
Hybrid Simulations Using CPU-GPU Paradigm for Reacting Flows
Jeremiah Lee (United Technologies Research Center)
GPU technology is attractive to computation intensive simulations such as Computational Fluid Dynamics (CFD) of Reacting Flows. A hybrid CPU-GPU paradigm was benchmarked by simulating a canonical CFD problem. A complex turbulent reactive flow was sim ...Read More
GPU technology is attractive to computation intensive simulations such as Computational Fluid Dynamics (CFD) of Reacting Flows. A hybrid CPU-GPU paradigm was benchmarked by simulating a canonical CFD problem. A complex turbulent reactive flow was simulated including detailed chemistry that is typically burdensome for CPU based calculations. We achieved 2-5X overall speed-up using CPU-GPU simulations compared to CPU-only simulations. Further details of the CFD problem, hybrid methodology, performance metrics definition and benchmarking results will be presented. This promising technology, if exploited properly, could quickly enable accurate predictions of finite rate chemistry effects, such as pollutant emissions from combustors. ***This talk is part of the "Accelerating Industrial Competitiveness through Extreme-Scale Computing" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.***  Back
 
Keywords:
Supercomputing, Developer - Performance Optimization, Computational Fluid Dynamics, GTC 2015 - ID S5719
Streaming:
Download:
 
Turbomachinery R&D Acceleration Using Titan
Ravi Srinivasan (Dresser-Rand)
Dresser-Rand (D-R) is an industrial partner of Oak Ridge Leadership Computing Facility (OLCF) and utilizes the Titan platform to accelerate turbomachinery research and development. In order to take advantage of computing infrastructure at OLCF, D-R h ...Read More
Dresser-Rand (D-R) is an industrial partner of Oak Ridge Leadership Computing Facility (OLCF) and utilizes the Titan platform to accelerate turbomachinery research and development. In order to take advantage of computing infrastructure at OLCF, D-R has engaged with a third-party CFD software provider to add and modify computational fluid dynamics (CFD) solver modules. The developments include enhancing the scalability of the flow-solver by performing better grid partitioning, implementing GPU based acceleration and significantly improving IO performance. Turbomachinery design at D-R is complemented by employing an optimization process. Titan is the enabling technology that accelerates this process by significantly reducing database generation time and has made it possible to consider implementing optimization as part of R&D. Successful compressor component designs derived from optimization have been experimentally tested by D-R. The steps undertaken for optimization will be presented  Back
 
Keywords:
Supercomputing, Developer - Performance Optimization, Computational Fluid Dynamics, GTC 2015 - ID S5753
Streaming:
Download:
 
Realizing GPU Computation at Scale (Presented by Cray)
John Lee (Cray Cluster Solutions, Inc.), Maria Iordache (Cray Inc.)
GPUs deliver compelling performance in a very energy efficient and compact package. However, a number of requirements have to be satisfied in order to extract this performance. This talk will discuss Cray's experience with scaling behavior on system ...Read More
GPUs deliver compelling performance in a very energy efficient and compact package. However, a number of requirements have to be satisfied in order to extract this performance. This talk will discuss Cray's experience with scaling behavior on systems built from 8 GPU servers, using K40 or K80 GPUs, as a function of algorithm, data flow and interconnect. Some of these examples were validated on systems ranging up to the 10th fastest supercomputer in the world (#10 on the Top500 list) and the 4th greenest supercomputer (#4 on the Green500).  Back
 
Keywords:
Supercomputing, Data Center, Cloud Computing & HPC, GTC 2015 - ID S5871
Streaming:
Video & Image Processing
Presentation
Media
Streaming FFTs on Large 3D Microscope Images
Peter Steinbach (Max Planck Institute of Molecular Cell Biology and Genetics)
Dive deep into efficient and fast memory transfers of multi-gigabyte image data to perform swift iterative deconvolutions of 3D microscope imagery. Through the creation of an open-source GPU deconvolution implementation (github.com/psteinb/libmultivi ...Read More
Dive deep into efficient and fast memory transfers of multi-gigabyte image data to perform swift iterative deconvolutions of 3D microscope imagery. Through the creation of an open-source GPU deconvolution implementation (github.com/psteinb/libmultiviewnative), I studied various techniques to orchestrate memory copies of multi-dimensional images. I will present concepts, available options and details of efficient memory transfers from host to device memory. I will showcase CUDA/C++ code and discuss my experiences with various CUDA versions on NVIDIA hardware that lead to greater performance than achieved by just performing the calculations on device (2-3x). This work will enable the scientific community to push the limits of processing and handling data gathered by imaging living tissue.  Back
 
Keywords:
Video & Image Processing, Data Center, Cloud Computing & HPC, Computer Vision & Machine Vision, Life & Material Science, GTC 2015 - ID S5208
Streaming:
 
A 2D Convolution Framework for Extreme Performance Tuning
Alan Wang (NVIDIA)
We propose a 2D convolution framework that (1) maintains a unified abstraction incorporating a series of optimization techniques and (2) can auto-tune the performance on different GPUs. We quantify and analyze the performance impact of using a single ...Read More
We propose a 2D convolution framework that (1) maintains a unified abstraction incorporating a series of optimization techniques and (2) can auto-tune the performance on different GPUs. We quantify and analyze the performance impact of using a single strategy which reveals its potential when applied to other application. The experiment shows the algorithm tuned by our framework can reach a high GFLOPs utilization of nearly 80%, when target GM107.  Back
 
Keywords:
Video & Image Processing, Developer - Performance Optimization, Computer Vision & Machine Vision, GTC 2015 - ID S5305
Streaming:
 
Real-Time Image Enhancement Using Multi-Frame Technique
Eric Kelmelis (EM Photonics)
Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera, which ...Read More
Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera, which severely limits the quality of data that is captured by high-end imaging systems. We will discuss the practical considerations of keeping up with real-time video; tuning kernel performance; architecting complex, asynchronous, multi-stage processing pipelines; and effectively using multiple GPUs in a real-time context.  Back
 
Keywords:
Video & Image Processing, Developer - Performance Optimization, Defense, GTC 2015 - ID S5352
Streaming:
 
High Capability Multidimensional Data Compression on GPUs
Sergio Zarantonello (Santa Clara University and Algorithmica LLC), Ed Karrels (Santa Clara University)
In this talk we present a CUDA implementation of a wavelet-based compression utility for multidimensional data, and give examples of its application in earth science and medical imaging. Key features of our codec are efficiency and speed. A special f ...Read More
In this talk we present a CUDA implementation of a wavelet-based compression utility for multidimensional data, and give examples of its application in earth science and medical imaging. Key features of our codec are efficiency and speed. A special feature is the ability to guarantee compression errors no larger than an a priori set tolerance in a user-prescribed metric. Since this feature requires multiple passes of the compress-decompress process, hardware acceleration offered by GPUs is critical. This paper was written in collaboration with S. E. Zarantonello, D. Concha, D. Fabris, A. Goyal, E. Karrels, B. Smithson, Q. Wang from School of Enfineering at Santa Clara University.  Back
 
Keywords:
Video & Image Processing, Developer - Algorithms, Energy Exploration, GTC 2015 - ID S5455
Streaming:
Download:
 
GPU-Based, Real-Time HEVC Decoder, UHD Solution on Automotive Infotainment Platforms
Rama Mohana Reddy (PathPartner Technology Consulting Pvt Ltd.)
In this session we present GPUs for HEVC decoding. By using GPU for HEVC decoder, we save significant CPU time and power, which can be used for other critical tasks. Our use of GPU for motion compensation module of HEVC decoder, made it possible to a ...Read More
In this session we present GPUs for HEVC decoding. By using GPU for HEVC decoder, we save significant CPU time and power, which can be used for other critical tasks. Our use of GPU for motion compensation module of HEVC decoder, made it possible to achieve real-time HEVC decoder solution for UHD resolution on Automotive infotainment platforms. By porting motion compensation module on GPU, we achieved 40% CPU time savings and good scalability.  Back
 
Keywords:
Video & Image Processing, Embedded Systems, Automotive, Media & Entertainment, GTC 2015 - ID S5491
Streaming:
 
Fast ANN for High-Quality Collaborative Filtering
Yun-Ta Tsai (Google, Inc.)
Collaborative filtering collects similar patches, jointly filters them, and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Ex ...Read More
Collaborative filtering collects similar patches, jointly filters them, and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Exploiting self-similarity, however, requires finding matching image patches, which is an expensive operation. We propose a GPU-friendly approximated-nearest-neighbor algorithm that produces high-quality results for any type of collaborative filter. We evaluate our ANN search against state-of-the-art ANN algorithms in several application domains. Our method is orders of magnitudes faster, yet provides similar or higher-quality results than the previous work.  Back
 
Keywords:
Video & Image Processing, GTC 2015 - ID S5562
Streaming:
 
FlexISP: A Flexible Camera Image Processing Framework
Dawid Pajak (NVIDIA)
Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also intro ...Read More
Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the original sensor data. We propose an end-to-end system that is aware of the camera and image model, enforces natural-image priors, while jointly accounting for common image processing steps like demosaicking, denoising, deconvolution, and so forth, all directly in a given output representation (e.g., YUV, DCT). Our system is flexible and we demonstrate it on regular Bayer images as well as images from custom sensors. In all cases, we ac  Back
 
Keywords:
Video & Image Processing, GTC 2015 - ID S5563
Streaming:
 
Cascaded Displays: Spatiotemporal Superresolution Using Offset Pixel Layers
Dikpal Reddy (NVIDIA)
We describe a new approach to quadruple the effective pixel count and double the refresh rate of existing displays. Our approach, termed cascaded displays, achieve high resolution by stacking two or more spatial light modulators, such as LCDs, on top ...Read More
We describe a new approach to quadruple the effective pixel count and double the refresh rate of existing displays. Our approach, termed cascaded displays, achieve high resolution by stacking two or more spatial light modulators, such as LCDs, on top of one another, and offsetting them by half a pixel or less both horizontally and vertically. The same concept can also be applied temporally to increase effective frame rate. We use a real-time GPU-based non-negative matrix factorization to decompose the desired images, videos, or real-time content into appropriate multi-layered attenuation patterns. We have prototyped this technology with a dual-layer LCD, a digital projector containing a pair of LCoS microdisplays, and multi-layer stacks of printed films.  Back
 
Keywords:
Video & Image Processing, GTC 2015 - ID S5567
Streaming:
 
Image Learning and Computer Vision in CUDA (Presented by ArrayFire)
Peter Andreas Entschev (ArrayFire)
Analyzing a massive data set? Need fast results? Need computer vision algorithms? Not sure when and where to start? The answer is here and now! In this tutorial we will give you the tools to bring your favorite computer vision algorithm to life. In t ...Read More
Analyzing a massive data set? Need fast results? Need computer vision algorithms? Not sure when and where to start? The answer is here and now! In this tutorial we will give you the tools to bring your favorite computer vision algorithm to life. In this tutorial we will go over key challenges for implementing computer vision and machine learning algorithms on the GPU. We will walk you through several computer vision algorithms for the GPU (ORB, Fast, SIFT) and give you the hands experience to implement you own algorithms.  Back
 
Keywords:
Video & Image Processing, Computer Vision & Machine Vision, GTC 2015 - ID S5796
Streaming:
Visualization - In-Situ & Scientific
Presentation
Media
Simulating What is Measured - Closing the Loop Between Experiment and Simulation
Michael Bussmann (Helmholtz-Zentrum Dresden - Rossendorf), Axel Huebl (Helmholtz-Zentrum Dresden - Rossendorf)
With GPU-accelerated simulations, frames-per-second, in-situ visualization and visual analytics becoming a reality, it increases scalability of codes which allows to reduce the time to obtain a solution significantly. This also makes it possible to r ...Read More
With GPU-accelerated simulations, frames-per-second, in-situ visualization and visual analytics becoming a reality, it increases scalability of codes which allows to reduce the time to obtain a solution significantly. This also makes it possible to run large-scale parameter surveys for optimization. We will present recent activities on integrating complex particle accelerator simulations into a reconstruction loop for matching experimental measurements to simulation. This requires to put simulations in a loop with large-scale data analysis, sythetic diagnostics, image reconstruction techniques and interactive in-situ visualization. We will show how the different building blocks of such a tool chain can be accelerated using GPUs and discuss the combination of these tools.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Computational Physics, Machine Learning & Deep Learning, Medical Imaging, GTC 2015 - ID S5199
Streaming:
Download:
 
High Performance Computing on Mobile Devices through Distributed Shared CUDA
Edgar Josafat Martinez Noriega (The University of Electro-Communications, Tokyo)
Through a GPU virtualization tool, (DS-CUDA), we remotely use an NVIDIA GPU from our local network to accelerate a molecular dynamics (MD) simulation inside an Android device (NVIDIA SHIELD). We implement a NaCl MD simulation on Android. We accelera ...Read More
Through a GPU virtualization tool, (DS-CUDA), we remotely use an NVIDIA GPU from our local network to accelerate a molecular dynamics (MD) simulation inside an Android device (NVIDIA SHIELD). We implement a NaCl MD simulation on Android. We accelerate the computation of force, velocity and coordinate using CUDA through the DS-CUDA tool. We use a laptop equipped with GeForce GTX 680M (server) connected to our LAN network using Gigabit Ethernet. Android device (client) is connected to same LAN using Wifi 802.11n. Server and client communicate under tcp socket. We reached up to 420 Gflops in force computation on a simulation with 5832 ions, 5700 times faster than the 0.073 Gflops delivered from CPU implementation on NVIDIA SHIELD.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Life & Material Science, GTC 2015 - ID S5290
Streaming:
Download:
 
NVIDIA IndeX the In-Situ Visualization Software Merges Compute Cycles with Graphics Cycles
Tom-Michael Thamm (NVIDIA), Marc Nienhaus (NVIDIA), Mahendra Roopa (NVIDIA)
A technical overview about NVIDIA IndeX as an in-situ technology will be presented. In addition, we describe an interactive workflow between compute and graphic cycles within NVIDIA IndeX. A real-time live demo on the NVIDIA VCA cluster using more th ...Read More
A technical overview about NVIDIA IndeX as an in-situ technology will be presented. In addition, we describe an interactive workflow between compute and graphic cycles within NVIDIA IndeX. A real-time live demo on the NVIDIA VCA cluster using more than 1 TB of scientific data is underlining the power of the in-situ technology.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Data Center, Cloud Computing & HPC, Graphics Virtualization, GTC 2015 - ID S5307
Streaming:
 
Interactive Visual Exploration of Peridynamic-Based Fracture Simulation
Chakrit Watcharopas (School of Computing, Clemson University and Dept. of Computer Science, Kasetsart University)
Simulating fracture has been an area of interest in graphics for many years. Beyond the computational expense needed to achieve realistic fracture, even the simplest of techniques often requires repeated iteration to fine-tune the many parameters tha ...Read More
Simulating fracture has been an area of interest in graphics for many years. Beyond the computational expense needed to achieve realistic fracture, even the simplest of techniques often requires repeated iteration to fine-tune the many parameters that control the simulation. In this session, we will focus on one particular technique, fracture using spring-mass systems, with the goal in mind to better understand the capabilities of a variant of spring-mass fracture known as peridynamics. Coupled with a framework for visualization, our method allows users to simultaneously compare multiple fracture simulation runs across different parameter settings. We present experimental results and report new extensions to our peridynamic-based fracture simulation, implemented in CUDA on Tesla K40s.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Computational Physics, Game Development, GTC 2015 - ID S5324
Streaming:
Download:
 
Roadmap for Many-Core Visualization Software in DOE
Jeremy Meredith (Oak Ridge National Laboratory)
Visualization and data analysis is an important part of the US DOE investment in HPC to solve challenging scientific problems. As HPC systems become more reliant on many-core technology, three DOE projects are addressing various aspects of this chall ...Read More
Visualization and data analysis is an important part of the US DOE investment in HPC to solve challenging scientific problems. As HPC systems become more reliant on many-core technology, three DOE projects are addressing various aspects of this challenge. PISTON provides cross-platform algorithms, EAVL provides advanced data models, and Dax provides execution models. This talk will briefly review these projects and highlight some of the successes each project has had. We then discuss our roadmap to consolidate the features of these three frameworks into a unified system called VTK-m.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Data Center, Cloud Computing & HPC, Visualization - Large Scale & Multi-Display, GTC 2015 - ID S5363
Streaming:
Download:
 
VMD: Visualization and Analysis of Biomolecular Complexes with GPU Computing
John Stone (University of Illinois at Urbana-Champaign)
This talk will showcase recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray supercomputers. This presentation will ...Read More
This talk will showcase recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray supercomputers. This presentation will highlight the use of in-place OpenGL rendering and GPU ray tracing for interactive and batch mode rendering of images and movies, CUDA just-in-time (JIT) compilation for increasing the performance of data-driven visualization and analysis algorithms, and GPU accelerated analysis of results of hybrid structure determination methods that combine data from cryo-electron microscopy and X-ray crystallography with all-atom molecular dynamics simulations.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Big Data Analytics, Life & Material Science, GTC 2015 - ID S5371
Streaming:
Download:
 
OmpSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs
Hugo Pérez (Barcelona Supercomputing Center - CUDA Center of Excellence), Benjamin Hernandez (Barcelona Supercomputing Center - CUDA Center of Excellence), Isaac Rudomin (Barcelona Supercomputing Center - CUDA Center of Excellence)
Industry trends in the coming years in the race to exascale imply the availability of cluster computing with hundreds to thousands of cores per chip. Programming presents a challenge due to the heterogeneous architecture. Using novel programming mode ...Read More
Industry trends in the coming years in the race to exascale imply the availability of cluster computing with hundreds to thousands of cores per chip. Programming presents a challenge due to the heterogeneous architecture. Using novel programming models that facilitate this process is necessary. In this talk we present the case of simulation and visualization of crowds. We analyze and compare the use of two programming models: OmpSs and CUDA and show that OmpSs allows us to exploit all the resources combining the use of CPU and GPU taking care of memory management, scheduling, communications and synchronization automatically. We will present experimental results obtained in the Barcelona Supercomputing Center GPU Cluster as well as describe several modes used for visualizing the results.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Data Center, Cloud Computing & HPC, Real-Time Graphics, Supercomputing, GTC 2015 - ID S5381
Streaming:
Download:
 
Visualization Toolkit: Faster, Better, Open Scientific Rendering and Compute
Robert Maynard (Kitware, Inc.), Marcus Hanwell (Kitware, Inc.)
The Visualization Toolkit (VTK) is an open source scientific visualization framework. We will describe the new VTK rendering backend which targets modern GPUs, taking advantage of the flexible programmable pipeline. This has resulted in significant i ...Read More
The Visualization Toolkit (VTK) is an open source scientific visualization framework. We will describe the new VTK rendering backend which targets modern GPUs, taking advantage of the flexible programmable pipeline. This has resulted in significant improvements in rendering performance, especially with large geometries (20 million+ triangles) being rendered over 100 times faster, without significant API changes, with near identical rendering. This offers a drop-in replacement for existing applications, and a turn-key open source visualization framework for new applications. The VTK-M offers highly parallel and efficient algorithms for scientific data. The architecture, and how it will interact with VTK will be discussed.  Back
 
Keywords:
Visualization - In-Situ & Scientific, GTC 2015 - ID S5604
Streaming:
Download:
 
High Performance In-Situ Visualization with Thousands of GPUs
Evghenii Gaburov (SURFsara (CUDA Research Center)), Jeroen Bédorf (CWI)
In-situ visualization is one of the major themes in HPC. The ability to attach a massively parallel visualization tool to a live simulation can be valuable to the researches, whose simulations may last days or even weeks on a supercomputer. High Perf ...Read More
In-situ visualization is one of the major themes in HPC. The ability to attach a massively parallel visualization tool to a live simulation can be valuable to the researches, whose simulations may last days or even weeks on a supercomputer. High Performance In-Situ Visualization allows to visualize, interact and conduct data analysis in real-time, thereby enabling an efficient and intuitive discovery process. The ability of Tesla GPUs to compute and render simultaneously enables a wide range of high performance in-situ analysis scenarios with little overhead. In this talk we'll present our first attempt at this using the US Titan and Swiss Piz Daint supercomputers. We will present our solutions to the rendering pipeline that allowed us to achieve ~10fps on 1024 GPUs.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Astronomy & Astrophysics, Supercomputing, GTC 2015 - ID S5610
Streaming:
 
Scientific Visualization on GPU Clusters
Peter Messmer (NVIDIA)
Learn how to visualize your data on GPU accelerated supercomputers. In this presentation, we will give an overview of data analysis and visualization on GPU accelerated supercomputers and clusters. In a first part, we will describe the steps necessar ...Read More
Learn how to visualize your data on GPU accelerated supercomputers. In this presentation, we will give an overview of data analysis and visualization on GPU accelerated supercomputers and clusters. In a first part, we will describe the steps necessary to use the GPUs in a remote supercomputer for visualization. We will then provide a brief overview of Paraview, one of the widely used visualization applications, touching on topics like parallel compositing and in-situ visualization of GPU resident data.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Graphics Virtualization, Supercomputing, GTC 2015 - ID S5660
Streaming:
Download:
 
Programming Pointers to Optimize your In-Situ Visualization Pipeline
Shalini Venkataraman (NVIDIA)
We will show programming tips and techniques to maximize performance in your in-situ vsialization pipeline. Specific topics include creating OpenGL contexts offscreen, using hardware-based encoding for rendered images and optimizing the readback perf ...Read More
We will show programming tips and techniques to maximize performance in your in-situ vsialization pipeline. Specific topics include creating OpenGL contexts offscreen, using hardware-based encoding for rendered images and optimizing the readback performance using multi-context multi-threaded OpenGL.  Back
 
Keywords:
Visualization - In-Situ & Scientific, Real-Time Graphics, GTC 2015 - ID S5815
Streaming:
Visualization - Large Scale & Multi-Display
Presentation
Media
See the Big Picture: Scalable Visualization Solutions for High Resolution Displays
Doug Traill (NVIDIA)
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. A ...Read More
Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.  Back
 
Keywords:
Visualization - Large Scale & Multi-Display, GTC 2015 - ID S5142
Streaming:
Download:
 
Architectural Display Walls Using NVAPI
Doug Traill (NVIDIA)
This session is aimed at developers who want to utilize NVIDIA's Warp + Intensity APIs. We will demonstrate how to use NVIDIA's Warp API to create arbitrary screen layouts, mixing landscape and portrait displays with MOSAIC. Using these simple cons ...Read More
This session is aimed at developers who want to utilize NVIDIA's Warp + Intensity APIs. We will demonstrate how to use NVIDIA's Warp API to create arbitrary screen layouts, mixing landscape and portrait displays with MOSAIC. Using these simple constructs, it is possible to create complex architectural display layouts. NVIDIA's Warp + Intensity API's can be utilized for a variety of different functions, including: (1) Projector Warping and Edge-Blending; (2) Projection Mapping; (3) Architectural Displays and; (4) Watermark and Image Overlays.  Back
 
Keywords:
Visualization - Large Scale & Multi-Display, Developer - Tools & Libraries, GTC 2015 - ID S5143
Streaming:
Download:
 
Applying The Big Picture, Multi-Screen Monitoring Center for Hydroelectric Power Plants
Clint Pearson (HDR Inc.)
Inspired by a GTC13 session entitled "See The Big Picture", this session presents HDR's work with the Tennessee Valley Authority (TVA) on designing and installing a new "Instrumentation Monitoring Center." The IMC is a custom dev ...Read More
Inspired by a GTC13 session entitled "See The Big Picture", this session presents HDR's work with the Tennessee Valley Authority (TVA) on designing and installing a new "Instrumentation Monitoring Center." The IMC is a custom developed multi-monitor wall, intended to display dam conditions and water levels at lakes and rivers critical to the TVA hydroelectric power plants, which provide power for many states in their region. Phase 1 of the Monitoring Center was designed to simultaneously display GIS information with a custom overlay of real-time instrument data for 12 remote sites on the same 8 monitor video wall. This multi-display video wall had to be able to instantly expand to a one-site view over the entire wall. The solution applied 4 NVIDIA K5000's in a BOXX system.  Back
 
Keywords:
Visualization - Large Scale & Multi-Display, Real-Time Graphics, GTC 2015 - ID S5486
Streaming:
Download:
 
Building a Life-Size Automultiscopic Display Using Consumer Hardware
Andrew Jones (USC Institute for Creative Technologies)
Automultiscopic displays allow multiple users to experience 3D content without the hassle of special glasses or head gear. Such displays generate many simultaneous images with high-angular density, so that each eye perceives a distinct and different ...Read More
Automultiscopic displays allow multiple users to experience 3D content without the hassle of special glasses or head gear. Such displays generate many simultaneous images with high-angular density, so that each eye perceives a distinct and different view. This presents a unique challenge for content acquisition and rendering. In this talk, we explain how to build an automultiscopic display using off-the-shelf projectors, video-splitters, and graphics cards. We also present a GPU-based algorithm for rendering a large numbers of views from a sparse array of video cameras.  Back
 
Keywords:
Visualization - Large Scale & Multi-Display, Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, GTC 2015 - ID S5540
Streaming:
Download:
Web Acceleration
Presentation
Media
Web-Based Distributed Voluntary Computing for Large Scale Scientific Computations Using GPU
Ibrahim Demir (University of Iowa)
The goal of this session is to learn about experimental and upcoming web technologies to improve speed of web-based scientific computing (eg. WebGL, WebCL, Web Workers, ASM.js, SIMD.js, and etc). Distributed volunteer computing can enable researchers ...Read More
The goal of this session is to learn about experimental and upcoming web technologies to improve speed of web-based scientific computing (eg. WebGL, WebCL, Web Workers, ASM.js, SIMD.js, and etc). Distributed volunteer computing can enable researchers to form parallel computing environments to utilize the computing power of millions of computers on the web, and use them towards running large scale scientific simulations and models. Recent developments in web technologies allow client-side scripting languages to run at speeds close to native applications, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed a distributed computing framework for researchers to run their scientific models on volunteer computers.  Back
 
Keywords:
Web Acceleration, Data Center, Cloud Computing & HPC, Big Data Analytics, Developer - Programming Languages, GTC 2015 - ID S5313
Streaming:
 
Using the Power of the GPU to Connect the Web to the Real World
Rob Manson (buildAR.com)
This session will take a detailed look at the various media stream processing pipelines available on the Web Platform and how the optimization of these will be critical in the near future. We will look specifically at how you can use GPUs directly fr ...Read More
This session will take a detailed look at the various media stream processing pipelines available on the Web Platform and how the optimization of these will be critical in the near future. We will look specifically at how you can use GPUs directly from Javascript for vision and sensor processing. One specific example will explore how Depth Cameras can now be used to extend the web and the influence this may have on the other pipelines too. These streams of sensor and image data now make it possible to connect the web to the real world. GPUs are a key asset for taming this growing flood of data.  Back
 
Keywords:
Web Acceleration, Augmented Reality & Virtual Reality, Computer Vision & Machine Vision, GTC 2015 - ID S5435
Streaming:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2015 NVIDIA Corporation Legal Info | Privacy Policy