SEARCH SESSIONS
SEARCH SESSIONS

Search All
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

AEC & Manufacturing
Presentation
Media
Abstract:
The aim of this project is to create designer, 3D, shaped polymer particles for engineering applications by combining masked photo-polymerization and flow sculpting in orthogonal directions. The final particle shape is the intersection of these 2D or ...Read More
Abstract:
The aim of this project is to create designer, 3D, shaped polymer particles for engineering applications by combining masked photo-polymerization and flow sculpting in orthogonal directions. The final particle shape is the intersection of these 2D orthogonal cross-sections. We use GPUs to generate voxelized representation of the particle shape. We then use constrained optimization with GPU acceleration to solve the inverse problem of generating the required cross-sections that will create the desired 3D particle shape  Back
 
Topics:
AEC & Manufacturing, Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5197
Download:
Share:
 
Abstract:
Advances in GPU Technology have opened the door for significant performance gains for applications willing to use the modern OpenGL APIs. This talk will provide details of how the Direct Model Scene Graph and Rendering Engine has adapted its renderin ...Read More
Abstract:
Advances in GPU Technology have opened the door for significant performance gains for applications willing to use the modern OpenGL APIs. This talk will provide details of how the Direct Model Scene Graph and Rendering Engine has adapted its rendering architecture to handle not only today's, but tomorrow's advances, and how the use of these technologies have significantly increased rendering performance.  Back
 
Topics:
AEC & Manufacturing, Performance Optimization, Rendering & Ray Tracing, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5387
Streaming:
Download:
Share:
 
Abstract:
SpaceX is designing a new, methane-fueled engine powerful enough to lift the equipment and personnel needed to colonize Mars. A vital aspect of this effort involves the creation of a multi-physics code to accurately model a running rocket engine. The ...Read More
Abstract:
SpaceX is designing a new, methane-fueled engine powerful enough to lift the equipment and personnel needed to colonize Mars. A vital aspect of this effort involves the creation of a multi-physics code to accurately model a running rocket engine. The scale and complexity of turbulent non-premixed combustion has so far made it impractical to simulate, even on today's largest supercomputers. We present a novel approach using wavelets on GPUs, capable of capturing physics down to the finest turbulent scales.  Back
 
Topics:
AEC & Manufacturing, Developer - Algorithms, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5398
Streaming:
Download:
Share:
 
Abstract:
To reduce the gap between a physical mock-up and a virtual mock-up, a combination of real-time rendering and simulation enable better decision making. Leveraging NVIDIA Optix to develop specific automotive tools, we are able to run simulations a ...Read More
Abstract:

To reduce the gap between a physical mock-up and a virtual mock-up, a combination of real-time rendering and simulation enable better decision making. Leveraging NVIDIA Optix to develop specific automotive tools, we are able to run simulations and visualize solutions to a wide range of problems, such as what is the best vehicle geometry to minimize gravel impact on the door. In addition, tools such as RTT DeltaGen enable photo real results that help us experiment and visualize changing vehicle designs; for example when changing the slope of the windshield, how are elements inside the car affected due to the reflective properties of glass.

  Back
 
Topics:
AEC & Manufacturing, Autonomous Vehicles, Rendering & Ray Tracing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5628
Streaming:
Download:
Share:
 
Abstract:
The audience will learn how GPUs can accelerate cloud-based finite element analysis and design optimization. The computational challenges underlying such tasks will be discussed, followed by their solution through fast GPU linear solvers. A case ...Read More
Abstract:

The audience will learn how GPUs can accelerate cloud-based finite element analysis and design optimization. The computational challenges underlying such tasks will be discussed, followed by their solution through fast GPU linear solvers. A case-study involving integration of massively parallel GPU computing with modern browser technology will demonstrate and identify new frontiers in engineering.

  Back
 
Topics:
AEC & Manufacturing, Data Center & Cloud Infrastructure, Product Design & Styling, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5330
Streaming:
Download:
Share:
 
Abstract:
Using a concrete example with an actual CAD model running in CATIA, CATIA Live Rendering break down the frontier between industrial modeling and realistic rendering for design. Powered by Iray and coupled with NVIDIA VCA, it ensures real-time photo r ...Read More
Abstract:
Using a concrete example with an actual CAD model running in CATIA, CATIA Live Rendering break down the frontier between industrial modeling and realistic rendering for design. Powered by Iray and coupled with NVIDIA VCA, it ensures real-time photo realistic rendering and unprecedented speed batching for all of your marketing assets. Follow-up a live actual creation workflow from ideation to marketing assets using the 3DEXPERIENCE platform.  Back
 
Topics:
AEC & Manufacturing, Product Design & Styling, Visualization - Large Scale & Multi-Display, Rendering & Ray Tracing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5541
Streaming:
Download:
Share:
 
Abstract:
Santa Cruz Bicycles is an industry leading manufacturer of high-end, high-performance mountain bikes. Join Product Design Manager, Geoff Casey as he demonstrates his team's approach to creating bikes that are at the forefront of engineering. With co ...Read More
Abstract:
Santa Cruz Bicycles is an industry leading manufacturer of high-end, high-performance mountain bikes. Join Product Design Manager, Geoff Casey as he demonstrates his team's approach to creating bikes that are at the forefront of engineering. With color and graphic design such a critical aspect of bike design, the company leverages visual computing tools to gain an advantage in a highly competitive industry. Harnessing the power of the GPU in conjunction with Bunkspeed's 3D visualization software, Santa Cruz's design team rapidly realizes their vision in real time, making on the fly design decisions that cut both time and cost out of the product development lifecycle.  Back
 
Topics:
AEC & Manufacturing, Product Design & Styling, Rendering & Ray Tracing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5659
Streaming:
Download:
Share:
 
Abstract:
Yaskawa Motoman successfully improved the speed and quality of rendering processes to promote its latest robotic and automation solutions by leveraging the strengths of WebGL visualization applications (CL3VER) and NVIDIA's Quadro GPU technology. Ga ...Read More
Abstract:
Yaskawa Motoman successfully improved the speed and quality of rendering processes to promote its latest robotic and automation solutions by leveraging the strengths of WebGL visualization applications (CL3VER) and NVIDIA's Quadro GPU technology. Gain insight into how Yaskawa's Sales & Marketing Group provides interactive 3D marketing experiences to enhance the promotion of next generation robotic solutions.  Back
 
Topics:
AEC & Manufacturing, Product Design & Styling
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5673
Streaming:
Download:
Share:
 
Abstract:
After a year of 500 users working with NVIDIA GRID in a virtualized CAD environment at PSA Peugeot Citroen, we will present the who, what, where, why, and how the PSA IT department enables CAD workstations end users to work almost anywhere. Lear ...Read More
Abstract:

After a year of 500 users working with NVIDIA GRID in a virtualized CAD environment at PSA Peugeot Citroen, we will present the who, what, where, why, and how the PSA IT department enables CAD workstations end users to work almost anywhere. Learn how virtualization helps us to handle our business challenges and the benefits and improvements virtualization brought to our business.

  Back
 
Topics:
AEC & Manufacturing, Autonomous Vehicles, GPU Virtualization, Product Design & Styling
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5625
Streaming:
Share:
Artificial Intelligence and Deep Learning
Presentation
Media
Abstract:
We proposed GPU parallelization of Modified Fuzzy Hyper Line Segment Neural Network (MFHLSNN) [1] [2] using NVIDIA's CUDA. The training, classification and testing phase of MFHLSNN are data parallel tasks and parallelized on GPU. The skin da ...Read More
Abstract:

We proposed GPU parallelization of Modified Fuzzy Hyper Line Segment Neural Network (MFHLSNN) [1] [2] using NVIDIA's CUDA. The training, classification and testing phase of MFHLSNN are data parallel tasks and parallelized on GPU. The skin data set [3] available on UCI repository is used in this work. We obtained 2.5x, 10.75x and 10.71x speedup for training, classification and recognition, respectively using single NVIDIA's Tesla K20 GPU, with 99.7% recognition.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5121
Download:
Share:
 
Abstract:
Controlling fluid flow shape provides a fundamental tool for numerous material science and biological applications. We have recently discovered and demonstrated the flow sculpting concept in a micro-channel using a set of pillars. However, creat ...Read More
Abstract:

Controlling fluid flow shape provides a fundamental tool for numerous material science and biological applications. We have recently discovered and demonstrated the flow sculpting concept in a micro-channel using a set of pillars. However, creating user-defined flow shapes for practical applications requires laborious and time-consuming design iterations. So, to develop a practical design tool we explore the applicability of Deep learning models to serve as a map between user-defined flow shapes and the corresponding sequences of pillars.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5173
Download:
Share:
 
Abstract:
Optimal performance of decision tree construction on the GPU varies with the data under analysis. Significant acceleration is generally available, however, by using parallel scan to partition ranked predictor sets, as well as by exploiting the p ...Read More
Abstract:

Optimal performance of decision tree construction on the GPU varies with the data under analysis. Significant acceleration is generally available, however, by using parallel scan to partition ranked predictor sets, as well as by exploiting the predictor-level concurrency offered by argmax computations. Additional predictor concurrency can be exposed by training many trees at once, essentially unrolling the tree-construction loop.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Life & Material Science
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5282
Download:
Share:
 
Abstract:
In Purine framework, a deep network is expressed as a bipartite graph (bi-graph), which is composed of operators and data tensors. Different parallelism schemes over GPUs and/or CPUs on single or multiple PCs can be universally implemented by gr ...Read More
Abstract:

In Purine framework, a deep network is expressed as a bipartite graph (bi-graph), which is composed of operators and data tensors. Different parallelism schemes over GPUs and/or CPUs on single or multiple PCs can be universally implemented by graph composition. This eases researchers from coding for various parallelization schemes. Scheduled by the task dispatcher, memory transfers are fully overlapped with other computations, which greatly reduce the communication overhead and help us achieve approximate linear acceleration.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Tools & Libraries
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5285
Download:
Share:
 
Abstract:
The Support Vector Machine (SVM) is a fundamental machine learning algorithm, effective for many classification problems, but with a high computational cost. Moreover, to obtain the best results for a given problem, the SVM meta-parameters need ...Read More
Abstract:

The Support Vector Machine (SVM) is a fundamental machine learning algorithm, effective for many classification problems, but with a high computational cost. Moreover, to obtain the best results for a given problem, the SVM meta-parameters need to be tuned, leading to numerous SVM executions and to a huge execution time. We have developed a semi-automatic solution based on OpenACC that allows the use of multiple GPUs for fast and efficient SVM meta-parameter tuning. We present our results on several handwritten digit classification problems.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Tools & Libraries
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5299
Download:
Share:
 
Abstract:
Poster presents the server for feature selection that is build around computational engine developed on GPU. Server can be used to find informative features in data-sets described with millions of features. A very fast CUDA based engine allows f ...Read More
Abstract:

Poster presents the server for feature selection that is build around computational engine developed on GPU. Server can be used to find informative features in data-sets described with millions of features. A very fast CUDA based engine allows for exhaustive multidimensional searches of all combinations of features. Models in 2, 3, 4 and 5 dimensions can be built. The GPU engine performance is 1.35 *10^9 model evaluations per second. We describe algorithm and CUDA implementation using example of 2-dimensional GWAS.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5301
Download:
Share:
 
Abstract:
This poster presents the results of applying Association Analysis to two widely differing domains, the world of Foreign Exchange numbers, and the world of News texts. In both cases the results provide insights that demonstrably correspond to rea ...Read More
Abstract:

This poster presents the results of applying Association Analysis to two widely differing domains, the world of Foreign Exchange numbers, and the world of News texts. In both cases the results provide insights that demonstrably correspond to real world events, and in both cases the analysis avoids user bias to the maximum extent possible. The challenges of the approach are met by the observation that patterns are parallel, just like the hardware. Combinations can be checked in parallel, so the GPU is simply the right tool for the job.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5311
Download:
Share:
 
Abstract:
Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, ...Read More
Abstract:

Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, and performs two 1D convolutions consecutively. The GPU implementation consists of two kernels. First is a batched SVD routine on GPUs that can compute multiple small matrices simultaneously. Second is the computation of convolution, which combines three methods using different memory spaces for various filter size. Experimental results show that the implementations can achieve 1.35x~2.66x speedup in the forward pass and the backward pass comparing to the state of art GPU implementations of CNNs.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5147
Streaming:
Share:
 
Abstract:
As the explosive growth of data, data mining has become a significant research domain. Recommendation systems, that automatically push knowledge from massive data collection to the users, is a hot topic. Collaborative filtering (CF) is one of th ...Read More
Abstract:

As the explosive growth of data, data mining has become a significant research domain. Recommendation systems, that automatically push knowledge from massive data collection to the users, is a hot topic. Collaborative filtering (CF) is one of the essential algorithms in recommendation system. The goal of this session is to show how to accelerate the computation in CF by using multi-GPU platform. Firstly, we identify the computation kernel, similarity matrix calculation. Then, present a CUDA multi-thread model, where the data elements are processed in a data-parallel fashion. Next, propose a workload partitioning scheme thereby balanced workload can be distributed to different GPUs. Experiments on a real-world dataset demonstrate its performance on a platform with 4 Tesla K10 cards.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5158
Streaming:
Download:
Share:
 
Abstract:
Convolutional neural networks (CNNs) have achieved an impressive suite of results on image classification. Industry adoption, for instance by Alibaba, also indicates bright prospects. In this talk we will present several methods to optimize and ...Read More
Abstract:

Convolutional neural networks (CNNs) have achieved an impressive suite of results on image classification. Industry adoption, for instance by Alibaba, also indicates bright prospects. In this talk we will present several methods to optimize and accelerate GPU implementation of Convolutional Neural Networks. An optimized implementation is given as an example which has smaller memory footprints and performs 1.4 to 3 times faster than Caffe.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5159
Streaming:
Download:
Share:
 
Abstract:
This presentation describes GPU acceleration of convolution neural networks for speech processing workloads. We compare three alternatives for implementing core computational kernels, hand-coded, using CUBLAS, and using CUDNN. We describe impact ...Read More
Abstract:

This presentation describes GPU acceleration of convolution neural networks for speech processing workloads. We compare three alternatives for implementing core computational kernels, hand-coded, using CUBLAS, and using CUDNN. We describe impact of each approach on the algorithmic design and discuss how each approach impacts performance and result accuracy.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5231
Streaming:
Download:
Share:
 
Abstract:
This presentation proposes a novel, multi-GPU, parallel strategy to speed up deep neural network training. The strategy proposed here, RSPS, is a data parallelism method. For RSPS, there is no need of center node, multiple GPUs make up a ring an ...Read More
Abstract:

This presentation proposes a novel, multi-GPU, parallel strategy to speed up deep neural network training. The strategy proposed here, RSPS, is a data parallelism method. For RSPS, there is no need of center node, multiple GPUs make up a ring and work asynchronously. Every GPU transmits the model information to the next GPU directly. Theoretical analysis of the speedup and the model latency of RSPS are presented, the speedup extreme is given. The proposed strategy can be extended to multi-GPU even multi-server architecture easily. Experiment results show that the proposed strategy achieves an approximate linear speedup without loss in recognition performance from 3 GPUs to 8 GPUs. The proposed strategy is an efficient and effective GPU parallel strategy for DNN training.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5277
Streaming:
Share:
 
Speakers:
Abstract:
We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strateg ...Read More
Abstract:

We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.98% - a relative 10.2% improvement over the previous best result.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5280
Streaming:
Download:
Share:
 
Abstract:
Big data kind of problems emerge in the analysis of biological samples. Advanced acquisition methods that provide 3D mass spectrometry information along with sophisticated learning algorithms call for fast computation methods. GPUs are an enabli ...Read More
Abstract:

Big data kind of problems emerge in the analysis of biological samples. Advanced acquisition methods that provide 3D mass spectrometry information along with sophisticated learning algorithms call for fast computation methods. GPUs are an enabling technology to allow the analysis of the ever growing mass spectrometry data. Come hear about the machine learning algorithms migrated to the GPU environment, including Probabilistic Latent Semantic Analysis and Hierarchical Clustering Distance Calculation, with acceleration of more than 1-2 orders of magnitude. This work was under the framework of 3D Massomics, a European FP7 funded project that includes partners with expertise in imaging mass spectrometry, analytical chemistry, medicine, statistics, bioinformatics, and parallel computing.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics, Life & Material Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5311
Streaming:
Download:
Share:
 
Abstract:
This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrate ...Read More
Abstract:

This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Components for Assisted Rescue and Unmanned Search operations) concentrates on aiding these efforts by providing robotic components for rescue teams. Adding mobile robots into the tray rises the need for additional planning effort, which would consume a lot of time using classical approach. We will present how this can be prevented by using CUDA based mission planners for solving tasks like path planning, patrol communication relay location etc. A number of CUDA-implemented algorithms will be shown along with example results.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5319
Streaming:
Download:
Share:
 
Abstract:
We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Pre ...Read More
Abstract:

We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Previously, Neural Network framework developers had to implement these low-level routines for GPUs on an ad-hoc basis, optimizing individual computational kernels by hand and repeating this work as new parallel processors emerged. cuDNN alleviates this burden by providing tuned black box implementations of these functions. The library is easy to integrate into existing frameworks, and provides optimized performance and memory usage across GPU generations. We discuss supported functionality, algorithmic implementation details and performance achieved.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Tools & Libraries, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5331
Streaming:
Share:
 
Abstract:
Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed dat ...Read More
Abstract:

Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed data, e.g., a single object within an image. We show how to enrich deep learning to jointly predict a set of random variables while leveraging learned variable correlations. To this end we present an efficient GPU driven algorithm based on neural networks that is able to jointly capture nonlinearities for multiple variables and their correlations.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5368
Streaming:
Download:
Share:
 
Abstract:
In past years we have seen many projects that build and maintain highly optimized implementations of Support Vector Machines (SVM) that leverage GPUs. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite i ...Read More
Abstract:

In past years we have seen many projects that build and maintain highly optimized implementations of Support Vector Machines (SVM) that leverage GPUs. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its popularity in many high impact applications, including genetics, neuroscience and systems biology. Rather than crafting a new GPU implementation for the Elastic Net, we introduce a novel reduction from the Elastic Net to the SVM, two seemingly disparate algorithms. This allows us to implement the Elastic Net in a way that spends almost all of its time in an SVM solver. As a result, we can leverage already existing GPU implementations of SVM solvers, and achieve in 11 lines of MATLAB code the fastest Elastic Net by multiple orders of magnitude.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5543
Streaming:
Download:
Share:
 
Abstract:
Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing ...Read More
Abstract:

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe's complexity.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5552
Streaming:
Download:
Share:
 
Abstract:
We, at Nuance, use DNN in ASR systems in multiple languages and tasks. We train DNN with large amounts of data. DNNs are trained with gradient descent methods that are slow and difficult to parallelize across machines. In order to speed up the t ...Read More
Abstract:

We, at Nuance, use DNN in ASR systems in multiple languages and tasks. We train DNN with large amounts of data. DNNs are trained with gradient descent methods that are slow and difficult to parallelize across machines. In order to speed up the training process, we have developed training algorithms/recipes which can be used to train a DNN in parallel on multiple GPU devices. This can significantly reduce the DNN training time. We will present benchmark results that include the basic computational operations included in DNN training (SGEMM, Memory copy throughput, etc.) as well as the end-to-end training time on different GPU based hardware configurations. In particular we will benchmark systems based on K10 versus systems based on K80, with a number of GPU varying from 1 to 16.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Signal and Audio Processing, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5571
Streaming:
Download:
Share:
 
Abstract:
In this talk, we discuss the latest techniques for solving image classification, localization, and detection problems on a multi-GPU architecture. We will cover issues and algorithms associated with training convolutional neural networks, as wel ...Read More
Abstract:

In this talk, we discuss the latest techniques for solving image classification, localization, and detection problems on a multi-GPU architecture. We will cover issues and algorithms associated with training convolutional neural networks, as well as other network architectures, on small clusters of GPUs.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5580
Streaming:
Download:
Share:
 
Abstract:
This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry ( ...Read More
Abstract:

This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5581
Streaming:
Share:
 
Abstract:
Despite the great progress of the deep learning models (Deep Convolutional Neural Networks) in the field of visual recognition in the past few years, one of the greatest bottlenecks lies in the extremely long training hours (from several weeks t ...Read More
Abstract:

Despite the great progress of the deep learning models (Deep Convolutional Neural Networks) in the field of visual recognition in the past few years, one of the greatest bottlenecks lies in the extremely long training hours (from several weeks to months) to handle tens of millions of training images. The goal of this session is to share the results that we achieved when we used multiple-GPUs installed in one server to speed-up the training process. By configuring 16 GPUs (8 Titan Zs) and optimizing the parallel implementation for the CNN training, up to 14x speed increase is achieved without compromising, and even sometimes boosting, the model's accuracy. Comprehensive experimental results have demonstrated the linear scalability of the proposed multi-GPU training processes.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5585
Streaming:
Download:
Share:
 
Abstract:
GPU-optimized Deep Neural Networks (DNNs) excel on visual pattern recognition tasks. They are successfully used for automotive problems like pedestrian and traffic sign detection. DNNs are fast and extremely accurate. They help the field of conn ...Read More
Abstract:

GPU-optimized Deep Neural Networks (DNNs) excel on visual pattern recognition tasks. They are successfully used for automotive problems like pedestrian and traffic sign detection. DNNs are fast and extremely accurate. They help the field of connectomics by making it possible to segment and reconstruct the neuronal connections in large sections of brain tissue for the first time. This will bring a new understanding of how biological brains work. DNNs power automatic navigation of a quadcopter in the forest.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5590
Streaming:
Share:
 
Abstract:
BIDMach is a rich, extensible machine learning toolkit that fully exploits GPU acceleration. On a single machine with NVIDIA GPU, it holds records for most common ML tasks, outperforming cluster systems. This tutorial will overview BIDMach, from ...Read More
Abstract:

BIDMach is a rich, extensible machine learning toolkit that fully exploits GPU acceleration. On a single machine with NVIDIA GPU, it holds records for most common ML tasks, outperforming cluster systems. This tutorial will overview BIDMach, from its matrix layer, through to defining new learning algorithms. The tutorial is interactive, and we will provide an EC2 image for participants to follow along. Specifically we will cover:a hardware-agnostic matrix library (BIDMat), in-memory learning, scaling up to terabyte sources, parameter tuning, custom learners, creating new models, and interactive machine learning.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics, Visualization - In-Situ & Scientific
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5621
Streaming:
Download:
Share:
 
Abstract:
In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within a ...Read More
Abstract:

In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within an image without explicitly performing image segmentation or generating key point descriptors. We show that Reconstruction Networks can learn the structure of face and facial landmarks automatically, even under various poses and illumination conditions and outperform state-of-the-art performance for Face Detection and Facial Landmark Localization while requiring only a fraction of the computational cost.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Autonomous Vehicles, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5629
Streaming:
Share:
 
Abstract:
Deep Learning is a promising machine learning technique with a high barrier to entry. In this talk, we provide an easy entry into this field via "deep features" from pre-trained models. These features can be trained on one data set for ...Read More
Abstract:

Deep Learning is a promising machine learning technique with a high barrier to entry. In this talk, we provide an easy entry into this field via "deep features" from pre-trained models. These features can be trained on one data set for one task and used to obtain good predictions on a different task, on a different data set. No prior experience necessary. Real time demos will be given using GraphLab Create, a popular open source based software. GraphLab Create utilizes NVIDA GPUs for significant performance speedup.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Big Data Analytics, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5630
Streaming:
Download:
Share:
 
Abstract:
Speech is the user interface of the future, but today's implementations often fail when we need them the most, such as in noisy environments or when the microphone isn't close at hand. At Baidu, an increasing fraction of our users employ ...Read More
Abstract:

Speech is the user interface of the future, but today's implementations often fail when we need them the most, such as in noisy environments or when the microphone isn't close at hand. At Baidu, an increasing fraction of our users employ speech interfaces to find what they are looking for. In this talk, I will show how next generation deep learning models can provide state-of-the-art speech recognition performance. We train these models using clusters of GPUs using CUDA, MPI and Infiniband.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5631
Streaming:
Download:
Share:
 
Abstract:
Deep Learning methods including Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have demonstrated powerful acoustic modeling capabilities for Automatic Speech Recognition (ASR). However, the ...Read More
Abstract:

Deep Learning methods including Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have demonstrated powerful acoustic modeling capabilities for Automatic Speech Recognition (ASR). However, these methods often need large volumes of training data and consequently long training times. In this GTC talk, we will describe our distributed asynchronous training platform for training CNNs and RNNs across an array of GPUs.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5632
Streaming:
Download:
Share:
 
Abstract:
Recent advances in Deep Learning have resulting in significant improvements in speech recognition, natural language processing and related tasks. In this talk, I will give an overview of the state-of-the-art in Deep Learning for Speech and Langu ...Read More
Abstract:

Recent advances in Deep Learning have resulting in significant improvements in speech recognition, natural language processing and related tasks. In this talk, I will give an overview of the state-of-the-art in Deep Learning for Speech and Language Processing and present recent work at CMU on GPU-Accelerated methods for Real-Time Speech and Language Processing, joint optimization for Spoken Language Understanding, and continuous On-line Learning methods.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5634
Streaming:
Share:
 
Abstract:
In previous work, we developed GPU-Accelerated Speech Recognition Engine optimized for faster than real time speech recognition on heterogeneous CPU-GPU architecture. In this work, we extended this work to focus on developing a scalable server-c ...Read More
Abstract:

In previous work, we developed GPU-Accelerated Speech Recognition Engine optimized for faster than real time speech recognition on heterogeneous CPU-GPU architecture. In this work, we extended this work to focus on developing a scalable server-client speech recognition solution specifically optimized for simultaneous decoding of multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, we applied "Producer-Consumer" multi-threaded model. In this model, a single producer thread accepts work items and passes these to the consumer threads via work queue. We divide entire speech recognition process into three consumer classes. These consumer classes are pipelined and connected via task queues to achieve maximum hardware utilization.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5635
Streaming:
Download:
Share:
 
Abstract:
Occlusion edge detection is a very challenging task in robotics and unmanned autonomous vehicles. The occlusion edges in images correspond to range discontinuity in the scene from the point of view of the observer and extracting these edges from ...Read More
Abstract:

Occlusion edge detection is a very challenging task in robotics and unmanned autonomous vehicles. The occlusion edges in images correspond to range discontinuity in the scene from the point of view of the observer and extracting these edges from raw image and videos are very computationally intensive. Deep Learning techniques have largely replaced existing methods for extracting information in similar applications by mapping the problem to large multi-layer neural networks. These techniques rely on utilizing Deep Convolutional Neural Networks (DCNNs) with multiple hidden layers for capturing the local spatial correlations, that help in identifying occlusion edges in images and videos.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Intelligent Machines, IoT & Robotics, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5646
Streaming:
Download:
Share:
 
Abstract:
This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussi ...Read More
Abstract:

This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussion of large-scale image recognition, a history of the ILSVRC and an overview of current techniques and trends in image classification and object detection, as well as the role that GPUs have played in this challenge.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5665
Streaming:
Share:
 
Abstract:
Image feature representation plays an essential role in image recognition. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits ...Read More
Abstract:

Image feature representation plays an essential role in image recognition. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits its applicability to domains where labels are hard to obtain. I will present a new data-driven feature learning paradigm which does not rely on category labels. Instead, we learn from user behavior data collected on social media. We use the image relationship discovered in the latent space from the user behavior data to guide the image feature learning. Also presented is a new large-scale image and user behavior dataset collected on Behance.net. The dataset consists of 1.9 million images and over 300 million view records from 1.9 million users.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5713
Streaming:
Download:
Share:
 
Opening Keynote (Keynote Talk)
Abstract:
Don't miss GTC's opening keynote address from NVIDIA CEO and co-founder Jensen Huang. He'll discuss the latest breakthroughs in visual computing, including how NVIDIA is fueling the revolution in deep learning.
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Keynote
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S2000
Streaming:
Download:
Share:
 
Abstract:
This works addresses the problem of recognizing font style of the text from an image. Our algorithm is based on a carefully designed deep convolutional neural network. Since collecting real-world training text images for font recognition is extr ...Read More
Abstract:

This works addresses the problem of recognizing font style of the text from an image. Our algorithm is based on a carefully designed deep convolutional neural network. Since collecting real-world training text images for font recognition is extremely difficult, we have to resort to synthetic training data, which unfortunately has a large domain mismatch from the real-world test examples. Besides data augmentation techniques of adding synthetic degradations, we also present a domain adaptation framework to bring the gap between synthetic training and real-world testing. In particular, we introduce a convolutional neural network decomposition approach to obtain effective features for classification, which is done based on stacked convolutional auto encoders. Millions of images are used in the model, which could not have been trained without the GPU and CUDA. The proposed DeepFont system achieves top-5 accuracy of over 80% on a large labeled real-world test set we collected.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5720
Streaming:
Download:
Share:
 
Abstract:
The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ...Read More
Abstract:

The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ago. Although the fundamental techniques were developed in the 1980s and 1990s, it was only recently that they were applied at large scale, due to the advent of general-purpose GPU computing and the availability of internet-scale datasets. The deep learning experts at Clarifai have spent years working alongside pioneers of the field and form a team who has vast experience developing new deep learning techniques and building state of the art systems that solve real problems. In this talk we will present some of the latest technologies we have developed and show how they can be applied to power a new generation of intelligent applications.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Tools & Libraries, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5740
Streaming:
Share:
 
Abstract:
Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of to ...Read More
Abstract:

Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ever-changing (on a weekly, daily, or sometimes hourly basis), and the volume ever-growing; making it very challenging to automatically and continuously expose relevant content. Manually defining features to represent this data is showing its limits. In this talk, I provide an overview of how automated, content-driven representationsenabled by modern deep-learning algorithmsenables us to build adaptive systems which capture the richness of this content. Specifically, the presentation focuses on deep representations for images and images+text.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5760
Streaming:
Share:
 
Abstract:
Since 2009, Microsoft has engaged with academic pioneers of deep learning and has created industry-scale successes in speech recognition as well as in speech translation, object recognition, automatic image captioning, natural language, multimod ...Read More
Abstract:

Since 2009, Microsoft has engaged with academic pioneers of deep learning and has created industry-scale successes in speech recognition as well as in speech translation, object recognition, automatic image captioning, natural language, multimodal processing, semantic modeling, web search, contextual entity search, ad selection, and big data analytics. Much of these successes are attributed to the availability of big datasets for training deep models, the powerful general-purpose GPU computing, and the innovations in deep learning architectures and algorithms. In this talk, a selected overview will be given to highlight our work in some of these exciting applications, as well as the lessons we have learned along the way as to what tasks are best solved by deep learning methods.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5788
Streaming:
Share:
 
Abstract:
We have witnessed many ground-breaking results in computer vision research using deep learning techniques. In this talk, we introduce recent achievements in our group (http://sensetime.com/) which we believe will bridge the gap between research ...Read More
Abstract:

We have witnessed many ground-breaking results in computer vision research using deep learning techniques. In this talk, we introduce recent achievements in our group (http://sensetime.com/) which we believe will bridge the gap between research and product development and will bring about many computer-vision-enabled smart products. We show that our unified deep CNN framework, accelerated using modern GPU architecture, can be easily applied to various vision tasks including image processing, pedestrian detection, object localization and face recognition, meanwhile achieving state-of-the-art performance.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Media and Entertainment, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5800
Streaming:
Download:
Share:
 
Abstract:
Looking for a simplified way to program machine learning algorithms? This tutorial will give you hands on experience implementing Deep Belief Networks using ArrayFire and other CUDA tools. Learn the best practices for implementing parallel versi ...Read More
Abstract:

Looking for a simplified way to program machine learning algorithms? This tutorial will give you hands on experience implementing Deep Belief Networks using ArrayFire and other CUDA tools. Learn the best practices for implementing parallel versions of popular algorithms on GPUs. Instead of reinventing the wheel, you will learn where to find and how to use excellent versions of these algorithms already available in CUDA and ArrayFire libraries. You will walk away equipped with the best tools and knowledge for implementing accelerated machine learning algorithms.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Tools & Libraries, Developer - Algorithms
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5803
Streaming:
Download:
Share:
 
Abstract:
Preferred Networks, Inc (PFN) specialized in distributed machine learning technology, with a focus on Deep Learning, for the Internet of Things (IoT). In this session, we will first introduce PFN's goal - the realization of Distributed Deep ...Read More
Abstract:

Preferred Networks, Inc (PFN) specialized in distributed machine learning technology, with a focus on Deep Learning, for the Internet of Things (IoT). In this session, we will first introduce PFN's goal - the realization of Distributed Deep Intelligence using GPU technology synergistic implementation and integration of Deep Learning intelligence throughout the IoT networks. We will show our current Deep Learning projects for IoT, including Surveillance camera, Retail solution, Automobile, and Bio/Healthcare. In particular, we will discuss the development of distributed Deep Neural Networks for drug discovery using the entire PubChem database via GPU technologies.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Emerging Companies Summit, Life & Material Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5813
Streaming:
Download:
Share:
 
Abstract:
Over the past few years, we have built large-scale computer systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. We have made significant im ...Read More
Abstract:

Over the past few years, we have built large-scale computer systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. We have made significant improvements in the state-of-the-art in many of these areas, and our software systems and algorithms have been used by dozens of different groups at Google to train state-of-the-art models for speech recognition, image recognition, various visual detection tasks, language modeling, language translation, and many other tasks. In this talk, I''ll highlight some of the distributed systems and algorithms that we use in order to train large models quickly. I''ll then discuss ways in which we have applied this work to a variety of problems in Google''s products, usually in close collaboration with other teams. This talk describes joint work with many people at Google.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Keynote
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5817
Streaming:
Download:
Share:
 
Abstract:
Deep Learning has transformed many important tasks, including speech and image recognition. Deep Learning systems scale well by absorbing huge amounts of data to create accurate models. The computational resources afforded by GPUs have been inst ...Read More
Abstract:

Deep Learning has transformed many important tasks, including speech and image recognition. Deep Learning systems scale well by absorbing huge amounts of data to create accurate models. The computational resources afforded by GPUs have been instrumental to this scaling. However, as Deep Learning has become more mainstream, it has generated some hype, and has been linked to everything from world peace to evil killer robots. In this talk, Dr. Ng will help separate hype from reality, and discuss potential ways that Deep Learning technologies can benefit society in the short and long term.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computer Vision
Type:
Keynote
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5818
Streaming:
Download:
Share:
 
Abstract:
Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in signifi ...Read More
Abstract:

Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in significant performance improvements over existing methods. In particular, we show how operations such as convolutions and dense matrix multiply can be efficiently implemented using a custom assembler to attain state-of-the-art performance on the NVIDIA Maxwell GPU architecture. Additionally, we can significantly reduce memory bandwidth and run much larger models by using limited precision with a minimal tradeoff in model accuracy.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5873
Streaming:
Download:
Share:
 
Abstract:
DIGITS provides a user-friendly interface for training and classification that can be used to train DNNs with a few clicks. DIGITS gives users easy access to existing databases and previously trained network models, as well as training activitie ...Read More
Abstract:

DIGITS provides a user-friendly interface for training and classification that can be used to train DNNs with a few clicks. DIGITS gives users easy access to existing databases and previously trained network models, as well as training activities in progress. Modifying your network configuration to maximize accuracy is easily accomplished with this platform too. The network configuration process is intuitive, making it easy for experienced DL experts to use and researchers just getting started. The main console helps users keep track of their changes. This tool runs as a web application making it easy to share results and collaborate. The workflow for using DIGITS will be presented and discussed in this session.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5924
Streaming:
Share:
Astronomy & Astrophysics
Presentation
Media
Abstract:
This poster presents Weather Research and Forecast (WRF) model, a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, ...Read More
Abstract:
This poster presents Weather Research and Forecast (WRF) model, a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research communities. WRF offers multiple physics options, one is the Long-Wave Rapid Radiative Transfer Model (RRTM). Even with the advent of large-scale parallelism in weather models, much of the performance increase has come from increasing processor speed rather than increased parallelism. We present an alternative method of scaling model performance.  Back
 
Topics:
Astronomy & Astrophysics, Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5144
Download:
Share:
 
Abstract:
There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of inform ...Read More
Abstract:

There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of information is not meant for humans to process, and CPUs and traditional algorithm both meet their bottleneck in processing. With the help of recent deep learning technologies and powerful implementations with NVIDIA's GPUs, the developed models can competitively accurately classify galaxies.

  Back
 
Topics:
Astronomy & Astrophysics, Artificial Intelligence and Deep Learning, Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5176
Download:
Share:
 
Abstract:
Dynamics in the solar atmosphere, including solar flares, coronal mass ejections, micro-flares and different types of jets, are powered by the evolution of the sun's intense magnetic field. 3D Radiative Magnetohydrodnamics (MHD) computer simulations ...Read More
Abstract:
Dynamics in the solar atmosphere, including solar flares, coronal mass ejections, micro-flares and different types of jets, are powered by the evolution of the sun's intense magnetic field. 3D Radiative Magnetohydrodnamics (MHD) computer simulations have furthered our understanding of the processes involved. Detailed analysis of this evolution entails tracing magnetic field lines, an operation which is not time-efficient on a single processor. By utilizing a GPU to trace lines in parallel, conducting such analysis is made feasible.  Back
 
Topics:
Astronomy & Astrophysics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5196
Download:
Share:
 
Abstract:
Current and future NASA planetary missions are generating ever-increasing volumes of terrain data from orbital and surface-based assets at vastly different resolutions. We have applied an alternative technology, subdivision surfaces, coupled with a n ...Read More
Abstract:
Current and future NASA planetary missions are generating ever-increasing volumes of terrain data from orbital and surface-based assets at vastly different resolutions. We have applied an alternative technology, subdivision surfaces, coupled with a novel volumetric reconstruction process, to help manage and present high-fidelity mesh representations of the disparate range of terrain data collected by rovers and satellites. Applications include terrain data visualization, autonomous navigation, and other localization and mapping problems.  Back
 
Topics:
Astronomy & Astrophysics, Visualization - In-Situ & Scientific
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5307
Download:
Share:
 
Abstract:
We have implemented an extensive package of detector modeling and image reconstruction algorithms for Swift's BAT telescope on a Tesla K20 processor. Individual reconstructed images with 2 million pixels are reprocessed with a compute intensive nois ...Read More
Abstract:
We have implemented an extensive package of detector modeling and image reconstruction algorithms for Swift's BAT telescope on a Tesla K20 processor. Individual reconstructed images with 2 million pixels are reprocessed with a compute intensive noise reduction algorithm which has been modified to run under CUDA 6.5. Methods employed to port existing code to a GPU implementation with a minimum of code development are presented.  Back
 
Topics:
Astronomy & Astrophysics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5316
Download:
Share:
 
Abstract:
Stellar systems come in many shapes and sizes. We present two new GPU-accelerated N-body codes focusing on two kind of systems: dwarf spheroidal galaxies and globular clusters. ETICS is based on series expansion of the Poisson equation and is ideal f ...Read More
Abstract:
Stellar systems come in many shapes and sizes. We present two new GPU-accelerated N-body codes focusing on two kind of systems: dwarf spheroidal galaxies and globular clusters. ETICS is based on series expansion of the Poisson equation and is ideal for diffuse objects such as dwarf galaxies. Since in globular clusters close stellar encounters and binaries play very important roles in the dynamics, a much more accurate integrator is needed. NBODY6++ is a direct-summation N-body code which can provide this kind of accuracy.  Back
 
Topics:
Astronomy & Astrophysics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5323
Download:
Share:
 
Abstract:
The estimation of the Maximum Likelihood (MLE) is the most robust algorithm used in gamma-ray astronomy but, particularly if used in conduction with "unbinned" analysis, it uses a huge amount of computing resources. Typically, the estimatio ...Read More
Abstract:
The estimation of the Maximum Likelihood (MLE) is the most robust algorithm used in gamma-ray astronomy but, particularly if used in conduction with "unbinned" analysis, it uses a huge amount of computing resources. Typically, the estimation of the maximum is left to a single-thread minimizer, like MINUIT, running on a CPU while providing a callback function that may estimate the likelihood on the GPU. We propose an alternative to the MINUIT package, that leverages Dynamic Parallelism and runs entirely on GPUs.  Back
 
Topics:
Astronomy & Astrophysics, Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5327
Download:
Share:
 
Abstract:
Detectors for Gamma-ray Astronomy are the prototypes for distributed experiments. Single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them ...Read More
Abstract:
Detectors for Gamma-ray Astronomy are the prototypes for distributed experiments. Single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them to the central data acquisition provides a key advantage. We aim at developing and testing algorithms and techniques to implement such kind of local data sparsification at detector level. To reach this goal, we leveraged, and compare, the parallel capabilities of Kayla and Jetson TK1.  Back
 
Topics:
Astronomy & Astrophysics, Intelligent Machines, IoT & Robotics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5328
Download:
Share:
 
Abstract:
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the ...Read More
Abstract:
Come and learn how GPUs can help discovering the most distant galaxies by performing close to real-time simulations at an unprecedented scale of the multi-object adaptive optics technique (MOAO). The European Southern Observatory (ESO) is leading the construction of the European Extremely Large Telescope (E-ELT), a 39m diameter telescope, to provide Europe with the biggest eye on the universe ever built. MOA is the most complex adaptive optics concept proposed for the E-ELT and simulating the instrument at full scale is extremely compute-intensive. The tomographic reconstructor (TR) is one of the core components of both the design simulations and eventually system operations and it requires the inversion of a large dense covariance matrix.  Back
 
Topics:
Astronomy & Astrophysics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5122
Streaming:
Share:
 
Abstract:
Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onb ...Read More
Abstract:

Since its launch in 2010, NASA's Solar Dynamics Observatory (SDO) has continuously monitored the Sun's changes in magnetic activity. Both the Atmospheric Imaging Assembly (AIA) and Helioseismic & Magnetic Imager (HMI) instruments onboard SDO deliver 4096x4096 pixel images at a cadence of more than one image per second. Although SDO images are free from distortion by absorption and scattering in the Earth's atmosphere, images are still blurred by the intrinsic point spread functions of the telescopes. In this presentation, we show how the instrument teams have deployed CUDA-enabled GPUs to perform deconvolution of SDO images. The presentation will demonstrate how we leveraged cuFFT and Thrust to implement an efficient image processing pipeline.

  Back
 
Topics:
Astronomy & Astrophysics, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5209
Streaming:
Download:
Share:
 
Abstract:
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the devel ...Read More
Abstract:
Radio astronomy imaging is a complex, compute and memory intensive problem, that is dominating the cost of next generation radio facilities. Using the MeerKAT telescope, currently under construction in South Africa, as a primer, we describe the development of a highly parallel, low power, low cost imager using System on Chip devices. In particular NVIDIA's TK1 and successors are considered. The talk will also briefly describe the opportunities and solutions presented by the forthcoming Square Kilometer Array, whose processing costs require game changing technology shifts to become achievable.  Back
 
Topics:
Astronomy & Astrophysics, Intelligent Machines, IoT & Robotics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5222
Streaming:
Download:
Share:
 
Abstract:
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is mot ...Read More
Abstract:
We introduce Taranis, a library for performing ray-traced radiation transport in smoothed particle hydrodynamics (SPH) entirely on the GPU. We discuss the design, algorithm, and key optimizations (such as ray packets) for our use-case. Taranis is motivated by the current intractability of coupled radiation-hydrodynamics simulations. This talk focuses on Taranis' tracing component, which has been influenced by recent work in computer graphics. It outperforms a 32-core CPU code on a single GPU. Our scheme allows particles to be updated independently and requires fewer rays than a typical 'long characteristics' method. Taranis' radiation transport solver is also implemented on the GPU, and targets large-scale simulations of reionization. However, the tracing API exists as a standalone entity.  Back
 
Topics:
Astronomy & Astrophysics, Computational Physics, Rendering & Ray Tracing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5266
Streaming:
Download:
Share:
 
Abstract:
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio ...Read More
Abstract:
We present summary of optimization work on GPU-based Correlator pipeline code. This is an on-going effort jointly between National Center for Radio Astrophysics (NCRA) and NVIDIA. The central goal of the effort is to upgrade the Giant Meterwave Radio Telescope (GMRT) receiver with wide-band GPU-based back-end and extending this design as a proposal for back-end for the LOW frequency array of the SKA Telescope. We look at various processing stages involved in pipeline for exploring optimization possibilities with some interesting results already achieved.  Back
 
Topics:
Astronomy & Astrophysics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5302
Streaming:
Download:
Share:
 
Abstract:
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and ...Read More
Abstract:
Learn how to use GPUs on the desktop to study the structure and evolution of the Universe: how galaxies are pulled together by gravity, and how space expands under the influence of Dark Energy. Metrics used to describe this structure are the two- and three-point correlation functions, which quantify the clustering of galaxies. Cosmological datasets can number in the millions (and soon billions) of galaxies, making these O(N^2) and O(N^3) metrics computationally challenging. This talk will detail how we have ported solutions to the GPU. In particular we focus on the novel histogramming bottlenecks inherent in these calculations, and how they can be mitigated. Throughout we will emphasise how GPUs and heterogeneous computing can be used for everyday data analysis with large datasets.  Back
 
Topics:
Astronomy & Astrophysics, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5509
Streaming:
Download:
Share:
 
Abstract:
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that ...Read More
Abstract:
The evolution of the universe is an extraordinarily fascinating and, of course, complex problem. Scientists use the most advanced simulation codes to try to describe and understand the origin and the behavior of the incredible variety of objects that populate it: stars, galaxies, black holes The most powerful computing systems are required to pursue such goals and GPUs represent an outstanding opportunity. In this talk, we present one of these codes, Ramses, and the ongoing work to enable this code to efficiently exploit GPUs through the adoption of the OpenACC programming model. The most recent achievement will be shown together with some of the scientific challenges GPUs can help addressing.  Back
 
Topics:
Astronomy & Astrophysics, OpenACC, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5531
Streaming:
Download:
Share:
 
Abstract:
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsar ...Read More
Abstract:
In this talk I will give an introduction to the biggest of the upcoming big data science facilities, the Square Kilometre Array radio telescope (SKA), and will look at how GPUs will enable this instrument discover exotic rapidly spinning radio pulsars. Radio pulsars provide us with phenomenal tools with which we may probe the most extreme environments in the Universe. More massive than our Sun, yet spinning faster than a kitchen blender and sending jets of radio waves out from their magnetic poles, these exotic cosmic lighthouses are key to understanding gravity and allowing us to ask the question: was Einstein right? To answer this question we must use the SKA to scour the Galaxy in search of exotic pulsars binary systems. This task is extremely computationally expensive, requiring the execution of many billions of Fourier transforms. Here I will review the work being done to leverage the power of GPUs to solve the SKAs pulsar searching challenge.  Back
 
Topics:
Astronomy & Astrophysics, Big Data Analytics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5875
Streaming:
Download:
Share:
Augmented Reality and Virtual Reality
Presentation
Media
Abstract:
We present a near-eye light field display design that supports accommodation and high spatial resolution while using with the same bandwidth as a conventional display. A light source array reflects light in multiple directions off a high speed binary ...Read More
Abstract:
We present a near-eye light field display design that supports accommodation and high spatial resolution while using with the same bandwidth as a conventional display. A light source array reflects light in multiple directions off a high speed binary display, creating a light field over the eye. The display bandwidth conventionally used for color gradations is instead used to create a high angular resolution binary light field; color gradations will be partially recovered when the light field is collected by the eye and focused on the retina.  Back
 
Topics:
Augmented Reality and Virtual Reality
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5248
Download:
Share:
 
Abstract:
One of the main objectives of virtual reality based surgical simulation systems is the removal of pathologic tissues. Cutting imposes many challenges in the development of a robust, interactive surgery simulation, not only because of the nonlinear ma ...Read More
Abstract:
One of the main objectives of virtual reality based surgical simulation systems is the removal of pathologic tissues. Cutting imposes many challenges in the development of a robust, interactive surgery simulation, not only because of the nonlinear material behavior exhibited by soft tissue but also due to the complexity of introducing the cutting-induced discontinuity. We propose a high performance cutting algorithm for complex tetrahedral meshes. As a proof of concept we integrated our algorithm in a craniotomy simulation.  Back
 
Topics:
Augmented Reality and Virtual Reality, Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5254
Download:
Share:
 
Abstract:
We developed a web-based 3D interactive learning environment for teaching hydrological concepts. The system provides a visually striking platform with realistic terrain information, and water simulation. Students can create scenarios, control pa ...Read More
Abstract:

We developed a web-based 3D interactive learning environment for teaching hydrological concepts. The system provides a visually striking platform with realistic terrain information, and water simulation. Students can create scenarios, control parameters, and evaluate mitigation alternatives. The system utilizes web technologies and GPU for water simulation and object collisions on the terrain. The system supports virtual reality, augmented and immersive reality modes, and enables interaction using gesture, body movement and portable devices.

  Back
 
Topics:
Augmented Reality and Virtual Reality
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5255
Download:
Share:
 
Abstract:
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more genera ...Read More
Abstract:
Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.  Back
 
Topics:
Augmented Reality and Virtual Reality, Computer Vision, Medical Imaging & Radiology, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5182
Streaming:
Download:
Share:
 
Abstract:
It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely ...Read More
Abstract:

It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.

  Back
 
Topics:
Augmented Reality and Virtual Reality, Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5626
Streaming:
Download:
Share:
 
Abstract:
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this tal ...Read More
Abstract:
Virtual reality is the next frontier of gaming, and NVIDIA is leading the way by introducing VR Direct a set of hardware and software technologies we're creating to cut down graphics latency and accelerate stereo rendering performance. In this talk, we'll show how developers can use NVIDIA GPUs and VR Direct to improve the gaming experience on the Oculus Rift and other VR headsets.  Back
 
Topics:
Augmented Reality and Virtual Reality, Gaming and AI, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5668
Streaming:
Download:
Share:
 
Abstract:
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will de ...Read More
Abstract:
This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will demonstrate showcases the NVIDIA® VCA cluster for cloud-based rendering, NVENC for low-latency video encoding, and Google's Project Tango with the Tegra K1 processor for pose tracking and video decoding. The demo system presented can also serve graphics to multiple low-latency devices, such as a Virtual Reality HMD, at a rate much faster than the graphics are rendered.  Back
 
Topics:
Augmented Reality and Virtual Reality, Media and Entertainment, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5733
Streaming:
Share:
 
Abstract:
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU a ...Read More
Abstract:
Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU acceleration and cheap sensors has enabled low-cost consumer-grade VR, and the rapid adoption of software development kits is paving the way for creating virtual reality apps on platforms from desktops to smartphones, and even running in your web browser using WebGL. Join VR pioneer and WebGL developer Tony Parisi as he explores this exciting frontier. This session will take a look at the latest VR hardware devices, supported operating systems and software development kits, and a wide applications already being deployed.  Back
 
Topics:
Augmented Reality and Virtual Reality, Tools & Libraries, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5737
Streaming:
Download:
Share:
Autonomous Vehicles
Presentation
Media
Abstract:
Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remai ...Read More
Abstract:

Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remains a problem for the car manufacturer. We propose a method to predict performance of computer vision algorithms on multiple, heterogeneous architectures in order to help choosing the best algorithm - architecture association. The approach is illustrated with a lane detection algorithm embedded on the K1.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5158
Download:
Share:
 
Abstract:
GPU acceleration of robotic services focused on 3D point cloud processing of robotic depth sensors to approach real time for use in self-driving automobiles.
 
Topics:
Autonomous Vehicles, Intelligent Machines, IoT & Robotics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5192
Download:
Share:
 
Abstract:
This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implic ...Read More
Abstract:

This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5108
Streaming:
Download:
Share:
 
Abstract:
Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, ...Read More
Abstract:

Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5123
Streaming:
Download:
Share:
 
Abstract:
Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars.
 
Topics:
Autonomous Vehicles, Intelligent Machines, IoT & Robotics, AEC & Manufacturing, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5137
Streaming:
Share:
 
Abstract:
A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the fi ...Read More
Abstract:

A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5295
Streaming:
Download:
Share:
 
Abstract:
Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitte ...Read More
Abstract:

Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted into these cars as a proof-of-concept for next-generation digital clusters and infotainment systems.

  Back
 
Topics:
Autonomous Vehicles, Intelligent Machines, IoT & Robotics, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5396
Streaming:
Share:
 
Abstract:
Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel ...Read More
Abstract:

Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computing to up-integrate traditionally disparate vehicle systems. We will also discuss the advantages and challenges involved in this process.

  Back
 
Topics:
Autonomous Vehicles
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5469
Streaming:
Download:
Share:
 
Abstract:
Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within th ...Read More
Abstract:

Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing and safety constraints required by the automotive industry. In addition, learn how the solution allows controlled communication between virtualized operating systems and the vehicle networks while maintaining the isolation between both.

  Back
 
Topics:
Autonomous Vehicles
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5532
Streaming:
Download:
Share:
 
Abstract:
Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world ...Read More
Abstract:

Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world applications, such as infotainment systems, will find the bottlenecks in your system. Find them before the project fails or find options to transfer tasks to the GPU (e.g. using CUDA). Attendees will see how to transform your system architecture into a ""System Resource Model"" then find the ""Critical Use Cases"" of your application and match them with this model. This practical approach will show how to setup benchmarks in parallel to emulate use cases under reproducible conditions based on an example for an automotive infotainment system.

  Back
 
Topics:
Autonomous Vehicles, Intelligent Machines, IoT & Robotics, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5587
Streaming:
Download:
Share:
 
Abstract:
Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How ...Read More
Abstract:

Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can it be selected? Is the idea of a work load manager still relevant? On the other hand, autonomous driving brings new challenges for the vigilance and distraction of the driver. How can the driver be pulled back into the loop when required? When is it required? How can drivers be informed about the limits of the machine? We will also discuss methods on how to "measure" HMI and driving performance in automation, such as steering wheel reversal rate, standard deviation lane position, speed keeping and more.

  Back
 
Topics:
Autonomous Vehicles, Augmented Reality and Virtual Reality
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5588
Streaming:
Download:
Share:
 
Abstract:
For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting condit ...Read More
Abstract:

For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5599
Streaming:
Download:
Share:
 
Abstract:
One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hard ...Read More
Abstract:

One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, we are able to significantly improve the performance of speech recognition, even in challenging noise conditions.

  Back
 
Topics:
Autonomous Vehicles, Artificial Intelligence and Deep Learning, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5633
Streaming:
Share:
 
Abstract:
During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computat ...Read More
Abstract:

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5637
Streaming:
Share:
 
Abstract:
Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for ...Read More
Abstract:

Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.

  Back
 
Topics:
Autonomous Vehicles, Intelligent Machines, IoT & Robotics, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5789
Streaming:
Download:
Share:
 
Abstract:
On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into ...Read More
Abstract:

On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5870
Streaming:
Download:
Share:
 
Abstract:
Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These sm ...Read More
Abstract:

Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart sensors provide real-time information on billions of 'Things' and their surroundings (through 3D object detection, tracking, and classification) and, when needed, the ability to control them. The 'Things' include vehicles, infrastructure, buildings, homes, appliances, light controls, thermostats, medical devices, computers, and handheld devices. Growth of personal devices (phones, tablets, laptops, game consoles) is limited by the number of people in the world. The largest growth will come from connected devices in areas such as smart energy, home automation, and transportation.

  Back
 
Topics:
Autonomous Vehicles, Artificial Intelligence and Deep Learning, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5918
Streaming:
Share:
 
Abstract:
As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break&quo ...Read More
Abstract:

As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the vehicle? Drawing heavily on the Vehicle Dynamics Program, The Specialty Equipment Market Association ("SEMA") has developed the Vehicle Electronics Program to ensure that the next generation of in-car electronics realizes its full potential. Learn about this new program including the new proposed federal motor vehicle standard, FMVSS 150. In addition, we'll cover the resources and opportunities available to developers for designing and customizing vehicles.

  Back
 
Topics:
Autonomous Vehicles, Product Design & Styling
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5545
Streaming:
Download:
Share:
Big Data Analytics
Presentation
Media
Abstract:
GTRI has implemented and tested a CUDA upgrade to a ground-to-air Hardware-in-the-Loop missile simulator. By breaking a single-threaded thermal integrator loop into multiple independent kernels, a speedup of 20X is achieved for complex targets. This ...Read More
Abstract:
GTRI has implemented and tested a CUDA upgrade to a ground-to-air Hardware-in-the-Loop missile simulator. By breaking a single-threaded thermal integrator loop into multiple independent kernels, a speedup of 20X is achieved for complex targets. This speed increase reduces computation time from days to hours, and preliminary results show that multiple GPUs may allow additional speedup by removing stream concurrency limits.  Back
 
Topics:
Big Data Analytics, Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5136
Download:
Share:
 
Abstract:
Because of large volume and complex algorithm of map projection, transformation of Geo-referenced data among various projection are facing big computational challenge, especially for big spatial data. In order to overcome this challenge, we present a ...Read More
Abstract:
Because of large volume and complex algorithm of map projection, transformation of Geo-referenced data among various projection are facing big computational challenge, especially for big spatial data. In order to overcome this challenge, we present a cloud-based parallel computing framework for accelerating the map projection of vector-based big spatial data in this work. GPU-enabled parallel map projection algorithms were developed based on CUDA platform for our framework.  Back
 
Topics:
Big Data Analytics, HPC and Supercomputing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5161
Download:
Share:
 
Abstract:
We are leveraging the abilities of relational databases for scalable storage and retrieval, and massively parallelized computation on the GPU to perform large-scale pattern recognition tasks. We have successfully integrated these two technologies, pr ...Read More
Abstract:
We are leveraging the abilities of relational databases for scalable storage and retrieval, and massively parallelized computation on the GPU to perform large-scale pattern recognition tasks. We have successfully integrated these two technologies, providing the database with the means to do high-performance computation on massive stored datasets. Internalizing this capability within the database facilitates blending of advanced relational and spatial operations into pattern matching tasks; which is applicable in a variety of fields.  Back
 
Topics:
Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5233
Download:
Share:
 
Abstract:
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced ...Read More
Abstract:

Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analytics software to simplify the analysis of complex, multi-variate datasets. In this poster, we illustrate how GPGPU's can be leveraged to accelerate key operations in TDA by over 14X.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5239
Download:
Share:
 
Abstract:
Free electron lasers enable to study non-linear multi-photon processes within one single FEL shot which is less than 50ft long and has a repetition rate of 120Hz at LCLS. As a result, a huge amount of data is created in a very short acquisition time. ...Read More
Abstract:
Free electron lasers enable to study non-linear multi-photon processes within one single FEL shot which is less than 50ft long and has a repetition rate of 120Hz at LCLS. As a result, a huge amount of data is created in a very short acquisition time. Analyzing this data on a single shot level needs a lot of computing power but can be massively parallelized. In order to decrease the evaluation time, we created a GPU-based evaluation software for our electron time-of-flight spectrometer setup.  Back
 
Topics:
Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5276
Download:
Share:
 
Abstract:
Persons or things have become a mobile sensor, with GPS enabled. These self-quantified technologies generate humongous amounts of raw data, which can turn to be a valuable information for users and business. A massively parallel SQL-like parallel eng ...Read More
Abstract:
Persons or things have become a mobile sensor, with GPS enabled. These self-quantified technologies generate humongous amounts of raw data, which can turn to be a valuable information for users and business. A massively parallel SQL-like parallel engine, GalacticaDB with extended geo-spatial capabilities. It accelerates analytic computation with optimizing queries processing and exploiting NVIDIA Tesla GPUs. Our results indicate that the GPU is an effective and energy efficient co-processor for executing database query operations.  Back
 
Topics:
Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5277
Download:
Share:
 
Abstract:
This poster analyzes GPU implementation of the Levenshtein Distance function for fuzzy string matching of a Vehicle Identification Number (VIN) against a dataset of known VINs. Our solution gives a 13x - 15x increase over similar CPU solutions. Our w ...Read More
Abstract:
This poster analyzes GPU implementation of the Levenshtein Distance function for fuzzy string matching of a Vehicle Identification Number (VIN) against a dataset of known VINs. Our solution gives a 13x - 15x increase over similar CPU solutions. Our work aims to help correct human error that occurs during data entry and return meaningful information to the user, that they can then use to inform their decisions.  Back
 
Topics:
Big Data Analytics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5305
Download:
Share:
 
Abstract:
Today's malware ecosystem produces hundreds of thousands distinct samples per day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction ...Read More
Abstract:

Today's malware ecosystem produces hundreds of thousands distinct samples per day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5313
Download:
Share:
 
Abstract:
This poster introduces how to run general join algorithm in GPU to solve an important graph problem: clique listing. In particular, two different join algorithms are presented for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a ...Read More
Abstract:
This poster introduces how to run general join algorithm in GPU to solve an important graph problem: clique listing. In particular, two different join algorithms are presented for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a recently presented worst-case optimal multi-predicate join algorithm. The second is a novel approach, inspired by the first approach but more suitable for GPU architectures. The performance benchmarks show that for both approaches using GPUs is efficient.  Back
 
Topics:
Big Data Analytics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5319
Download:
Share:
 
Abstract:
For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock" ...Read More
Abstract:
For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system, uses a high-level bulk-synchronous abstraction with traversal and computation steps, designed specifically for the GPU. It is a framework that is general, straightforward to program, and fast (on par with hardwired primitives and faster than any other programmable GPU library).  Back
 
Topics:
Big Data Analytics, Tools & Libraries
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5326
Download:
Share:
 
Abstract:
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system c ...Read More
Abstract:
Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system can be simulated and visualized at large scales using the open source FLAME GPU framework. Methods of code generation from XML documents and use of CUDA streams for heterogeneous state execution are presented. Examples include cellular tissue modelling and large scale crowd dynamics.  Back
 
Topics:
Big Data Analytics, Tools & Libraries, Life & Material Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5133
Streaming:
Download:
Share:
 
Abstract:
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in commu ...Read More
Abstract:
Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running BC on 192 GPUs.  Back
 
Topics:
Big Data Analytics, Developer - Algorithms, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5156
Streaming:
Download:
Share:
 
Abstract:
In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks.
 
Topics:
Big Data Analytics, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5176
Streaming:
Download:
Share:
 
Abstract:
The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will inclu ...Read More
Abstract:

The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include insights into data storage structures for I/O efficient processing as well as the application of the massive parallelism of the GPU to real world graph data.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5200
Streaming:
Download:
Share:
 
Abstract:
Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed a ...Read More
Abstract:

Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at improving efficiency and reliability of the indexing process.The solution we propose is scalable and exploits in-memory computing to minimize I/O operations and enhance performance. Moreover we describe the CUDA-based parallelization of the most compute-intensive tasks involved in the indexing process. The integration of the CUDA components within an architecture that is mostly Java-based led us to develop a technique for Java-CUDA interoperability that can be applied to other applications. Some visualisation results will also be presented.

  Back
 
Topics:
Big Data Analytics, Data Center & Cloud Infrastructure, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5212
Streaming:
Download:
Share:
 
Abstract:
Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity ...Read More
Abstract:

Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardware. Among these the dataflow graph processing model is the most general, representing jobs as distributed operators (nodes) connected by data channels (edges). In this talk, we explain how we have extended an existing dataflow graph processing framework to fully take into account GPU resources in the cluster. We show how this paradigm fully exploits the batch and streaming features of the GPU in a distributed job. We then finally expose our model for the scheduling on this heterogeneous processing framework.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5215
Streaming:
Download:
Share:
 
Abstract:
GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure tha ...Read More
Abstract:

GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure that provides a suite of GPU-driven machine learning and graph algorithms as a web service. The effortless usability of an HTTP API unlocks the power of GPU computing with none of the attendant complexities. As examples, we will show interactive analytics on web-scale graphs and deep learning on large data sets using nothing more than a modern web browser.

  Back
 
Topics:
Big Data Analytics, Data Center & Cloud Infrastructure, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5224
Streaming:
Download:
Share:
 
Abstract:
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query proc ...Read More
Abstract:

Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processing in such databases, GPUs can be used as fast, high bandwidth co-processors. As part of our work, we integrate Nvidia GPUs to DB2-BLU by changing the infrastructure of DB2-BLU and developing GPU kernels. We have a hybrid design in which we use some of DB2-BLU features on IBM's POWER8 processor and NVIDIA's GPU accelerator technology for fast query processing. This work was done in collaboration with Peter Kokosielis.

  Back
 
Topics:
Big Data Analytics, Data Center & Cloud Infrastructure, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5229
Streaming:
Download:
Share:
 
Abstract:
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, h ...Read More
Abstract:
This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, however, increasion of data size makes performance concerns. PG-Strom is an extension of PostgreSQL database, designed to off-load several CPU intensive query workloads (scan, join and aggregation; right now) to GPGPU, then x10 times faster than existing SQL implementation. Its characteristics well fits usual workloads of BI (business intelligence) tools in cost effective way, but not all. PG-Strom extension is released under the GPLv2 terms, and will be supported by PostgreSQL v9.5.  Back
 
Topics:
Big Data Analytics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5276
Streaming:
Download:
Share:
 
Abstract:
Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to ...Read More
Abstract:

Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achieve excellent performances in the traversal, via a level synchronous Breadth First Search (BFS), of large scale graphs (i.e. million of nodes and billions of edges) using multiple GPUs system. We are going to present our recent activities related to GPU-based graph processing: a new implementation of the BFS based on a 2D partitioning exploiting the atomic operations of the Kepler architecture, two solutions to the st-connectivity problem and all-pairs shortest path. Some of these can be of immediate use in the analysis of large sets of data.

  Back
 
Topics:
Big Data Analytics, Data Center & Cloud Infrastructure, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5337
Streaming:
Download:
Share:
 
Abstract:
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) ...Read More
Abstract:
This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) that need to be performed for ensuring the computation is correct. We will show how to interleave shortest path based computation in the context of network centrality metric to reduce the number of memory accesses and to maximize their coalescing. We will also see how the representation of the network in memory is key to balance thread divergence and the number of atomic operations.  Back
 
Topics:
Big Data Analytics, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5425
Streaming:
Download:
Share:
 
Abstract:
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering w ...Read More
Abstract:

In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5459
Streaming:
Download:
Share:
 
Abstract:
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated ...Read More
Abstract:
Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated values as input for further processing like conditional calculating (if-then-else) or top-k evaluation and therefore often run into memory problems. We present the design of optimized condition-based processors in large data sets combined with a floating frame approach to stream through these data areas. Conditional calculations are especially useful to split large value sets into clusters for further analyzing or aggregating and we will provide examples on real world social media data including localized Twitter trends and Wikipedia page hits.  Back
 
Topics:
Big Data Analytics, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5481
Streaming:
Download:
Share:
 
Abstract:
Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data system ...Read More
Abstract:

Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do not support geospatial data. In addition to our work on managing spatial data on single-node GPUs, we have integrated our parallel designs with an open source, a big data system called Impala to support both efficient and scalable distributed spatial query processing in an interactive SQL environment. We present system architecture, data parallel designs for spatial indexing and query processing as well as performance on real datasets for point-in-polygon test based spatial joins.

  Back
 
Topics:
Big Data Analytics, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5489
Streaming:
Share:
 
Abstract:
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while ...Read More
Abstract:

As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.

  Back
 
Topics:
Big Data Analytics, Data Center & Cloud Infrastructure, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5544
Streaming:
Share:
 
Abstract:
GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximiz ...Read More
Abstract:

GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximize visibility. The bad news is that these layouts and basic interactions are computationally intensive enough that analysts can no longer simply slide a slider, drag a graph cluster, etc. With the availability of GPUs, however, the rules have changed. This talk shows examples of smarter designs and how we use GPUs to turn them into interactive tools. For experts, we will discuss how running in browsers and even phones led to Graphistry's tiered GPU visualization engine approach, and touch on our use of WebGL, WebCL, and our own in-house libraries.

  Back
 
Topics:
Big Data Analytics, Web Acceleration, Visualization - In-Situ & Scientific
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5589
Streaming:
Download:
Share:
 
Abstract:
Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malwar ...Read More
Abstract:

Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosystem every day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5612
Streaming:
Download:
Share:
 
Abstract:
Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utt ...Read More
Abstract:

Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utterances are decoded simultaneously using CUDA streams. The single CUDA block approach is shown to be substantially more efficient and enables overlapping of CPU and GPU computation by merging ten thousands of separate CUDA kernel calls for each utterance. The proposed approach has the disadvantage of large GPU global memory requirement because of the simultaneous decoding feature. However, the latest GPU cards with up to 12GB of global memory fulfill this requirement and the full utilization of the GPU card is possible using all available SMs.

  Back
 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5658
Streaming:
Share:
 
Abstract:
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for tr ...Read More
Abstract:
Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for true chromatin loops. This team is working with IBM POWER8 and NVIDIA Tesla GPU technologies to creating customized algorithms for enabling genomics scientists to see fine details about genome folding and learn more about genetic regulation. The maps of looping revealed thousands of hidden switches not known to have existed before. For genes that cause diseases or cancers, locating these switches is essential. GPUs help speed up these algorithms up to 200x, reducing the cycle time to process a single chromosome from a week long process to less than a coffee break.  Back
 
Topics:
Big Data Analytics, Developer - Algorithms, Life & Material Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5821
Streaming:
Download:
Share:
 
Abstract:
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Serv ...Read More
Abstract:
This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Services Australia, Westgate Bridge, Melbourne, Australia, City of Trondheim, Norway, City of Brisbane, Ipswich city, Manly city and more. We will present how our innovative algorithms powered by the GPGPU based SenDISA platform is enabling Big Data analytic applications by fusing data from Video, Sensor & IoT devises and combining them with other transaction data to deliver smart city solutions across the globe. We will provide insights into the architecture of SenDISA and the market specific Big Data solutions serving different market verticals.  Back
 
Topics:
Big Data Analytics, Computer Vision, Video & Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5869
Streaming:
Download:
Share:
Computational Fluid Dynamics
Presentation
Media
Abstract:
Lean how to use your GPU for massive-scale particle-based fluid simulations that require a larger amount of memory space than the video memory. We introduce a novel GPU-based neighbor search algorithm used in particle-based fluid simulations such as ...Read More
Abstract:
Lean how to use your GPU for massive-scale particle-based fluid simulations that require a larger amount of memory space than the video memory. We introduce a novel GPU-based neighbor search algorithm used in particle-based fluid simulations such as SPH. With the proposed method, we can efficiently handle a massive-scale particle-based fluid simulation with a limited GPU video memory in out-of-core manner. We have demonstrated that our method robustly handles massive-scale benchmark scenes consisting of up to 65 million particles and requires up to 16 GB memory by using a GPU having only 3 GB memory. It shows up to 26 times higher performance compared to using NVIDIA's mapped memory technique and 51 times higher performance compared to using a CPU core.  Back
 
Topics:
Computational Fluid Dynamics, Developer - Algorithms, Computational Physics, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5116
Streaming:
Download:
Share:
 
Abstract:
The presentation shows the potential of GPU acceleration for reducing turn-around times of industrial CFD applications. FluiDyna is adressing this issue in a modular approach: the library "Culises" was developed to accelerate matrix op ...Read More
Abstract:

The presentation shows the potential of GPU acceleration for reducing turn-around times of industrial CFD applications. FluiDyna is adressing this issue in a modular approach: the library "Culises" was developed to accelerate matrix operations originating from arbitrary problems. This approach can be complemented by a second module that generates the linear system directly on the GPU the resulting code being less general, but allowing higher speed-up. The code aeroFluidX is a finite volume solver dedicated to incompressible aerodynamics, combining a SIMPLE algorithm for unstructured grids with state-of-the-art RANS turbulence modelling. MPI-parallelization allows calculations being split-up on multiple GPU-enabled nodes, leading to speed-ups of 2.5-3x for industrial scale problems.

  Back
 
Topics:
Computational Fluid Dynamics, Autonomous Vehicles, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5189
Streaming:
Download:
Share:
 
Abstract:
Learn how run-time code generation can be used to generate high-performance matrix-multiplication kernels for GPUs. In this talk, I will introduce GiMMiK, an open-source framework for generating bespoke kernels for performing block-by-panel type matr ...Read More
Abstract:
Learn how run-time code generation can be used to generate high-performance matrix-multiplication kernels for GPUs. In this talk, I will introduce GiMMiK, an open-source framework for generating bespoke kernels for performing block-by-panel type matrix-matrix multiplications. The techniques employed by GiMMiK will be described in detail. Benchmarks comparing GiMMiK to cuBLAS will be presented and speed-ups of up to 10x will be demonstrated. Specific applications of GiMMiK in the field of high-order computational fluid dynamics will also be highlighted.  Back
 
Topics:
Computational Fluid Dynamics, Performance Optimization, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5207
Streaming:
Download:
Share:
 
Abstract:
We present a GPU-based framework for the fully-resolved simulation of interacting rigid and deformable solid objects that move in fluid flow. The fluid dynamics is based on a meshless approach. Moving Lagrangian markers, distributed in the fluid doma ...Read More
Abstract:
We present a GPU-based framework for the fully-resolved simulation of interacting rigid and deformable solid objects that move in fluid flow. The fluid dynamics is based on a meshless approach. Moving Lagrangian markers, distributed in the fluid domain as well as on the solid surfaces, are used to capture the fluid dynamics, fluid-solid, and solid-solid interactions. Mass and momentum exchange between neighbor markers are determined in a parallel spatial subdivision algorithm. The solid objects' distributed forces are reduced in parallel via thrust reduction algorithms and used later for temporal update via lightweight GPU kernels. Scenarios containing tens of thousands of floating rigid and flexible objects were exercised on several GPU architectures and the linear scalability was shown.  Back
 
Topics:
Computational Fluid Dynamics, Tools & Libraries, Developer - Algorithms, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5238
Streaming:
Download:
Share:
 
Abstract:
Dive deep into the fascinating world of real-time computational fluid dynamics. We present details of our CUDA-accelerated flow solver for the simulation of non-linear violent flows in marine and coastal engineering. The solver, the efficient la ...Read More
Abstract:

Dive deep into the fascinating world of real-time computational fluid dynamics. We present details of our CUDA-accelerated flow solver for the simulation of non-linear violent flows in marine and coastal engineering. The solver, the efficient lattice boltzmann environment elbe, is accelerated with recent NVIDIA graphics hardware and allows for three-dimensional simulations of complex flows in or near to real-time. Details of the very efficient numerical back end, the pre- and postprocessing tools and the integrated OpenGL visualizer tool will be presented. Join us in this talk to learn about a prototype for next-generation CFD tools for simulation-based design (SBD) and interactive flow field monitoring on commodity hardware.

  Back
 
Topics:
Computational Fluid Dynamics, Visualization - In-Situ & Scientific, Computational Physics, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5304
Streaming:
Download:
Share:
 
Abstract:
Learn how a Domain Specific Language can be used to accelerate a full-scale industrial CFD application. With OP2, you can easily describe your computational problem at a high level, and then generate CUDA code. We show how parallelization on an unstr ...Read More
Abstract:
Learn how a Domain Specific Language can be used to accelerate a full-scale industrial CFD application. With OP2, you can easily describe your computational problem at a high level, and then generate CUDA code. We show how parallelization on an unstructured mesh is handled over a cluster of GPUs, and how a range of optimizations can be automatically applied during code generation for GPUs, such as conversion from Array-of-Structures to Structure-of-Arrays, the use of shared memory or caches to improve data reuse. We demonstrate that a 4x performance increase can be achieved with a K40 GPU over a server CPU, and present scaling up to 16 GPUs.  Back
 
Topics:
Computational Fluid Dynamics, Programming Languages, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5318
Streaming:
Download:
Share:
 
Abstract:
Learn how to utilize compute shaders and write your own, efficient fluid flow solver accelerated with single GPU. First, I will introduce basics of the Lattice Boltzmann method including additional turbulence modelling. Then, an implementation in mod ...Read More
Abstract:
Learn how to utilize compute shaders and write your own, efficient fluid flow solver accelerated with single GPU. First, I will introduce basics of the Lattice Boltzmann method including additional turbulence modelling. Then, an implementation in modern OpenGL will be discussed. I will investigate efficiency of the code and discuss its potential applications in games, medicine and other end-user tools.  Back
 
Topics:
Computational Fluid Dynamics, Visualization - In-Situ & Scientific, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5343
Streaming:
Download:
Share:
 
Abstract:
How to simulate and render game-ready, high resolution fluid in real time on the GPU using DirectX. We'll present a new method for sparsely simulating and rendering traditional grid based fluid systems. By utilizing a simple CPU prediction algorithm ...Read More
Abstract:
How to simulate and render game-ready, high resolution fluid in real time on the GPU using DirectX. We'll present a new method for sparsely simulating and rendering traditional grid based fluid systems. By utilizing a simple CPU prediction algorithm, we can update the virtual memory table of the GPU to reflect only the active areas of a simulation volume, providing compressed memory storage and hardware level, memory translation for performing region look ups. This CPU prediction mechanism has a much wider use case than just fluid simulation, and is a must know for anyone planning on using tiled resources in the future.  Back
 
Topics:
Computational Fluid Dynamics, Developer - Algorithms, Real-Time Graphics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5756
Streaming:
Download:
Share:
Computational Physics
Presentation
Media
Abstract:
We present an out-of-core proximity computation method, commonly used for particle-based fluid simulations, to handle a massive-scale simulation requiring a larger memory space than a GPU has. We have demonstrated that our method robustly handles up ...Read More
Abstract:
We present an out-of-core proximity computation method, commonly used for particle-based fluid simulations, to handle a massive-scale simulation requiring a larger memory space than a GPU has. We have demonstrated that our method robustly handles up to 65 M particles and requires up to 16 GB memory by using a GPU having only 3 GB memory. It's up to 51 X higher performance than using a CPU core. This high performance with a limited video memory space is achieved mainly thanks to the high accuracy of our memory estimation method.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5113
Download:
Share:
 
Abstract:
Large-scale direct numerical simulation (DNS) of gas-solid suspension flows is traditionally limited for its huge demand of compute resource, but urgently needed to study the interaction between gas and particle. Usually, empirical formulas are emplo ...Read More
Abstract:
Large-scale direct numerical simulation (DNS) of gas-solid suspension flows is traditionally limited for its huge demand of compute resource, but urgently needed to study the interaction between gas and particle. Usually, empirical formulas are employed to estimate the interaction. However, with the acceleration of GPU, large-scale DNS of gas-solid flows is feasible, so that we can expect to obtain the full knowledge of the interaction between gas and particle. In this study, we develop a GPU-acceleated DNS program to study the interaction.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5114
Download:
Share:
 
Abstract:
We present our open source code HASEonGPU for computing the amplified spontaneous emission (ASE) in laser gain media using a Monte-Carlo approach. With multi-GPU acceleration and optimized sampling techniques ASE can now be computed at high resolutio ...Read More
Abstract:
We present our open source code HASEonGPU for computing the amplified spontaneous emission (ASE) in laser gain media using a Monte-Carlo approach. With multi-GPU acceleration and optimized sampling techniques ASE can now be computed at high resolution within minutes instead of days. This speed up allows for integrating realistic ASE computations into the design tool chain for the development of high power lasers.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5122
Download:
Share:
 
Abstract:
In this work running real-world large scale magnetic simulations on embedded platforms (Jetson TK1) will be presented for the first time. The performance optimization methods on the GPUs of embedded systems will be discussed. Then the performance com ...Read More
Abstract:
In this work running real-world large scale magnetic simulations on embedded platforms (Jetson TK1) will be presented for the first time. The performance optimization methods on the GPUs of embedded systems will be discussed. Then the performance comparison against desktop CPU and desktop GPUs will be addressed. With low cost, low power consumption and rapid performance increase, embedded systems have been demonstrated as a promising candidate for building computational clusters for physical simulations.  Back
 
Topics:
Computational Physics, Intelligent Machines, IoT & Robotics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5123
Download:
Share:
 
Abstract:
Micromagnetic simulation is an indispensable tool for studying spintronics devices, such as MRAMs and spin-torque nano-oscillators. For large-scale computations of 3D micromagnetic models beyond a single-GPU limitation, we have implemented a multi-no ...Read More
Abstract:
Micromagnetic simulation is an indispensable tool for studying spintronics devices, such as MRAMs and spin-torque nano-oscillators. For large-scale computations of 3D micromagnetic models beyond a single-GPU limitation, we have implemented a multi-node multi-GPU code by using MPI+CUDA. The parallel performance of the code has improved by optimizing the FFT-based calculations of magnetic dipole field that occupy most of the computing time. The developed code has typical scaling efficiency and provides correct physics.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5125
Download:
Share:
 
Abstract:
The implement of GPU parallel computing of dissipative particle dynamics based on CUDA was carried out. Some issues involved, such as thread mapping, parallel cell-list array updating, generating pseudo-random number on GPU, memory access optimizatio ...Read More
Abstract:
The implement of GPU parallel computing of dissipative particle dynamics based on CUDA was carried out. Some issues involved, such as thread mapping, parallel cell-list array updating, generating pseudo-random number on GPU, memory access optimization and loading balancing are discussed in detail; Furthermore, Poiseuille flow and suddenly contracting and expanding flow were simulated to verify the correctness of GPU parallel computing. The results of GPU parallel computing of DPD show that speedup is up to 49X compared with CPU serial computing  Back
 
Topics:
Computational Physics, Life & Material Science
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5131
Download:
Share:
 
Abstract:
We present an algorithm, which enables to study the chaotic behavior as full complex Lyapunov spectrum of SU(2) Yang-Mills fields and the entropy-energy relation utilizing the Kolmogorov-Sinai entropy which was extrapolated to the large size limit by ...Read More
Abstract:
We present an algorithm, which enables to study the chaotic behavior as full complex Lyapunov spectrum of SU(2) Yang-Mills fields and the entropy-energy relation utilizing the Kolmogorov-Sinai entropy which was extrapolated to the large size limit by this numerical algorithm to apply CUDA to calculate the eigenvalues of the monodromy matrix, which is an fxf sparse matrix (f=24N). We are using a hybrid block Hessenberg reduction system to compute the required eigenvalues, which achieves 2-3 times higher performance than the CPU only version.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5140
Download:
Share:
 
Abstract:
We present a framework for a efficient coupling for a GPU based Discrete Element Method (DEM) solver for real world particle problems with a coupling ability with a CPU based Computational Fluid Dynamics (CFD) Solver provided by one of our partners. ...Read More
Abstract:
We present a framework for a efficient coupling for a GPU based Discrete Element Method (DEM) solver for real world particle problems with a coupling ability with a CPU based Computational Fluid Dynamics (CFD) Solver provided by one of our partners. Supporting multi-sphere particles, heat transfer and liquid transfer.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5152
Download:
Share:
 
Abstract:
This poster presents the further development of a program complex for solving CFD problems, oriented on heterogeneous GPU-based computer systems. Basing on finite volumes method the difference scheme is constructed for Quasi Gas Dynamic equations sys ...Read More
Abstract:
This poster presents the further development of a program complex for solving CFD problems, oriented on heterogeneous GPU-based computer systems. Basing on finite volumes method the difference scheme is constructed for Quasi Gas Dynamic equations system in 3D formulation on arbitrary non-orthogonal multiblock structured index grid. The algorithm efficiency was verified on a set of test problems. A detailed investigation of speed-up and scaling were made up to very large number of parallel GPUs in use.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5155
Download:
Share:
 
Abstract:
Numerical simulations of multiphase gas-liquid flows are challenging. For the sake of performance,we are looking for new paradigms of numerical modeling. We investigate two complementary ways. First a conventional numerical solver, namely the remappe ...Read More
Abstract:
Numerical simulations of multiphase gas-liquid flows are challenging. For the sake of performance,we are looking for new paradigms of numerical modeling. We investigate two complementary ways. First a conventional numerical solver, namely the remapped Lagrange scheme, showing performance issues on GPUs; and secondly an attempt to derive a Lattice Boltzmann-like solvers (LB) for such complex fluids. As preliminary results, we get strong GPU speedups for the remapped Lagrange, and it was able to derivate compressible LB with interesting features.  Back
 
Topics:
Computational Physics, Real-Time Graphics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5180
Download:
Share:
 
Abstract:
Octane is a fluid simulation library specifically tailored to solving the incompressible Navier-Stokes equations for smoke and subsonic combustion simulations. This CUDA-based tool borrows a number of techniques from leading computer graphics literat ...Read More
Abstract:
Octane is a fluid simulation library specifically tailored to solving the incompressible Navier-Stokes equations for smoke and subsonic combustion simulations. This CUDA-based tool borrows a number of techniques from leading computer graphics literature on fluid simulation, including the use of the level set method to support a thin flame combustion model. Simulations are created with user-specified initial conditions, and results are subsequently visualized to better understand the propagation of heat through an environment.  Back
 
Topics:
Computational Physics, Tools & Libraries
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5202
Download:
Share:
 
Abstract:
The bi-conjugate gradient stabilized algorithm (BiCGStab) has often been used as an interative solver to compute approximate solutions of large and sparse non-symmetric systems of linear equations derived from Poisson equations in fluid simulation. R ...Read More
Abstract:
The bi-conjugate gradient stabilized algorithm (BiCGStab) has often been used as an interative solver to compute approximate solutions of large and sparse non-symmetric systems of linear equations derived from Poisson equations in fluid simulation. Recently, the GPBiCGSafe algorithm proposed by Fujino et al. has been proven to have very good convergence behavior. In this presenta- tion, we describe our GPU parallel implementation of a solver of 3D Navier- Stokes equations using the GPBiCGSafe algorithm.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5206
Download:
Share:
 
Abstract:
We report the acceleration of spintronic simulations in double precision based on the implementation of an explicit finite differences solver by factors of 1.6 to 13x. The smaller factor was observed when comparing a single thread implementation runn ...Read More
Abstract:
We report the acceleration of spintronic simulations in double precision based on the implementation of an explicit finite differences solver by factors of 1.6 to 13x. The smaller factor was observed when comparing a single thread implementation running in a Intel Xeon E5620 @ 2.4 GHz and a NVIDIA GeForce GTX 670M. The highest value was observed when comparing an Intel i7-2760QM @ 2.4 GHz NVIDIA Tesla M2070. Optimizations consisted of the reduction of access to the global device memory by the increased usage of registers and shared memory.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5226
Download:
Share:
 
Abstract:
We present the state-of-the-art simulation of lattice QCD with dynamical (u,d,s,c) quarks at National Taiwan University. Using a unit of two GTX-TITAN, lattice QCD with (1+1+1+1)-flavors of domain-wall quarks can be simulated on the 32^3*64 lattice, ...Read More
Abstract:
We present the state-of-the-art simulation of lattice QCD with dynamical (u,d,s,c) quarks at National Taiwan University. Using a unit of two GTX-TITAN, lattice QCD with (1+1+1+1)-flavors of domain-wall quarks can be simulated on the 32^3*64 lattice, attaining sustained 780 Gflops/s. This study is vital for understanding QCD (Quantum chromodynamics), the fundamental theory for the interaction between quarks and gluons, which manifests as the strong interaction inside the nucleus and plays an important role in the evolution of the universe.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5258
Download:
Share:
 
Abstract:
Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model followin ...Read More
Abstract:
Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model following KardarParisiZhang growth in d+1 dimensions, related to the Asymmetric Simple Exclusion Process. Up to 100 - 400 x speedup can be achieved with respect to the serial code running on a I5 core. This permits studying disorder and aging behavior in these system.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5259
Download:
Share:
 
Abstract:
Nanopatterning of surfaces and bulk materials is very important from molecular electronics to photovoltaics. But, in order to understand the underlying physics of self-organization, large scale atomistic simulations are crucial. Only stochastic model ...Read More
Abstract:
Nanopatterning of surfaces and bulk materials is very important from molecular electronics to photovoltaics. But, in order to understand the underlying physics of self-organization, large scale atomistic simulations are crucial. Only stochastic models can bridge the gap from nano to micro, enabling simulations of micron-sized volumes, billions of atoms and study long-time evolution. Random site-selection is essential but can be harmed by domain decomposition in GPGPU. We present solutions by example of a dimer-model for KPZ surface growth.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5266
Download:
Share:
 
Abstract:
We present NVIDIA's CUDA realization of high-efficient algorithms (named LRnLA DiamondTile) of 3D FDTD (Finite Difference Time Domain) method for Maxwell equations, which includes the following capabilities: Perfectly Matched Layer (PML) as boundary ...Read More
Abstract:
We present NVIDIA's CUDA realization of high-efficient algorithms (named LRnLA DiamondTile) of 3D FDTD (Finite Difference Time Domain) method for Maxwell equations, which includes the following capabilities: Perfectly Matched Layer (PML) as boundary conditions; fourth space order which allows us to use more coarse discretization; different materials, including dispersive, special sources, such as TF/SF (Total Field /Scattered Field).  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5275
Download:
Share:
 
Abstract:
Cell dynamic simulation (CDS) is one of the well-known techniques to demonstrate the order structure in block copolymer systems and to consider different effects of simulation parameters. In additions, cell dynamic simulation helps to understand diff ...Read More
Abstract:
Cell dynamic simulation (CDS) is one of the well-known techniques to demonstrate the order structure in block copolymer systems and to consider different effects of simulation parameters. In additions, cell dynamic simulation helps to understand different aspects of morphological topographies. Because it is expensive to develop a complex technique such as cell dynamic simulation with a sequential algorithm in central processing unit (CPU), there is a need to consider a parallel algorithm and programming model to solve time-consuming issues.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5278
Download:
Share:
 
Abstract:
We present a method using NVIDIA Thrust library to calculate multiple Mie-scattering using a GPU. Mie-scattering describes the scattering of electromagnetic waves on spheroidal particles whose diameter is close to the wavelength of the incident radia ...Read More
Abstract:
We present a method using NVIDIA Thrust library to calculate multiple Mie-scattering using a GPU. Mie-scattering describes the scattering of electromagnetic waves on spheroidal particles whose diameter is close to the wavelength of the incident radiation. At the current state, our implementation shows speedups of 19,5 compared to a quad-core CPU. This enables us to simulate complex optical scenarios with our software. Additionally, we have verified the simulation results using a high precision measurement device in the laboratory.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5281
Download:
Share:
 
Abstract:
We present how real-time fluid simulation can be useful for many engineering applications and in particular the understanding, prediction, and control of thermal loads in Data Center. We will show how our framework can achieve the simulation of compl ...Read More
Abstract:
We present how real-time fluid simulation can be useful for many engineering applications and in particular the understanding, prediction, and control of thermal loads in Data Center. We will show how our framework can achieve the simulation of complex indoor air flows, including thermal and turbulent behaviors, in real-time using the Lattice Boltzmann Method optimized for execution on GPU using CUDA.  Back
 
Topics:
Computational Physics, Real-Time Graphics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5286
Download:
Share:
 
Abstract:
In this work we introduce a computationally improved vacancy recognition technique, based in a previously developed algorithm. The procedure is based in the use of Graphics Processing Unit (GPU) instead of Central Processing Unit (CPU), improving the ...Read More
Abstract:
In this work we introduce a computationally improved vacancy recognition technique, based in a previously developed algorithm. The procedure is based in the use of Graphics Processing Unit (GPU) instead of Central Processing Unit (CPU), improving the spatial mapping in the sample and the speed during the identification process of atomic vacancies. The results show that with this technique, efficiency is improved and a reduction of required parameters in comparison with the original algorithm is presented.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5289
Download:
Share:
 
Abstract:
We present an OptiX-based GPU pipeline for the simulation of X-ray imaging. The purpose of the pipeline is to enable ultra-rapid imaging simulation so that high-quality statistical analysis can be performed on image ensembles to investigate explosive ...Read More
Abstract:
We present an OptiX-based GPU pipeline for the simulation of X-ray imaging. The purpose of the pipeline is to enable ultra-rapid imaging simulation so that high-quality statistical analysis can be performed on image ensembles to investigate explosive threat detection in the context of airline baggage inspection. Approximately 5000 simulated images can be generated in one hour on a machine equipped with a GTX770 GPU, and 1 million simulated images can be generated in 2 hours using a 100 GPU instances (Amazon G2) in the cloud.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5297
Download:
Share:
 
Abstract:
Diblock copolymers possess fascinating self-assembly properties that can be leveraged for a variety of industrial applications, most notably nanolithography. However, such efforts are often impeded by the formation of metastable defect structures. We ...Read More
Abstract:
Diblock copolymers possess fascinating self-assembly properties that can be leveraged for a variety of industrial applications, most notably nanolithography. However, such efforts are often impeded by the formation of metastable defect structures. We present a GPU-accelerated method to quantify the difficulty of defect removal, guiding experiments toward optimal polymer chemistry. We demonstrate that this problem is ideally suited to NVIDIA GPUs' massively parallel architecture.  Back
 
Topics:
Computational Physics, Life & Material Science
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5308
Download:
Share:
 
Abstract:
Propel, a hybrid-parallel reacting flow code developed at the Naval Research Laboratory, is being used to study advanced combustion concepts and detailed chemical reaction models. This poster describes Propel's core design and feature set, current r ...Read More
Abstract:
Propel, a hybrid-parallel reacting flow code developed at the Naval Research Laboratory, is being used to study advanced combustion concepts and detailed chemical reaction models. This poster describes Propel's core design and feature set, current research being conducted with Propel on Rotating Detonation Engines, and its implementation of detailed thermochemistry and chemical kinetics.  Back
 
Topics:
Computational Physics
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5318
Download:
Share:
 
Abstract:
HFSS Transient is a 3-D full-wave time domain electromagnetic field solver based on the discontinuous Galerkin time domain (DGTD) method. It is equipped with local time stepping for efficient simulation on hp-adaptive tetrahedral meshes. The presenta ...Read More
Abstract:
HFSS Transient is a 3-D full-wave time domain electromagnetic field solver based on the discontinuous Galerkin time domain (DGTD) method. It is equipped with local time stepping for efficient simulation on hp-adaptive tetrahedral meshes. The presentation will demonstrate how GPUs can benefit the solution of radiation and scattering problems involving multiscale geometry and complex materials. When there are multiple HPC tasks for parametric sweeps or network analyses with multiple excitations, the speedup with GPU acceleration scales linearly with respect to the number of GPUs. Concepts will be explained for increasing parallel efficiency on mixed-order meshes dominated by low-order elements. This work was done in collaboration with Stylianos.Dosopoulos, Senior R&D Engineer, ANSYS Inc. and Rickard Petersson, Senior R&D Manager, ANSYS Inc.  Back
 
Topics:
Computational Physics, AEC & Manufacturing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5183
Streaming:
Download:
Share:
 
Abstract:
Radiation therapy with ion beams precisely targets the tumor, leaving surrounding healthy tissue unharmed. Usually, ion accelerators are huge in size and thus only found in few facilities worldwide. Using high-power laser systems for accelerating the ...Read More
Abstract:
Radiation therapy with ion beams precisely targets the tumor, leaving surrounding healthy tissue unharmed. Usually, ion accelerators are huge in size and thus only found in few facilities worldwide. Using high-power laser systems for accelerating the ions could reduce the size and cost of such systems, potentially increasing the number of treatment facilities and thus giving more patients access to this promising therapy method. In order to bring laser acceleration of ions to application, realistic simulations of the acceleration process are needed. We present PIConGPU, a relativistic particle-in-cell plasma simulation code implemented on GPUs that is ideal for optimizing laser ion acceleration.  Back
 
Topics:
Computational Physics, Life & Material Science, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5193
Streaming:
Download:
Share:
 
Abstract:
The cubic stencil is one of the most common data set for on-lattice algorithms and high quality random numbers are useful in many areas. Based on the lessons we learned during the development of a highly-tuned implementation of a Monte Carlo (MC) sim ...Read More
Abstract:
The cubic stencil is one of the most common data set for on-lattice algorithms and high quality random numbers are useful in many areas. Based on the lessons we learned during the development of a highly-tuned implementation of a Monte Carlo (MC) simulator for the three-dimensional Ising spin glass, we present solutions for an efficient memory access pattern for the cubic stencil and for lagged-Fibonacci-like PRNGs, in particular for the famous Mersenne-Twister MT19937. We will show both single- and multi-GPU results highlighting the advantages coming from our approach also in the multi-GPU settings, and a comparison of the performances of our PRNG implementations with those of the cuRand library.  Back
 
Topics:
Computational Physics, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5220
Streaming:
Download:
Share:
 
Abstract:
This talk describes the DEM algorithms and heuristics that are optimized for the parallel NVIDIA® Kepler GPU architecture in detail. This includes a novel collision detection algorithm for convex polyhedra based on the separating plane (SP) metho ...Read More
Abstract:
This talk describes the DEM algorithms and heuristics that are optimized for the parallel NVIDIA® Kepler GPU architecture in detail. This includes a novel collision detection algorithm for convex polyhedra based on the separating plane (SP) method. In addition, we present heuristics optimized for the parallel NVIDIA® Kepler GPU architecture. Our algorithms have minimalistic memory requirements, which enables us to store data in the limited but high bandwidth constant memory on the GPU. We then systematically verify the DEM implementation after we demonstrate the computational scaling on two large-scale simulations.  Back
 
Topics:
Computational Physics, Astronomy & Astrophysics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5244
Streaming:
Download:
Share:
 
Abstract:
Explore new algorithms and techniques to solve large-scale Maxwell eigenvalue problems arising in the simulations of bandgap engineering. By using the proposed algorithms and the implementations, we have successfully computed the desired multiple int ...Read More
Abstract:
Explore new algorithms and techniques to solve large-scale Maxwell eigenvalue problems arising in the simulations of bandgap engineering. By using the proposed algorithms and the implementations, we have successfully computed the desired multiple interior eigenvalues of the eigensystems with dimension as large as 4.2 millions within 100 seconds by using one single GPU. The techniques are extended to multiple GPUs to solve eigenvalue problems with different wave vectors simultaneously so that we can shorten the time to plot a complete band structure diagram from days to minutes. The codes can also achieve almost linear scalability for parallel computers ranging from a workstation with multiple GPUs to a cluster with homogeneous or heterogeneous CPU and GPU.  Back
 
Topics:
Computational Physics, Developer - Algorithms, Life & Material Science, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5254
Streaming:
Download:
Share:
 
Abstract:
We will explain how to construct Locally Recursive non-Locally Asynchronous (LRnLA) algorithms that allow to reach peak performance for big data memory bound problems. The DiamondTile algorithm is presented for explicit stencil based modeling on GPGP ...Read More
Abstract:
We will explain how to construct Locally Recursive non-Locally Asynchronous (LRnLA) algorithms that allow to reach peak performance for big data memory bound problems. The DiamondTile algorithm is presented for explicit stencil based modeling on GPGPU. It is implemented for finite difference simulation of acoustic wave equation, elastic seismic media, FDTD electromagnetics, and RKDG method of gas dynamics. The resulting performance in 2nd order wave simulation exceeds that of the optimized CUDA example codes and it reaches more than 50 billions cells per second per one GPGPU device.  Back
 
Topics:
Computational Physics, Seismic & Geosciences, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5315
Streaming:
Download:
Share:
 
Abstract:
CICE is a sea ice model that is part of the Los Alamos National Laboratory's Climate, Ocean and Sea Ice Modeling Group. CICE can be used in a fully coupled atmosphere-ice-ocean-land global climate model. It can also be used as a stand-alone ...Read More
Abstract:

CICE is a sea ice model that is part of the Los Alamos National Laboratory's Climate, Ocean and Sea Ice Modeling Group. CICE can be used in a fully coupled atmosphere-ice-ocean-land global climate model. It can also be used as a stand-alone application. This talk presents the effort currently under way to accelerate CICE on the GPU.

  Back
 
Topics:
Computational Physics, OpenACC
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5322
Streaming:
Download:
Share:
 
Abstract:
Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is part ...Read More
Abstract:
Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is particularly well-suited to many-core architectures, (ii) introduce our massively parallel implementation PyFR (www.pyfr.org), which through run-time code generation is able to target NVIDIA GPU hardware and, (iii) showcase some of the high-fidelity, unsteady, flow simulations undertaken using PyFR on both desktop and HPC systems.  Back
 
Topics:
Computational Physics, Developer - Algorithms, Computational Fluid Dynamics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5372
Streaming:
Download:
Share:
 
Abstract:
The dynamic simulation of systems involving contacts between bodies is complicated by the nonsmooth nature of frictional constraints. When the number of contacts between bodies increases to millions, as in the case of granular flows in silos or in so ...Read More
Abstract:
The dynamic simulation of systems involving contacts between bodies is complicated by the nonsmooth nature of frictional constraints. When the number of contacts between bodies increases to millions, as in the case of granular flows in silos or in soil dynamics, the computational efficiency of traditional methods can become an issue even on supercomputers. A second-order interior point (PDIP) method is used to solve a nonlinear optimization problem entirely on the GPU. The method displays faster convergence than traditional first-order methods and calls for a significantly smaller number of iterations. To alleviate the computational bottleneck of solving large linear systems, this work uses the parallel sparse solver, SPIKE::GPU, to accelerate the PDIP solution.  Back
 
Topics:
Computational Physics, Developer - Algorithms, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5400
Streaming:
Download:
Share:
 
Abstract:
Accelerators have become a key ingredient in HPC. GPUs had a head start and are already widely used in HPC applications but now are facing competition from Intel's Xeon Phi accelerators. The latter promise comparable performance and easier p ...Read More
Abstract:

Accelerators have become a key ingredient in HPC. GPUs had a head start and are already widely used in HPC applications but now are facing competition from Intel's Xeon Phi accelerators. The latter promise comparable performance and easier portability and even feature a higher memory bandwidth - key to good performance for a wide range of bandwidth-bound HPC applications. In this session we compare their performance using a Lattice QCD application as a case study. We give a short overview of the relevant features of the architectures and discuss some implementation details. Learn about the effort it takes to achieve great performance on both architectures. See which accelerator is more energy efficient and which one takes the performance crown at about 500 GFlop/s.

  Back
 
Topics:
Computational Physics, Performance Optimization, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5447
Streaming:
Download:
Share:
 
Abstract:
The computational efficiency of GPU technology to compute the physics of discrete particle interaction is presented. The physical problems that require this type of computation are numerous. In particular, one such area is modeling blast shields that ...Read More
Abstract:
The computational efficiency of GPU technology to compute the physics of discrete particle interaction is presented. The physical problems that require this type of computation are numerous. In particular, one such area is modeling blast shields that are designed to provide underbody protection of military vehicles from explosive mines. The solver technology is called the Discrete Particle Method (DPM) and by itself has proven to be an accurate and predictive tool for simulating the blast event. Combine that with parallel processing of the GPU and you have an efficient and cost effective tool as well.  Back
 
Topics:
Computational Physics, AEC & Manufacturing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5449
Streaming:
Download:
Share:
 
Abstract:
Find out how we transformed our algorithms for combustion kinetics to exploit parallelism on GPUs to enable ~10x speedup of combustion simulations with detailed chemistry. The necessary data access patterns and code organization required CUDA na ...Read More
Abstract:

Find out how we transformed our algorithms for combustion kinetics to exploit parallelism on GPUs to enable ~10x speedup of combustion simulations with detailed chemistry. The necessary data access patterns and code organization required CUDA native implementations of the thermodynamic and kinetic functions. We also exploit CUDA libraries for sparse and dense matrix factorization. The results are shown as improvements in overall simulation speedup along with cost breakdowns for the various portions of the simulation.

  Back
 
Topics:
Computational Physics, Autonomous Vehicles, AEC & Manufacturing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5468
Streaming:
Download:
Share:
 
Abstract:
We present our continued efforts to produce a high performance Monte Carlo simulator for radiation therapy using CUDA and NVIDIA GPUs. The code is based on the main algorithm used in Geant4, a particle physics simulation toolkit. Our work has progres ...Read More
Abstract:
We present our continued efforts to produce a high performance Monte Carlo simulator for radiation therapy using CUDA and NVIDIA GPUs. The code is based on the main algorithm used in Geant4, a particle physics simulation toolkit. Our work has progressed on two fronts. First, we have improved the accuracy of the simulation predictions against computational benchmarks and some experimental data. Second, we have improved the run-time performance using CUB sort and reduce routines to mitigate thread divergence. The technique involves sorting particles into threads based on the selected physics process in each iteration of the simulation algorithm.  Back
 
Topics:
Computational Physics, Performance Optimization, Life & Material Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5471
Streaming:
Download:
Share:
 
Abstract:
Learn how to build a 3D solver for the groundwater flow equation that is accelerated by the GPU. The underlying mathematical model treats the entire subsurface, both saturated and unsaturated, as a whole. The governing nonlinear, time dependent, ...Read More
Abstract:

Learn how to build a 3D solver for the groundwater flow equation that is accelerated by the GPU. The underlying mathematical model treats the entire subsurface, both saturated and unsaturated, as a whole. The governing nonlinear, time dependent, parabolic partial differential equation is discretized into 19 million nodes. The resulting K20-based GPU solver is 20 times faster than the original single CPU Fortran code.

  Back
 
Topics:
Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5503
Streaming:
Download:
Share:
 
Abstract:
Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable particle-in-cell (PIC) code for studying microturbulence in magnetically-confined plasmas. As a representative particle-in-cell (PIC) code, GTC-P includes algorithmic level "sca ...Read More
Abstract:
Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable particle-in-cell (PIC) code for studying microturbulence in magnetically-confined plasmas. As a representative particle-in-cell (PIC) code, GTC-P includes algorithmic level "scatter" and "gather" operations, which feature random memory access, potential fine-grained synchronization and low computational intensity. However, it is challenging to optimize this class of irregular codes on current HPC architectures. In this talk, we will present our efforts in porting and optimizing the GTC-P code on NVIDIA GPU. In particular, we will discuss the redesign of the "shift" kernel for Kepler architecture. The performance of the code will be demonstrated on the top 7 supercomputers world-wide.  Back
 
Topics:
Computational Physics, Performance Optimization, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5650
Streaming:
Download:
Share:
Computer Vision
Presentation
Media
Abstract:
Proposed System is a fully automated De-weathering system to improve the visibility/stability during bad weather conditions much needed for surveillance/automotive infotainment/defense applications has been developed. Fog and haze during day and nigh ...Read More
Abstract:
Proposed System is a fully automated De-weathering system to improve the visibility/stability during bad weather conditions much needed for surveillance/automotive infotainment/defense applications has been developed. Fog and haze during day and night time was handled with real time performance using accelerations from CUDA implemented algorithms. Videos from fixed cameras are processed with no special hardware except CUDA capable NVIDIA GPU's.  Back
 
Topics:
Computer Vision, Video & Image Processing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5112
Download:
Share:
 
Abstract:
This work presents a real-time, GPU-enabled / CUDA-aware (unified memory model) implementation of a novel incremental Principal Component Pursuit (PCP) algorithm for video background modeling on the Jetson TK1 platform.Our implementation has an extre ...Read More
Abstract:
This work presents a real-time, GPU-enabled / CUDA-aware (unified memory model) implementation of a novel incremental Principal Component Pursuit (PCP) algorithm for video background modeling on the Jetson TK1 platform.Our implementation has an extremely low memory footprint, and a computational complexity that allows (on the Jetson TK1 platform) a processing frame rate throughput of 27.8 and 9.4 f.p.s. for grayscale videos of 640x480 and 1920x1088 respectively.  Back
 
Topics:
Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5135
Download:
Share:
 
Abstract:
The goal of this work is to solve object recognition under lighting conditions. Several issues presented in real scenes compromise the performance, such as noise, object's distortion, and incident light sources. We consider global illumination by ac ...Read More
Abstract:
The goal of this work is to solve object recognition under lighting conditions. Several issues presented in real scenes compromise the performance, such as noise, object's distortion, and incident light sources. We consider global illumination by accurately approximation of real world physics for light and matter interactions in a 3D scene. The system is able to adapt to input light conditions to improve the recognition performance. The results of the proposed system are given in terms of recognition metrics and computational efficiency.  Back
 
Topics:
Computer Vision, Rendering & Ray Tracing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5139
Download:
Share:
 
Abstract:
We address the problem of real-time tracking from high resolution video streams. We present a GPU implementation of the TLD algorithm of Kalal et al. (2012), one of the most robust tracking algorithms in the literature. Its high-computational cost ha ...Read More
Abstract:
We address the problem of real-time tracking from high resolution video streams. We present a GPU implementation of the TLD algorithm of Kalal et al. (2012), one of the most robust tracking algorithms in the literature. Its high-computational cost has restricted its use to low-resolution videos. We ported the algorithm from a CPU optimized version to a 2-SMX NVIDIA Kepler architecture. We achieved an average speedup of 3x on different 1080p videos, obtaining a system capable of real-time tracking on FullHD streams using very low cost hardware.  Back
 
Topics:
Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5141
Download:
Share:
 
Abstract:
This paper presents a new procedure to reconstruct and shade automatically, with artistic interference if needed, an area captured by a sequence of photos or a high definition video (e.g,. 4K cameras). The first CPU version uses VRay, the render time ...Read More
Abstract:
This paper presents a new procedure to reconstruct and shade automatically, with artistic interference if needed, an area captured by a sequence of photos or a high definition video (e.g,. 4K cameras). The first CPU version uses VRay, the render time for the Race Track took about 15 minutes per frame, reducing by 8 the total render time of the classic scene modeling. We reduce significantly the render time when using Optix as our RayTrace engine.  Back
 
Topics:
Computer Vision, Media and Entertainment
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5156
Download:
Share:
 
Abstract:
There is a technique for presenting stereograms, where full information, for the two eyes, is contained in a single image. These images are known as "autostereograms", they may contain a wide variety of forms of depth with some limitations. ...Read More
Abstract:
There is a technique for presenting stereograms, where full information, for the two eyes, is contained in a single image. These images are known as "autostereograms", they may contain a wide variety of forms of depth with some limitations. The images are generated in multiple planes and in turn front or behind the physical plane. In order to perceive 3D shapes in autostereograms, it is necessary to separate the visual processes, between focusing and convergence, linked under normal vision. This using a supercomputing platform with 128 GPUs.  Back
 
Topics:
Computer Vision, Visualization - In-Situ & Scientific
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5160
Download:
Share:
 
Abstract:
Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, ...Read More
Abstract:
Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, inspect, and refine complex kinematic models. Our custom markerless motion capture system, in conjunction with our use of high-density silicon neural implants (? 100 channels), provides an unprecedented glimpse into the relationship between the brain, memory, and behavior.  Back
 
Topics:
Computer Vision, Life & Material Science
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5172
Download:
Share:
 
Abstract:
Pornography detection is a significant subtask of online content filtering. One of the biggest problems of many social networks and video sharing websites is to prevent pornography distribution. In this work we describe a combined porn detector ...Read More
Abstract:

Pornography detection is a significant subtask of online content filtering. One of the biggest problems of many social networks and video sharing websites is to prevent pornography distribution. In this work we describe a combined porn detector based on deep neural networks. Our detector works with several types of porn features, such as porn film studio logos, warning text and sexual explicit scenes. In addition, we show the results of speed comparison between CPU and GPU realization of neural networks, run according to this task.

  Back
 
Topics:
Computer Vision, Artificial Intelligence and Deep Learning
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5178
Download:
Share:
 
Abstract:
In this poster, we present an efficient and highly parallelizable algorithm for unwraping omnidirectional images that benefit from the CUDA platform (NVIDIA Jetson TK1). The implementation achieves the real-time with a range of 1697 ? 34 FPS for unwr ...Read More
Abstract:
In this poster, we present an efficient and highly parallelizable algorithm for unwraping omnidirectional images that benefit from the CUDA platform (NVIDIA Jetson TK1). The implementation achieves the real-time with a range of 1697 ? 34 FPS for unwrapping omni-images (512x512 ? 4096x4096) into pano-images (128x512 ? 1024x4096).  Back
 
Topics:
Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5183
Download:
Share:
 
Abstract:
We propose a GPU parallelized algorithm using deep convolutional neural network for single image super-resolution (SR). Unlike the traditional sparse coding based method, our deep learning approach parallelizes all the computation steps from end to e ...Read More
Abstract:
We propose a GPU parallelized algorithm using deep convolutional neural network for single image super-resolution (SR). Unlike the traditional sparse coding based method, our deep learning approach parallelizes all the computation steps from end to end, without sacrificing the performance of the state-of-the-art SR methods both qualitatively and quantitatively. The GPU parallelization accelerates our algorithm significantly, and enables us to build up a real-time video SR system for real-time applications.  Back
 
Topics:
Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5195
Download:
Share:
 
Abstract:
GPU optimizations already improved projection speed of Wide-Area Motion Imaging (WAMI) maps by 100x. An Air Force-led team developed novel GPU-optimized algorithms that merge projection with stabilization and automated real-time tracking of items suc ...Read More
Abstract:
GPU optimizations already improved projection speed of Wide-Area Motion Imaging (WAMI) maps by 100x. An Air Force-led team developed novel GPU-optimized algorithms that merge projection with stabilization and automated real-time tracking of items such as vehicles and people. The resulting systems will ultimately be deployed on small on-board processors in low-cost drones to replace multi-million dollar systems deployed on turbine aircraft. The imagery has military and civil applications such as security, traffic management and firefighting.  Back
 
Topics:
Computer Vision, Video & Image Processing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5205
Download:
Share:
 
Abstract:
Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge proces ...Read More
Abstract:
Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge processing requirements bring big challenges. We refined the parallel algorithm and performed deep-optimization on Tegra K1 Jetson platform. Compared to ARM CPU, experiments show 156x speedup. The results show Tegra K1 has great potential for embedded real-time computer vision processing.  Back
 
Topics:
Computer Vision, Video & Image Processing
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5213
Download:
Share:
 
Abstract:
Image representation is a critical step in computer vision. However, it remains as one of the most challenging topics, partly because of the lack of sufficiently discriminative and robust representations, and the high computational cost of state-of-t ...Read More
Abstract:
Image representation is a critical step in computer vision. However, it remains as one of the most challenging topics, partly because of the lack of sufficiently discriminative and robust representations, and the high computational cost of state-of-the-art methods. In our work, we propose an image representation method that integrates low/middle-level features extracted from images with high level cognitive representations, which is accelerated with the use of GPUs could be improved by introducing high-level semantic knowledge representations.  Back
 
Topics:
Computer Vision
Type:
Poster
Event:
GTC Silicon Valley
Year:
2015
Session ID:
P5216
Download:
Share: