SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Big Data Analytics
Presentation
Media
Evaluation of Parallel Hashing Techniques
Rajesh Bordawekar (IBM T. J. Watson Research Center)
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present ...Read More
This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present different ways of implementing these functions on the GPU, with emphasis on data structures that exploit GPU's data parallel features as well as memory constraints.   Back
 
Keywords:
Big Data Analytics, Developer - Programming Languages, GTC 2014 - ID S4507
Streaming:
Download:
 
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Computing
Rajesh Bordawekar (IBM T. J. Watson Research Center), Ruchir Puri (IBM Research)
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM ...Read More
In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID S5459
Streaming:
Download:
 
Accelerating Spark Workloads Using GPUs
Rajesh Bordawekar (IBM Research)
The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We'll disc ...Read More
The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We'll discuss opportunities for exploiting GPUs for accelerating different Spark components such as MLLib. The talk will first overview the Spark programming and execution model and the describe key issues in integrating GPUs into the Spark infrastructure. We then describe our approach for enabling Spark to use multiple GPUs in a distributed manner and provide details of accelerating key MLLib kernels without changing the source Spark program.  Back
 
Keywords:
Big Data Analytics, Deep Learning and AI, Algorithms, GTC 2016 - ID S6280
Streaming:
Download:
Deep Learning and AI
Presentation
Media
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLINK
Rajesh Bordawekar (IBM T. J. Watson Research Center)
We'll discuss approaches for accelerating out-of-core nearest neighbor computation on multi-GPU systems using various system features such as NVLink. Nearest neighbor calculations operate over a set of high-dimensional vectors and compute pa ...Read More

We'll discuss approaches for accelerating out-of-core nearest neighbor computation on multi-GPU systems using various system features such as NVLink. Nearest neighbor calculations operate over a set of high-dimensional vectors and compute pair-wise distances using certain similarity metrics such as cosine or maxNorm distances. In practice, the number of vectors can be very large and can have very high dimension (for example, 5 million 1,000 vectors for the Wikipedia corpus). In such cases, the data cannot fit the GPU device memory, and needs to be fetched from the host memory. We'll present GPU implementations of key nearest neighbor algorithms (for example, locality sensitive hashing) for these scenarios and demonstrate how one can use NVLink for optimizing these algorithms.

  Back
 
Keywords:
Deep Learning and AI, Algorithms, GTC 2017 - ID S7112
Download:
Finance
Presentation
Media
Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGAs
Rajesh Bordawekar (IBM T. J. Watson Research Center)
We experimentally implement key financial risk modeling algorithms (e.g., Monte Carlo Pricing) on nvidia TK1 and compare its performance against a FPGA implementation. We compute both the FLOPS/dollar and FLOPS/watt, and describe pro and cons of usin ...Read More
We experimentally implement key financial risk modeling algorithms (e.g., Monte Carlo Pricing) on nvidia TK1 and compare its performance against a FPGA implementation. We compute both the FLOPS/dollar and FLOPS/watt, and describe pro and cons of using two different architectures for implementing financial risk modeling algorithms.  Back
 
Keywords:
Finance, Embedded, Developer - Algorithms, GTC 2015 - ID S5227
Streaming:
Download:
Machine Learning & Deep Learning
Presentation
Media
Accelerating Deep Convolution Neural Networks For Large-Scale Speech Tasks Using GPUs
Rajesh Bordawekar (IBM T. J. Watson Research Center)
This presentation describes GPU acceleration of convolution neural networks for speech processing workloads. We compare three alternatives for implementing core computational kernels, hand-coded, using CUBLAS, and using CUDNN. We describe impact of e ...Read More
This presentation describes GPU acceleration of convolution neural networks for speech processing workloads. We compare three alternatives for implementing core computational kernels, hand-coded, using CUBLAS, and using CUDNN. We describe impact of each approach on the algorithmic design and discuss how each approach impacts performance and result accuracy.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, Developer - Algorithms, GTC 2015 - ID S5231
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2018 NVIDIA Corporation Legal Info | Privacy Policy