GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Accelerated Data Science
Presentation
Media
Building a Distributed GPU DataFrame with Python
Abstract:
We'll discuss the GPU Open Analytics Initiative, an effort to develop a GPU data frame that can handle a large-scale data-analytics workflow and support out-of-core cases in which the data is larger than GPU memory. We'll describe how we divided the problem into two parts, developing an elementary single-GPU data frame to handle in-memory use cases, and then combining multiple single-GPU data frames into a distributed multi-GPU data frame for out-of-core use cases. We'll briefly introduce our distributed GPU data frame and its capabilities. We'll then explain how we scaled out by using Dask, a distributed computation framework in Python, to orchestrate the single-GPU data frames and achieve out-of-core capability with minimal effort. Our idea can be generalized to build custom distributed GPU computation by composing single-GPU libraries.
 
Topics:
Accelerated Data Science, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9449
Streaming:
Download:
Share: