Learn about BlazingSQL, our new, free GPU SQL engine built on RAPIDS open-source software. We will show multiple demo workflows using BlazingSQL to connect data lakes to RAPIDS tools. We'll explain how we dramatically accelerated our engine and made it substantially more lightweight by integrating Apache Arrow into GPU memory and cuDF into RAPIDS. That made it easy to install and deploy BlazingSQL + RAPIDS in a matter of minutes. More importantly, we built a robust framework to help users bring data from data lakes into GPU-Accelerated workloads without having to ETL on CPU memory or separate GPU clusters. We'll discuss how that makes it possible to keep everything in the GPU while BlazingSQL manages the SQL ETL. RAPIDS can then take these results to continue machine learning, deep learning, and visualization workloads.
BlazingDB, the distributed SQL engine on GPUs, will show how we contribute to the Apache GPU Data Frame (GDF) project, and begun to leverage inside BlazingDB. Through the integration of the GDF we have been able to dramatically accelerate our data engine, getting over 10x performance improvements. More importantly, we have built a robust framework to help users bring data from their data lake into GPU accelerated workloads without having to ETL on CPU memory, or separate CPU clusters. Keep everything in the GPU, BlazingDB handles the SQL ETL, and then pyGDF and DaskGDF can take these results to continue machine learning workloads. With the GDF customer workloads can keep the data in the GPU, reduce network and PCIE I/O, dramatically improve ETL heavy GPU workloads, and enable data scientists to run end-to-end data pipelines from the comfort of one GPU server/cluster.
Extract analytical value out of your enterprise data lake with a state-of-the-art GPU SQL analytics engine. As businesses continue to consolidate massive datasets into data lake technologies (HDFS, AWS S3, Azure Blob, etc.), they find themselves unable to fully leverage the value these lakes hold. Data engineering departments need to produce unique, costly ETL processes for every dataset and every tool which hopes to interact with said dataset. At BlazingDB we've built an analytics engine that runs SQL directly on open source file formats inside data lakes, currently BlazingDB's Simpatico and Apache Parquet. These file formats can be easily accessed from a variety of different tools, limit duplication of large volumes of data, and support improved data governance. Learn strong practices for ensuring your data lake doesn't turn into a swamp and how to extract the full value of your data lake investment.