Learn about BlazingSQL, our new, free GPU SQL engine built on RAPIDS open-source software. We will show multiple demo workflows using BlazingSQL to connect data lakes to RAPIDS tools. We'll explain how we dramatically accelerated our engine and made it substantially more lightweight by integrating Apache Arrow into GPU memory and cuDF into RAPIDS. That made it easy to install and deploy BlazingSQL + RAPIDS in a matter of minutes. More importantly, we built a robust framework to help users bring data from data lakes into GPU-Accelerated workloads without having to ETL on CPU memory or separate GPU clusters. We'll discuss how that makes it possible to keep everything in the GPU while BlazingSQL manages the SQL ETL. RAPIDS can then take these results to continue machine learning, deep learning, and visualization workloads.
BlazingDB, the distributed SQL engine on GPUs, will show how we contribute to the Apache GPU Data Frame (GDF) project, and begun to leverage inside BlazingDB. Through the integration of the GDF we have been able to dramatically accelerate our data engine, getting over 10x performance improvements. More importantly, we have built a robust framework to help users bring data from their data lake into GPU accelerated workloads without having to ETL on CPU memory, or separate CPU clusters. Keep everything in the GPU, BlazingDB handles the SQL ETL, and then pyGDF and DaskGDF can take these results to continue machine learning workloads. With the GDF customer workloads can keep the data in the GPU, reduce network and PCIE I/O, dramatically improve ETL heavy GPU workloads, and enable data scientists to run end-to-end data pipelines from the comfort of one GPU server/cluster.
Learn strategies for efficiently employing various cascaded compression algorithms on the GPU. Many database input fields are amenable to compression since they have repeating or gradually increasing pattern, such as dates and quantities. Fast implementations of decompression algorithms such as RLE-Delta will be presented. By utilizing compression, we can achieve 10 times greater effective read bandwidth than the interconnect allows for raw data transfers. However, I/O bottlenecks still play a big role in the overall performance and data has to be moved efficiently in and out of the GPU to ensure optimal decompression rate. After a deep dive into the implementation, we'll show a real-world example of how BlazingDB leverages these compression strategies to accelerate database operations.