It is common knowledge that GPUs can dramatically accelerate HPC and machine learning/AI workloads, but can they do the same for general purpose analytics? In this talk, Todd Mostak, CEO of MapD, will provide real-world examples of how a new generation of GPU-powered analytics platforms can enable enterprises from a range of verticals to dramatically accelerate the process of insight generation at scale. In particular, he will focus on how the key technical differentiators of GPUs: their massive computational bandwidth, fast memory, and native rendering pipeline, make them uniquely suited to allow analysts and data scientists to query, visualize and power machine learning over large, often high-velocity, datasets. Using the open source MapD analytics platform as an example, Todd will detail the technical approaches his team took to leverage the full parallelism of GPUs and demo how the platform allows analysts to interactively explore datasets containing tens of billions of records.
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.
Most of what claims to be interactive visualization of big datasets relies on one of two strategies: pre-canning and sampling. However, both of these techniques have well-known limitations. Enter Map-D, a distributed end-to-end data analytics and visualization platform that can run on any number of GPUs, allowing millisecond query latencies over multi-terabyte datasets. In addition to supporting ultra-fast relational table and array querying, Map-D uses the native graphics pipeline of the GPU to render 2D and 3D visualizations of the results. By streaming these visualizations to a user''s browser via interactive 30fps H264 video, it can appear as if billions of data points are in the DOM, even on low-powered mobile clients.
map-D makes big data interactive for anyone! map-D is a super-fast GPU database that allows anyone to interact and visualize streaming big data in real time. Its unique architecture runs 70-1,000x faster than other in-memory databases or big data analytics platforms. To boot, it works with any size or kind of dataset; works with data that is streaming live on to the system; uses cheap, off-the-shelf hardware; is easily scalable.map-D is focused on learning from big data. At the moment, the map-D team is working on projects with MIT CSAIL, the Harvard Center for Geographic Analysis and the Harvard-Smithsonian Center for Astrophysics. Join Todd Mostak and Tom Graham, key members of the map-D team, as they demonstrate the speed and agility of map-D and describe the live processing, search and mapping of over 1 billion tweets.
Map-D (Massively Parallel Database) uses multiple NVIDIA GPUs to interactively query and visualize big data in real-time. Map-D is an SQL-enabled column store that generates 70-400X speedups over other in-memory databases. This talk discusses the basic architecture of the system, the advantages and challenges of running queries on the GPU, and the implications of interactive and real-time big data analysis in the social sciences and beyond.