Abstract:
As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.