Based on a comprehensive performance study of Watson workloads, we'll deep dive into optimizing critical retrieve and rank functions using GPU acceleration. The performance of cognitive applications like answering natural language questions heavily depends on quickly selecting the relevant documents needed to generate a correct answer. While analyzing the question to determine appropriate search terms, weights, and relationships is relatively quick, retrieving and ranking a relevant subset from millions of documents is a time-consuming task. Only after completing it can any advanced natural language processing algorithms be effective.
Starting with a conventional CPU implementation we identify the most time-consuming operations when processing SQL queries, and show how they can be efficiently offloaded to the GPU. Using queries from a variant of the TPC-H benchmark, we offer a deep dive on how to optimally map complex database operations like join to the GPU hardware, such that they achieve up to 90% hardware efficiency and a throughput of >100M records per second. Given data sets that are orders of magnitude larger than GPU memory, the focus of this talk will be on efficient data layout and movement.