It is not always easy to accelerate a complex serial algorithm with CUDA parallelization. A case in point is that of aligning bisulfite-treated DNA (bsDNA) sequences to a reference genome. A simple CUDA adaptation of a CPU-based implementation can improve the speed of this particular kind of sequence alignment, but it's possible to achieve order-of-magnitude improvements in throughput by organizing the implementation so as to ensure that the most compute-intensive parts of the algorithm execute on GPU threads.
The challenge: do interactive similarity searching in a SQL database that contains billions of short DNA sequences. The response: this database query is amenable to GPU acceleration because efficient numerical computation can be carried out in parallel on large numbers of independent data items. Implementation details and performance will be discussed, with emphasis on the integration of GPU computation with the database server environment.
The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data. Compute nodes in the cluster contain multiple CPU cores, 100GB or more of system RAM, and one or more GPUs; a prototype node is implemented with 12 CPU cores (24 hyperthreads), 144GB of RAM, and four NVIDIA C2070s. In this session we will describe the design of a genomic sequence-alignment application that targets the cluster compute-node hardware. We will discuss the algorithms we use and how they are implemented as CUDA kernels, point out the key optimizations in the implementation, and look at the performance of the software.