GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
We'll discuss the computational challenge of aligning short DNA reads to very large reference genomes, a problem that tests the limits of computing hardware. We'll explain how adapting a CUDA-accelerated short-read aligner to handle these genomes resulted in a tenfold reduction in execution time.
We'll discuss the computational challenge of aligning short DNA reads to very large reference genomes, a problem that tests the limits of computing hardware. We'll explain how adapting a CUDA-accelerated short-read aligner to handle these genomes resulted in a tenfold reduction in execution time.  Back
 
Topics:
Genomics & Bioinformatics, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9350
Streaming:
Download:
Share:
 
Abstract:

It is not always easy to accelerate a complex serial algorithm with CUDA parallelization. A case in point is that of aligning bisulfite-treated DNA (bsDNA) sequences to a reference genome. A simple CUDA adaptation of a CPU-based implementation can improve the speed of this particular kind of sequence alignment, but it's possible to achieve order-of-magnitude improvements in throughput by organizing the implementation so as to ensure that the most compute-intensive parts of the algorithm execute on GPU threads.

It is not always easy to accelerate a complex serial algorithm with CUDA parallelization. A case in point is that of aligning bisulfite-treated DNA (bsDNA) sequences to a reference genome. A simple CUDA adaptation of a CPU-based implementation can improve the speed of this particular kind of sequence alignment, but it's possible to achieve order-of-magnitude improvements in throughput by organizing the implementation so as to ensure that the most compute-intensive parts of the algorithm execute on GPU threads.

  Back
 
Topics:
AI in Healthcare, Genomics & Bioinformatics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8130
Streaming:
Download:
Share:
 
Abstract:

The challenge: do interactive similarity searching in a SQL database that contains billions of short DNA sequences. The response: this database query is amenable to GPU acceleration because efficient numerical computation can be carried out in parallel on large numbers of independent data items. Implementation details and performance will be discussed, with emphasis on the integration of GPU computation with the database server environment.

The challenge: do interactive similarity searching in a SQL database that contains billions of short DNA sequences. The response: this database query is amenable to GPU acceleration because efficient numerical computation can be carried out in parallel on large numbers of independent data items. Implementation details and performance will be discussed, with emphasis on the integration of GPU computation with the database server environment.

  Back
 
Topics:
Computational Biology & Chemistry, Algorithms & Numerical Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7367
Download:
Share:
 
Abstract:
Learn how to use the Performance Monitor tool ("PerfMon") in Microsoft Windows to do non-invasive real-time visualization of the performance of a CUDA application. This approach lets you aggregate performance data from the host operating system and hardware along with GPU performance metrics, and makes it possible to examine the interactions between GPU components (CUDA compute and memory activity) and non-GPU components (CPU activity, disk I/O, and host memory) throughout the execution lifetime of a complex CUDA application. Examples will be provided from the performance analysis of a pipelined CUDA application that runs kernels on multiple GPUs and that makes intensive concurrent use of CPU threads and host memory.
Learn how to use the Performance Monitor tool ("PerfMon") in Microsoft Windows to do non-invasive real-time visualization of the performance of a CUDA application. This approach lets you aggregate performance data from the host operating system and hardware along with GPU performance metrics, and makes it possible to examine the interactions between GPU components (CUDA compute and memory activity) and non-GPU components (CPU activity, disk I/O, and host memory) throughout the execution lifetime of a complex CUDA application. Examples will be provided from the performance analysis of a pipelined CUDA application that runs kernels on multiple GPUs and that makes intensive concurrent use of CPU threads and host memory.  Back
 
Topics:
Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6287
Streaming:
Download:
Share:
 
Abstract:
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 to 10 times faster than CPU implementations with comparable sensitivity and mapping quality. We have taken a different approach to the use of GPUs by implementing a series of CUDA kernels that filter the set of reference locations at which to compute seed-and-extend alignments, thereby decreasing the amount of parallel sequence-matching computation and improving the overall throughput of the GPU/CPU pipeline. Even without extreme CUDA code optimization, we observe increased sensitivity (i.e., a larger number of reported valid mappings) with throughput as good as or better than existing GPU-based sequence aligners.
Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 to 10 times faster than CPU implementations with comparable sensitivity and mapping quality. We have taken a different approach to the use of GPUs by implementing a series of CUDA kernels that filter the set of reference locations at which to compute seed-and-extend alignments, thereby decreasing the amount of parallel sequence-matching computation and improving the overall throughput of the GPU/CPU pipeline. Even without extreme CUDA code optimization, we observe increased sensitivity (i.e., a larger number of reported valid mappings) with throughput as good as or better than existing GPU-based sequence aligners.  Back
 
Topics:
Genomics & Bioinformatics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4248
Streaming:
Download:
Share:
 
Abstract:

The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data. Compute nodes in the cluster contain multiple CPU cores, 100GB or more of system RAM, and one or more GPUs; a prototype node is implemented with 12 CPU cores (24 hyperthreads), 144GB of RAM, and four NVIDIA C2070s. In this session we will describe the design of a genomic sequence-alignment application that targets the cluster compute-node hardware. We will discuss the algorithms we use and how they are implemented as CUDA kernels, point out the key optimizations in the implementation, and look at the performance of the software.

The Department of Physics and Astronomy at Johns Hopkins University is currently constructing a new computer cluster to facilitate high-throughput data-intensive computation on terabyte-scale data, including the analysis of genomic sequence data. Compute nodes in the cluster contain multiple CPU cores, 100GB or more of system RAM, and one or more GPUs; a prototype node is implemented with 12 CPU cores (24 hyperthreads), 144GB of RAM, and four NVIDIA C2070s. In this session we will describe the design of a genomic sequence-alignment application that targets the cluster compute-node hardware. We will discuss the algorithms we use and how they are implemented as CUDA kernels, point out the key optimizations in the implementation, and look at the performance of the software.

  Back
 
Topics:
Genomics & Bioinformatics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3092
Streaming:
Download:
Share:
 
Speakers:
Richard Wilton
- The Johns Hopkins University
Abstract:
It is axiomatic that computational throughput can be increased by exploiting the parallelism of GPU hardware -- but what if the computational algorithm is not easy to implement in parallel? We have modified one such algorithm -- the Smith-Waterman-Gotoh dynamic programming algorithm for local sequence alignment -- so as to make it more amenable to data-parallel computation. The result is a successful CUDA implementation that fully exploits GPU parallelism.
It is axiomatic that computational throughput can be increased by exploiting the parallelism of GPU hardware -- but what if the computational algorithm is not easy to implement in parallel? We have modified one such algorithm -- the Smith-Waterman-Gotoh dynamic programming algorithm for local sequence alignment -- so as to make it more amenable to data-parallel computation. The result is a successful CUDA implementation that fully exploits GPU parallelism.  Back
 
Topics:
Life & Material Science, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102115
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next