SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Application Design & Porting Techniques
Presentation
Media
CUDA-Based GPU Computing Framework for GNU Octave
John Melonakos (AccelerEyes)
This poster presents the design of a CUDA-GPU based parallel processing framework for GNU Octave. Octave is a high-level interpreted language, primarily intended for numerical computations. GNU Octave being an open source alternative to Matlab, is wi ...Read More
This poster presents the design of a CUDA-GPU based parallel processing framework for GNU Octave. Octave is a high-level interpreted language, primarily intended for numerical computations. GNU Octave being an open source alternative to Matlab, is widely used in academic and research institutes. The GPU framework allows Octave users to accelerate their software written in Octave high-level M language on GPUs with minimal code modifications. To my knowledge, this is the first attempt to build a GPU framework for Octave, contrary to previous attempts to provide GPU variants for a set of Octave functions.  Back
 
Keywords:
Application Design & Porting Techniques, GTC 2012 - ID P2213
Download:
Astronomy & Astrophysics
Presentation
Media
Directing Experiments in the International Space Station With GPU-Assisted Image Analysis
Peter Lu
We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIAs CUDA. We use our code to analyze images of liquid-gas phase separation in a model colloid-p ...Read More

We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIAs CUDA. We use our code to analyze images of liquid-gas phase separation in a model colloid-polymer system, photographed in the absence of gravity aboard the International Space Station (ISS). Our GPU code is 4000 times faster than simple MATLAB code performing the same calculation on a central processing unit (CPU), 130 times faster than simple C code, and 30 times faster than optimized C++ code using single-instruction, multiple data (SIMD) extensions. The speed increases from these parallel algorithms enable us to analyze images downlinked from the ISS in a rapid fashion and send feedback to astronauts on orbit while the experiments are still being run.

  Back
 
Keywords:
Astronomy & Astrophysics, GTC 2009 - ID S09437
Download:
Big Data Analytics
Presentation
Media
Extreme Machine Learning with GPUs
John Canny (UC Berkeley)
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach ...Read More
BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach is an easy-to-use, interactive environment similar to SciPy/Matlab, but with qualitatively higher performance. The session will discuss: Performance: BIDMach follows a "Lapack" philosophy of building high-level algorithms on fast low-level routines (like BLAS). It exploits the unique hardware features of GPUs to provide more than order-of-magnitude gains over alternatives. Accuracy: Monte-Carlo methods (MCMC) are the most general way to derive models, but are slow. We have developed a new approach to MCMC which provides two orders of magnitude speedup beyond hardware gains. Our "cooled" MCMC is fast and improves model accuracy. Interactivity: We are developing interactive modeling/visualization capabilities in BIDMach to allow analysts to guide, correct, and improve models in real time.  Back
 
Keywords:
Big Data Analytics, Bioinformatics & Genomics, Machine Learning & Deep Learning, Scientific Visualization, GTC 2014 - ID S4811
Streaming:
Download:
Climate, Weather, Ocean Modeling
Presentation
Media
Global High Resolution Estimation of Evapotranspiration - SEBS on GPU using CUDA-C
Mohammad Abouali (Computational Science Research Center - San Diego State University)
This poster introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). It uses Compute Unified Device Architecture C (CUDA-C) programming model. The o ...Read More
This poster introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). It uses Compute Unified Device Architecture C (CUDA-C) programming model. The output of the new implementation is compared to a MATLAB code that has already been fully tested in the Water Cycle Multimission Observation Strategy (WACMOS) project. The code is timed against both MATLAB and a purely high-performance C implementation of the same algorithm. The code has been tested on several different NVIDIA cards, with different compute capabilities.  Back
 
Keywords:
Climate, Weather, Ocean Modeling, GTC 2013 - ID P3225
Download:
 
Accelerating Shallow Water Flow and Mass Transport Using Lattice Boltzmann Methods on GPUs
Kevin Tubbs (Dell, Inc.)
A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing P ...Read More

A lattice Boltzmann method for solving the shallow water equations and the advection-dispersion equation is developed and implemented on graphics processing unit (GPU)-based architectures. The proposed LBM is implemented to an NVIDIA Computing Processors. GPU computing is performed using the Jacket GPU engine for MATLAB and ArrayFire. Mass transport with velocity-dependent dispersion in shallow water flow is simulated by combining the MRT-LBM model and the TRT-LBM model. This talk will demonstrate the GPU parallel performance for modeling mass transport phenomena in shallow water flows.

  Back
 
Keywords:
Climate, Weather, Ocean Modeling, Developer - Algorithms, GTC 2013 - ID S3324
Streaming:
Download:
Cloud Visualization
Presentation
Media
MatCloud: Accelerating Matrix Math GPU Operations with SaaS
Frank Mueller, Xing Wu
We present MatCloud (www.mat-cloud.com), a cloud infrastructure and service for scientific computing using state-of-the-art GPU clusters. MatCloud is a service infrastructure exposed by a simple web terminal interface to run Matlab-like commands ...Read More

We present MatCloud (www.mat-cloud.com), a cloud infrastructure and service for scientific computing using state-of-the-art GPU clusters. MatCloud is a service infrastructure exposed by a simple web terminal interface to run Matlab-like commands/scripts. Join us to see how GPU technology can not only be applied to cloud computing community, but also boost the adoption of cloud computing for its dramatic performance gains over traditional cloud infrastructures.MatCloud is an in-progress academic project and is under active development.

  Back
 
Keywords:
Cloud Visualization, Developer - Tools & Libraries, GTC 2010 - ID S1020260
Streaming:
Download:
Computational Physics
Presentation
Media
NLSEmagic: A GPU-accelerated Matlab-based code for integrating the Multidimensional Nonlinear Schrodinger Equation
Ron Caplan (Predictive Science Inc.)
NLSEmagic is a freely-distributed package of C and MATLAB script codes which simulate the nonlinear Schrodinger equation in one, two, and three dimensions. The code includes MEX integrators in C, as well as CUDA-enabled GPU-accelerated MEX files in ...Read More
NLSEmagic is a freely-distributed package of C and MATLAB script codes which simulate the nonlinear Schrodinger equation in one, two, and three dimensions. The code includes MEX integrators in C, as well as CUDA-enabled GPU-accelerated MEX files in C. The MATLAB script files call the compiled MEX codes forming an easy-to-use highly efficient program. The codes utilize a fourth-order (in time) explicit Runge-Kutta scheme combined with the choice of standard second-order or a compact fourth-order (in space) finite differencing.  Back
 
Keywords:
Computational Physics, GTC 2013 - ID P3138
Download:
Computer Vision & Machine Vision
Presentation
Media
Enhanced Human Computer Interaction Using Hand Gesture Analysis on GPU
Pragati Dharmale (SNHU, NH)
This poster represents very active research topic in human computer interaction (HCI) as automatic hand gesture recognition using NVIDIA's GPU. In this work, neural network based video gesture are processed and the finger counts recognize. Due to re ...Read More
This poster represents very active research topic in human computer interaction (HCI) as automatic hand gesture recognition using NVIDIA's GPU. In this work, neural network based video gesture are processed and the finger counts recognize. Due to real time requirement, algorithm need to optimize and be computationally efficient. We implemented the MATLAB code, it performs slow when neural network processing started. Implementing them in a parallel programming model such as GPU-CUDA provided the necessary gain in processing speed.  Back
 
Keywords:
Computer Vision & Machine Vision, Video & Image Processing, GTC 2015 - ID P5235
Download:
 
GPU Implementation of Particle Filter Based Object Tracking Algorithm
Pinalkumar Engineer (Indian Institute of Technology Bombay, INDIA)
This poster presents GPU implementation of Particle filter based object tracking algorithm in Video. We have compared Matlab and OpenCV implementation with its GPU implementation. It is found that more 100X speedup is achieved for 1024 particles for ...Read More
This poster presents GPU implementation of Particle filter based object tracking algorithm in Video. We have compared Matlab and OpenCV implementation with its GPU implementation. It is found that more 100X speedup is achieved for 1024 particles for CUDA implementation compared to pure Matlab implementation, maintaining frame rate of ~56 fps.We have also compared pure OpenCV implementation with GPU implementation, achieving 20X speedup.  Back
 
Keywords:
Computer Vision & Machine Vision, Real-Time Graphics, GTC 2015 - ID P5322
Download:
Developer - Algorithms
Presentation
Media
Deriving Parallelism and GPU Acceleration of Algorithms with Inter-Dependent Data Fields
James Malcolm
- Accelereyes
This poster presents an approach to derive parallelism in algorithms that involve building sparse matrix that represents relationships between inter-dependent data fields and enhancing its performance on the GPU. ...Read More
This poster presents an approach to derive parallelism in algorithms that involve building sparse matrix that represents relationships between inter-dependent data fields and enhancing its performance on the GPU. This work compares the algorithm performance on the GPU to its CPU variant that employs the traditional sparse matrix-vector multiplication (SpMV) approach. We have also compared our algorithm performance with CUSP SpMV on GPU. The softwares used in this work are MATLAB and Jacket - GPU engine for MATLAB  Back
 
Keywords:
Developer - Algorithms, GTC 2010 - ID P10A10
Download:
 
An Accelerated Weeks Method for Numerical Laplace Transform Inversion
Patrick Kano (Acunum Algorithms and Simulations, LLC)
Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions ...Read More

Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions in physically meaningful variables. A robust numerical inversion approach is thus desirable. In this talk, I present one of the approaches to compute an approximate inverse, the Weeks method. I will also discuss the difficulties in performing numerical inversion. Finally, I will show how we have been able to utilize Jacket from AccelerEyes in MATLAB to more efficiently and robustly implement the Weeks method.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2415
Streaming:
Download:
 
Fast Reflectarray Antenna Analysis and Synthesis on GPUs
Angelo Liseno (Universita di Napoli Federico II), Amedeo Capozzoli (Universita di Napoli Federico II)
Illustrate a computationally demanding problem in a hot topic of applied electromagnetics: presented is the fast analysis and synthesis of electrically large, high performance reflectarray antennas intended for satellite applications. Such anten ...Read More

Illustrate a computationally demanding problem in a hot topic of applied electromagnetics: presented is the fast analysis and synthesis of electrically large, high performance reflectarray antennas intended for satellite applications. Such antennas have thousands of control parameters to be optimized, entailing a high computational burden. The use of fast numerical techniques and of parallel processing is mandatory. The key points of accurate and reliable reflectarray antenna synthesis are detailed, also with reference to the concept of aperiodic reflectarrays introduced by the authors in a world patent recently acquired by the European Space Agency (ESA) and to research projects funded by ESA. The computationally critical steps are discussed with particular reference to the fast evaluations of the radiation operator and of the functional gradient as well as the fast implementation of the optimization algorithms on GPUs. The use of Non-Uniform FFTs (NUFFTs) and of the parallel processing capabilities of GPUs by the CUDA language and the Accelereyes tools are highlighted. As Matlab has become a common platform for technical computing, interfacing of these procedures to standard Matlab scripts is also detailed. Authors: A. Capozzoli, C. Curcio, A. Liseno and G. Toso.

  Back
 
Keywords:
Developer - Algorithms, GTC 2013 - ID S3139
Streaming:
Download:
 
A Scalable, Numerically Stable, High-performance Tridiagonal Solver Using GPUs
Li-Wen Chang (University of Illinois at Urbana-Champaign), Wen-Mei Hwu (University of Illinois at Urbana-Champaign)
Attend this session to learn new techniques to build a scalable and numerically stable tridiagonal solver for GPUs. It appears the numerical stability was missing in all existing GPU-based tridiagonal solvers. In this work, presented is a scalab ...Read More

Attend this session to learn new techniques to build a scalable and numerically stable tridiagonal solver for GPUs. It appears the numerical stability was missing in all existing GPU-based tridiagonal solvers. In this work, presented is a scalable, numerically stable, high-performance tridiagonal solver. Solver provides comparable quality of stable solutions to Intel MKL and Matlab, at speed comparable to the GPU tridiagonal solvers in existing packages like CUSPARSE. Presented and analyzed are two key optimization strategies for our solver: a high throughput data layout transformation for memory efficiency, and a dynamic tiling approach for reducing the memory access footprint caused by branch divergence. Several applications are shown to get large benefits from this solver. In this case study, Empirical Mode Decomposition, which is a critical method in time-frequency analyses, is used to demonstrate usability of our solver.

  Back
 
Keywords:
Developer - Algorithms, GTC 2013 - ID S3191
Streaming:
Download:
 
Fast In-place Transposition and Layout Conversion on GPUs
I-Jui (Ray) Sung (University of Illinois at Urbana-Champaign)
Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of- ...Read More

Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of-place transposition, while simple in nature, may be prohibitive due to large spatial overhead for applications with large datasets. This talk presents the techniques on in-place matrix transposition, and the audience will also learn how to use our in-place transposition library with examples given in CUDA, MATLAB, and Mathematica CUDA bindings.

  Back
 
Keywords:
Developer - Algorithms, Developer - Tools & Libraries, GTC 2013 - ID S3307
Streaming:
Download:
Developer - Programming Languages
Presentation
Media
A Speech Recognition Application Framework for Highly Parallel Implementations on the GPU
Jike Chong
- Parasians, LLC
Data layout, data placement, and synchronization processes are not usually part of a speech application expert''s daily concerns. ...Read More
Data layout, data placement, and synchronization processes are not usually part of a speech application expert''s daily concerns. Yet failure to carefully take these concerns into account in a highly parallel implementation on the graphics processing units (GPU) could mean an order of magnitude of loss in application performance. We present an application framework for parallel programming of automatic speech recognition (ASR) applications that allows a speech application expert to effectively implement speech applications on the GPU, and demonstrate how the ASR application framework has enabled a Matlab/Java programmer to achieve a 20x speedup in application performance on a GPU.  Back
 
Keywords:
Developer - Programming Languages, GTC 2010 - ID P10R01
Download:
 
CUDA & GPU computing for Matlab
Roy Fahn
- Systematics
 
Keywords:
Developer - Programming Languages, High Performance Computing, GTS Israel 2011 - ID GTSI1108
Download:
 
GPU Computing with MATLAB
Andy The (MathWorks)
Learn how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and computational finance. We will use an image processing example to demonstrate how you can speed up you ...Read More
Learn how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and computational finance. We will use an image processing example to demonstrate how you can speed up your MATLAB code by using built-in GPU enabled functionality or by replacing key computations with CUDA kernels. We will also illustrate how MATLAB can be used as a development environment and test framework for CUDA kernel evaluation, visualization, and validation.   Back
 
Keywords:
Developer - Programming Languages, Medical Imaging, Video & Image Processing, GTC 2014 - ID S4421
Streaming:
Download:
Developer - Tools & Libraries
Presentation
Media
PyCUDA: Even Simpler GPU Programming with Python
Andreas Kloeckner
Explore PyCUDA, a robust, open-source toolkit that lets you control your GPU from the comfort of Python, a Matlab-like scripting language. Learn about Fermi tuning with PyCUDA, the new interfaces for CUBLAS and CUFFT, the ecosystem of third-part ...Read More

Explore PyCUDA, a robust, open-source toolkit that lets you control your GPU from the comfort of Python, a Matlab-like scripting language. Learn about Fermi tuning with PyCUDA, the new interfaces for CUBLAS and CUFFT, the ecosystem of third-party libraries built on PyCUDA, and examples illustrating PyCUDA''s benefits to large-scale applications.

  Back
 
Keywords:
Developer - Tools & Libraries, Computational Fluid Dynamics, Physics Simulation, GTC 2010 - ID S10041
Streaming:
Download:
 
Loren Dean
- MathWorks
MATLAB is a widely used tool for scientific, engineering and financial applications. As the popularity of GPUs has grown, there is strong interest from engineers and scientists who solve computationally intensive problems to be able to leverage GPU ...Read More
MATLAB is a widely used tool for scientific, engineering and financial applications. As the popularity of GPUs has grown, there is strong interest from engineers and scientists who solve computationally intensive problems to be able to leverage GPUs within MATLAB and other products from MathWorks. This talk will discuss how MathWorks tools can help engineers and scientist to take advantage of GPU resources while continuing to work in the familiar MATLAB environment. A range of capabilities will be discussed and demonstrated.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2010 - ID S102267
Streaming:
Download:
 
MATLAB GPU Computing Essentials Tutorial
John Melonakos
In this tutorial, we will discuss AccelerEyes' Jacket software which connects MATLAB to the graphics processing unit (GPU). With the GPU as a backend computation engine, Jacket brings together the best of three important computational worlds ...Read More

In this tutorial, we will discuss AccelerEyes' Jacket software which connects MATLAB to the graphics processing unit (GPU). With the GPU as a backend computation engine, Jacket brings together the best of three important computational worlds: computational speed, visualization, and the user-friendliness of MATLAB programming. Jacket enables developers to write and run code on the GPU in the native M-Language used in MATLAB. Jacket accomplishes this by automatically wrapping the M-Language into a GPU compatible form. By simply casting input data to Jacket's GPU data structure, MATLAB functions are transformed into GPU functions. Jacket also preserves the interpretive nature of the M-Language by providing real-time, transparent access to the GPU compiler. The tutorial will provide examples of running MATLAB code on the GPU for image and signal processing, life science, finance, and other applications. Also, a Q/A session will enable audience members to ask specific questions about the Jacket project and MATLAB GPU computing.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2009 - ID S09016
Streaming:
Download:
 
Median Filtering: A Case Study in CUDA Optimization
James Malcolm
In this tutorial we present a new approach to integrating CUDA code into MATLAB leveraging Jacket's Developer SDK. Jacket's Developer SDK makes integration of custom CUDA code into Jacket's runtime very easy. With a few simple jkt fu ...Read More

In this tutorial we present a new approach to integrating CUDA code into MATLAB leveraging Jacket's Developer SDK. Jacket's Developer SDK makes integration of custom CUDA code into Jacket's runtime very easy. With a few simple jkt functions (which mimic standard MEX API functions), you can integrate custom CUDA kernels into Jacket. This enables your CUDA code to inherit Jacket's optimized memory management and kernel execution runtime system. In this tutorial, we share examples of using the Developer SDK to write a median filter in CUDA and integrate it into Jacket. We will start with the naïve approach, then show how to optimize by using shared memory, and finally show the impact of using texture memory on this problem. In each case, we will integrate the code into Jacket & MATLAB via the Developer SDK and do benchmarking in MATLAB. A Q/A session will enable audience members to ask specific questions about the Developer SDK, CUDA programming, and MATLAB-based GPU computing.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2009 - ID S09455
Download:
 
Multi-GPU Support for MATLAB using Jacket
 
Keywords:
Developer - Tools & Libraries, GTC 2009 - ID P0981
Download:
 
Jacket for Multidimensional Scaling in Genomics
Chris McClanahan (AccelerEyes)
In this tutorial, we will present AccelerEyes?? Jacket software which enables GPU computing in MATLAB through a user case study entitled ??Multidimensional Scaling for Genomics?. We show how Jacket enables developers to write and run code ...Read More

In this tutorial, we will present AccelerEyes?? Jacket software which enables GPU computing in MATLAB through a user case study entitled ??Multidimensional Scaling for Genomics?. We show how Jacket enables developers to write and run code on the GPU in the native M-Language used in MATLAB. By simply casting data to Jacketâ??s GPU data structure, MATLAB functions are transformed into GPU functions. Additionally, we will also include demos of running MATLAB code on the GPU for image and signal processing, life science, finance, and other applications. A Q/A session will enable audience members to ask specific questions about Jacket.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2287
Streaming:
Download:
 
GPU Computing with MATLAB
Dan Doherty (MathWorks)
In this webinar, we will show how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and RADAR. We will demonstrate how you can speed up your MATLAB code by using ...Read More

In this webinar, we will show how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and RADAR. We will demonstrate how you can speed up your MATLAB code by using built-in GPU enabled functionality or by replacing key computations with CUDA kernels. We will also illustrate how MATLAB can be used for CUDA kernel evaluation, visualization, and validation.

  Back
 
Keywords:
Developer - Tools & Libraries, Audio, Image and Video Processing, Signal & Audio Processing, GTC Webinars 2014 - ID GTCE090
Streaming:
Download:
High Performance Computing
Presentation
Media
Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems
Nolan Davis
- SAIC
Large linear algebra problems may be solved using recursive block decomposition in which GPUs efficiently compute the sub-blocks and multicore CPUs put the sub-blocks back together within a large shared memory space. ...Read More
Large linear algebra problems may be solved using recursive block decomposition in which GPUs efficiently compute the sub-blocks and multicore CPUs put the sub-blocks back together within a large shared memory space. This talk will present benchmark results for such a hybrid approach, implemented in Matlab® and using Jacket® to access the GPU compute power.  Back
 
Keywords:
High Performance Computing, Developer - Algorithms, Signal & Audio Processing, GTC 2010 - ID S102100
Streaming:
Download:
Life & Material Science
Presentation
Media
Generalized Linear Model (GLM) Based Quantitative Trait Locus (QTL) Analysis
Ali Akoglu
- University of Arizona
Relating Genotype to Phenotype in Complex Environments has been identified as one of the grand challenges of plant sciences. ...Read More
Relating Genotype to Phenotype in Complex Environments has been identified as one of the grand challenges of plant sciences. Under the umbrella of the iPlant Collaborative funded by the Plant Science Cyberinfrastructure Collaborative program of the NSF, our goal is to develop GPU implementation of the General Linear Model (GLM) to statistically link genotype to phenotype and dramatically decrease the execution time for GLM analyses. GPU based highly parallelized Forward Regression stage of the GLM achieved 177x speedup over the Matlab based serial version. Results of this study will enable larger, more intensive genetic mapping analyses to be conducted.  Back
 
Keywords:
Life & Material Science, GTC 2010 - ID P10K01
Download:
Machine Learning & Deep Learning
Presentation
Media
Improving Mars Rover Image Compression Via GPUs And Genetic Algorithms
Brendan Babb (University of Alaska Anchorage)
Learn how to use Jacket to accelerate genetic algorithm (GA) image compression. Our research uses a GA to optimize lossy compression transforms that outperform state-of-the-art wavelet-based approaches for a variety of image classes, including f ...Read More

Learn how to use Jacket to accelerate genetic algorithm (GA) image compression. Our research uses a GA to optimize lossy compression transforms that outperform state-of-the-art wavelet-based approaches for a variety of image classes, including fingerprints, satellite, medical, and images transmitted from the Mars Exploration Rovers. A typical training run evolves a population of transforms over many generations; since each transform must be applied to each image from the training set, each run entails thousands of independent, parallelizable fitness evaluations. By using MATLAB, and Jacket to perform 2D convolution on the GPU, we have greatly reduced the total computation time needed.

  Back
 
Keywords:
Machine Learning & Deep Learning, GTC 2012 - ID S2133
Streaming:
Download:
 
A Reduction of the Elastic Net to Support Vector Machines Leveraging GPU Computing
Jacob Gardner (Washington University in St. Louis)
In past years we have seen many projects that build and maintain highly optimized implementations of Support Vector Machines (SVM) that leverage GPUs. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its po ...Read More
In past years we have seen many projects that build and maintain highly optimized implementations of Support Vector Machines (SVM) that leverage GPUs. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its popularity in many high impact applications, including genetics, neuroscience and systems biology. Rather than crafting a new GPU implementation for the Elastic Net, we introduce a novel reduction from the Elastic Net to the SVM, two seemingly disparate algorithms. This allows us to implement the Elastic Net in a way that spends almost all of its time in an SVM solver. As a result, we can leverage already existing GPU implementations of SVM solvers, and achieve in 11 lines of MATLAB code the fastest Elastic Net by multiple orders of magnitude.  Back
 
Keywords:
Machine Learning & Deep Learning, Big Data Analytics, GTC 2015 - ID S5543
Streaming:
Download:
Medical Imaging
Presentation
Media
GPUs Open New Avenues in Medical MRI
Chris A. Cocosco (University Medical Center Freiburg, Dept. of Radiology, Medical Physics.)
See how GPUs enable exciting new developments in medical Magnetic Resonance Imaging (MRI). Their computational power makes now practical new MRI techniques that can bring shorter imaging sessions, better images, and more insight into human physi ...Read More

See how GPUs enable exciting new developments in medical Magnetic Resonance Imaging (MRI). Their computational power makes now practical new MRI techniques that can bring shorter imaging sessions, better images, and more insight into human physiology. Learn about the characteristics of the general computational approach for obtaining the final image, and how it can be implemented using an iterative conjugate gradient algorithm. The algorithm exhibits massive parallelism and fits well the GPU architecture. Learn about its CUDA implementation details and Matlab integration. See throughput measurements of Tesla GPUs compared to top of the line many-core and large RAM CPU systems.

  Back
 
Keywords:
Medical Imaging, GTC 2012 - ID S2348
Streaming:
Download:
 
Accelerating 3D CT Reconstruction Using GPUs
Saoni Mukherjee (Northeastern University)
We implement 3D conebeam computed tomography on a GPU. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is ...Read More
We implement 3D conebeam computed tomography on a GPU. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on a CPU using either Matlab or C and on a heterogeneous system combining CPU and GPU. The relative performance of backprojection is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over thirty times using the GPU are seen for phantom data and close to fifty times for the larger mouse datasets.   Back
 
Keywords:
Medical Imaging, GTC 2013 - ID P3121
Download:
 
Real-Time Functional Brain Imaging: How GPU Acceleration Redefines Each Stage
Adam Gazzaley (UCSF), Tim Mullen (Swartz Center for Computational Neuroscience, Institute for Neural Computation, UC San Diego), Christian Kothe (UCSD), Oleg Konings (Gazzaley Lab at UCSF)
Learn how massively parallel CPU-GPU architectures and distributed optimization algorithms are advancing the state-of-the art in real-time non-invasive electroencephalography (EEG) and brain-machine interfaces (BCI), offering new perspectives in how ...Read More
Learn how massively parallel CPU-GPU architectures and distributed optimization algorithms are advancing the state-of-the art in real-time non-invasive electroencephalography (EEG) and brain-machine interfaces (BCI), offering new perspectives in how we study and interface with the human brain. Specifically, we will discuss recent efforts to accelerate key computationally-intensive inference problems. These include accurate neuronal source reconstruction, large-scale dynamical system identification, graph-theoretic connectivity analysis, and statistical machine learning for improved neuronal and cognitive state inference. We will examine distributed implementations of Alternating Direction Method of Multipliers (ADMM) convex optimization, using cuBLAS and custom CUDA kernels. Among these, a CUDA implementation of the sum-of-norms regularization (group lasso) will be discussed and compared to a serial C++ implementation and an optimized multi-core CPU MATLAB implementation.  Back
 
Keywords:
Medical Imaging, Developer - Performance Optimization, Real-Time Graphics, GTC 2014 - ID S4633
Streaming:
 
CUDA-Accelerated MATLAB without Parallel Computing Toolbox for 3D Medical Image Segmentation
Jung W. Suh (KLA-Tencor)
Learn how to accelerate your MATLAB codes using CUDA without Parallel Computing Toolbox. Although the Parallel Computing Toolbox is useful for speeding up, this toolbox may not be accessible to every MATLAB user and may have limitations in fully expl ...Read More
Learn how to accelerate your MATLAB codes using CUDA without Parallel Computing Toolbox. Although the Parallel Computing Toolbox is useful for speeding up, this toolbox may not be accessible to every MATLAB user and may have limitations in fully exploiting the power of both MATLAB and CUDA. For the purpose of general speeding up of MATLAB applications, the GPU-utilization through c-mex would provide more flexibility and power in many situations. This session will go through the MATLAB implementation of the atlas-based 3D hippocampus segmentation for MRI image as an example. The atlas-based segmentation is widely used in neuroimage analysis due to its reliable segmentation result even for the challenging target objects with ambiguous and complicated boundaries. However, it requires a high computational power because 3D image registration is used during the segmentation process. This session will show the each step of CUDA optimization for our atlas-based segmentation MATLAB codes from profiling to CUDA conversions through c-mex.  Back
 
Keywords:
Medical Imaging, Computer Vision, Video & Image Processing, GTC 2014 - ID S4342
Streaming:
Download:
 
Accelerated FFT-JVIE Solvers for Electromagnetic Analysis in MRI
Antonio S. Montemayor (Universidad Rey Juan Carlos)
We present an increasingly accelerated framework for solving the FFT-based Volume Integral Equation for the electric current (JVIE). This method is used in magnetic resonance imaging for computing the volumetric current distribution in the head prod ...Read More
We present an increasingly accelerated framework for solving the FFT-based Volume Integral Equation for the electric current (JVIE). This method is used in magnetic resonance imaging for computing the volumetric current distribution in the head produced by an external loop of uniform current. This problem is very computationally intensive because of the involved 3D data, as well as very parallelizable. We begin with a MATLAB implementation, then we include calls to the GPU versions of the MATLAB functions, then we customize the GPU functions with CUDA kernels and finally we create a standalone application in CUDA C.  Back
 
Keywords:
Medical Imaging, GTC 2014 - ID P4193
Download:
Robotics & Autonomous Machines
Presentation
Media
Kernel Support Vector Machines on GPUs
Stephen Tyree (Washington University in St. Louis)
We revisit optimization of Kernel Support Vector Machines with modern parallel hardware. Most parallel SVM solvers implement dual decomposition methods which, while fast on single-CPUs, are not naturally suited to GPUs. We focus on a primal optimiza ...Read More
We revisit optimization of Kernel Support Vector Machines with modern parallel hardware. Most parallel SVM solvers implement dual decomposition methods which, while fast on single-CPUs, are not naturally suited to GPUs. We focus on a primal optimization where the most computationally intensive operations can be written as large, dense linear algebra operations, computable with existing highly optimized packages. We prototype a sparse primal optimization solver in MATLAB/Jacket. This method outperforms LibSVM, an optimized and popular single-CPU dual decomposition solver, by orders of magnitude, and is up to 10x faster than GTSVM, the fastest existing kernelized GPU SVM solver.  Back
 
Keywords:
Robotics & Autonomous Machines, Databases, Data Mining, Business Intelligence, GTC 2013 - ID P3266
Download:
 
Acceleration of a Pseudo-Bacterial Potential Field Algorithm for Path Planning
Ulises Orozco-Rosas (Instituto Politecnico Nacional)
Path planning of a mobile robot -- determining an optimal path from a universe of possible solutions -- is one of the most computationally intensive tasks and a challenge in dynamically changing environments. Using GPUs, it is possible to process dat ...Read More
Path planning of a mobile robot -- determining an optimal path from a universe of possible solutions -- is one of the most computationally intensive tasks and a challenge in dynamically changing environments. Using GPUs, it is possible to process data-intensive tasks efficiently. This work presents the acceleration of a Pseudo-Bacterial Potential Field (PBPF) algorithm for path planning. The Matlab-CUDA implementation of the PBPF algorithm shows how to find an optimal collision-free path for a mobile robot and how to speed up the path planning computation through the use of GPUs. The simulation results demonstrate the efficiency of the PBPF implementation to solve the path planning problem in offline and online mode.  Back
 
Keywords:
Robotics & Autonomous Machines, Self-Driving Cars & Automotive, IoT, GTC 2016 - ID P6288
Download:
Signal & Audio Processing
Presentation
Media
UnLocBox - How To Develop a GPU Library for Advanced Algorithms, Accessible from C and Matlab
Eri Rubin (SagivTech LTD)
This talk will describe the UNLocBox, a GPU based library containing advanced approaches in the solution of inverse problems. This work is a joint effort of SagivTech Ltd, Ecole Polytechnique Federale de Lausanne and the University of Bremen wit ...Read More

This talk will describe the UNLocBox, a GPU based library containing advanced approaches in the solution of inverse problems. This work is a joint effort of SagivTech Ltd, Ecole Polytechnique Federale de Lausanne and the University of Bremen within a Future Emerging Technologies project sponsored by the European Union. The project, UNLocx, aims at developing a framework for constructing problem adapted, ultra-efficient algorithms concerning (de-)coding and analyzing and synthesizing of signals and images. Although these newly developed algorithms produce promising results, their computational complexity prevents their application to real life problems. In this project, SagivTech''s main task was to provide a GPU library for these algorithms, that can be easily called by Matlab and C developers who have no knowledge of GPU computing. The GPU UNLocBox library allows researchers to tackle complex applications in medical image processing which could not be solved otherwise. This research has been (partially) supported by EU FET Open grant UNLocX (255931).

  Back
 
Keywords:
Signal & Audio Processing, Developer - Algorithms, Developer - Tools & Libraries, Medical Imaging, GTC 2013 - ID S3087
Streaming:
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2016 NVIDIA Corporation Legal Info | Privacy Policy