Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We map these pre-computed tables to the GPU''s texture memory. Interpolation then becomes a texture lookup where the hardware automatically performs the interpolation, albeit at very low precision. Our mathematical framework reasons about the impact of this precision and our performance results show 500X speedups. This work generalizes the GPU texture units as computation engines and opens up new problems for GPU acceleration.
Have you heard about the largest ground-based telescope ever built? Are you interested in the newest NVIDIA DGX-1 hardware accelerator? Come and learn how the DGX-1 architecture dramatically leaps forward the computational astronomy community in designing major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we'll explain how the resulting matrix computations associated with an efficient task-based programming model help design the next generation of telescope instruments.
This talk will demonstrate an implementation of the Total Lagrangian Explicit Dynamic finite element formulation in CUDA. As this method was originally developed for very soft tissues, it is in full compliance to geometric, material and loading nonlinearities and achieves significant speedups over industry-proven solutions while retaining accuracy. Intricacies of parallel TLED implementation and comparisons to conventional FE codes are provided as well as a short introduction into the background of this numerical tool. Learn the details and benefits of mathematically reformulating an FE problem towards parallelization and subsequently implementing/optimizing the algorithm in CUDA.
RISE - Risky Intervention and Surveillance Environment is very de- manding task. In presentation three areas of research are shown such as 3D data registration, robot navigation and 3D cloud of points processing. The approach based on robust KNN nearest neighborhood search applied for improvement of ICP algorithm is shown. The path planning parallel approach based on wave propagation method is shown. On line segmentation of 3D cloud of points based on normal vector computation is given. The set of proposed algorithms where tested on GPGPU NVIDIA CUDA GF 580, the results are satisfying.
Zaxtar is the highest performance video server that will feed up to 4GB/sec of video data to 4K projectors or 4K displays. This performance is accomplished by utilizing CPU, GPU and synchronization of multiple computers. The most important feature is mathematically lossless compression, which can compress video or graphics data at the ratio of 3 to 1 without losing any information. Mathematically lossless compression has been achieved by CPU up until now, but we have ported the algorithm to Quadro FX 5800.
A fast way of developing, prototyping and deploying numerical algorithms that can take advantage of CUDA capable systems is available in Mathematica 8. Over the past year, educators, scientists, and business users have taken advantage of the benefits that the support of GPU programming in Mathematica. By integrating and implementing CUDA/OpenCL in their programs, users make use of a hybrid approach, combining the speed-up that GPUs offer and a powerful numerical development system. In this presentation several examples describing numerical applications ranging from deconvolution of MRI imaging, linear solvers for FEM, systems of ODEs, line integral convolution visualization are presented.
Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions in physically meaningful variables. A robust numerical inversion approach is thus desirable. In this talk, I present one of the approaches to compute an approximate inverse, the Weeks method. I will also discuss the difficulties in performing numerical inversion. Finally, I will show how we have been able to utilize Jacket from AccelerEyes in MATLAB to more efficiently and robustly implement the Weeks method.
We present an investigation into the emergent mathematical behavior of computational operations which are performed efficiently on massively multi-core architectures. This novel perspective for computationally solving mathematical equations is designed from the ground up for efficient implementation on massively multi-core architectures. In this session, we present one of our alogithms which randomly generates a large number of sparse domain discretizations of a Partial Differential Equation. The statistical moments of the ultra-sparse-grid solutions suggest optimal locations for gridpoints. We will apply this algorithm to Poisson and Hamilton-Jacobi steady state equations and provide preliminary analytical results.
Learn about the fast multipole method (FMM) and its optimization on NVIDIA GPUs. The FMM is a well-known algorithm with a variety of applications in areas like galaxy simulation, electrostatic potential calculations, boundary element methods, integral equations, dislocations dynamics, etc. The FMM offers several difficulties when running on parallel heterogeneous platforms such as multicore processors with GPUs. Some parts of the calculation suffer from limited concurrency, and load-balancing can be very uneven for certain distributions of particles. We will present a new API and runtime system, called StarPU, that allows expressing a calculation as a graph of tasks, with dependencies, and contains a runtime system that can optimally schedule those tasks on a parallel machine. StarPU supports conventional multicore processors as well as NVIDIA GPUs. Authors: Emmanuel Agullo, BÃÂ©renger Bramas, Olivier Coulaud, Matthias Messner, (INRIA Bordeaux - Sud-Ouest / LaBRI, Talence, France). Eric Darve, (Stanford Institute for Computational and Mathematical Engineering). Toru Takahashi, (Department of Mechanical Science and Engineering, Nagoya University, Nagoya, Japan).
Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of-place transposition, while simple in nature, may be prohibitive due to large spatial overhead for applications with large datasets. This talk presents the techniques on in-place matrix transposition, and the audience will also learn how to use our in-place transposition library with examples given in CUDA, MATLAB, and Mathematica CUDA bindings.
Mathematica comes with many extremely optimized numerical libraries integrated into the application, but they don''t yet take advantage of the GPU. Thankfully, Mathematica provides an easy to use API for communicating between a large variety of external resources, called MathLink. This tutorial will provide a hands-on introduction to start using CUDA within Mathematica, an introduction to the cuda mathematica plugin, as well as the different issues one has to keep in mind when writing MathLink applications using the CUDA Toolkit. Finally we will showcase a few real-world examples.
Since version 8, Mathematica offers advanced support for GPU acceleration with optimized CUDA functions and a built-in framework for developing scientific CUDA kernel code. In this session, the Wolfram development team will share their experience developing their next-generation CUDA support in Mathematica. From the unique ability of Parallel Nsight to attach its CUDA debugger to a running process, the new parallel Warp Watch for warp-wide variable views and expression evaluation, to the latest runtime CUDA profiling experiments; they will demonstrate how they were able to take advantage of Parallel Nsight to get the most out of CUDA and the GPU.
Visuvi develops targeted visual search engine solutions for a wide range of vertical applications in medicine, ecommerce and general-purpose visual search and maintains an index of images on the Internet. Visuvi has patent pending technology fort it's search engine that examines the content and patterns within an image, categorizes that information via mathematical indexing and delivers search results based on the image itself - no text or meta-tags required. Visuvi Inc. is a privately held company based in Redwood City, CA and is managed by a seasoned executive team consisting of Christopher Boone, President and CEO, Alexander Valenica, Chief Scientist and Co-Founder, Florian Brody, VP Marketing and Yuri Drozd, VP Product Management.
With the introduction of GPU support in version 8, Mathematica has become an excellent environment for integrating CUDA with high level code for interpretation or visualization. In this presentation, we will show the usefulness of Mathematica in the venue of computational finance. In addition to demonstrating the GPU-accelerated financial computations which can be readily performed within Mathematica, we will show that these calculations can easily be integrated with third-party data sources including Microsoft Excel and databases. Furthermore, we will cover the UnRisk Mathematica package written by MathConsult, which seamlessly adds GPU-accelerated complex model calibration algorithms to Mathematica's repertoire.
Get the latest development in Next Generation Knowledge Based Engineering (KBE) software which provides real results over the traditional design approach. Today there exist numerous KBE applications in the field of vehicle ergonomics, suspension, NVH, safety, regulations etc which deal with huge number of iterations and mathematical algorithm. With GPU computing and CUDA the KBE kernel is restructured to incorporate parallel programming model which helps the applications run faster and achieving time reduction from hours to seconds. KBE geometry kernel also gets benefited by enabling CUDA in topology based operations which take lot of time when performed on CPU.
This talk describes the recent simulation of ~18,000 proteins in suspension, reproducing the crowding conditions of the cell interior. The simulations were obtained with MUPHY, a computational platform for multi-scale simulations of real-life biofluidic problems. The same software has been used in the past to simulate blood flows through the human coronary arteries and DNA translocation across nanopores. The simulations were performed on the Titan system at the Oak Ridge National Laboratory, and exhibits excellent scalability up to 18, 000 K20X NVIDIA GPUs, reaching 20 Petaflops of aggregate sustained performance with a peak performance of 27.5 Petaflops for the most intensive computing component. In this talk I will describe how the combination of novel mathematical models, computational algorithms, hardware technology and parallelization techniques allowed reproducing for the first time such a massive amount of proteins.
ACM Gordon Bell Finalist
Learn how GPUs help design major, multimillion-dollar optical instruments for the European Extremely Large Telescope. From the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, this talk will explain how the resulting dense linear algebra operations associated with an efficient task-based programming model help design the next generation of telescope instruments.
The analysis of structure in three-dimensional images is increasingly important for biomedical research and computational science. In this poster, we outline ongoing work developing Diderot, a parallel domain-specific language for three-dimensional image visualization and analysis algorithms, such as volume rendering, fiber tractography, and particle systems. Diderot supports a high-level mathematical computation model coupled with a batch-synchronous parallelism model. The poster further describes Diderots GPU implementation and its high performance measurements on GPUs versus other sequential and parallel platforms.