SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Astronomy & Astrophysics
Presentation
Media
GRASSY: Leveraging GPU Texture Units for Asteroseismic Data Analysis
Matt Sinclair
Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We ...Read More

Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We map these pre-computed tables to the GPU''s texture memory. Interpolation then becomes a texture lookup where the hardware automatically performs the interpolation, albeit at very low precision. Our mathematical framework reasons about the impact of this precision and our performance results show 500X speedups. This work generalizes the GPU texture units as computation engines and opens up new problems for GPU acceleration.

  Back
 
Keywords:
Astronomy & Astrophysics, High Performance Computing, GTC 2010 - ID S10044
Download:
Big Data Analytics
Presentation
Media
Fast Vertical Data Classification Using GPUs
Arjun G. Roy (NDSU)
Massive amounts of data is being generated in recent times. Current classification methods are quite accurate but extremely slow on big data. We propose a two-pronged approach a) Treat data vertically instead of conventional horizontal treatment and ...Read More
Massive amounts of data is being generated in recent times. Current classification methods are quite accurate but extremely slow on big data. We propose a two-pronged approach a) Treat data vertically instead of conventional horizontal treatment and use our vertical-data specific classification algorithm and b) Exploit GPUs fast mathematical computational speed to process vertical data quickly which significantly benefits from our data structure called P-Tree. Our classification algorithm is O(k) where k is number of attributes and achieves significantly high accuracy.  Back
 
Keywords:
Big Data Analytics, GTC 2014 - ID P4263
Download:
 
Accelerating Topological Data Analysis Using GPUs
Ryan Hsu (Ayasdi)
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analy ...Read More
Topology provides a mathematical framework for applying a complete range of statistical, geometric, and machine learning methods, revealing insights from the geometry of your data. Ayasdi utilizes topological data analysis (TDA) in its advanced analytics software to simplify the analysis of complex, multi-variate datasets. In this poster, we illustrate how GPGPU's can be leveraged to accelerate key operations in TDA by over 14X.  Back
 
Keywords:
Big Data Analytics, Machine Learning & Deep Learning, GTC 2015 - ID P5239
Download:
Computational Fluid Dynamics
Presentation
Media
A Practical Introduction to Computational Fluid Dynamics on GPUs
Con Caris, John Taylor, Tomasz Bednarz
- CSIRO
Learn step-by-step procedures to write an explicit CFD solver based on final difference methods with staggered grid allocations and boundary fitted coordinates. ...Read More
Learn step-by-step procedures to write an explicit CFD solver based on final difference methods with staggered grid allocations and boundary fitted coordinates. We will discuss the derivation of the mathematical model, discretization of the model equations, development of the algorithms, and parallelization and visualization of the computed data using OpenCL and OpenGL. Compares case studies of natural convection, driven cavity, scaling analysis, and magneto-thermal convection computed using CSIRO''s CPU/GPU supercomputer cluster to known analytical and experimental solutions.  Back
 
Keywords:
Computational Fluid Dynamics, Developer - Algorithms, High Performance Computing, High Performance Computing, Physics Simulation, Physics Simulation, GTC 2010 - ID 2058
Streaming:
Download:
Computational Physics
Presentation
Media
Optimization of a CUDA-based Monte Carlo Code for Radiation Therapy
Nick Henderson (Stanford University, Institute for Computational and Mathematical Engineering)
Learn about optimization efforts in G4CU, a CUDA Monte Carlo code for radiation therapy. G4CU is based on the core algorithm and physics processes in Geant4, a toolkit for simulating particles traveling through and interacting with matter. The techni ...Read More
Learn about optimization efforts in G4CU, a CUDA Monte Carlo code for radiation therapy. G4CU is based on the core algorithm and physics processes in Geant4, a toolkit for simulating particles traveling through and interacting with matter. The techniques covered will include the use of texture references for look-up tables, device configuration for different simulation components, and scheduling of work for different particle types.  Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, Medical Imaging, GTC 2014 - ID S4259
Streaming:
Download:
 
Hierarchical Algorithms on Heterogeneous Architectures: Adaptive Multigrid Solvers for LQCD on GPUs
M Clark (NVIDIA)
Graphics Processing Units (GPUs) are an increasingly popular platform upon which to deploy lattice quantum chromodynamics calculations. While there has been much progress to date in developing solver algorithms to improve strong scaling on such plat ...Read More
Graphics Processing Units (GPUs) are an increasingly popular platform upon which to deploy lattice quantum chromodynamics calculations. While there has been much progress to date in developing solver algorithms to improve strong scaling on such platforms, there has been less focus on deploying 'mathematically optimal' algorithms. A good example of this are hierarchical solver algorithms such as adaptive multigrid, which are known to solve the Dirac operator with optimal O(N) complexity. We describe progress to date in deploying adaptive multigrid solver algorithms to NVIDIA GPU architectures and discuss in general the suitability of heterogeneous architectures for hierarchical algorithms.  Back
 
Keywords:
Computational Physics, Numerical Algorithms & Libraries, GTC 2014 - ID S4327
Streaming:
Download:
 
GPU-Based "ab-initio" Properties Investigations for Binary Compounds
Sergey Seriy (Komsomolsk-on-Amur State technical University)
GPU-calculations is modern, effective, and fastest way for "ab-initio" mathematical modeling of nanostructures, and, in particular, a properties of nanostructures. This article consist some results of ab-initio research for hardness propert ...Read More
GPU-calculations is modern, effective, and fastest way for "ab-initio" mathematical modeling of nanostructures, and, in particular, a properties of nanostructures. This article consist some results of ab-initio research for hardness properties of several stable binary compounds. It is a combinations of Al, Si, Mg, Cu, and Fe in "fcc", "nacl", "cu2mg", "mgcu2", "zns", "caf2", "cscl", "alfe3", and other structures (B1, B2, B3, C1, C15, A15, D03, L12, ...). We find an equilibrium volumes, elastic moduli, and total energy per atom for most stable compounds and compare this ab-initio simulations and experimental data. Also we estimated a increase of performance coefficient, for GPU-calculations with CUDA technology. All calculations performed on GPAW and Abinit software, based on DFT, and their GPU-versions.  Back
 
Keywords:
Computational Physics, GTC 2014 - ID P4121
Download:
 
GPU-Accelerated Solver for the 3D Groundwater Flow Equation
Bob Zigon (Beckman Coulter)
Learn how to build a 3D solver for the groundwater flow equation that is accelerated by the GPU. The underlying mathematical model treats the entire subsurface, both saturated and unsaturated, as a whole. The governing nonlinear, time dependent, para ...Read More
Learn how to build a 3D solver for the groundwater flow equation that is accelerated by the GPU. The underlying mathematical model treats the entire subsurface, both saturated and unsaturated, as a whole. The governing nonlinear, time dependent, parabolic partial differential equation is discretized into 19 million nodes. The resulting K20-based GPU solver is 20 times faster than the original single CPU Fortran code.  Back
 
Keywords:
Computational Physics, GTC 2015 - ID S5503
Streaming:
Download:
Computational Structural Mechanics
Presentation
Media
Nonlinear Real-time Finite Element Analysis Using CUDA
Vukasin Strbac (KULeuven University, Belgium)
This talk will demonstrate an implementation of the Total Lagrangian Explicit Dynamic finite element formulation in CUDA. As this method was originally developed for very soft tissues, it is in full compliance to geometric, material and loading ...Read More

This talk will demonstrate an implementation of the Total Lagrangian Explicit Dynamic finite element formulation in CUDA. As this method was originally developed for very soft tissues, it is in full compliance to geometric, material and loading nonlinearities and achieves significant speedups over industry-proven solutions while retaining accuracy. Intricacies of parallel TLED implementation and comparisons to conventional FE codes are provided as well as a short introduction into the background of this numerical tool. Learn the details and benefits of mathematically reformulating an FE problem towards parallelization and subsequently implementing/optimizing the algorithm in CUDA.

  Back
 
Keywords:
Computational Structural Mechanics, Developer - Algorithms, Manufacturing Technical, GTC 2013 - ID S3093
Streaming:
Download:
Computer Aided Design
Presentation
Media
No More NURBS: Try PSPS
Qingde Li (University of Hull)
The goal of this session is to show how to create geometric shapes in GPUs, by taking advantage of GPU's tessellation feature, using the state of the art spline technique called PSP splines (PSPS). PSPS are simpler than B-splines in its mathematical ...Read More
The goal of this session is to show how to create geometric shapes in GPUs, by taking advantage of GPU's tessellation feature, using the state of the art spline technique called PSP splines (PSPS). PSPS are simpler than B-splines in its mathematical form, but are much more powerful than NURBS in geometric design. Compared with Bezier, B-spline, NURBS, design a geometric shape using PSPS is much more efficient, flexible and more intuitive. In this session we will describe what PSPS are and demonstrate how to directly implement PSPS in GLSL or HLSL in the tessellation stages to create new geometries.   Back
 
Keywords:
Computer Aided Design, Digital Product Design & Styling, Game Development, GTC 2014 - ID S4240
Streaming:
Computer Vision
Presentation
Media
Terrestrial 3D Mapping with Parallel Computing Approach
Janusz Bedkowski (Institute of Mathematical Machines)
This work concerns the parallel implementation of 3D mapping algorithm. Several nearest neighborhood search strategies are compared. The accuracy of final 3D mapping is evaluated with geodetic precision. This work can be used in several applications ...Read More
This work concerns the parallel implementation of 3D mapping algorithm. Several nearest neighborhood search strategies are compared. The accuracy of final 3D mapping is evaluated with geodetic precision. This work can be used in several applications such as mobile robotics and spatial design. Attendees will learn how to choose proper nearest neighbors search strategy for 3D data registration, how to build accurate 3D maps, how to evaluate 3D mapping system with geodetic precision and what the influence of parallel programming is to performance and accuracy.  Back
 
Keywords:
Computer Vision, Big Data Analytics, GTC 2014 - ID S4353
Streaming:
Computer Vision & Machine Vision
Presentation
Media
Parallel Computing In Mobile Robotics for RISE
Janusz Bedkowski (Institute of Mathematical Machines, Warsaw, Poland)
RISE - Risky Intervention and Surveillance Environment is very de- manding task. In presentation three areas of research are shown such as 3D data registration, robot navigation and 3D cloud of points processing. The approach based on robust KNN ...Read More

RISE - Risky Intervention and Surveillance Environment is very de- manding task. In presentation three areas of research are shown such as 3D data registration, robot navigation and 3D cloud of points processing. The approach based on robust KNN nearest neighborhood search applied for improvement of ICP algorithm is shown. The path planning parallel approach based on wave propagation method is shown. On line segmentation of 3D cloud of points based on normal vector computation is given. The set of proposed algorithms where tested on GPGPU NVIDIA CUDA GF 580, the results are satisfying.

  Back
 
Keywords:
Computer Vision & Machine Vision, GTC 2012 - ID S2081
Streaming:
Download:
Defense
Presentation
Media
Using NVIDIA GPUs for Real-time Data Processing in a Holographic Radar System
Peter Wurmsdobler (Aveillant)
**Please note that no recording of this webinar is available. Please contact Peter Wurmsdobler directly at peter dot wurmsdobler at aveillant dot com for more information on the subject matter.** In this webinar, Peter Wurmsdobler, Lead Software ...Read More

**Please note that no recording of this webinar is available. Please contact Peter Wurmsdobler directly at peter dot wurmsdobler at aveillant dot com for more information on the subject matter.** In this webinar, Peter Wurmsdobler, Lead Software Architect, Aveillant, will give a short introduction to Aveillant's Holographic Radar systems, the principles of Holographic radars, as opposed to scanning radar systems, as well as its computational requirements. Peter will go on to explore the technical challenges faced in the implementation of the mathematical algorithms needed, how they were solved, and why NVIDIA GPUs proved to be a good fit to meet the computational needs. Finally, Peter will present performance charts that reveal the amount of processing needed in real-time for a real radar system. 

  Back
 
Keywords:
Defense, Signal & Audio Processing, GTC Webinars 2014 - ID GTCE089
Developer - Algorithms
Presentation
Media
GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations
Fred Lionetti
Mathematical models describing cellular membranes form the basis of whole tissue models to describe the electrical activity of entire organs, such as the heart. ...Read More
Mathematical models describing cellular membranes form the basis of whole tissue models to describe the electrical activity of entire organs, such as the heart. Numerical simulations based on these models are useful for both basic science and increasingly for clinical diagnostic and therapeutic applications such as targeting ablation therapy for atrial arrhythmias, defibrillator design and cardiac resynchronization therapy. A common bottleneck in such simulations arises from solving large stiff systems of ordinary differential equations (ODEs) thousands of times for numerous integration points (representing cells) throughout a three-dimensional tissue or organ model. For some electrophysiology simulations, over 80% of the time is spent solving these systems of ODEs. While a cluster provides the required interactive response time to solve the ODEs, a desktop sized platform would enhance usability of the software in a laboratory setting. The audience will benefit by learning how a real-world, complex, HPC application can directly benefit by the use of CUDA technology. Participants will learn which optimization techniques yielded the best performance results on an actual application. We will also explore the benefits and limits of the use of single precision in certain scientific applications.  Back
 
Keywords:
Developer - Algorithms, Life & Material Science, Medical Imaging, Visualization, GTC 2009 - ID S09036
Streaming:
Download:
 
Exploiting the GPU in Ultra High-End 4K Video Servers
Mark Marrin
Zaxtar is the highest performance video server that will feed up to 4GB/sec of video data to 4K projectors or 4K displays. This performance is accomplished by utilizing CPU, GPU and synchronization of multiple computers. The most important featu ...Read More

Zaxtar is the highest performance video server that will feed up to 4GB/sec of video data to 4K projectors or 4K displays. This performance is accomplished by utilizing CPU, GPU and synchronization of multiple computers. The most important feature is mathematically lossless compression, which can compress video or graphics data at the ratio of 3 to 1 without losing any information. Mathematically lossless compression has been achieved by CPU up until now, but we have ported the algorithm to Quadro FX 5800.

  Back
 
Keywords:
Developer - Algorithms, Video & Image Processing, Visualization, GTC 2009 - ID S09108
Download:
 
GPU Based Numerical Methods in Mathematica
Ulises Cervantes-Pimentel (Wolfram Research), Abdul Dakkak (Wolfram Research)
A fast way of developing, prototyping and deploying numerical algorithms that can take advantage of CUDA capable systems is available in Mathematica 8. Over the past year, educators, scientists, and business users have taken advantage of the ben ...Read More

A fast way of developing, prototyping and deploying numerical algorithms that can take advantage of CUDA capable systems is available in Mathematica 8. Over the past year, educators, scientists, and business users have taken advantage of the benefits that the support of GPU programming in Mathematica. By integrating and implementing CUDA/OpenCL in their programs, users make use of a hybrid approach, combining the speed-up that GPUs offer and a powerful numerical development system. In this presentation several examples describing numerical applications ranging from deconvolution of MRI imaging, linear solvers for FEM, systems of ODEs, line integral convolution visualization are presented.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2106
Streaming:
Download:
 
An Accelerated Weeks Method for Numerical Laplace Transform Inversion
Patrick Kano (Acunum Algorithms and Simulations, LLC)
Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions ...Read More

Mathematical methods based on the use of the Laplace transform are a standard component of undergraduate education. Real world problems however often yield Laplace space solutions which are too complex to be analytically inverted to expressions in physically meaningful variables. A robust numerical inversion approach is thus desirable. In this talk, I present one of the approaches to compute an approximate inverse, the Weeks method. I will also discuss the difficulties in performing numerical inversion. Finally, I will show how we have been able to utilize Jacket from AccelerEyes in MATLAB to more efficiently and robustly implement the Weeks method.

  Back
 
Keywords:
Developer - Algorithms, GTC 2012 - ID S2415
Streaming:
Download:
 
Emergent Numerical Algorithms
David Bortz (University of Colorado, Boulder)
We present an investigation into the emergent mathematical behavior of computational operations which are performed efficiently on massively multi-core architectures. This novel perspective for computationally solving mathematical equations is d ...Read More

We present an investigation into the emergent mathematical behavior of computational operations which are performed efficiently on massively multi-core architectures. This novel perspective for computationally solving mathematical equations is designed from the ground up for efficient implementation on massively multi-core architectures. In this session, we present one of our alogithms which randomly generates a large number of sparse domain discretizations of a Partial Differential Equation. The statistical moments of the ultra-sparse-grid solutions suggest optimal locations for gridpoints. We will apply this algorithm to Poisson and Hamilton-Jacobi steady state equations and provide preliminary analytical results.

  Back
 
Keywords:
Developer - Algorithms, GTC 2013 - ID S3151
Streaming:
Download:
 
Task-based Parallelization of the Fast Multipole Method on NVIDIA GPUs and Multicore Processors
Eric Darve (Stanford, Institute for Computational and Mathematical Engineering)
Learn about the fast multipole method (FMM) and its optimization on NVIDIA GPUs. The FMM is a well-known algorithm with a variety of applications in areas like galaxy simulation, electrostatic potential calculations, boundary element methods, in ...Read More

Learn about the fast multipole method (FMM) and its optimization on NVIDIA GPUs. The FMM is a well-known algorithm with a variety of applications in areas like galaxy simulation, electrostatic potential calculations, boundary element methods, integral equations, dislocations dynamics, etc. The FMM offers several difficulties when running on parallel heterogeneous platforms such as multicore processors with GPUs. Some parts of the calculation suffer from limited concurrency, and load-balancing can be very uneven for certain distributions of particles. We will present a new API and runtime system, called StarPU, that allows expressing a calculation as a graph of tasks, with dependencies, and contains a runtime system that can optimally schedule those tasks on a parallel machine. StarPU supports conventional multicore processors as well as NVIDIA GPUs. Authors: Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Matthias Messner, (INRIA Bordeaux - Sud-Ouest / LaBRI, Talence, France). Eric Darve, (Stanford Institute for Computational and Mathematical Engineering). Toru Takahashi, (Department of Mechanical Science and Engineering, Nagoya University, Nagoya, Japan).

  Back
 
Keywords:
Developer - Algorithms, GTC 2013 - ID S3192
Streaming:
Download:
 
Fast In-place Transposition and Layout Conversion on GPUs
I-Jui (Ray) Sung (University of Illinois at Urbana-Champaign)
Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of- ...Read More

Learn how to perform efficient dense rectangular matrix transposition in place on the GPU. Transposition (full and tiled) has been an important building block in many GPU-accelerated algorithms for better memory access patterns. However, out-of-place transposition, while simple in nature, may be prohibitive due to large spatial overhead for applications with large datasets. This talk presents the techniques on in-place matrix transposition, and the audience will also learn how to use our in-place transposition library with examples given in CUDA, MATLAB, and Mathematica CUDA bindings.

  Back
 
Keywords:
Developer - Algorithms, Developer - Tools & Libraries, GTC 2013 - ID S3307
Streaming:
Download:
 
Numerical Reproducibility Challenges on Extreme Scale Multi-Threading GPUs
Michela Taufer (University of Delaware)
Learn how to mitigate rounding errors that can hamper result reproducibility when concurrent executions burst and workflow determinism vanishes. This talk unveils the power of mathematical methods to model rounding-errors in scientific applications a ...Read More
Learn how to mitigate rounding errors that can hamper result reproducibility when concurrent executions burst and workflow determinism vanishes. This talk unveils the power of mathematical methods to model rounding-errors in scientific applications and illustrates how these methods can mitigate error drifting on new generation, many-core GPUs. We will discuss performance and accuracy issues for a diverse set of scientific applications that rely on floating point arithmetic. In particular, our experimental study will cover the following exploration space: floating point format and precision (e.g., single, double, and composite precision), numerical range used by the computation, degree of multi-threading, thread scheduling scheme, and algorithmic variant.  Back
 
Keywords:
Developer - Algorithms, Computational Physics, Supercomputing, GTC 2015 - ID S5245
Streaming:
Download:
Developer - Performance Optimization
Presentation
Media
Featured Talk: Memory Management Tips, Tricks and Techniques
Stephen Jones (SpaceX)
GPUs can push teraflops of mathematical power, but feeding the SMs with data can often be harder than optimising your algorithm. A well-designed program must take into account both access of data from within the GPU as well as allocation and transfer ...Read More
GPUs can push teraflops of mathematical power, but feeding the SMs with data can often be harder than optimising your algorithm. A well-designed program must take into account both access of data from within the GPU as well as allocation and transfer of data between CPU and GPU. This talk will cover techniques including sub-allocation, shared memory management, and parallel memory structures such as stacks, queues and ring-buffers which can greatly improve the throughput of your algorithms. 75% of programs are limited by memory bandwidth and not compute power, so careful memory management is critical to a high-performance program.  Back
 
Keywords:
Developer - Performance Optimization, Developer - Algorithms, Developer - Programming Languages, GTC 2015 - ID S5530
Streaming:
Download:
Developer - Programming Languages
Presentation
Media
Mathematica for GPU Programming
Ulises Cervantes-Pimentel
- Wolfram Research
Mathematica is widely used in scientific, engineering, mathematical fields and education. In this session, new tools for general GPU programming in the next release of Mathematica are presented. ...Read More
Mathematica is widely used in scientific, engineering, mathematical fields and education. In this session, new tools for general GPU programming in the next release of Mathematica are presented. These tools build on top of Mathematica's technology which provides a simple, yet powerful, interface to the large base of compiling tools. Applications of CUDA and OpenCL from within Mathematica will be presented. These examples will provide a general overview of the powerful development environment for GPU programming that Mathematica can offer not just for researchers but for anybody with basic knowledge of Mathematica and GPU programming.  Back
 
Keywords:
Developer - Programming Languages, Developer - Algorithms, Developer - Tools & Libraries, Video & Image Processing, GTC 2010 - ID 2028
Download:
Developer - Tools & Libraries
Presentation
Media
Integrating CUDA BLAS with IMSL Fortran
Chris Gottbrath
- TotalView Technologies, Inc., a Rogue Wave Software company
As GPU hardware becomes more prevalent in both research and commercial institutions, software that takes advantage of this specialized hardware is growing in demand. ...Read More
As GPU hardware becomes more prevalent in both research and commercial institutions, software that takes advantage of this specialized hardware is growing in demand. In many cases, it is infeasible or impossible to rewrite an existing program to run entirely on the GPU, so the goal is often to offload as much work as possible. As the IMSL Library team at Rogue Wave Software considers how best to tackle the GPU realm with a general mathematical library, the IMSL Fortran Library takes an initial step where the CUDA BLAS library is utilized to offload CPU work to GPU hardware. This presentation will discuss the approach and architecture of the solution. Benchmark results will show where success has been found. Plans for future products will also be covered.  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2010 - ID S102299
Streaming:
Download:
 
Using CUDA within Mathematica
Kashif Rasul
Mathematica comes with many extremely optimized numerical libraries integrated into the application, but they don''t yet take advantage of the GPU. Thankfully, Mathematica provides an easy to use API for communicating between a large var ...Read More

Mathematica comes with many extremely optimized numerical libraries integrated into the application, but they don''t yet take advantage of the GPU. Thankfully, Mathematica provides an easy to use API for communicating between a large variety of external resources, called MathLink. This tutorial will provide a hands-on introduction to start using CUDA within Mathematica, an introduction to the cuda mathematica plugin, as well as the different issues one has to keep in mind when writing MathLink applications using the CUDA Toolkit. Finally we will showcase a few real-world examples.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2009 - ID S09033
Streaming:
Download:
 
Developing Next-Generation CUDA Acceleration in Wolfram's Mathematica with Parallel Nsight
Since version 8, Mathematica offers advanced support for GPU acceleration with optimized CUDA functions and a built-in framework for developing scientific CUDA kernel code. In this session, the Wolfram development team will share their experienc ...Read More

Since version 8, Mathematica offers advanced support for GPU acceleration with optimized CUDA functions and a built-in framework for developing scientific CUDA kernel code. In this session, the Wolfram development team will share their experience developing their next-generation CUDA support in Mathematica. From the unique ability of Parallel Nsight to attach its CUDA debugger to a running process, the new parallel Warp Watch for warp-wide variable views and expression evaluation, to the latest runtime CUDA profiling experiments; they will demonstrate how they were able to take advantage of Parallel Nsight to get the most out of CUDA and the GPU.

  Back
 
Keywords:
Developer - Tools & Libraries, GTC 2012 - ID S2430
Streaming:
Download:
Embedded Systems
Presentation
Media
Mobile 3D Mapping With Tegra K1
Karol Majek (Institute of Mathematical Machines)
This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work ...Read More
This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work shows how to replace traditional CUDA-enabled laptops with embedded Tegra K1. Attendees will learn about the problems and challenges of embedding parallel 3D mapping algorithm and how to improve its speed.  Back
 
Keywords:
Embedded Systems, Computer Vision & Machine Vision, GTC 2015 - ID S5383
Streaming:
Download:
Emerging Companies Summit
Presentation
Media
Visuvi, Inc. Startup Presentation
Christopher Boone
Visuvi develops targeted visual search engine solutions for a wide range of vertical applications in medicine, ecommerce and general-purpose visual search and maintains an index of images on the Internet. Visuvi has patent pending technology for ...Read More

Visuvi develops targeted visual search engine solutions for a wide range of vertical applications in medicine, ecommerce and general-purpose visual search and maintains an index of images on the Internet. Visuvi has patent pending technology fort it's search engine that examines the content and patterns within an image, categorizes that information via mathematical indexing and delivers search results based on the image itself - no text or meta-tags required. Visuvi Inc. is a privately held company based in Redwood City, CA and is managed by a seasoned executive team consisting of Christopher Boone, President and CEO, Alexander Valenica, Chief Scientist and Co-Founder, Florian Brody, VP Marketing and Yuri Drozd, VP Product Management.

  Back
 
Keywords:
Emerging Companies Summit, GTC 2009 - ID S102052
Streaming:
Download:
Finance
Presentation
Media
Mathematica as a Practical Platform for GPU-Accelerated Finance
Dylan Roeh (Wolfram Research Inc), Abdul Dakkak (Wolfram Research Inc)
With the introduction of GPU support in version 8, Mathematica has become an excellent environment for integrating CUDA with high level code for interpretation or visualization. In this presentation, we will show the usefulness of Mathematica in ...Read More

With the introduction of GPU support in version 8, Mathematica has become an excellent environment for integrating CUDA with high level code for interpretation or visualization. In this presentation, we will show the usefulness of Mathematica in the venue of computational finance. In addition to demonstrating the GPU-accelerated financial computations which can be readily performed within Mathematica, we will show that these calculations can easily be integrated with third-party data sources including Microsoft Excel and databases. Furthermore, we will cover the UnRisk Mathematica package written by MathConsult, which seamlessly adds GPU-accelerated complex model calibration algorithms to Mathematica's repertoire.

  Back
 
Keywords:
Finance, GTC 2012 - ID S2100
Streaming:
Download:
 
GPU Computing in .NET for Financial Risk Analytics
Ryan Deering (Chatham Financial)
Learn how a rapidly growing mid-sized financial company incorporated GPU computing into its quantitative finance models. Our quantitative development team faced two major obstacles in adopting GPU computing. The first obstacle is the large cost of s ...Read More
Learn how a rapidly growing mid-sized financial company incorporated GPU computing into its quantitative finance models. Our quantitative development team faced two major obstacles in adopting GPU computing. The first obstacle is the large cost of switching away from our mature .NET development process. The other obstacle arises from the difficulty of synchronizing a slow hardware purchasing cycle with a fast software delivery cycle. We addressed these concerns by creating a hybrid linear algebra library in .NET that dynamically switches to GPU computing when CUDA hardware is available. This library allows our developers to code in .NET and focus on the mathematical and financial models without worrying about CUDA syntax. In this session we will describe how we built the library in .NET using CUBLAS, CURAND, and CUDA Runtime libraries. We will also show the performance gains from switching to GPU computing in pricing Bermudan swaptions using the Libor Market Model.  Back
 
Keywords:
Finance, GTC 2014 - ID S4451
Streaming:
Download:
General Interest
Presentation
Media
Introducing CUDA in KBE Applications for Digital Vehicle Development Programs
Avijit Santra (Tata Motors Limited)
Get the latest development in Next Generation Knowledge Based Engineering (KBE) software which provides real results over the traditional design approach. Today there exist numerous KBE applications in the field of vehicle ergonomics, suspension ...Read More

Get the latest development in Next Generation Knowledge Based Engineering (KBE) software which provides real results over the traditional design approach. Today there exist numerous KBE applications in the field of vehicle ergonomics, suspension, NVH, safety, regulations etc which deal with huge number of iterations and mathematical algorithm. With GPU computing and CUDA the KBE kernel is restructured to incorporate parallel programming model which helps the applications run faster and achieving time reduction from hours to seconds. KBE geometry kernel also gets benefited by enabling CUDA in topology based operations which take lot of time when performed on CPU.

  Back
 
Keywords:
General Interest, GTC 2012 - ID S2040
Download:
Graphics Virtualization
Presentation
Media
Training and Support System in the Cloud for Search and Rescue Missions
Pawel Musialik (Institute of Mathematical Machines)
This work concerns the development of training and support system for SAR missions based on NVIDIA GRID technology. The architecture of cloud system will be discussed. This system can be deployed in the disaster zone as Mobile Data Centre and in typi ...Read More
This work concerns the development of training and support system for SAR missions based on NVIDIA GRID technology. The architecture of cloud system will be discussed. This system can be deployed in the disaster zone as Mobile Data Centre and in typical Data Centre. We developed software tools for registration and gathering robotic data (3D cloud of points) into the common coordinate system. The rendering of 3D data is accessible via SaaS (Software as a Service) model. This software is dedicated for SAR teams working with modern UAV (Unmanned Aerial Vehicles) and UGV (Unmanned Ground Vehicles). GRID technology helps with integration of many data sources and visualisation over Ethernet. Training system is using these 3D maps as reference training area for rigid body simulation of robots.  Back
 
Keywords:
Graphics Virtualization, Data Center, Cloud Computing & HPC, Defense, Signal & Audio Processing, GTC 2015 - ID S5374
Streaming:
Download:
High Performance Computing
Presentation
Media
GPU Acceleration of Cube Calculus Operations
Vamsi Parasa
- Portland State University
In our current work, we present the first massively parallel, GPU accelerated implementation of the Cube Calculus operations for multivalued and binary logic, also called Cube Calculus Machine (CCM). ...Read More
In our current work, we present the first massively parallel, GPU accelerated implementation of the Cube Calculus operations for multivalued and binary logic, also called Cube Calculus Machine (CCM). Substantial speedups upto the order of 85x are achieved using the CUDA enabled nVIDIA Tesla GPU compared to the CPU implementation on a sequential processor.CC is a very efficient and convenient mathematical formalism for representation, processing and synthesis of binary and multivalued logic which has significant applications in logic synthesis, image processing and machine learning. Thus, massive speedups achieved using GPUs are very encouraging to build future parallel VLSI EDA systems   Back
 
Keywords:
High Performance Computing, GTC 2010 - ID P10I13
Download:
Machine Learning & Deep Learning
Presentation
Media
CUDA in Urban Search and Rescue: Mission Planing Module for Icarus Project
Pawel Musialik (Institute of Mathematical Machines)
This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Com ...Read More
This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Components for Assisted Rescue and Unmanned Search operations) concentrates on aiding these efforts by providing robotic components for rescue teams. Adding mobile robots into the tray rises the need for additional planning effort, which would consume a lot of time using classical approach. We will present how this can be prevented by using CUDA based mission planners for solving tasks like path planning, patrol communication relay location etc. A number of CUDA-implemented algorithms will be shown along with example results.  Back
 
Keywords:
Machine Learning & Deep Learning, Computer Vision & Machine Vision, GTC 2015 - ID S5319
Streaming:
Download:
Media & Entertainment
Presentation
Media
JPEG2000 on GPU: A Fast 4K Video Mastering, Archiving, and Contribution
Jiri Matela (Comprimato)
JPEG2000 is state-of-the-art video compression adopted by all digital cinemas. Besides that, it has become the format of choice for longterm archiving mainly because it significantly saves disk space, provides superior image quality, and it allows fo ...Read More
JPEG2000 is state-of-the-art video compression adopted by all digital cinemas. Besides that, it has become the format of choice for longterm archiving mainly because it significantly saves disk space, provides superior image quality, and it allows for mathematically lossless compression. The recent development in standardization of master video formats (IMF) makes JPEG2000 the emerging video compression for 4K delivery and because of the very high image quality it is being used for broadcast contribution as well. The talk will cover various applications of JPEG2000 in digital video production workflows and it will explain how NVIDIA GPUs enable such workflows with speed sufficient for 4K video processing.  Back
 
Keywords:
Media & Entertainment, Defense, Medical Imaging, Video & Image Processing, GTC 2015 - ID S5602
Streaming:
 
BLINK: A GPU-Enabled Image Processing Framework
Mark Davey (The Foundry)
We present BLINK, a language and framework for developing image processing algorithms across a range of computation devices. BLINK-based algorithms are automatically translated to optimised code for both GPUs and CPUs. This "write-once" app ...Read More
We present BLINK, a language and framework for developing image processing algorithms across a range of computation devices. BLINK-based algorithms are automatically translated to optimised code for both GPUs and CPUs. This "write-once" approach enables us to target both existing and new GPU hardware with minimal extra effort. Many algorithms produce visibly different results if mathematical operations are allowed to differ across platforms. Therefore BLINK has been designed to ensure numerically identical results between NVIDIA GPUs and CPUs. BLINK is at the heart of a number of key Foundry plug-ins and applications. An overview of this work and performance profiles will be presented, highlighting the speed gains achieved by using NVIDIA GPUs.  Back
 
Keywords:
Media & Entertainment, Video & Image Processing, GTC 2015 - ID S5619
Streaming:
Download:
Medical Imaging
Presentation
Media
Accelerating 3D CT Reconstruction Using GPUs
Saoni Mukherjee (Northeastern University)
We implement 3D conebeam computed tomography on a GPU. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is ...Read More
We implement 3D conebeam computed tomography on a GPU. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on a CPU using either Matlab or C and on a heterogeneous system combining CPU and GPU. The relative performance of backprojection is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over thirty times using the GPU are seen for phantom data and close to fifty times for the larger mouse datasets.   Back
 
Keywords:
Medical Imaging, GTC 2013 - ID P3121
Download:
Numerical Algorithms & Libraries
Presentation
Media
Fast N-body Methods as a Compute-Bound Preconditioner for Sparse Solvers on GPUs
Rio Yokota (KAUST)
Learn how to unleash the full power of GPUs on one of the more difficult problems -- preconditioning in sparse solvers -- by using fast N-body methods as a preconditioner. Fast N-body methods have been able to achieve high percentage of the peak perf ...Read More
Learn how to unleash the full power of GPUs on one of the more difficult problems -- preconditioning in sparse solvers -- by using fast N-body methods as a preconditioner. Fast N-body methods have been able to achieve high percentage of the peak performance since the early days of GPU computing. However, its successful applications have been limited to astrophysics and molecular dynamics, where the physics itself is naturally described by a collection of discrete points. Mathematically, there is nothing that prevents the use of fast N-body methods as a solver for a more general class of PDEs. This would not have been a good idea back when Flops were expensive, since it essentially turns the sparse matrix into a dense matrix of the same size, before hierarchically grouping the off-diagonal blocks. But now that Flops are becoming comparatively cheap, the notion of a "compute-bound preconditioner" sounds attractive more than ever. We will demonstrate how competitive such a preconditioner actually is on Kepler.  Back
 
Keywords:
Numerical Algorithms & Libraries, Computational Physics, GTC 2014 - ID S4228
Streaming:
Download:
Scientific Visualization
Presentation
Media
Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale
Kenneth Moreland (Sandia National Laboratories)
Visualization on today's GPU technology and at extreme scale requires massive concurrency. The Dax Toolkit is a development framework for designing and using such devices. Learn how to use Dax to execute classic visualization and analysis algorithms ...Read More
Visualization on today's GPU technology and at extreme scale requires massive concurrency. The Dax Toolkit is a development framework for designing and using such devices. Learn how to use Dax to execute classic visualization and analysis algorithms on a variety of mesh data structures and adapt the templated toolkit to your own data structures. Also design your own massively-threaded visualization algorithms in a simplified development environment that allows you to focus on the mathematical and algorithmic design. Dax's automatic concept and scheduling mechanisms automatically build parallel scheduling and communication code from signatures using C++.  Back
 
Keywords:
Scientific Visualization, Large Scale Data Visualization & In-Situ Graphics, GTC 2014 - ID S4620
Streaming:
Download:
Supercomputing
Presentation
Media
20 Petaflops Simulation of Protein Suspensions in Crowding Conditions
Simone Melchionna (National Research Council of Italy)
This talk describes the recent simulation of ~18,000 proteins in suspension, reproducing the crowding conditions of the cell interior. The simulations were obtained with MUPHY, a computational platform for multi-scale simulations of real-life bi ...Read More

This talk describes the recent simulation of ~18,000 proteins in suspension, reproducing the crowding conditions of the cell interior. The simulations were obtained with MUPHY, a computational platform for multi-scale simulations of real-life biofluidic problems. The same software has been used in the past to simulate blood flows through the human coronary arteries and DNA translocation across nanopores. The simulations were performed on the Titan system at the Oak Ridge National Laboratory, and exhibits excellent scalability up to 18, 000 K20X NVIDIA GPUs, reaching 20 Petaflops of aggregate sustained performance with a peak performance of 27.5 Petaflops for the most intensive computing component. In this talk I will describe how the combination of novel mathematical models, computational algorithms, hardware technology and parallelization techniques allowed reproducing for the first time such a massive amount of proteins.  

 

ACM Gordon Bell Finalist

  Back
 
Keywords:
Supercomputing, Supercomputing 2013 - ID SC3117
Streaming:
Visualization
Presentation
Media
Diderot: A Parallel DSL for Image Analysis and Visualization
Lamont Samuels (University of Chicago)
The analysis of structure in three-dimensional images is increasingly important for biomedical research and computational science. In this poster, we outline ongoing work developing Diderot, a parallel domain-specific language for three-dimensio ...Read More

The analysis of structure in three-dimensional images is increasingly important for biomedical research and computational science. In this poster, we outline ongoing work developing Diderot, a parallel domain-specific language for three-dimensional image visualization and analysis algorithms, such as volume rendering, fiber tractography, and particle systems. Diderot supports a high-level mathematical computation model coupled with a batch-synchronous parallelism model. The poster further describes Diderots GPU implementation and its high performance measurements on GPUs versus other sequential and parallel platforms.

  Back
 
Keywords:
Visualization, GTC 2012 - ID P2493
Download:
 
 
NVIDIA - World Leader in Visual Computing Technologies
Copyright © 2016 NVIDIA Corporation Legal Info | Privacy Policy