In this talk we will introduce the basic concepts behind The Simulation Program with Integrated Circuit Emphasis (SPICE) and discuss in detail the two most time consuming parts of the circuit simulation: the device model evaluation and the solution of large sparse linear systems. In particular, we focus on the evaluation of the basic models, such as resistor, capacitor and inductor as well as more complex transistor (BSIM4v7) model on the GPU. Also, we discuss the solution of sets of linear systems that are performed throughout the simulation. We take advantage of the fact that the coefficient matrices in these linear systems have the same sparsity pattern (and often end up with the same pivoting strategy) and show how to obtain their solution using a direct method on the GPU. Finally, we present numerical experiments and discuss future work. Co-authors Francesco Lannutti, Sharanyan Chetlur, Lung Sheng Chien, Philippe Vandermersch.
The libraries distributed in the CUDA SDK and offered by third parties provide a wealth for functions commonly encountered in a GPU acceleration project. Using these libraries can often significantly shorten the development time of a GPU project while leading to high-performance, high-quality software. In the CUDA 5.0 release, NVIDIA introduced enhancements across many libraries to improve performance and take advantage of new features available in the Kepler-series GPUs. In this tutorial, we will provide an overview of the libraries in the CUDA SDK, including cuBLAS, cuRAND, cuSPARSE, cuFFT, NPP and Thrust, as well as libraries provided by 3rd parties. The audience will not only learn about the strengths of the individual libraries, but also learn about the decision making process to select the best suited library for their project.
A parallel algorithm for solving a sparse triangular linear system on the GPU is proposed. It implements the solution of the triangular system in two phases. The analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. The solve phase obtains the full solution by iterating sequentially across the constructed levels. The solution elements corresponding to each level are obtained in parallel. The numerical experiments are presented and it is shown that the incomplete-LU and Cholesky preconditioned iterative methods can achieve a 2x speedup on the GPU over their CPU implementation.