Attend this session to learn new techniques to build a scalable and numerically stable tridiagonal solver for GPUs. It appears the numerical stability was missing in all existing GPU-based tridiagonal solvers. In this work, presented is a scalable, numerically stable, high-performance tridiagonal solver. Solver provides comparable quality of stable solutions to Intel MKL and Matlab, at speed comparable to the GPU tridiagonal solvers in existing packages like CUSPARSE. Presented and analyzed are two key optimization strategies for our solver: a high throughput data layout transformation for memory efficiency, and a dynamic tiling approach for reducing the memory access footprint caused by branch divergence. Several applications are shown to get large benefits from this solver. In this case study, Empirical Mode Decomposition, which is a critical method in time-frequency analyses, is used to demonstrate usability of our solver.