Radio astronomy is a real-time signal processing application that requires extreme supercomputing. While today''s radio telescopes require 10-100 Tflops of computational power, by the end of the decade this will increase into the Exaflops regime, driven by the Hydrogen Epoch of Reionization Array (HERA) and the Square Kilometer Array (SKA). The most compute intensive part of this problem is the so-called cross-correlation algorithm, which can be recast as a linear-algebra problem similar in spirit to DGEMM. In this session we describe the cross-correlation engine that powers the pathfinder LEDA radio telescope and has been (re)optimized for the Kepler GK110 architecture to achieve over 2.5 Tflops in sustained performance. This level of efficiency is critical to meeting strict power and space constraints imposed by the instrument''s remote location.