We'll discuss approaches for accelerating out-of-core nearest neighbor computation on multi-GPU systems using various system features such as NVLink. Nearest neighbor calculations operate over a set of high-dimensional vectors and compute pair-wise distances using certain similarity metrics such as cosine or maxNorm distances. In practice, the number of vectors can be very large and can have very high dimension (for example, 5 million 1,000 vectors for the Wikipedia corpus). In such cases, the data cannot fit the GPU device memory, and needs to be fetched from the host memory. We'll present GPU implementations of key nearest neighbor algorithms (for example, locality sensitive hashing) for these scenarios and demonstrate how one can use NVLink for optimizing these algorithms.