We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,65504). This narrow numerical range can result both in overflow ("inf/nan" problem) or underflow ("vanishing gradient") during training of deep networks. We'll describe the new scaling algorithm, implemented in nvcaffe, which prevents these negative effects. With this algorithm, we successfully trained such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy. Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.