GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Deep Learning & AI Frameworks
Presentation
Media
Training ImageNet in Four Minutes
Abstract:
We'll discuss how we build a highly scalable deep learning training system and training ImageNet in four minutes. For dense GPU clusters we optimize the training system by proposing a mixed-precision training method that significantly improves training throughput of a single GPU without losing accuracy. We also propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on ImageNet dataset without losing accuracy. And we propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. Our training system can achieve 75.8% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7% top-1 test accuracy within 4 minutes using 1024 Tesla P40 GPUs,which also outperforms all other existing systems.
 
Topics:
Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9146
Streaming:
Download:
Share: