SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Speakers:
Abstract:
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We'll describe quantizing neural networks models for various image (classification, detection, segmentation) and natural language processing tasks. In addition to convolutional feed forward networks, we will cover quantization of recurrent models. The discussion will examine both floating point and integer quantizations, targeting features in Volta and Turing GPUs.
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We'll describe quantizing neural networks models for various image (classification, detection, segmentation) and natural language processing tasks. In addition to convolutional feed forward networks, we will cover quantization of recurrent models. The discussion will examine both floating point and integer quantizations, targeting features in Volta and Turing GPUs.  Back
 
Topics:
AI Application Deployment and Inference, AI and DL Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9659
Streaming:
Download:
Share: