GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Speakers:
Abstract:
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We'll describe quantizing neural networks models for various image (classification, detection, segmentation) and natural language processing tasks. In addition to convolutional feed forward networks, we will cover quantization of recurrent models. The discussion will examine both floating point and integer quantizations, targeting features in Volta and Turing GPUs.
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We'll describe quantizing neural networks models for various image (classification, detection, segmentation) and natural language processing tasks. In addition to convolutional feed forward networks, we will cover quantization of recurrent models. The discussion will examine both floating point and integer quantizations, targeting features in Volta and Turing GPUs.  Back
 
Topics:
AI Application, Deployment & Inference, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9659
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next