GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

AI Application, Deployment & Inference
Presentation
Media
TensorCore Optimized DNN for Efficient Low Latency Inference for 5G Networks
Abstract:

We'll examine the challenges for telecommunications companies of harvesting the considerable computational capacity of modern GPU architectures. One issue is that low latency inference requires small batch sizes, which are inherently detrimental to Tensor Core performance. Another involves efficient coefficient reuse, which demands very large matrix-matrix multiplications, while feedforward DNNs typically used for telecommunications ML have relatively small vector-matrix multiplications. We'll discuss our approach, which aims to provide low latency with significantly higher performance by improving use of computation capacity available in Tensor Cores.

 
Topics:
AI Application, Deployment & Inference, 5G & Edge, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9769
Streaming:
Download:
Share: