SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

AI Application Deployment and Inference
Presentation
Media
Optimizing Runtime Performance of Neural Net Architectures for High Scalability
Abstract:
Learn about the advantages and pitfalls of venturing away from off-the-shelf libraries to implement neural network inference algorithms from the ground up. We'll discuss the challenges of building large-vocabulary speech recognition engines able to support decoding more than 1,000 simultaneous conversations per NVIDIA V100 card, while still able to down-port onto low-memory embedded configurations such as the Tegra TK1. We'll cover what characteristics of the many types of popular neural networks used in speech recognition scale almost perfectly, as well as those that resist scaling and even scale negatively. Learn what profiling reveals about the silent, looming cost of kernel synchronization and what to do about it.
 
Topics:
AI Application Deployment and Inference, Speech and Language Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9535
Streaming:
Download:
Share: