GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Big Data Analytics
Presentation
Media
Single CUDA Block Implementation of Time Synchronous Viterbi Search for Speech Recognition
Abstract:

Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utterances are decoded simultaneously using CUDA streams. The single CUDA block approach is shown to be substantially more efficient and enables overlapping of CPU and GPU computation by merging ten thousands of separate CUDA kernel calls for each utterance. The proposed approach has the disadvantage of large GPU global memory requirement because of the simultaneous decoding feature. However, the latest GPU cards with up to 12GB of global memory fulfill this requirement and the full utilization of the GPU card is possible using all available SMs.

 
Topics:
Big Data Analytics, Artificial Intelligence and Deep Learning, Signal and Audio Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5658
Streaming:
Share: