GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Data Center & Cloud Infrastructure
Presentation
Media
GPU Resource Pooling and the benefit of deploying CUPTI
Abstract:
We'll describe a lightweight GPU counter monitoring tool called GPUPerf that our Alibaba team developed with NVIDIA. It monitors GPU context create and destroy, and records GPU internal counter values, such as active/elapsed cycles, IPC, and memory access bandwidth with little overhead. We'll discuss how we deployed this tool in one of our lab clusters to do real-time monitoring. Combined with information collected from NVIDIA-smi, we now understand our GPU server workload much better. We'll also explain how GPUPerf helps improve GPU cluster orchestra and scheduling algorithms.
 
Topics:
Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9285
Streaming:
Download:
Share: