Abstract:
We'll introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We'll analyze its computational traits and concentrate on the critical aspects to leverage the GPU's computational power. We'll introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed-up compared to a CPU implementation and is publicly available to other researchers.