SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

AI and DL Research
Presentation
Media
GUNREAL: GPU-Accelerated Unsupervised Reinforcement and Auxiliary Learning
We'll introduce GPU-accelerated unsupervised reinforcement and auxiliary learning (UNREAL) algorithm. Recent state-of-the-art deep reinforcement learning algorithms, such as A3C and UNREAL, are designed to train on a single device with only CPUs. Using GPU acceleration for these algorithms results in low GPU utilization, which means the full performance of the GPU is not reached. Motivated by the architecture changes made by the GA3C algorithm, which gave A3C better GPU acceleration, together with the high learning efficiency of the UNREAL algorithm, we extend GA3C with the auxiliary tasks from UNREAL to create GUNREAL. We show that our GUNREAL system finished training faster than UNREAL and reached higher scores than GA3C.
We'll introduce GPU-accelerated unsupervised reinforcement and auxiliary learning (UNREAL) algorithm. Recent state-of-the-art deep reinforcement learning algorithms, such as A3C and UNREAL, are designed to train on a single device with only CPUs. Using GPU acceleration for these algorithms results in low GPU utilization, which means the full performance of the GPU is not reached. Motivated by the architecture changes made by the GA3C algorithm, which gave A3C better GPU acceleration, together with the high learning efficiency of the UNREAL algorithm, we extend GA3C with the auxiliary tasks from UNREAL to create GUNREAL. We show that our GUNREAL system finished training faster than UNREAL and reached higher scores than GA3C.  Back
 
Keywords:
AI and DL Research, Performance Optimization, GTC Silicon Valley 2018 - ID S8219
Streaming:
Download:
Share:
Big Data Analytics
Presentation
Media
Preliminary I/O Performance Evaluation on GPU Accelerator and External Memory
Recent supercomputers deploy not only many-core accelerators such as GPU but also Non-Volatile Memory (NVM) such as flash memory as an external memory, in order to handle large-scale data processing for a wide range of applications. However, how to construct NVM as local disks at a low cost with large volume for heterogeneous supercomputers is not clear. In order to clarify I/O characteristics between GPU and NVM, we comparatively investigate I/O strategies on GPU and multiple mini SATA SSDs. Our preliminary results exhibit 3.06GB/s of throughput from 8 mini SATA SSDs to GPU by using RAID0 with appropriate stripe size.
Recent supercomputers deploy not only many-core accelerators such as GPU but also Non-Volatile Memory (NVM) such as flash memory as an external memory, in order to handle large-scale data processing for a wide range of applications. However, how to construct NVM as local disks at a low cost with large volume for heterogeneous supercomputers is not clear. In order to clarify I/O characteristics between GPU and NVM, we comparatively investigate I/O strategies on GPU and multiple mini SATA SSDs. Our preliminary results exhibit 3.06GB/s of throughput from 8 mini SATA SSDs to GPU by using RAID0 with appropriate stripe size.  Back
 
Keywords:
Big Data Analytics, GTC Silicon Valley 2014 - ID P4251
Download:
Share:
HPC and Supercomputing
Presentation
Media
A Scalable Implementation of a MapReduce-based Graph Algorithm for Large-scale Heterogeneous Supercomputers
Fast processing for extremely large-scale graph is becoming increasingly important. The applicability of acceleration using GPUs including optimization techniques, of the GIM-V graph processing algorithm, which is based on MapReduce, is an open problem. We implemented a multi-GPU-based GIM-V application with load balance optimization between GPU devices. Our experiments on the TSUBAME2.0 supercomputer using 256 nodes (6144 hyper-threaded CPU cores, 768 GPUs) showed that our GPU-based implementation performed 87.04 ME/s on 2^30 (1.07 billion) vertices and 2^34 (17.2 billion) edges, and 1.52 times faster than the CPU-based naive implementation with 2^29 vertices and 2^33 edges.
Fast processing for extremely large-scale graph is becoming increasingly important. The applicability of acceleration using GPUs including optimization techniques, of the GIM-V graph processing algorithm, which is based on MapReduce, is an open problem. We implemented a multi-GPU-based GIM-V application with load balance optimization between GPU devices. Our experiments on the TSUBAME2.0 supercomputer using 256 nodes (6144 hyper-threaded CPU cores, 768 GPUs) showed that our GPU-based implementation performed 87.04 ME/s on 2^30 (1.07 billion) vertices and 2^34 (17.2 billion) edges, and 1.52 times faster than the CPU-based naive implementation with 2^29 vertices and 2^33 edges.  Back
 
Keywords:
HPC and Supercomputing, Clusters & GPU Management, GTC Silicon Valley 2013 - ID P3133
Download:
Share:
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
APGAS programming model abstracts deep memory hierarchy such as distributed memory and GPU device memory by a global view of data and asynchronous operations on massively parallel computing environments. However, how much GPUs accelerate applications using APGAS model remains unclear. We give a comparative performance analysis of APGAS model in X10 on GPUs with MPI on multi-core CPUs using lattice QCD. Our experimental results on TSUBAME2.5 show that our X10 implementation on 32 GPUs exhibits 11.0x speedup over MPI on multi-core CPUs.
APGAS programming model abstracts deep memory hierarchy such as distributed memory and GPU device memory by a global view of data and asynchronous operations on massively parallel computing environments. However, how much GPUs accelerate applications using APGAS model remains unclear. We give a comparative performance analysis of APGAS model in X10 on GPUs with MPI on multi-core CPUs using lattice QCD. Our experimental results on TSUBAME2.5 show that our X10 implementation on 32 GPUs exhibits 11.0x speedup over MPI on multi-core CPUs.  Back
 
Keywords:
HPC and Supercomputing, Computational Physics, GTC Silicon Valley 2015 - ID P5237
Download:
Share: