Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.
As Machine Learning advances for industrial applications, there will be an increasing need for it to scale productively. Similar to High Performance Computing, methods will be applied or eventually invented to allow for productive weak scaling, capacity and throughput. This will be the goal of our industry; although some will see faster implementation, others may take years. Therefore, in the meantime, a strong scaling and capability platform with tiers of ultra-high bandwidth, low latency interconnects and memory/storage classes will be needed. We will discuss such a system, such as Tokyo Tech's latest TSUBAME3.0 supercomputer, which is designed to achieve such strong scaling to over 2000 Pascal P100 processors, and together with its predecessor TSUBAME2.5 will provide the largest Machine Learning / AI capabilities in Japan.
TSUBAME 2.5 succeeded TSUBAME 2.0 by upgrading all 4224 Tesla M2050 GPUs to Kepler K20x GPUs, achieving 5.76 / 17.1 Petaflops peak in double / single point precision respectively, latter the fastest in Japan. By overcoming several technical challenges, TSUBAME 2.5 exhibits x2-3 speedup and multi-petaflops performance for many applications, leading to TSUBAME 3.0 in 2015-16.
Tsubame2.0 has been in successful production for the last 2 years, producing numerous research results and accolades. With possible upgrade of the GPUs to Kepler 2s, it will have the capability to surpass the 10 petaflops-class supercomputers in single-precision applications, without any increase in the power consumption of 1MW average.
In the global exascale race, hardware often takes center stage. But the race might ultimately be won or lost based on how well the industry optimizes new and existing applications for extreme parallelism. Today's apps will not just run on tomorrow's systems, so we must think strategically and creatively about how to design applications that take maximum advantage of the first power-efficient, accelerator-driven exascale systems. This panel of HPC, software and computer science experts will discuss what we can, and should be doing, including a review of new scientific and commercial HPC requirements, programming model options and how to best align architecture and software design processes.
To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements. Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months. An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research. Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:
Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment. After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.