GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.

Modern-day enablement of AI has been achieved by acceleration of deep learning by GPUs; now, we are entering the realm of ever-more complex deep learning tasks involving complicated algorithms, deeper and sophisticated network layers, as well as rapidly increasing data sets, whereby a handful of GPUs are proving to be insufficient for such tasks. By designing and building large-scale HPC machines with extensive vector/tensor processing capabilities based on GPUs, such as Tsubame3, ABCI ,and Post-K, as well as designing new scalable learning algorithms, we are overcoming such challenges. In particular, the ABCI grand challenge had yielded 3 research groups, including ours at Tokyo Tech., to scale ImageNet trainings to over 4000 GPUs and training times in minutes. This paves a way for the new era of "scalable AI" as much as traditional HPC has been.

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1807
Download:
Share:
 
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features to allow for HPC to BD/AI convergence at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning. 
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features to allow for HPC to BD/AI convergence at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning.   Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1720
Download:
Share:
 
Abstract:

As Machine Learning advances for industrial applications, there will be an increasing need for it to scale productively. Similar to High Performance Computing, methods will be applied or eventually invented to allow for productive weak scaling, capacity and throughput. This will be the goal of our industry; although some will see faster implementation, others may take years. Therefore, in the meantime, a strong scaling and capability platform with tiers of ultra-high bandwidth, low latency interconnects and memory/storage classes will be needed. We will discuss such a system, such as Tokyo Tech's latest TSUBAME3.0 supercomputer, which is designed to achieve such strong scaling to over 2000 Pascal P100 processors, and together with its predecessor TSUBAME2.5 will provide the largest Machine Learning / AI capabilities in Japan.

As Machine Learning advances for industrial applications, there will be an increasing need for it to scale productively. Similar to High Performance Computing, methods will be applied or eventually invented to allow for productive weak scaling, capacity and throughput. This will be the goal of our industry; although some will see faster implementation, others may take years. Therefore, in the meantime, a strong scaling and capability platform with tiers of ultra-high bandwidth, low latency interconnects and memory/storage classes will be needed. We will discuss such a system, such as Tokyo Tech's latest TSUBAME3.0 supercomputer, which is designed to achieve such strong scaling to over 2000 Pascal P100 processors, and together with its predecessor TSUBAME2.5 will provide the largest Machine Learning / AI capabilities in Japan.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7813
Download:
Share:
 
Abstract:

TSUBAME 2.5 succeeded TSUBAME 2.0 by upgrading all 4224 Tesla M2050 GPUs to Kepler K20x GPUs, achieving 5.76 / 17.1 Petaflops peak in double / single point precision respectively, latter the fastest in Japan. By overcoming several technical challenges, TSUBAME 2.5 exhibits x2-3 speedup and multi-petaflops performance for many applications, leading to TSUBAME 3.0 in 2015-16.

TSUBAME 2.5 succeeded TSUBAME 2.0 by upgrading all 4224 Tesla M2050 GPUs to Kepler K20x GPUs, achieving 5.76 / 17.1 Petaflops peak in double / single point precision respectively, latter the fastest in Japan. By overcoming several technical challenges, TSUBAME 2.5 exhibits x2-3 speedup and multi-petaflops performance for many applications, leading to TSUBAME 3.0 in 2015-16.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2013
Session ID:
SC3105
Streaming:
Download:
Share:
 
Abstract:

Tsubame2.0 has been in successful production for the last 2 years, producing numerous research results and accolades. With possible upgrade of the GPUs to Kepler 2s, it will have the capability to surpass the 10 petaflops-class supercomputers in single-precision applications, without any increase in the power consumption of 1MW average.

Tsubame2.0 has been in successful production for the last 2 years, producing numerous research results and accolades. With possible upgrade of the GPUs to Kepler 2s, it will have the capability to surpass the 10 petaflops-class supercomputers in single-precision applications, without any increase in the power consumption of 1MW average.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2012
Session ID:
SC2031
Download:
Share:
 
Abstract:

In the global exascale race, hardware often takes center stage. But the race might ultimately be won or lost based on how well the industry optimizes new and existing applications for extreme parallelism. Today's apps will not just run on tomorrow's systems, so we must think strategically and creatively about how to design applications that take maximum advantage of the first power-efficient, accelerator-driven exascale systems. This panel of HPC, software and computer science experts will discuss what we can, and should be doing, including a review of new scientific and commercial HPC requirements, programming model options and how to best align architecture and software design processes.

In the global exascale race, hardware often takes center stage. But the race might ultimately be won or lost based on how well the industry optimizes new and existing applications for extreme parallelism. Today's apps will not just run on tomorrow's systems, so we must think strategically and creatively about how to design applications that take maximum advantage of the first power-efficient, accelerator-driven exascale systems. This panel of HPC, software and computer science experts will discuss what we can, and should be doing, including a review of new scientific and commercial HPC requirements, programming model options and how to best align architecture and software design processes.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2531
Streaming:
Download:
Share:
 
Abstract:

To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements.  Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months.  An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research.  Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:

 

  • Barcelona Supercomputing Center, OmpSs: Leveraging CUDA for Productive Programming in Clusters of Multi-GPU Systems
  • Harvard University, Massive Cross-correlation in radio Astronomy with Graphics Processing Units
  • Tokyo Tech, TSUBAME 2.0
  • University of Tennessee, MAGMA: A breakthrough in Solvers for Eigenvalue Problems

 

Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment.  After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.

To highlight and reward the excellent research taking place at our CCOEs, we hosted an event during GTC 2012 to showcase four of their top achievements.  Each of our 18 CCOEs was asked to submit an abstract describing what they considered to be their top achievement in GPU Computing over the past 18 months.  An NVIDIA panel selected four exemplars from these submissions to represent their work on GPU Computing research.  Each of our CCOEs has made amazing contributions, but the four CCOEs selected to showcase their work were:

 

  • Barcelona Supercomputing Center, OmpSs: Leveraging CUDA for Productive Programming in Clusters of Multi-GPU Systems
  • Harvard University, Massive Cross-correlation in radio Astronomy with Graphics Processing Units
  • Tokyo Tech, TSUBAME 2.0
  • University of Tennessee, MAGMA: A breakthrough in Solvers for Eigenvalue Problems

 

Each of the four CCOE finalists will be awarded an HP ProLiant SL250 Gen8 GPU system configured with dual NVIDIA Tesla K10 GPU accelerators in recognition of this accomplishment.  After the four presentations, the CCOE representatives were asked to vote for their favorite presentation and achievement. Tokyo Tech was voted as the audience favorite, and thus wins the extra bragging rights of being honored by their peers as the inaugural recipient of the CUDA Achievement Award 2012.

  Back
 
Topics:
General Interest
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S4000
Download:
Share:
 
Speakers:
Satoshi Matsuoka
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2011
Session ID:
SC137
Streaming:
Share:
 
Speakers:
Satoshi Matsuoka
- Global Scientific Information and Computing Center (GSIC) of Tokyo Institute of Technology (Tokyo Tech)
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2010
Session ID:
SC1023
Download:
Share:
 
Speakers:
Satoshi Matsuoka
- Tokyo Institute of Technology
Abstract:
Tsubame2.0 is the next-generation multi-petaflops supercomputer that been designed and built at Tokyo Tech, with more than 4000 NVIDIA Fermi GPUs. as a successor to the highly successful Tsubame1. Deep design considerations were made based on experiences on Tsubame1 retrofitted with the previous generation Tesla to maximize the versatility and the competitiveness of the system across considerable number of application domains, as well as accommodating as much strong scaling as possible. This resulted in a totally new custom system design in collaboration with HP and NEC, rather than a machine with a retrofitted GPUs. The resulting supercomputer hopefully will become a design template of future large-scale GPU systems to come.
Tsubame2.0 is the next-generation multi-petaflops supercomputer that been designed and built at Tokyo Tech, with more than 4000 NVIDIA Fermi GPUs. as a successor to the highly successful Tsubame1. Deep design considerations were made based on experiences on Tsubame1 retrofitted with the previous generation Tesla to maximize the versatility and the competitiveness of the system across considerable number of application domains, as well as accommodating as much strong scaling as possible. This resulted in a totally new custom system design in collaboration with HP and NEC, rather than a machine with a retrofitted GPUs. The resulting supercomputer hopefully will become a design template of future large-scale GPU systems to come.  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102280
Streaming:
Download:
Share:
 
Speakers:
Paul Calleja, Satoshi Matsuoka, Ting-Wai Chiu
- University of Cambridge, National Taiwan University, Tokyo Institute of Technology
Abstract:
Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Professor Ting-wai Chiu at National Taiwan University, Dr. Satoshi Matsuoka at Tokyo Tech and Dr. Paul Calleja at the University of Cambridge.
Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Professor Ting-wai Chiu at National Taiwan University, Dr. Satoshi Matsuoka at Tokyo Tech and Dr. Paul Calleja at the University of Cambridge.   Back
 
Topics:
General Interest
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
S102265
Streaming:
Download:
Share:
 
Speakers:
Professor Satoshi Matsuoka
- Global Scientific Information and Computing Center (GSIC) of Tokyo Institute of Technology (Tokyo Tech)
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2009
Session ID:
SC0908
Streaming:
Download:
Share:
 
Abstract:
GPU computing is transforming the extreme high-end realms of supercomputing. NVIDIA Tesla GPUs already power several of the world's sixty fastest supercomputers, and this trend is accelerating. This three-hour "super session" will feature some of the world's premiere supercomputing experts, who will discuss their experience building and deploying GPU-based supercomputing clusters, and present case studies of designing and porting codes for "big iron" GPU supercomputers.
GPU computing is transforming the extreme high-end realms of supercomputing. NVIDIA Tesla GPUs already power several of the world's sixty fastest supercomputers, and this trend is accelerating. This three-hour "super session" will feature some of the world's premiere supercomputing experts, who will discuss their experience building and deploying GPU-based supercomputing clusters, and present case studies of designing and porting codes for "big iron" GPU supercomputers.   Back
 
Topics:
HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2009
Session ID:
S09049
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next