GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:
On the road to Exa-scale supercomputers, there is a need to solve new challenges. This requires new system architectures moving from homogenous CPU centered systems to heterogeneous systems. New processing engines such as GPUs enables greater processing power, and at the same time the network becomes a more important part of the system. With a better co-design, the network is required to perform smarter and more complex operations beyond the traditional data movement. In this talk we will present how BlueField smart networking devices can change the boundaries between CPU and network and between software and hardware to enable greater scalability and performance for your supercomputer.
On the road to Exa-scale supercomputers, there is a need to solve new challenges. This requires new system architectures moving from homogenous CPU centered systems to heterogeneous systems. New processing engines such as GPUs enables greater processing power, and at the same time the network becomes a more important part of the system. With a better co-design, the network is required to perform smarter and more complex operations beyond the traditional data movement. In this talk we will present how BlueField smart networking devices can change the boundaries between CPU and network and between software and hardware to enable greater scalability and performance for your supercomputer.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1904
Streaming:
Share:
 
Abstract:
AI techniques have been around for more than 5 decades, but only in the 2000s, have we seem neural networks have commercial use and machine learning techniques to start surpassing traditional methods in picture recognition, natural language processing and other tasks. Probably the most important piece enabling AI was the use of GPUs to train modela, enabling great speedup compared to CPUs. Running distributed machine learning on a large number of GPUs requires movement of a large amount of data between the GPUs or between GPUs and the Parameter-Server imposing heavy load on the interconnect, which now becomes the new bottleneck. Creating an efficient system for distributed machine learning requires not only the best processing engines and latest GPU model but also requires an efficient high performance interconnect technology to enable efficient utilization of the GPUs and near-linear scaling. Mellanox focuses on CPU offload technologies designed to process data as it moves through the network, either by the Host Channel Adapter of the switch. This frees up CPU and GPU cycles for computation, reduces the amount of data transferred over the network, allows for efficient pipelining of network and computation, and provides for very low communication latencies and overheads. We will present the special requirements imposed on the interconnect by the distributed machine learning applications, and describe the latest interconnect technologies allowing efficient data transfer and processing.
AI techniques have been around for more than 5 decades, but only in the 2000s, have we seem neural networks have commercial use and machine learning techniques to start surpassing traditional methods in picture recognition, natural language processing and other tasks. Probably the most important piece enabling AI was the use of GPUs to train modela, enabling great speedup compared to CPUs. Running distributed machine learning on a large number of GPUs requires movement of a large amount of data between the GPUs or between GPUs and the Parameter-Server imposing heavy load on the interconnect, which now becomes the new bottleneck. Creating an efficient system for distributed machine learning requires not only the best processing engines and latest GPU model but also requires an efficient high performance interconnect technology to enable efficient utilization of the GPUs and near-linear scaling. Mellanox focuses on CPU offload technologies designed to process data as it moves through the network, either by the Host Channel Adapter of the switch. This frees up CPU and GPU cycles for computation, reduces the amount of data transferred over the network, allows for efficient pipelining of network and computation, and provides for very low communication latencies and overheads. We will present the special requirements imposed on the interconnect by the distributed machine learning applications, and describe the latest interconnect technologies allowing efficient data transfer and processing.   Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1929
Streaming:
Download:
Share:
 
Abstract:
Well demonstrate how to build a scalable, high-performance, data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, low-latency network interconnects for both InfiniBand and Ethernet. Well present state-of-the-art techniques for distributed machine learning, and explain what special requirements they impose on the system. There will be an overview of interconnect technologies used to scale and accelerate distributed machine learning. This will include RDMA, NVIDIAS GPUDIRECT technology and in-network computing platform, which is used to accelerate large-scale deployments in HPC and artificial intelligence.
Well demonstrate how to build a scalable, high-performance, data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, low-latency network interconnects for both InfiniBand and Ethernet. Well present state-of-the-art techniques for distributed machine learning, and explain what special requirements they impose on the system. There will be an overview of interconnect technologies used to scale and accelerate distributed machine learning. This will include RDMA, NVIDIAS GPUDIRECT technology and in-network computing platform, which is used to accelerate large-scale deployments in HPC and artificial intelligence.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI
Type:
Talk
Event:
GTC Washington D.C.
Year:
2019
Session ID:
DC91167
Download:
Share:
 
Abstract:
Learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present state-of-the-art techniques for distributed machine learning and examine what special requirements these techniques impose on the system. We'll also give an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing that accelerates large-scale deployments in HPC and artificial intelligence.
Learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present state-of-the-art techniques for distributed machine learning and examine what special requirements these techniques impose on the system. We'll also give an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing that accelerates large-scale deployments in HPC and artificial intelligence.  Back
 
Topics:
Data Center & Cloud Infrastructure, Deep Learning & AI Frameworks, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9268
Streaming:
Share:
 
Abstract:
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing used to accelerate large scale deployments in HPC and artificial intelligence.
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and in-network computing used to accelerate large scale deployments in HPC and artificial intelligence.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Special Event
Event:
GTC Israel
Year:
2018
Session ID:
SIL8145
Streaming:
Share:
 
Abstract:
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We'll present the state of the art techniques for distributed machine learning, and discuss what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and a special focus on the in-network computing SHARP technology used to accelerate large scale deployments in artificial intelligence and high performance computing.
Come join us and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We'll present the state of the art techniques for distributed machine learning, and discuss what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology, and a special focus on the in-network computing SHARP technology used to accelerate large scale deployments in artificial intelligence and high performance computing.  Back
 
Topics:
AI Application, Deployment & Inference, Advanced AI Learning Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8635
Streaming:
Share:
 
Abstract:

Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning, and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology and in-network computing use to accelerates large scale deployments in HPC and artificial intelligence.

Come join us, and learn how to build a data-centric GPU cluster for artificial intelligence. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. We will briefly present the state of the art techniques for distributed machine learning, and what special requirements they impose on the system, followed by an overview of interconnect technologies used to scale and accelerate distributed machine learning including RDMA, NVIDIA's GPUDirect technology and in-network computing use to accelerates large scale deployments in HPC and artificial intelligence.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2017
Session ID:
SIL7120
Download:
Share:
 
Abstract:

Come join us, and learn how to build a data-centric GPU clusters for artificial intelligence.  We will briefly present the state-of-the-art techniques for distributed Machine Learning, and the special requirements they impose on the GPU cluster. Additionally, we will present an overview of interconnect technologies used to scale and accelerate distributed Machine Learning.   During the session we will cover RDMA, NVIDIA's GPUDirect RDMA and GPUDirect Asynch as well as in-network-computing and how the use of those technologies enables new level of scalability and performance in large scale deployments in artificial intelligence and high performance computing.    

Come join us, and learn how to build a data-centric GPU clusters for artificial intelligence.  We will briefly present the state-of-the-art techniques for distributed Machine Learning, and the special requirements they impose on the GPU cluster. Additionally, we will present an overview of interconnect technologies used to scale and accelerate distributed Machine Learning.   During the session we will cover RDMA, NVIDIA's GPUDirect RDMA and GPUDirect Asynch as well as in-network-computing and how the use of those technologies enables new level of scalability and performance in large scale deployments in artificial intelligence and high performance computing.    

  Back
 
Topics:
Data Center & Cloud Infrastructure, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23200
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next