GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

We present the Alexa shock hydrodynamics code, built using Kokkos, and its performance on hardware including Intel KNL and NVIDIA P100 (which is twice as fast). Alexa performs 3D simulations of multiple materials undergoing large deformation at large energies. Part of the goal of Alexa is to bring complex simulations onto laptops for users.

We present the Alexa shock hydrodynamics code, built using Kokkos, and its performance on hardware including Intel KNL and NVIDIA P100 (which is twice as fast). Alexa performs 3D simulations of multiple materials undergoing large deformation at large energies. Part of the goal of Alexa is to bring complex simulations onto laptops for users.

  Back
 
Topics:
Performance Optimization
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1727
Download:
Share:
 
Abstract:
The Kokkos library enables development of HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for best performance. In contrast Kokkos integrates compile-time polymorphic data layout with parallel execution policies to portably manage memory access patterns. This year we present recently added Kokkos concepts & capabilities with an emphasis on improvements to usability, such as C++11 lambda support now available in CUDA 7.0 for portable nested parallelism. We will also present a new application of Kokkos to Sandia's multithreaded graph library (MTGL).
The Kokkos library enables development of HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for best performance. In contrast Kokkos integrates compile-time polymorphic data layout with parallel execution policies to portably manage memory access patterns. This year we present recently added Kokkos concepts & capabilities with an emphasis on improvements to usability, such as C++11 lambda support now available in CUDA 7.0 for portable nested parallelism. We will also present a new application of Kokkos to Sandia's multithreaded graph library (MTGL).  Back
 
Topics:
Tools & Libraries, Developer - Algorithms, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5166
Streaming:
Download:
Share:
 
Abstract:
Discover how the Kokkos library enables you to develop HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for achieving best performance. Thus codes must be extensively re-written to meet device specific memory access pattern requirements; e.g., data structures and loops transformed from array-of-structures patterns to structure-of-arrays patterns. We address this issue by integrating compile-time polymorphic data layout with parallel execution. We will present manycore performance portability of the LAMMPS molecular dynamics code and Trilinos/Tpetra linear solvers implemented with MPI+Kokkos, and run on a clusters with Intel Xeon Phi and NVIDIA Kepler devices.
Discover how the Kokkos library enables you to develop HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for achieving best performance. Thus codes must be extensively re-written to meet device specific memory access pattern requirements; e.g., data structures and loops transformed from array-of-structures patterns to structure-of-arrays patterns. We address this issue by integrating compile-time polymorphic data layout with parallel execution. We will present manycore performance portability of the LAMMPS molecular dynamics code and Trilinos/Tpetra linear solvers implemented with MPI+Kokkos, and run on a clusters with Intel Xeon Phi and NVIDIA Kepler devices.   Back
 
Topics:
HPC and Supercomputing, Numerical Algorithms & Libraries, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4213
Streaming:
Download:
Share:
 
Abstract:

Performance on manycore devices is dependent data access patterns where different devices (NVIDIA, Intel-Phi, NUMA) require different data access patterns. A performance-portable programming model does not force a false-choice between arrays-of-structures or structures-of-arrays, instead it defines abstractions to transparently adapt data structures to meet device requirements. The KokkosArray library implements this strategy through simple and intuitive multidimensional array abstractions. Usability and performance-portability is demonstrated with proxy-applications for finite element and molecular dynamics codes. MiniMD, a proxy-application for the LAMMPS molecular dynamic code, has implementations in OpenMP, OpenCL, CUDA, and now KokkosArray. A comparison of miniMD''s KokkosArray implementation with the previous three versions demonstrate the relative strengths and weaknesses of KokkosArray, and that how the portable version retains about 95% of the performance of the "native" versions. Multiphysics applications with heterogeneous finite element discretizations have complex and highly irregular data structures. A KokkosArray-based prototype unstructured heterogeneous finite element mesh library and its support for heterogeneous manycore parallel computations will be presented.

Performance on manycore devices is dependent data access patterns where different devices (NVIDIA, Intel-Phi, NUMA) require different data access patterns. A performance-portable programming model does not force a false-choice between arrays-of-structures or structures-of-arrays, instead it defines abstractions to transparently adapt data structures to meet device requirements. The KokkosArray library implements this strategy through simple and intuitive multidimensional array abstractions. Usability and performance-portability is demonstrated with proxy-applications for finite element and molecular dynamics codes. MiniMD, a proxy-application for the LAMMPS molecular dynamic code, has implementations in OpenMP, OpenCL, CUDA, and now KokkosArray. A comparison of miniMD''s KokkosArray implementation with the previous three versions demonstrate the relative strengths and weaknesses of KokkosArray, and that how the portable version retains about 95% of the performance of the "native" versions. Multiphysics applications with heterogeneous finite element discretizations have complex and highly irregular data structures. A KokkosArray-based prototype unstructured heterogeneous finite element mesh library and its support for heterogeneous manycore parallel computations will be presented.

  Back
 
Topics:
Tools & Libraries, Programming Languages, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3426
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next