GTC ON-DEMAND

 
SEARCH SESSIONS
SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

Presentation
Media
Abstract:

For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions. 

For job allocation decision, current batch schedulers have access to and use only information on the number of nodes and runtime because it is readily available at submission time from user job scripts. User-provided runtimes are typically inaccurate because users overestimate or lack understanding of job resource requirements. Beyond the number of nodes and runtime, other system resources, including IO and network, are not available but play a key role in system performance. In this talk we tackle the need for automatic, general, and scalable tools that provide accurate resource usage information to schedulers with our tool for Predicting Runtime and IO using Neural Networks and GPUs (PRIONN). PRIONN automates prediction of per-job runtime and IO resource usage, enabling IO-aware scheduling on HPC systems. The novelty of our tool is the input of whole job scripts into deep learning models that allows complete automation of runtime and IO resource predictions. 

  Back
 
Topics:
HPC and AI
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1810
Download:
Share:
 
Abstract:
We'll present graphs as powerful tools when analyzing complex relationships between entities. We'll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as graphs. Since many of the real graphs are very large and complex, the associated analysis algorithms must be very efficient and highly parallel. We present two implementations of a key graph-based analysis such as the triangle enumeration for two different parallel paradigms: GPU programming and Apache Spark. We'll reveal the performance of the two different implementations for the different paradigms as the characteristics of the graph change.
We'll present graphs as powerful tools when analyzing complex relationships between entities. We'll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as graphs. Since many of the real graphs are very large and complex, the associated analysis algorithms must be very efficient and highly parallel. We present two implementations of a key graph-based analysis such as the triangle enumeration for two different parallel paradigms: GPU programming and Apache Spark. We'll reveal the performance of the two different implementations for the different paradigms as the characteristics of the graph change.  Back
 
Topics:
Algorithms & Numerical Techniques, Tools & Libraries, Big Data Analytics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2016
Session ID:
S6424
Streaming:
Download:
Share:
 
Abstract:
Learn how to mitigate rounding errors that can hamper result reproducibility when concurrent executions burst and workflow determinism vanishes. This talk unveils the power of mathematical methods to model rounding-errors in scientific applications and illustrates how these methods can mitigate error drifting on new generation, many-core GPUs. We will discuss performance and accuracy issues for a diverse set of scientific applications that rely on floating point arithmetic. In particular, our experimental study will cover the following exploration space: floating point format and precision (e.g., single, double, and composite precision), numerical range used by the computation, degree of multi-threading, thread scheduling scheme, and algorithmic variant.
Learn how to mitigate rounding errors that can hamper result reproducibility when concurrent executions burst and workflow determinism vanishes. This talk unveils the power of mathematical methods to model rounding-errors in scientific applications and illustrates how these methods can mitigate error drifting on new generation, many-core GPUs. We will discuss performance and accuracy issues for a diverse set of scientific applications that rely on floating point arithmetic. In particular, our experimental study will cover the following exploration space: floating point format and precision (e.g., single, double, and composite precision), numerical range used by the computation, degree of multi-threading, thread scheduling scheme, and algorithmic variant.  Back
 
Topics:
Developer - Algorithms, Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2015
Session ID:
S5245
Streaming:
Download:
Share:
 
Abstract:
Discover and quantify the performance gains of dynamic parallelism for clustering algorithms on GPUs. Dynamic parallelism effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance is measured using two well-known clustering algorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.
Discover and quantify the performance gains of dynamic parallelism for clustering algorithms on GPUs. Dynamic parallelism effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance is measured using two well-known clustering algorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.  Back
 
Topics:
Numerical Algorithms & Libraries, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2014
Session ID:
S4318
Streaming:
Download:
Share:
 
Abstract:

With the plethora of future applications of carbon nanotube materials rapidly being realized and exploited, we are pursuing fundamental studies of structural, dynamic, and energetic properties of model single-walled carbon nanotubes in pure water and in aqueous solutions of simple inorganic salt, sodium chloride (NaCl) and sodium iodide (NaI). Our transformative research is supported and made possible because of a hybrid combination of resources at Oak Ridge National Lab such as the GPU cluster Keeneland for FEN ZI GPU molecular dynamics simulations of mean force calculations and the data-intensive cluster Nautilus for the data analysis of the GPU-computed potentials of mean force. In this talk we dive deep into the various key aspects of CNT simulations on hybrid resources. Come and learn some of the underlying challenges and get the latest solutions devised to tackle both algorithmic and scientific challenges of CNT simulations and their heterogeneous workflows with GPUs.

With the plethora of future applications of carbon nanotube materials rapidly being realized and exploited, we are pursuing fundamental studies of structural, dynamic, and energetic properties of model single-walled carbon nanotubes in pure water and in aqueous solutions of simple inorganic salt, sodium chloride (NaCl) and sodium iodide (NaI). Our transformative research is supported and made possible because of a hybrid combination of resources at Oak Ridge National Lab such as the GPU cluster Keeneland for FEN ZI GPU molecular dynamics simulations of mean force calculations and the data-intensive cluster Nautilus for the data analysis of the GPU-computed potentials of mean force. In this talk we dive deep into the various key aspects of CNT simulations on hybrid resources. Come and learn some of the underlying challenges and get the latest solutions devised to tackle both algorithmic and scientific challenges of CNT simulations and their heterogeneous workflows with GPUs.

  Back
 
Topics:
Quantum Chemistry, Developer - Algorithms
Type:
Talk
Event:
GTC Silicon Valley
Year:
2013
Session ID:
S3199
Streaming:
Download:
Share:
 
Abstract:

GPU enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism in the order of several hundreds to few thousands of cores, presents opportunities as well poses implementation challenges. In this talk dive deep into the various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. Learn some of the underlying challenges and get the latest solutions devised to tackle them in the FEN ZI code for fully atomistic macromolecular simulations.

GPU enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism in the order of several hundreds to few thousands of cores, presents opportunities as well poses implementation challenges. In this talk dive deep into the various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. Learn some of the underlying challenges and get the latest solutions devised to tackle them in the FEN ZI code for fully atomistic macromolecular simulations.

  Back
 
Topics:
Molecular Dynamics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2012
Session ID:
S2207
Streaming:
Download:
Share:
 
Speakers:
Michela Taufer, Narayan Ganesan, Sandeep Patel
- University of Delaware
Abstract:
Learn how to study membrane-bound protein receptors by moving beyond the current state-of-the-art simulations that only consider small patches of physiological membranes. Towards this end, this session presents how to apply large-scale GPU-enabled computations of extended phospholipid bilayer membranes using a GPU code based on the CHARMM force field for MD simulations. Our code enables fast simulations of large membrane regions in NVT and NVE ensembles and includes different methods for the representation of the electrostatic interactions, i.e., reaction force field and Ewald summation (PME) methods. Performance and scientific results for dimyristoylphosphatidylcholine (PC) based lipid bilayers are presented.
Learn how to study membrane-bound protein receptors by moving beyond the current state-of-the-art simulations that only consider small patches of physiological membranes. Towards this end, this session presents how to apply large-scale GPU-enabled computations of extended phospholipid bilayer membranes using a GPU code based on the CHARMM force field for MD simulations. Our code enables fast simulations of large membrane regions in NVT and NVE ensembles and includes different methods for the representation of the electrostatic interactions, i.e., reaction force field and Ewald summation (PME) methods. Performance and scientific results for dimyristoylphosphatidylcholine (PC) based lipid bilayers are presented.   Back
 
Topics:
Molecular Dynamics, HPC and AI, Physics Simulation
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
2035
Streaming:
Download:
Share:
 
Speakers:
Michela Taufer, Narayan Ganesan
- University of Delaware
Abstract:
Important applications in signal, data processing and bioinformatics that use dynamic programming are difficult to parallelize due to intrinsic data dependencies. We demonstrate a novel technique to extract parallelism out of data dependent algorithms and reformulate the same for GPUs. This simple technique breaks the dependencies and resolves them at an optimal point later in time, thus obtaining remarkable speedup on GPUs. We present a case study from computational biology i.e., protein motif-finding. We also present how the same technique can be extended and applied to other relevant problems such as gene-prediction and phylogenetics.
Important applications in signal, data processing and bioinformatics that use dynamic programming are difficult to parallelize due to intrinsic data dependencies. We demonstrate a novel technique to extract parallelism out of data dependent algorithms and reformulate the same for GPUs. This simple technique breaks the dependencies and resolves them at an optimal point later in time, thus obtaining remarkable speedup on GPUs. We present a case study from computational biology i.e., protein motif-finding. We also present how the same technique can be extended and applied to other relevant problems such as gene-prediction and phylogenetics.   Back
 
Topics:
Life & Material Science, Developer - Algorithms, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2010
Session ID:
2034
Streaming:
Download:
Share:
 
Abstract:

GPU-enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism, in the order of several hundred to a few thousand cores, presents opportunities as well as poses implementation challenges. In this webinar, Michela Taufer, Assistant Professor, Department of Computer and Information Sciences, University of Delaware, discusses various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. She will also visit some of the underlying challenges and solutions devised to tackle them.

GPU-enabled simulation of fully atomistic macromolecular simulation is rapidly gaining momentum, enabled by the massive parallelism and due to parallelizability of various components of the underlying algorithms and methodologies. The massive parallelism, in the order of several hundred to a few thousand cores, presents opportunities as well as poses implementation challenges. In this webinar, Michela Taufer, Assistant Professor, Department of Computer and Information Sciences, University of Delaware, discusses various key aspects of simulation methodologies of macro molecular systems specifically adapted to GPUs. She will also visit some of the underlying challenges and solutions devised to tackle them.

  Back
 
Topics:
Molecular Dynamics
Type:
Webinar
Event:
GTC Webinars
Year:
2011
Session ID:
GTCE007
Streaming:
Download:
Share:
 
 
Previous
  • Amazon Web Services
  • IBM
  • Cisco
  • Dell EMC
  • Hewlett Packard Enterprise
  • Inspur
  • Lenovo
  • SenseTime
  • Supermicro Computers
  • Synnex
  • Autodesk
  • HP
  • Linear Technology
  • MSI Computer Corp.
  • OPTIS
  • PNY
  • SK Hynix
  • vmware
  • Abaco Systems
  • Acceleware Ltd.
  • ASUSTeK COMPUTER INC
  • Cray Inc.
  • Exxact Corporation
  • Flanders - Belgium
  • Google Cloud
  • HTC VIVE
  • Liqid
  • MapD
  • Penguin Computing
  • SAP
  • Sugon
  • Twitter
Next