SEARCH SESSIONS
SEARCH SESSIONS

Search All
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC ON-DEMAND

AI & Deep Learning Research
Presentation
Media
Abstract:
Learn about the latest research on improvements to text-to-speech models and workflows using Tacotron2 and Waveglow produced by NVIDIA's applied deep learning research team. In partnership with our deep learning algorithm development team, learn mor ...Read More
Abstract:
Learn about the latest research on improvements to text-to-speech models and workflows using Tacotron2 and Waveglow produced by NVIDIA's applied deep learning research team. In partnership with our deep learning algorithm development team, learn more about how Tensor Cores have made fast mixed-precision training and faster than real-time inference performance available. We'll also be showing a demo, reviewing accuracy, and performance metrics through the open source implementation available on GitHub.  Back
 
Topics:
AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91022
Streaming:
Download:
Share:
 
Abstract:
In order to obtain peak performance and energy efficiency on modern deep learning architectures, such as GPUs and TPUs, it is critical to use half precision arithmetic. Compared to single precision, half precision reduces memory traffic, allowing 2x ...Read More
Abstract:
In order to obtain peak performance and energy efficiency on modern deep learning architectures, such as GPUs and TPUs, it is critical to use half precision arithmetic. Compared to single precision, half precision reduces memory traffic, allowing 2x better use of the available DRAM bandwidth. Smaller memory footprints for half precision layer activations also allow larger batch sizes and deeper network architectures to fit in the accelerator's memory during training. Finally, architectural features, such as Volta's Tensor Cores, boost the raw math throughput of half precision operations by up to 8x compared to single precision. We describe two new streamlined implementations of mixed-precision training being built into TensorFlow. The first is provided through extensions to the tf.keras API and will be available in the upcoming months. The second is based on a Grappler graph optimization pass and will work with TF 1.x graph-based models as well as future TensorFlow 2.0 models that make use of tf.function decorators. Each method is enabled using a one or two line tweak to the training script. Empirical results show that result accuracy matches that of a model trained in single-precision, while training speedup is similar to what can be achieved with hand-coded mixed precision strategies.  Back
 
Topics:
AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91029
Streaming:
Download:
Share:
 
Abstract:
We'll present our study on GPU optimization for deep learning with limited computational resources and share our tips and tricks for building a state-of-the-art Visual Question Answering (VQA) system. Learn about technical implementations of deep le ...Read More
Abstract:
We'll present our study on GPU optimization for deep learning with limited computational resources and share our tips and tricks for building a state-of-the-art Visual Question Answering (VQA) system. Learn about technical implementations of deep learning algorithms with GPU hardware utilization, including delayed updates and mixed-precision training, to deal with limited hardware resources while reduce training time and memory usage. We'll describe our experience designing a winning architecture for the VQA Challenge 2018 by applying deep learning tactics such as multi-level multi-modal fusion, parameter-interaction learning, and end-to-end optimization. Our techniques are all heavy computing tasks, so GPU programming plays an important role in advancing our work. We'll also provide convincing empirical proofs and a practical demonstration of a VQA application.  Back
 
Topics:
AI & Deep Learning Research, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9824
Streaming:
Download:
Share:
 
Abstract:
We will cover the techniques for training DNNs with Tensor Cores described in "S8923 - Training Neural Networks with Mixed Precision: Theory and Practice". These methods were introduced for AI processing with the Volta GPU architecture. T ...Read More
Abstract:
We will cover the techniques for training DNNs with Tensor Cores described in "S8923 - Training Neural Networks with Mixed Precision: Theory and Practice". These methods were introduced for AI processing with the Volta GPU architecture. Tensor Cores provide up to 120 TFlops throughput, mixing operations on IEEE half- and single-precision floats. Techniques used will include loss-scaling, master weights copy, and choosing the proper precision for a given operation. For each of TensorFlow and PyTorch we will describe a fp32 network definition and then demonstrate the same network using mixed precision techniques.  Back
 
Topics:
AI & Deep Learning Research, Algorithms & Numerical Techniques
Type:
Tutorial
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81012
Streaming:
Download:
Share:
 
Abstract:
We'll discuss training techniques and deep learning architectures for high-precision landmark localization. In the first part of the session, we'll talk about ReCombinator Networks, which aims at maintaining pixel-level image information ...Read More
Abstract:

We'll discuss training techniques and deep learning architectures for high-precision landmark localization. In the first part of the session, we'll talk about ReCombinator Networks, which aims at maintaining pixel-level image information, for high-accuracy landmark localization. This model combines coarse-to-fine features to first observe global (coarse) image information and then recombines local (fine) information. By using this model, we report SOTA on three facial landmark datasets. This model can be used for other tasks that require pixel-level accuracy (for example, image segmentation, image-to-image translation). In the second part, we'll talk about improving landmark localization in a semi-supervised setting, where less labeled data is provided. Specifically, we consider a scenario where few labeled landmarks are given during training, but lots of weaker labels (for example, face emotions, hand gesture) that are easier to obtain are provided. We'll describe training techniques and model architectures that can leverage weaker labels to improve landmark localization.

  Back
 
Topics:
AI & Deep Learning Research, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8406
Streaming:
Download:
Share:
 
Abstract:
Driver monitoring systems are used to detect many driver attributes like gaze, head pose, eye openness, and other features pertaining to attention and assistance. We''ll present a synthetic method of generating data for training DNNs, which caters to ...Read More
Abstract:
Driver monitoring systems are used to detect many driver attributes like gaze, head pose, eye openness, and other features pertaining to attention and assistance. We''ll present a synthetic method of generating data for training DNNs, which caters to the above mentioned features of the subject. We use blender for generating synthetic images, powered by NVIDIA GPUs, which can be scaled to match training needs. Synthetic data generatation allows precise control over data points that are difficult to control in a real environment, like pupil dialation. This approach avoids noisy measurements and results in high accuracy without the need for a high-precision 3D sensor.  Back
 
Topics:
AI & Deep Learning Research, Autonomous Vehicles, Advanced AI Learning Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8324
Streaming:
Download:
Share:
 
Abstract:
Many scientific and engineering fields increasingly rely on complex and time consuming computational simulation as part of the modern scientific workflow. In many applications, such as High Energy Particle Physics, Cosmology, Geophysics, and others, ...Read More
Abstract:
Many scientific and engineering fields increasingly rely on complex and time consuming computational simulation as part of the modern scientific workflow. In many applications, such as High Energy Particle Physics, Cosmology, Geophysics, and others, simulations are the computational bottleneck for producing and testing results. We introduce the usage of Generative Adversarial Networks (GAN) as a potential tool for speeding up expensive theoretical models and simulations in scientific and engineering applications, ushering in a new era of deep learning-powered scientific discovery. We will show that using a GAN-based High Energy Physics fast simulator on GPUs can provide speedups of up to 100,000x when compared to traditional simulation software, while retaining high levels of precision. Finally, we will discuss modeling and architectural considerations in this domain with the hope of directly empowering scientists and engineers in other fields to experiment with Generative Adversarial Networks in order to speed up simulation across scientific domains.  Back
 
Topics:
AI & Deep Learning Research, Advanced AI Learning Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81001
Streaming:
Download:
Share:
 
Abstract:
We'll cover the theory and practice for training DNNs with Tensor Cores, introduced for AI processing with the Volta GPU architecture. Tensor Cores provide up to 120 TFlops throughput, mixing operations on IEEE half- and single-precision floats. In ...Read More
Abstract:
We'll cover the theory and practice for training DNNs with Tensor Cores, introduced for AI processing with the Volta GPU architecture. Tensor Cores provide up to 120 TFlops throughput, mixing operations on IEEE half- and single-precision floats. In the theory portion of the talk, we'll review the half-precision format, values that arise in DNN computations, and techniques that maximize utilization of fp16 format by these values. Techniques include loss-scaling, master weights, and choosing the proper precision for a given operation. In the practice portion of this talk, we'll survey various models that have been trained in mixed precision, matching the accuracy of fp32 training sessions while using the same hyperparameters. Models include various architectures (feed forward, recurrent, generative) as well as cover diverse tasks (image, speech, and language processing). We'll also provide network design and training guidelines to maximize speed when using Tensor Cores.  Back
 
Topics:
AI & Deep Learning Research, Algorithms & Numerical Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8923
Streaming:
Download:
Share:
 
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better ...Read More
Abstract:
The Department of Energy (DOE) entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH) to accelerate cancer research. This "Cancer Moonshot" aims to tackle three main objectives: better understand the mechanisms of cancer, use large amounts of diverse medical data for predictive models, and enable precision medicine by providing guidance for treatment to individual patients. Leveraging the compute expertise of DOE in high performance computing (HPC) and new methods for deep learning in artificial intelligence, this HPC+AI approach aims to create a single scalable deep neural network code called CANDLE (CANcer Distributed Learning Environment) that will be used to address all three challenges. This talk aims to give an overview of the project and highlight how GPU accelerated systems in the DOE ecosystem, Summit and Sierra, have contributed to the project.  Back
 
Topics:
AI & Deep Learning Research, HPC and AI, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81033
Streaming:
Share:
 
Abstract:
In this session, you will learn about the latest IBM PowerAI solution, IBM Cloud GPU offerings and see a price-performance comparison, with supporting data, on the number of CPUs required to optimize GPU performance. We've also aggregated extensive ...Read More
Abstract:
In this session, you will learn about the latest IBM PowerAI solution, IBM Cloud GPU offerings and see a price-performance comparison, with supporting data, on the number of CPUs required to optimize GPU performance. We've also aggregated extensive test data to determine general best practices such as half-precision deep learning advantages on the Tesla V100 and the implications of neural-network model variable distribution and gradient aggregation techniques on your performance results. Join us to see why NVIDIA GPUs on IBM Cloud offer superior results.  Back
 
Topics:
AI & Deep Learning Research, Accelerated Data Science
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81013
Streaming:
Download:
Share:
AI Application, Deployment & Inference
Presentation
Media
Abstract:
We'll present a fast, highly accurate, and customizable object-detection network optimized for training and inference on GPUs. After describing the network architecture, we'll dive into how different stages of training workflow are accelerated. Our ...Read More
Abstract:
We'll present a fast, highly accurate, and customizable object-detection network optimized for training and inference on GPUs. After describing the network architecture, we'll dive into how different stages of training workflow are accelerated. Our techniques include data ingestion and augmentation, mixed precision, and multi-GPU training. We'll demonstrate how we optimized our network for deployment without loss of accuracy using ONNX and NVIDIA TensorRT. We'll also show how to create TensorRT plugins for post-processing to perform inference entirely on the GPU. This session will be a combination of lecture and demos.  Back
 
Topics:
AI Application, Deployment & Inference, Deep Learning & AI Frameworks, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9243
Streaming:
Download:
Share:
 
Abstract:
We'll discuss our work at Esri to reconstruct 3D building models from aerial LiDAR data with the help of deep neural networks. The value of accurate 3D building models for cities is hard to overestimate, but collecting and maintaining this data is l ...Read More
Abstract:
We'll discuss our work at Esri to reconstruct 3D building models from aerial LiDAR data with the help of deep neural networks. The value of accurate 3D building models for cities is hard to overestimate, but collecting and maintaining this data is labor-intensive, error-prone, and expensive. We teamed up with Miami-Dade County and NVIDIA to see if we could streamline this data-acquisition workflow or at least, make it more cost-effective. We used a Mask R-CNN model trained to detect and report instances of roof segments of various types. Our talk will cover data preparation and Mask R-CNN training and achieved precision. We'll also outline the inference architecture, the integration of TensorFlow and ArcGIS Pro 2.3, and the steps we used to reconstruct 3D building models from the predictions.  Back
 
Topics:
AI Application, Deployment & Inference, Seismic & Geosciences, Product & Building Design
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9255
Streaming:
Download:
Share:
 
Abstract:
We'll talk about how we built a GPU-Accelerated system for real-time information retrieval from large datasets in life sciences. Unstructured textual data is full of phrases and words that have multiple meanings, making it difficult for current info ...Read More
Abstract:
We'll talk about how we built a GPU-Accelerated system for real-time information retrieval from large datasets in life sciences. Unstructured textual data is full of phrases and words that have multiple meanings, making it difficult for current information-retrieval algorithms to find relevant documents. We'll describe our knowledge graph-based filtering mechanism for more precise real-time information retrieval. We outline how we accelerated the embedding generation process, treating it as an optimization problem and running it on NVIDIA Tesla V100 GPU cores. We'll also cover how we reduced the latency in distance computation using TensorRT.  Back
 
Topics:
AI Application, Deployment & Inference, Speech & Language Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9359
Streaming:
Download:
Share:
 
Speakers:
Abstract:
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We'll describe quantizing neural networks models for vari ...Read More
Abstract:
Although neural network training is typically done in either 32- or 16-bit floating point formats, inference can be run at even lower precisions that reduce memory footprint and elapsed time. We'll describe quantizing neural networks models for various image (classification, detection, segmentation) and natural language processing tasks. In addition to convolutional feed forward networks, we will cover quantization of recurrent models. The discussion will examine both floating point and integer quantizations, targeting features in Volta and Turing GPUs.  Back
 
Topics:
AI Application, Deployment & Inference, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9659
Streaming:
Download:
Share:
 
Abstract:
The average human brain has about 100 billion nerve cells. We therefore investigate the question whether there are algorithms for artificial neural networks that are linear in the number of neurons, while the number of connections incident to a neuro ...Read More
Abstract:
The average human brain has about 100 billion nerve cells. We therefore investigate the question whether there are algorithms for artificial neural networks that are linear in the number of neurons, while the number of connections incident to a neuron is bounded by a constant. We offer two approaches to answer this question: First, we derive an algorithm that quantizes a trained artificial neural network such that the resulting complexity is linear. Second, we demonstrate that training networks, whose connections are determined by uniform sampling can achieve a similar precision as compared to using fully connected layers. Due to sparsity upfront, these networks can be trained much faster. Both approaches are made plausible by relating artificial neural units to Monte Carlo integration. We'll demonstrate the results for classic test datasets.  Back
 
Topics:
AI Application, Deployment & Inference, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8780
Streaming:
Download:
Share:
 
Abstract:
Attendees will learn and understand why AI techniques are so powerful, why developing and deploying optimal AI solutions is complex, why using AI techniques effectively is still difficult--and what Dell Technologies is doing to remove these difficult ...Read More
Abstract:
Attendees will learn and understand why AI techniques are so powerful, why developing and deploying optimal AI solutions is complex, why using AI techniques effectively is still difficult--and what Dell Technologies is doing to remove these difficulties and bring easier, effective AI to everyone. Dell Technologies includes seven companies with a comprehensive portfolio of technology products, services and solutions for global industry, government, and education markets, and aims to be the leader in designing and delivering the best AI solutions for every customer, of every type and scale. From Dell Precision workstations for developers and Gateways for edge sensors, to Dell EMC GPU-optimized PowerEdge Servers and Ready Solutions for Deep Learning and hybrid cloud offerings, Dell is leveraging its leadership in technology and in enterprise relationships to design a world-class portfolio of AI solutions for diverse customer workloads, requirements and objectives. This presentation will cover AI and deep learning in an enterprise context, including customer challenges and needs, and then discuss Dell AI solutions and strategy to empower people to use AI rapidly and effectively.  Back
 
Topics:
AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81046
Streaming:
Share:
 
Abstract:
TensorFlow is an open source software library for numerical computation using data flow graphs. NVIDIA TensorRT is an inference optimizer and runtime for runtime deployment. TensorRT provides optimizations for deep neural networks and uses reduced pr ...Read More
Abstract:
TensorFlow is an open source software library for numerical computation using data flow graphs. NVIDIA TensorRT is an inference optimizer and runtime for runtime deployment. TensorRT provides optimizations for deep neural networks and uses reduced precision to increase throughput, reduce latency, while maintaining accuracy. Today we announced tighter integration in TensorFlow for TensorRT through with new TensorFlow APIs, sub-graph optimizations and INT8 calibration to automatically leverage Tensor Cores on Volta GPUs. TensorRT delivers 2.5x faster inference throughput compared to inference without TensorRT. In this session, NVIDIA developers will use an example based workflow to show how to use this new capability.  Back
 
Topics:
AI Application, Deployment & Inference, Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S81009
Streaming:
Download:
Share:
 
Abstract:
Come learn how you can optimize the deployment of your trained neural networks using the GPU-accelerated inferencing library called TensorRT. TensorRT is a high-performance tool for low-latency, high-throughput deep neural network (DNN) inference tha ...Read More
Abstract:
Come learn how you can optimize the deployment of your trained neural networks using the GPU-accelerated inferencing library called TensorRT. TensorRT is a high-performance tool for low-latency, high-throughput deep neural network (DNN) inference that runs on NVIDIA GPUs. The latest release of TensorRT introduces a novel, framework-agnostic network definition format called universal framework format, allowing TensorRT to support and optimize DNN models trained in multiple deep learning frameworks like Caffe and TensorFlow. It also provides the capability to run inference at reduced precision, giving developers the ability to take advantage of new GPU hardware features like the Volta Tensor Core architecture. This session will be a combination of lecture and live demos.  Back
 
Topics:
AI Application, Deployment & Inference, Tools & Libraries, Performance Optimization, Data Center & Cloud Infrastructure
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8496
Streaming:
Share:
 
Abstract:
NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. This lab provides hands-on experience using TensorRT to convert the neural network model to INT8 precision, calibrate, ...Read More
Abstract:

NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. This lab provides hands-on experience using TensorRT to convert the neural network model to INT8 precision, calibrate, vali date and deploy for inference in a self-driving car application.

  Back
 
Topics:
AI Application, Deployment & Inference
Type:
Instructor-Led Lab
Event:
GTC Europe
Year:
2017
Session ID:
53021
Download:
Share:
AI in Healthcare
Presentation
Media
Abstract:
We'll disscuss how GPUs are playing a central role in making advances in Ion Torrent's targeted sequencing workflow and talk about the S5 DNA sequencer from Ion Torrent that is enabling democratization of sequencing market and accel ...Read More
Abstract:

We'll disscuss how GPUs are playing a central role in making advances in Ion Torrent's targeted sequencing workflow and talk about the S5 DNA sequencer from Ion Torrent that is enabling democratization of sequencing market and accelerating research in precision medicine at a breathtaking pace with the help of GPUs. We'll highlight our work in liquid biopsy and non-invasive prenatal testing and how the breadth in technology offerings in semiconductor chips gives us the scale of sequencing from small panels to exomes. We'll discuss our analysis pipeline and the latest and greatest in algorithm development and acceleration on GPUs as well as our experiences ranging from Fermi to Pascal GPU architectures. 

  Back
 
Topics:
AI in Healthcare, Genomics & Bioinformatics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8419
Streaming:
Download:
Share:
 
Abstract:
Machine Learning in Precision Medicine: Patient-Specific Treatment Enabled by Quantitative Medical Imaging, Artificial Intelligence, and GPU Efficiency The attendees will learn about the need for and use of machine learning in today's patien ...Read More
Abstract:

Machine Learning in Precision Medicine: Patient-Specific Treatment Enabled by Quantitative Medical Imaging, Artificial Intelligence, and GPU Efficiency The attendees will learn about the need for and use of machine learning in today's patient-centered healthcare. The talk will focus on general approaches requiring machine learning to obtain image-based quantitative features, reach patient diagnoses, predict disease outcomes, and identify proper precision-treatment strategies. While the presented methods are general in nature, examples from cardiovascular disease management will be used to demonstrate the need for and power of machine learning enabled by the performance advantages of GPU computation.

  Back
 
Topics:
AI in Healthcare, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8892
Streaming:
Download:
Share:
 
Abstract:
This talk will overview the fields of Personalised Computational Medicine and In Silico Clinical Trials, which are revolutionizing Medicine and Medical Product Development. This talk will introduce these concepts, provide examples of how they ca ...Read More
Abstract:

This talk will overview the fields of Personalised Computational Medicine and In Silico Clinical Trials, which are revolutionizing Medicine and Medical Product Development. This talk will introduce these concepts, provide examples of how they can transform healthcare, and emphasize why artificial intelligence and machine learning are relevant to them. We will also explain the limitations of these approaches and why it is paramout to engage in both phenomenological (data-driven) and mechanistic (principle-driven) modelling. Both areas are in desperate need for better infrastructures -sofrware and hardaware- giving access to computational and storage resources. The talk will be thought-provoking and eye-opening as to opportunities in this space for researchers and industries alike.

  Back
 
Topics:
AI in Healthcare, Artificial Intelligence and Deep Learning, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8887
Streaming:
Share:
 
Abstract:
The Role of Data in Achieving Precision and Value in Healthcare The goal of healthcare is to provide the most effective treatment to every patient in the most efficient way. Data plays a key role in every aspect of this process from decision sup ...Read More
Abstract:

The Role of Data in Achieving Precision and Value in Healthcare The goal of healthcare is to provide the most effective treatment to every patient in the most efficient way. Data plays a key role in every aspect of this process from decision support systems that provide a clinician with the right information at the right time, to scheduling algorithms that predict patient flow and schedule accordingly, to analytics to coach and support patients in achieving or maintaining a healthy lifestyle. Achieving the vision of a data-informed healthcare system will require fundamental advances in many areas including causal inference, inference on complex, high-dimensional and heterogeneous data, missing data, process modeling, bias reduction, statistical validation, and model adaptation, to name a few. In this talk, I will illustrate some of these challenges through concrete examples within the Malone Center.

  Back
 
Topics:
AI in Healthcare, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8891
Streaming:
Download:
Share:
 
Abstract:
Images and sensors provide crucial information needed to make treatment decisions and machine learning methods are increasingly employed to supplement subjective human image interpretations and to integrate heterogeneous collections of information. W ...Read More
Abstract:
Images and sensors provide crucial information needed to make treatment decisions and machine learning methods are increasingly employed to supplement subjective human image interpretations and to integrate heterogeneous collections of information. We'll describe the rapidly changing landscape of medical images and sensors from both a computing, data, and medical point of view. We'll then do a deep dive in the area of pathology image analytics along with contributions made by deep learning methods to precision medicine and clinical diagnostics. Finally, we'll address the pivotal role of GPUs in supporting all of these computations and describe the roles of GPU-related tools, languages, and libraries in the medical image and sensor analytics.  Back
 
Topics:
AI in Healthcare, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7248
Download:
Share:
Accelerated Data Science
Presentation
Media
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features ...Read More
Abstract:
The TSUBAME3 supercomputer at Tokyo Institute of Technology became online in Aug. 2017, and became the greenest supercomputer in the world on the Green 500 at 14.11 GFlops/W; the other aspect of TSUBAME3, is to embody various BYTES-oriented features to allow for HPC to BD/AI convergence at scale, including significant scalable horizontal bandwidth as well as support for deep memory hierarchy and capacity, along with high flops in low precision arithmetic for deep learning.   Back
 
Topics:
Accelerated Data Science
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1720
Download:
Share:
Advanced AI Learning Techniques
Presentation
Media
Abstract:
We'll discuss monitoring and visualizing a deep neural network in MXNet and explain how to improve training performance. We'll also talk about coding best practices, data pre-processing, making effective use of CPUs, hybridization, efficient batch ...Read More
Abstract:
We'll discuss monitoring and visualizing a deep neural network in MXNet and explain how to improve training performance. We'll also talk about coding best practices, data pre-processing, making effective use of CPUs, hybridization, efficient batch size, low precision training, and other tips and tricks that can improve training performance by orders of magnitude.  Back
 
Topics:
Advanced AI Learning Techniques
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9370
Streaming:
Download:
Share:
 
Abstract:
We''ll show how generative adversarial networks (GANs) running on GPUs are about to revolutionize mass customization of patient-specific products at Glidewell Dental. Every day, our labs produce thousands of patient-specific items, such as dental res ...Read More
Abstract:
We''ll show how generative adversarial networks (GANs) running on GPUs are about to revolutionize mass customization of patient-specific products at Glidewell Dental. Every day, our labs produce thousands of patient-specific items, such as dental restorations, implants, and appliances. To deliver functional and aesthetic products, high levels of precision and consistency are essential. Traditionally, dental restoration design and manufacturing process was very labor intensive and required highly skilled dental professionals. Today, with the advances in CAD/CAM, the amount of manual labor has been significantly reduced; however, there are still many aspects of the process that require human intervention due to the fact that some of these aspects are hard to formalize and therefore impossible to automate with traditional tools. The convergence of several technologies, such as deep learning, GPGPU, and cloud computing, has allowed us to effectively train generative models on historical data. These models are now capable of automatically generating high-quality patient-specific designs.  Back
 
Topics:
Advanced AI Learning Techniques, Consumer Engagement & Personalization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8155
Streaming:
Download:
Share:
Algorithms & Numerical Techniques
Presentation
Media
Abstract:
Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding erro ...Read More
Abstract:

Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding errors. Although algorithms like binomial correction or Karatsuba can reduce rounding errors considerably, they require additional calculations. We'll detail performance of these algorithms based on the Warp Matrix Multiply Accumulate API.

  Back
 
Topics:
Algorithms & Numerical Techniques, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9176
Streaming:
Download:
Share:
 
Abstract:
Audience members will learn how to implement efficient Deep Learning computations using CUDA C++ in the context of CUTLASS. CUTLASS is an open-source collection of C++ template abstractions for implementing high-performance matrix-multiplication (GE ...Read More
Abstract:
Audience members will learn how to implement efficient Deep Learning computations using CUDA C++ in the context of CUTLASS. CUTLASS is an open-source collection of C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels of the CUDA thread hierarchy. We will describe many of the algorithmic strategies used by cuBLAS and cuDNN, and how they can be implemented using C++ templates to cover an extensive space of problem sizes, data layouts, and data types. In particular, we will emphasize how to support alternative and mixed precision math operations such as Pascal's integer DP4A operation and Volta's TensorCores. Finally, we will illustrate how CUTLASS primitives can be combined with custom functionality to implement related algorithms such as convolution. Although this talk highlights CUTLASS, the architecture concepts and algorithm details are relevant to any CUDA programmer focused on Deep Learning.  Back
 
Topics:
Algorithms & Numerical Techniques, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8854
Streaming:
Share:
 
Abstract:
Road identification and route prediction in near real time remains a challenging problem for many geographic regions, particularly in the case of natural disasters or crisis situations. Existing methods such as manual road labeling or aggregatio ...Read More
Abstract:

Road identification and route prediction in near real time remains a challenging problem for many geographic regions, particularly in the case of natural disasters or crisis situations. Existing methods such as manual road labeling or aggregation of mobile GPS track data are currently insufficient in dynamic scenarios. The frequent revisits of satellite imaging constellations may accelerate efforts to rapidly update road network and optimal path prediction, provided routing information can be extracted from imaging pixels. We'll demonstrate deep learning segmentation methods for identifying road center lines and intersections from satellite imagery, and inferring networks from these road segments. We'll also explore data quality requirements by comparing open source labels with-high precision labels created as part of the SpaceNet Roads challenge.

  Back
 
Topics:
Algorithms & Numerical Techniques, HD Mapping, Federal
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8384
Streaming:
Share:
 
Abstract:
Learn how to develop fast and energy-efficient linear solvers using GPUs. Hybrid CPU-GPU techniques achieve high performance at the cost of extra power consumption. The new advancements in GPU architectures enable full GPU solutions that are high per ...Read More
Abstract:
Learn how to develop fast and energy-efficient linear solvers using GPUs. Hybrid CPU-GPU techniques achieve high performance at the cost of extra power consumption. The new advancements in GPU architectures enable full GPU solutions that are high performance, energy efficient, and CPU-independent. In addition, new technologies such as half precision arithmetic (FP16) help the design of new solvers that are significantly faster and even more energy efficient. While FP16 arithmetic has been a powerful tool for deep learning applications, our designs show that it is also very useful for boosting performance and energy efficiency of linear solvers. The new developments complement the hybrid algorithms in the MAGMA library, and provide users with a wide variety of designs that fit different requirements of performance, energy efficiency, and numerical accuracy.  Back
 
Topics:
Algorithms & Numerical Techniques, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8478
Streaming:
Download:
Share:
 
Abstract:
With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. ...Read More
Abstract:
With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. This will include the connections to established IEEE floating point standards and existing HPC benchmarks. The discussion will focus on performance and numerical stability issues that are important for this kind of benchmarking and how they relate to NVIDIA platforms.  Back
 
Topics:
Algorithms & Numerical Techniques, Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7676
Download:
Share:
Artificial Intelligence and Deep Learning
Presentation
Media
Abstract:
Tensor Cores, introduced with the Volta GPU architecture, provide up to 125 teraflops of throughput for operations on IEEE half-precision floats. In the theory portion of this talk we will review the half-precision format, the features of Tensor Core ...Read More
Abstract:
Tensor Cores, introduced with the Volta GPU architecture, provide up to 125 teraflops of throughput for operations on IEEE half-precision floats. In the theory portion of this talk we will review the half-precision format, the features of Tensor Cores, and principles for building mixed precision neural networks in any framework. The practice portion will review these principles with examples in PyTorch and show how tools like Apex can automatically convert existing neural networks to use Tensor Cores. This conversion requires no change in model architecture or hyperparameters and has been successfully applied to visual, auditory, and linguistic tasks on multiple frameworks.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8171
Streaming:
Share:
 
Abstract:
OpenSeq2Seq is an open-source, TensorFlow-based toolkit, which supports a wide range of off-the-shelf models for Natural Language Translation (GNMT, Transformer, ConvS2S), Speech Recognition (Wave2Letter, DeepSpeech2), Speech Synthesis (Tacotron ...Read More
Abstract:

OpenSeq2Seq is an open-source, TensorFlow-based toolkit, which supports a wide range of off-the-shelf models for Natural Language Translation (GNMT, Transformer, ConvS2S), Speech Recognition (Wave2Letter, DeepSpeech2), Speech Synthesis (Tacotron 2), Language Modeling and transfer learning for NLP tasks. OpenSeq2Seq is optimized for latest GPUs. It supports multi-GPU and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x faster training time.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2018
Session ID:
SIL8152
Streaming:
Download:
Share:
 
Abstract:
In this session, CEO Ofir Schlam will talk about the huge opportunity in agriculture, a 4 Trillion dollar industry that saw little disruption in recent decades. Another unbelievable number is 35% - the amount of crop losses every farmer in the world ...Read More
Abstract:
In this session, CEO Ofir Schlam will talk about the huge opportunity in agriculture, a 4 Trillion dollar industry that saw little disruption in recent decades. Another unbelievable number is 35% - the amount of crop losses every farmer in the world faces annually by weeds, diseases, insects, and fertilizer deficiencies. Ofir will explain Taranis' unique AI challenges in developing AgroBrain our AI agronomist and AgroSet the ImageNet for Ag our proprietary data set. Taranis, a precision agriculture intelligence platform, helps Ag Retailers, input manufacturers and large farms increase their yields and cut costs by giving them a way to effectively monitor their fields, make informed decisions, and then act on them. The system uses sophisticated computer vision, data science, and deep learning algorithms to detect early symptoms of weeds, uneven emergence, nutrient deficiencies, disease/insect infestations, water damage, equipment problems and more so that farmers can address issues quickly and understand the impact on yield and cost of production.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2018
Session ID:
SIL8133
Streaming:
Share:
 
Abstract:
Mixed precision training of deep neural networks provides tremendous benefits: it requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throu ...Read More
Abstract:

Mixed precision training of deep neural networks provides tremendous benefits: it requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8x speedup over FP32. In this talk, we first present the considerations and techniques when training with reduced-precision, including master weights and automatic loss scaling. After, we discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8494
Download:
Share:
 
Abstract:
Learn how NVIDIA is enabling a new wave of cancer treatment powered by AI, Immunotherapy and Genomics. A collaboration between a large scale sequencing company, world leading researchers and AI machine learning teams is tackling the problem of whole ...Read More
Abstract:
Learn how NVIDIA is enabling a new wave of cancer treatment powered by AI, Immunotherapy and Genomics. A collaboration between a large scale sequencing company, world leading researchers and AI machine learning teams is tackling the problem of whole genome precision medicine by predicting whether a patient will respond to new cancer drugs. NVIDIA supports these world leading efforts giving the collaborators the ability overcome the limitations. This innovative project is a partnership between two SMEs, one of Australias largest medical research institutes and a global sequencing corporation. Each partner brings complementary strengths: genomiQa specialise in the analysis of genomic data; Max Kelsen use AI to mine big data and explore novel insights; QIMR Berghofer are leaders in immunology and cancer genomic research; and BGI are a major supplier of genomic sequencing. We will use sophisticated artificial intelligence approaches to integrate genomic, transcriptomic and patient clinical information to identify a predictor and develop a test of treatment response. The classifier will be developed using genomic data from a large melanoma project. We will then validate and refine the classifier in a second cohort - 400 lung cancer patients collected using routine practice within the Australian health system.  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
AI Conference Australia
Year:
2018
Session ID:
AUS80030
Download:
Share:
 
Abstract:
We'll describe training of very deep networks with mixed-precision float (("float16") using Volta Tensor Core. Float16 has two major potential benefits: high training speed and reduced memory footprint. But float16 has smaller nume ...Read More
Abstract:

We'll describe training of very deep networks with mixed-precision float (("float16") using Volta Tensor Core. Float16 has two major potential benefits: high training speed and reduced memory footprint. But float16 has smaller numerical range than regular single precision float, which can result in overflow or underflow ("vanishing gradient") during training. We'll describe simple rescaling mechanism which solves these potential issues. With this rescaling algorithm, we successfully used mixed precision training for such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy.Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

  Back
 
Topics:
Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Israel
Year:
2017
Session ID:
SIL7116
Download:
Share:
 
Abstract:
3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other ...Read More
Abstract:
3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other words, the positional accuracy from deep learning object detection is inherently poor. Combining deep learning and 3D model fitting, our 3D DeepObject has the best of both worlds. Deep learning can detect object (a bounding box) with close to human-level accuracy, while 3D model fitting can achieve pixel-level positional accuracy. The output (bounding boxes) from deep learning are the input for 3D model fitting. A bounding box from deep learning can significantly reduce the search space for 3D model fitting. Our latest test indicates that 3D DeepObject can achieve much higher positional accuracy than deep learning or 3D model fitting alone can achieve.  Back
 
Topics:
Artificial Intelligence and Deep Learning, Federal, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7149
Download:
Share:
 
Abstract:
We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,6550 ...Read More
Abstract:

We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,65504). This narrow numerical range can result both in overflow ("inf/nan" problem) or underflow ("vanishing gradient") during training of deep networks. We'll describe the new scaling algorithm, implemented in nvcaffe, which prevents these negative effects. With this algorithm, we successfully trained such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy. Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Algorithms & Numerical Techniques, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7218
Download:
Share:
 
Abstract:
We'll present new techniques for training machine learning models using low-precision computation and communication. We'll start by briefly outlining new theoretical results proving that, surprisingly, many fundamental machine learning t ...Read More
Abstract:

We'll present new techniques for training machine learning models using low-precision computation and communication. We'll start by briefly outlining new theoretical results proving that, surprisingly, many fundamental machine learning tools, such as dense generalized linear models, can be trained end-to-end (samples, model, and gradients) using low precision (as little as one bit per value), while still guaranteeing convergence. We'll then explore the implications of these techniques with respect to two key practical applications: multi-GPU training of deep neural networks, and compressed sensing for medical and astronomical data.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, Performance Optimization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7580
Download:
Share:
 
Abstract:
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting when training deep lear ...Read More
Abstract:
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting when training deep learning models. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. We propose a novel neural network parameterization, that we call Diet Networks, which considerably reduces the number of free parameters in the model. The Diet Networks parametrization is based on the idea that we can first learn or provide an embedding for each input feature and then learn how to map a feature's representation to the parameters linking the value of the feature to each of the hidden units of the classifier network. We experiment on a population stratification task of interest to medical studies and show that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier. This work was accepted at ICLR 2017.  Back
 
Topics:
Artificial Intelligence and Deep Learning, Computational Biology & Chemistry
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7643
Download:
Share:
 
Abstract:
Deep learning tools present a tremendous opportunity to improve healthcare. By increasing efficiency and accuracy of diagnostic testing, and elevating meaning from vast troves of clinical data, deep learning provides a pathway to true precision ...Read More
Abstract:

Deep learning tools present a tremendous opportunity to improve healthcare. By increasing efficiency and accuracy of diagnostic testing, and elevating meaning from vast troves of clinical data, deep learning provides a pathway to true precision care. However, there are challenges in the translation of this technology to the clinic: model performance, infrastructure development, data privacy, hospital policy, and vendor relationships are all critical components to this effort. We'll discuss the early experience of the MGH & BWH Center for Clinical Data Science in supporting the translation of deep learning technologies in medicine, touching upon many of the existing and emerging technical, clinical, and cultural challenges that this work presents.

  Back
 
Topics:
Artificial Intelligence and Deep Learning, AI in Healthcare, Healthcare and Life Sciences, Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7722
Download:
Share:
Astronomy & Astrophysics
Presentation
Media
Abstract:
After the Big Bang, the Universe contained hydrogen, helium, and a bit of lithium. Every other element on the periodic table is produced in stars and is disseminated into interstellar space via supernova explosions. Simulations of supernovae are amon ...Read More
Abstract:
After the Big Bang, the Universe contained hydrogen, helium, and a bit of lithium. Every other element on the periodic table is produced in stars and is disseminated into interstellar space via supernova explosions. Simulations of supernovae are among the most compute-intensive multi-physics applications on the world's largest modern supercomputers. We will discuss recent development of the FLASH code intended to make these simulations even more physically meaningful. In particular, well describe how our work on FLASH, as part of the OLCF CAAR program, allowed us to increase the number of tracked nuclear species from about a dozen to hundreds, making precision predictions that can be compared to observations possible.  Back
 
Topics:
Astronomy & Astrophysics, Computational Physics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91009
Streaming:
Share:
 
Abstract:
Learn how GPUs are pushing the limits of the largest astronomical telescopes on Earth and how they'll be used to image life-bearing planets outside our solar system. Thanks to hardware features such as Tensor Cores and mixed-precision suppor ...Read More
Abstract:

Learn how GPUs are pushing the limits of the largest astronomical telescopes on Earth and how they'll be used to image life-bearing planets outside our solar system. Thanks to hardware features such as Tensor Cores and mixed-precision support, plus optimized AI frameworks, GPU technology is changing how large data streams from optical sensors are digested in real time. We'll discuss how real-time AI made possible by GPUs opens up new means to optimally control the system and calibrate images, which will help scientists get the most out of the largest optical telescopes. GPUs will also benefit future extreme-size facilities like the European Extremely Large Telescope because the complexity of maintaining exquisite image quality increases with the square of its diameter size. We'll present on-sky results obtained on the 8.2-meter Subaru Telescope and explain why these techniques will be essential to future giant telescopes.

  Back
 
Topics:
Astronomy & Astrophysics, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9634
Streaming:
Download:
Share:
 
Abstract:
We''ll present a summary of ongoing work that targets the use of newer GPU architecture (Pascal and Volta) features in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this ex ...Read More
Abstract:

We''ll present a summary of ongoing work that targets the use of newer GPU architecture (Pascal and Volta) features in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. With Pascal and Volta architectures, we''ll discuss the advantage of using higher memory bandwidth, half-single precision, and integer arithmetic in existing GPU-based correlator pipeline code. This is an ongoing effort between the National Centre for Radio Astrophysics and NVIDIA. We''ll look at various processing stages involved in the pipeline for exploring optimization possibilities, and highlight interesting results that were achieved. We''ll address in detail the effect of using half precision with respect to accuracy of performance and required library changes.

  Back
 
Topics:
Astronomy & Astrophysics
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8339
Streaming:
Download:
Share:
Autonomous Vehicles
Presentation
Media
Abstract:
Leaders from the mapping technology companies will discuss the advantages of various algorithms to create and maintain maps, followed by a short Q&A session. HERE: Vladimir Shestak, Lead Software Engineer Automated Driving Edge Perception fo ...Read More
Abstract:

Leaders from the mapping technology companies will discuss the advantages of various algorithms to create and maintain maps, followed by a short Q&A session. HERE: Vladimir Shestak, Lead Software Engineer Automated Driving Edge Perception for HD Map Maintenance: We start this talk by presenting a brief overview of HD Live Map created by HERE and its use for connected ADAS or automated driving solutions. Although building such a map with a required centimeter level precision is technically hard, the instant the HD Live Map is built, changes in the real world can occur causing the map to no longer reflect reality.  Hence, a proper maintenance strategy must be in place with the goal to identify discrepancies between the HD Live Map and the real world and heal the HD Live Map as quickly as possible. We discuss a spectrum of techniques developed by HERE to address the map-healing process and then focus on our low-cost solutions for in-vehicle change detection. The example system employs a consumer-grade Android-based sensing system streaming imagery and telemetry in real-time into HERE Edge Perception software stack. We present the high-level software architecture of the stack, its main components, i.e., feature detection, object tracking and triangulation, RWO and Maplet generation, as well as in-vehicle deployment options. The real-time performance evaluation of the system concludes our talk. NavInfo Europe:  Geetank Raipuria, Computer Vision Engineer Real-Time Object Detection and Semantic Segmentation: This session will discuss how NavInfo uses computer vision and deep learning to build high-definition maps that cover China's highways and large city streets. This involves performing object detection and semantic segmentation on visual imagery collected from vehicle sensors. The NavInfo Europe Advanced Research Lab creates processes that extract information from this data, both in real-time onboard vehicles using the NVIDIA DRIVE platform, and faster than real-time, processing offline gathered video material through NVIDIA DeepStream.

  Back
 
Topics:
Autonomous Vehicles, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9351
Streaming:
Download:
Share:
 
Abstract:
Moderator: Dr. Justyna Zander, Global Head of Mapping, NVIDIA This session will discuss NVIDIA DRIVE Mapping, a platform that enables vehicle manufacturers to use maps from various global providers for highly accurate navigation and localization. DRI ...Read More
Abstract:
Moderator: Dr. Justyna Zander, Global Head of Mapping, NVIDIA This session will discuss NVIDIA DRIVE Mapping, a platform that enables vehicle manufacturers to use maps from various global providers for highly accurate navigation and localization. DRIVE Mapping products integrate a scalable sensor suite, software development kits, and co-integrated high-definition maps from leading mapping companies. These end-to-end technologies help collect environmental data to create and update HD maps. We'll explain how the platform makes it possible for a self-driving vehicle to localize itself with precision, discern potential hazards, and determine exactly where it can safely drive. Leaders from the mapping technology companies will discuss the advantages of various modalities of maps and the benefit they provide to autonomous vehicles, followed by a short Q&A session. TomTom: Willem Strijbosch, Head of Autonomous Driving Mapping Progress on the Car-to-Cloud-to-Car Cycle The talk will discuss the latest on map creation and using crowdsourced data for map updates at TomTom.  3DMapping: Dr. Gunnar Gräfe, CEO and Founder Precise Ultra HD Map Data as Basis for Virtual Testing and Simulation Digital road data is the basis for virtual testing and simulation. Artificially designed digital roads may help case by case, but for various applications the precise digitalization and digital as-built representation of real-world roads is needed. The typical requirement is, that the roads used for virtual testing and simulation are regarded as digital twin of the real-world roads, which is prerequisite for comparable testing in reality and in the virtual environment. The technical solution for digitizing test tracks, race tracks and public roads with sufficient accuracy and resolution is high-end mobile surveying using high-resolution scanners and multiple cameras. 3D Mapping has invented the necessary technology since more than 20 years and today deploys van-based survey systems worldwide. The technology is used for example to generate high-resolution digital road surface models in OpenCRG format or to produce precise high definition reference maps in OpenDrive format, which are either used for virtual simulation and testing or as reference map in the car for autonomous driving development. 3D Mapping is member of the OpenDrive core team and has been intensively working on standardization and updates of the formats OpenDrive and OpenCRG since several years and is fully engaged in the ongoing ASAM format standardizations. The developments lead to new standards including 3D environment combined with scenario elements.  Back
 
Topics:
Autonomous Vehicles
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9771
Streaming:
Download:
Share:
 
Abstract:
Autonomous driving systems use various neural network models that require extremely accurate and efficient computation on GPUs. This session will outline how Zoox employs two strategies to improve inference performance (i.e., latency) of trained neur ...Read More
Abstract:
Autonomous driving systems use various neural network models that require extremely accurate and efficient computation on GPUs. This session will outline how Zoox employs two strategies to improve inference performance (i.e., latency) of trained neural network models without loss of accuracy: (1) inference with NVIDIA TensorRT, and (2) inference with lower precision (i.e., Fp16 and Int8). We will share our learned lessons about neural network deployment with TensorRT and our current conversion workflow to tackle limitations.  Back
 
Topics:
Autonomous Vehicles
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9895
Streaming:
Download:
Share:
Computational Biology & Chemistry
Presentation
Media
Abstract:
We'll focus on one of the three pilots of the DOE and NCI partnership on precision oncology and the Cancer Moonshot, namely predicting tumor cell response to drug treatments with deep learning. Predicting tumor cell response to drug treatmen ...Read More
Abstract:

We'll focus on one of the three pilots of the DOE and NCI partnership on precision oncology and the Cancer Moonshot, namely predicting tumor cell response to drug treatments with deep learning. Predicting tumor cell response to drug treatments is a critical challenge for accomplishing the promise of precision medicine in oncology. As part of a joint project between DOE and NCI to develop advanced computing solutions for caner, we are developing a deep learning-based framework for modeling tumor-drug interaction and predicting dose response in pre-clinical screening.

  Back
 
Topics:
Computational Biology & Chemistry, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7788
Download:
Share:
Computational Fluid Dynamics
Presentation
Media
Abstract:
In this talk we will look at advances in the simulation of particulate systems in Computer Aided Engineering (CAE) applications. We will in particular be focusing on the Discrete Element Method (DEM) and the strides made in terms of the number of par ...Read More
Abstract:
In this talk we will look at advances in the simulation of particulate systems in Computer Aided Engineering (CAE) applications. We will in particular be focusing on the Discrete Element Method (DEM) and the strides made in terms of the number of particles and particle shape using the GPU based code Blaze-DEM. A variety of industrial applications ranging from mining, agriculture, civil engineering to pharmaceuticals will be discussed. We will also touch on how we can leverage the next wave of GPU computing namely, half precession and tensor cores in scientific computing which is still predominantly double precision based. Finally we look at the work been done by various groups to create a multi-physics GPU based platform using Blaze-DEM.  Back
 
Topics:
Computational Fluid Dynamics, Computer Aided Engineering
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8348
Streaming:
Share:
 
Abstract:
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name ...Read More
Abstract:

Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.

  Back
 
Topics:
Computational Fluid Dynamics, HPC and AI, HPC and Supercomputing
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23348
Download:
Share:
Computational Physics
Presentation
Media
Abstract:
The direct solution of the N-body problem is a simple, yet scientifically important and ubiquitous showcase algorithm for modern GPUs. However, the computational complexity is O(N^2). The fast multipole method is an algorithm that reduces runtime and ...Read More
Abstract:
The direct solution of the N-body problem is a simple, yet scientifically important and ubiquitous showcase algorithm for modern GPUs. However, the computational complexity is O(N^2). The fast multipole method is an algorithm that reduces runtime and complexity to optimal O(N) for any required precision. We'll present an optimized, fully NVIDIA CUDA-enabled, templated C++ implementation of the FMM, which considers all stages of the method, from particle input to the forces extraction. We compare different parallelization approaches and show the performance improvement when going from a dynamic parallelization to a presorted list-based approach that fits particular system constraints such as periodic boundary conditions. We'll discuss how to exploit the FMM operators such that both memory access overhead and the number of complex multiplications are minimized. Thereby the kernels are led to the compute bound range, and performance is increased.  Back
 
Topics:
Computational Physics, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7196
Download:
Share:
Computer Vision
Presentation
Media
Abstract:
This presentation will provide an overview of Blue River Technology's use of GPUs in developing their See and Spray technology for Precision Agriculture. We will motivate the use of Deep Learning in detection and classification of crops and ...Read More
Abstract:

This presentation will provide an overview of Blue River Technology's use of GPUs in developing their See and Spray technology for Precision Agriculture. We will motivate the use of Deep Learning in detection and classification of crops and weeds in production environments, and highlight the ways in which NVIDIA GPUs have provided the tools and platform for training powerful models. NVIDIA GPUs have also helped us perform real-time inference on working machines in the field. This talk will show how these systems perform and provide videos of the machines in operation.

  Back
 
Topics:
Computer Vision, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8160
Streaming:
Share:
 
Abstract:
Proving that such a complex system as an autonomous car is safe cannot be done using existing standards.   A new method needs to be invented that is much more data driven and probability based. Traditional redundant   solutio ...Read More
Abstract:

Proving that such a complex system as an autonomous car is safe cannot be done using existing standards.   A new method needs to be invented that is much more data driven and probability based. Traditional redundant   solutions don't apply when trying to optimize a Precision-Recall curve. Getting acceptance from the regulatory bodies and the public will be much easier if the industry converges on what this new method shall be.

  Back
 
Topics:
Computer Vision, Computer Vision
Type:
Talk
Event:
GTC Europe
Year:
2017
Session ID:
23166
Download:
Share:
Consumer Engagement & Personalization
Presentation
Media
Abstract:
We''ll discuss the GPU accelerated Monte Carlo compute at JP Morgan which was architected for C1060 cards and revamped a few times as new architectures were released. The key features of the code are exclusive use of double precisio ...Read More
Abstract:

We''ll discuss the GPU accelerated Monte Carlo compute at JP Morgan which was architected for C1060 cards and revamped a few times as new architectures were released. The key features of the code are exclusive use of double precision, data caching, and code structure where significant amount of CPU pre-compute is followed by running multiple GPU kernels. On the latest devices, memory per flop is a throughput limiting factor for a class of our GPU-accelerated models. As byte/flop ratio is continuing to fall from one generation of GPU to the next, we are exploring the ways to re-architecture Monte Carlo simulation code to decrease memory requirements and improve TCO of the GPU-enabled compute. Obvious next steps are store less, re-calculate more, and unified memory. 

  Back
 
Topics:
Consumer Engagement & Personalization, Finance - Quantitative Risk & Derivative Calculations
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8802
Download:
Share:
Deep Learning & AI Frameworks
Presentation
Media
Abstract:
Learn more about using the most popular computer vision and natural language processing models with state-of-the-art accuracy in MXNet, accelerated for NVIDIA Tensor Cores, to reduce training time. The session will explore the MXNet Gluon CV and NLP ...Read More
Abstract:
Learn more about using the most popular computer vision and natural language processing models with state-of-the-art accuracy in MXNet, accelerated for NVIDIA Tensor Cores, to reduce training time. The session will explore the MXNet Gluon CV and NLP toolkits with a demo showing how to achieve out-of-the-box acceleration on Tensor Cores. We'll also review and demo a new tool for MXNet, automated mixed-precision, which shows that with only a few lines of code, any MXNet Gluon model can be accelerated on NVIDIA Tensor Cores. In addition, we'lldiscuss the MXNet ResNet-50 MLPerf submission on NVIDIA DGX systems and share how MXNet was enhanced with additions such as Horovod and small batch to set a new benchmark record. Beyond training, we'll also cover improvements to the existing experimental MXNet-TRT integration going further than FP32 and ResNets.  Back
 
Topics:
Deep Learning & AI Frameworks, Tools & Libraries
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S91003
Streaming:
Download:
Share:
 
Abstract:
Mixed precision training of deep neural networks provides tremendous benefit. It requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an ...Read More
Abstract:
Mixed precision training of deep neural networks provides tremendous benefit. It requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8X speedup over FP32. In this tutorial we'll first present the considerations and techniques when training with reduced precision, including master weights and automatic loss scaling. After, we'll discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.  Back
 
Topics:
Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9143
Streaming:
Download:
Share:
 
Abstract:
We'll discuss how we build a highly scalable deep learning training system and training ImageNet in four minutes. For dense GPU clusters we optimize the training system by proposing a mixed-precision training method that significantly improves train ...Read More
Abstract:
We'll discuss how we build a highly scalable deep learning training system and training ImageNet in four minutes. For dense GPU clusters we optimize the training system by proposing a mixed-precision training method that significantly improves training throughput of a single GPU without losing accuracy. We also propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on ImageNet dataset without losing accuracy. And we propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. Our training system can achieve 75.8% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7% top-1 test accuracy within 4 minutes using 1024 Tesla P40 GPUs,which also outperforms all other existing systems.  Back
 
Topics:
Deep Learning & AI Frameworks, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9146
Streaming:
Download:
Share:
 
Abstract:
With deep learning being largely invariant to operator precision, there is potential for significant gains in performance and memory usage when training and serving deep learning models. Learn more about how you can take advantage of mixed precision ...Read More
Abstract:
With deep learning being largely invariant to operator precision, there is potential for significant gains in performance and memory usage when training and serving deep learning models. Learn more about how you can take advantage of mixed precision training in PyTorch to realize performance gains.  Back
 
Topics:
Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9832
Streaming:
Download:
Share:
 
Abstract:
We'll describe NVIDIA's Automatic Mixed Precision (AMP) for PyTorch, a tool to enable mixed precision training for neural networks in just three lines of Python. Mixed precision training combines memory savings and Tensor Core-accelerated throughpu ...Read More
Abstract:
We'll describe NVIDIA's Automatic Mixed Precision (AMP) for PyTorch, a tool to enable mixed precision training for neural networks in just three lines of Python. Mixed precision training combines memory savings and Tensor Core-accelerated throughput of FP16 (16-bit) arithmetic for compute-intensive operations with traditional FP32 arithmetic for a few selected operations. In practice, mixed precision delivers end-to-end speedups between 2 and 4X for many bellwether networks. We'll briefly review mixed precision benefits, concepts, and best practices, then walk through implementing AMP in several example models.  Back
 
Topics:
Deep Learning & AI Frameworks
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9998
Streaming:
Download:
Share:
Finance
Presentation
Media
Abstract:
Learn about recent progress in accelerating Monte Carlo simulation on the GPU in applications for pricing financial instruments and risk management. We'll focus on the forward Monte Carlo simulation, which allows for a natural parallelization across ...Read More
Abstract:
Learn about recent progress in accelerating Monte Carlo simulation on the GPU in applications for pricing financial instruments and risk management. We'll focus on the forward Monte Carlo simulation, which allows for a natural parallelization across CUDA cores, and present a recent extension of our implementation to a broad selection of industry standard valuation models for different asset classes, including hybrid models that can be used to price multi-currency and multi-asset portfolios. Even with increasing complexity and dimensionality of valuation models, our benchmarks show stable GPU speedup factors in the ranges of 20x and 30x for calculations with floating point double precision FP64 and single precision FP32, respectively. We also briefly summarize a most recent research project on a more complex backward (/American / Least Squares) Monte Carlo simulation method, based on regression algorithms used to price general financial instruments with optionality. The latter method heavily relies on matrix calculations and benefits from using GPU- accelerated libraries, cuBLAS for linear algebra and cuSOLVER for solvers.  Back
 
Topics:
Finance, Finance - Quantitative Risk & Derivative Calculations
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8587
Streaming:
Share:
GPU Virtualization
Presentation
Media
Abstract:
End-to-end machine learning workloads perform well using NVIDIA virtual GPUs in VMware vSphere. We'll discuss how to combine the performance of NVIDIA GPUs with manageability and scalability features and maximize GPU utilization for machine learning ...Read More
Abstract:
End-to-end machine learning workloads perform well using NVIDIA virtual GPUs in VMware vSphere. We'll discuss how to combine the performance of NVIDIA GPUs with manageability and scalability features and maximize GPU utilization for machine learning workloads using VMware and NVIDIA technology. We will outline end-to-end machine learning, including training, deploying for inferencing, and managing a production environment using VMware vSphere and VMware's Pivotal Kubernetes Service. NVIDIA Turing architecture is positioned for mixed-precision training and inferencing workloads. We'll describe ways to deploy GPU-Based workloads developed with machine learning frameworks like TensorFlow and Caffe2 by using VMware DirectPathIO and NVIDIA virtual GPU (vGPU). We'll also provide case studies that leverage vGPU scheduling options such as Equal Share, Fixed Share, and Best Effort, and illustrate their benefits with our performance study.  Back
 
Topics:
GPU Virtualization
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9815
Streaming:
Download:
Share:
Genomics & Bioinformatics
Presentation
Media
Abstract:
Learn about the importance of genomics in precision medicine and understand how researchers are decoding genomic information by building deep learning models. We'll show how the Kipoi model zoo for genomics (kipoi.org) can help in this endeavor and ...Read More
Abstract:
Learn about the importance of genomics in precision medicine and understand how researchers are decoding genomic information by building deep learning models. We'll show how the Kipoi model zoo for genomics (kipoi.org) can help in this endeavor and discuss several Kipoi use cases that demonstrate how it facilitates using, sharing, archiving, and building deep learning models in genomics. In addition, we'll highlight some recent successes of deep learning in genomics. Session participants can expect to gain appreciation for sharing end-to-end processing pipelines (not just models) and gain insight into how deep learning and GPU hardware accelerators are changing genomics.  Back
 
Topics:
Genomics & Bioinformatics, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9705
Streaming:
Download:
Share:
 
Abstract:
The MULTI-X platform simplifies the logistical challenges of deploying AI and ML solutions by providing pre-configured environments with ad-hoc scalable computing resources to quickly build, test, share and reproduce scientific applications. Its ...Read More
Abstract:

The MULTI-X platform simplifies the logistical challenges of deploying AI and ML solutions by providing pre-configured environments with ad-hoc scalable computing resources to quickly build, test, share and reproduce scientific applications. Its comprehensible modular framework accelerates the development and reduces the burden and cost of implementing AI solutions. The talk will include details of two exemplary deployments in the area of Cardiac Image Analysis, presented together with the outcome of the analysis of 5000 subjects of the UK-Biobank database. Developing and deploying AI solutions for clinical research use cases can be complex, resource intensive, and therefore expensive and challenging to implement for many researchers, groups and healthcare organisations. In the era of Big-Data and the IoT, the most critical problems are related to the secure access and management of large heterogeneous datasets, the deployment of GPU-accelerated massive parallel processing systems, and the setup of development environments encompassing complex ML tools and applications. Two exemplary use cases of the implementation of GPU-enabled AI solutions in the area of Cardiac Image Analysis, both developed and deployed in MULTI-X, will be presented together with the outcome of the analysis of 5000 Subjects of the UK-Biobank database.

  Back
 
Topics:
Genomics & Bioinformatics, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8274
Streaming:
Download:
Share:
HPC and AI
Presentation
Media
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We ...Read More
Abstract:
Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.  Back
 
Topics:
HPC and AI, AI Application, Deployment & Inference
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9542
Streaming:
Download:
Share:
HPC and Supercomputing
Presentation
Media
Abstract:
Supercomputing is playing a key role in our efforts to understand complex biological systems. To date we have performed calculations on the Summit supercomputer at OLCF with two different algorithms achieving 2.41 exaflops and 2.32 exaflops of mixed ...Read More
Abstract:
Supercomputing is playing a key role in our efforts to understand complex biological systems. To date we have performed calculations on the Summit supercomputer at OLCF with two different algorithms achieving 2.41 exaflops and 2.32 exaflops of mixed precision performance. The larger of these calculations required 22 Zeta-floating point-operations to achieve. The cost of generating biological data is dropping exponentially, resulting in increased data that has far outstripped the predictive growth in computational power from Moores Law. This flood of data has opened a new era of systems biology in which there are unprecedented opportunities to gain insights into complex biological systems. Integrated biological models need to capture the higher order complexity of the interactions among cellular components. Solving such complex combinatorial problems will give us extraordinary levels of understanding of biological systems. Paradoxically, understanding higher order sets of relationships among biological objects leads to a combinatorial explosion in the search space of biological data. These exponentially increasing volumes of data, combined with the desire to model more and more sophisticated sets of relationships within a cell, across an organism and up to ecosystems and, in fact, climatological scales, have led to a need for computational resources and sophisticated algorithms that can make use of such datasets. The result is a comprehensive systems biology model of an organism and how it has adapted to and responds to its abiotic and biotic environment which has applications in bioenergy, precision agriculture, and ecosystem studies among other disciplines.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1910
Streaming:
Share:
 
Abstract:
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applicat ...Read More
Abstract:
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax=b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16 and FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4×speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1930
Streaming:
Share:
 
Abstract:
Over the past year numerous updates to the CUDA platform have been released for libraries, language and system software. These target a range of diverse features from mixed precision solvers to scalable programming models to memory management to appl ...Read More
Abstract:
Over the past year numerous updates to the CUDA platform have been released for libraries, language and system software. These target a range of diverse features from mixed precision solvers to scalable programming models to memory management to applications of ray tracing in numerical methods. This talk will present a tour of all thats new and how to take advantage of it.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2019
Session ID:
SC1931
Streaming:
Download:
Share:
 
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll ...Read More
Abstract:
Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll discuss new CUDA Fortran features, including Tensor Core support and cooperative groups, and we'll cover our current work on half-precision. We'll explain new OpenACC 2.7 features, along with Beta true deep-copy directives and support for OpenACC programs on unified memory systems. The PGI compiler-assisted software testing feature helps determine where differences arise between CPU and GPU versions of a program or when porting to a new system. Learn upcoming projects, which include a high-performance PGI Subset of OpenMP for NVIDIA GPUs, support for GPU programming with standard C++17 parallel STL and Fortran, and incorporating GPU-Accelerated math libraries to support porting and optimization of HPC applications on NVIDIA GPUs.  Back
 
Topics:
HPC and Supercomputing, Programming Languages
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9289
Streaming:
Download:
Share:
 
Abstract:
The use of low-precision arithmetic in computing methods has been a powerful tool to accelerate numerous scientific computing applications including Artificial Intelligence. We present an investigation showing that other HPC applications can har ...Read More
Abstract:

The use of low-precision arithmetic in computing methods has been a powerful tool to accelerate numerous scientific computing applications including Artificial Intelligence. We present an investigation showing that other HPC applications can harness this power too, and in particular, the general HPC problem of solving Ax = b, where A is a large dense matrix, and the solution is needed in FP64 accuracy. Our approach is based on the mixed-precision (FP16->FP64) iterative refinement technique – we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly-tuned implementations where we show how the use of FP16-TC (tensor cores) arithmetic can provide up to 4X speedup and improve the energy consumption by a factor of 5 achieving 74 Gflop/Watt. This is due to the performance boost that the FP16 (Tensor Cores) provide and to its better accuracy that outperforms the classical FP16.

  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
Supercomputing
Year:
2018
Session ID:
SC1826
Download:
Share:
 
Abstract:
The 2015 Nobel Prize in Physics was awarded for the discovery of neutrino oscillations, which indicates that neutrinos have mass. This phenomenon was unexpected and is one of the clearest signs of new physics beyond the Standard Model. The NOvA exper ...Read More
Abstract:
The 2015 Nobel Prize in Physics was awarded for the discovery of neutrino oscillations, which indicates that neutrinos have mass. This phenomenon was unexpected and is one of the clearest signs of new physics beyond the Standard Model. The NOvA experiment aims to deepen our understanding of neutrino oscillations by measuring the properties of a muon neutrino beam produced at Fermi National Accelerator Laboratory at a Near Detector close to the beam source, and measuring the rate that muon neutrinos oscillate into electron neutrinos over an 810 km trip to a 14,000 ton Far Detector in Ash River, MN. Understanding this process may explain why the universe is made of matter instead of antimatter. Performing this measurement requires a high-precision method for classifying neutrino interactions. To this end, we developed a convolutional neural network that gave a 30 percent improvement in electron neutrino selection over previous methods equivalent increasing the Far Detector mass by 4,000 tons.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7230
Download:
Share:
 
Abstract:
Recent advances in the deployment of deep learning recurrent nets have been demonstrated in scaling studies of Princeton's new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Net) Code on modern GPU systems. This is a "big-data" ...Read More
Abstract:
Recent advances in the deployment of deep learning recurrent nets have been demonstrated in scaling studies of Princeton's new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Net) Code on modern GPU systems. This is a "big-data" project in that it has access to the huge EUROFUSION/JET disruption data base of over a half-petabyte to drive these studies. FRNN implements a distributed data parallel synchronous stochastic gradient approach with TensorFlow and Theano libraries at the backend and MPI for communication. This deep learning software has recently demonstrated excellent scaling up to 6,000 GPUs on Titan at Oak Ridge National Lab. The associated accomplishments exhibit clear progress toward the goal of establishing the practical feasibility of using leadership-class supercomputers to greatly enhance training of neural nets for transformational impact on key discovery science application domains such as fusion energy science. Powerful systems expected to be engaged for near-future deployment of this deep learning software include: (1) NVIDIA's SATURN V featuring its nearly 1,000 Pascal P100 GPUs; (2) Switzerland's Piz Daint Cray XC50 system with 4,500 P100 GPUs; (3) Japan's Tsubame 3 system with 3,000 P100 GPUs; (4) and OLCF's Summit-Dev system. Summarily, deep learning software trained on large scientific datasets hold exciting promise for delivering much-needed predictive tools capable of accelerating knowledge discovery. The associated creative methods being developed including a new half-precision capability -- also has significant potential for cross-cutting benefit to a number of important application areas in science and industry.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7243
Download:
Share:
 
Abstract:
The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simulation, ...Read More
Abstract:
The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simulation, and machine learning to advance our understanding of cancer biology and to integrate what is known into predictive models that can inform research and guide therapeutic developments. In 2015, the U.S. Department of Energy formed a collaboration with the National Cancer Institute for the joint development of advanced computing solutions for cancer.  Back
 
Topics:
HPC and Supercomputing
Type:
Talk
Event:
SIGGRAPH
Year:
2017
Session ID:
SC1711
Share:
Healthcare and Life Sciences
Presentation
Media
Abstract:
Precision medicine initiatives bring tremendous opportunities to speed up scientific discovery and promote quality improvement in medicine. However, it also raises big challenges in dealing with massive data from heterogeneous sources, such as electr ...Read More
Abstract:
Precision medicine initiatives bring tremendous opportunities to speed up scientific discovery and promote quality improvement in medicine. However, it also raises big challenges in dealing with massive data from heterogeneous sources, such as electronic health records (EHRs), -omics, and wearables. Traditional data mining and statistical learning methods tend to favor clean and structured data, which may not be able to effectively utilize the rich information embedded in biomedical data. The latest breakthrough in deep learning technologies provides a unique opportunity to retrieve information from complex and heterogeneous sources. We'll review advances in deep learning applied to precision medicine and next-generation healthcare, with a special focus on Deep Patient, a general-purpose patient representation from EHRs that facilitates clinical predictive modeling and medical analysis.  Back
 
Topics:
Healthcare and Life Sciences, AI in Healthcare, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7563
Download:
Share:
 
Abstract:
The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simula ...Read More
Abstract:

The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simulation, and machine learning to advance our understanding of cancer biology and to integrate what is known into predictive models that can inform research and guide therapeutic developments. In 2015, the U.S. Department of Energy formed a collaboration with the National Cancer Institute for the joint development of advanced computing solutions for cancer.

  Back
 
Topics:
Healthcare and Life Sciences, Artificial Intelligence and Deep Learning, HPC and Supercomputing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7782
Download:
Share:
Intelligent Machines, IoT & Robotics
Presentation
Media
Abstract:
An introduction to how the Jetson TX2 can be used for selective harvesting and precision agriculture. The Jetson TX2 is a surprisingly robust platform and the use of some of the unique, basic features of the module such as unified memory and dyn ...Read More
Abstract:

An introduction to how the Jetson TX2 can be used for selective harvesting and precision agriculture. The Jetson TX2 is a surprisingly robust platform and the use of some of the unique, basic features of the module such as unified memory and dynamic execution can create robust applications for the sensing, robot control, and data handling required for selective harvesting, or the harvesting of fruit over several cycles.\n

  Back
 
Topics:
Intelligent Machines, IoT & Robotics, Computer Vision
Type:
Talk
Event:
GTC Washington D.C.
Year:
2018
Session ID:
DC8244
Streaming:
Share:
 
Abstract:
We'll discuss the latest updates and features to the autonomous precision landing and GPS-denied navigation capabilities of the Joint Tactical Aerial Resupply Vehicle (JTARV) platform. These capabilities are enabled by our high-performance computer ...Read More
Abstract:
We'll discuss the latest updates and features to the autonomous precision landing and GPS-denied navigation capabilities of the Joint Tactical Aerial Resupply Vehicle (JTARV) platform. These capabilities are enabled by our high-performance computer vision libraries, Sentinel and HawkEye, both of which capitalize on NVIDIA's mobile GPUs and optimized deep learning frameworks. Autonomous navigation for aerial vehicles demand that core algorithms provide not only relevant, actionable information, but that they do so in a timely manner -- that is, the algorithms must operate in real time. We'll discuss how Sentinel object detection networks limit processing requirements for the autonomous precision landing capability. The requirement for high performance dictates optimization at every level, which is the focus of our ongoing research and development efforts.  Back
 
Topics:
Intelligent Machines, IoT & Robotics, Computer Vision
Type:
Talk
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7143
Download:
Share:
Intelligent Video Analytics
Presentation
Media
Abstract:
Learn how to develop an Artificial Intelligence system to localize and recognize food on trays to generate a purchase ticket in a check out process.
(1) Solving a real business problem using Deep Learning advanced technology based on obje ...Read More
Abstract:

Learn how to develop an Artificial Intelligence system to localize and recognize food on trays to generate a purchase ticket in a check out process.
(1) Solving a real business problem using Deep Learning advanced technology based on object detection and localization.
(2) Combining a pipeline of models to improve accuracy, precision and with reasonable recall levels.
(3) Discovering how to develop and train a model in the cloud to be used embedded in an NVIDIA Jetson TX1 device.

  Back
 
Topics:
Intelligent Video Analytics
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8236
Streaming:
Download:
Share:
 
Abstract:
Detecting road users in real-time is key to enabling safe autonomous driving applications in crowded urban environments. The talk presents a distributed sensor infrastructure being deployed in the city of Modena (Italy) at the heart of the Itali ...Read More
Abstract:

Detecting road users in real-time is key to enabling safe autonomous driving applications in crowded urban environments. The talk presents a distributed sensor infrastructure being deployed in the city of Modena (Italy) at the heart of the Italian 'Motor Valley'. Modena's Automotive Smart Area (MASA) connects hundreds of smart cameras, supporting embedded GPU modules for edge-side real-time detection, with higher performance GPU (fog) nodes at block level and low latency wireless V2X communication. A distributed deep learning paradigm balances precision and response time to give autonomous vehicles the required sensing support in a densely populated urban environment. The infrastructure will exploit a novel software architecture to help programmers and big data practitioners combine data-in-motion and data-at-rest analysis while providing Real-Time guarantees. MASA; funded under the European project CLASS, is an open testbench where interested partners may deploy and test next-generation AD applications in a tightly connected setting.

  Back
 
Topics:
Intelligent Video Analytics
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8254
Streaming:
Download:
Share:
Leadership and Policy in AI
Presentation
Media
Abstract:
As the industry and government collects massive amounts of data to help provide faster and more accurate clinical care, the gap in managing and exchanging these data types remains a challenge for the industry. Key highlights of this panel discussion ...Read More
Abstract:
As the industry and government collects massive amounts of data to help provide faster and more accurate clinical care, the gap in managing and exchanging these data types remains a challenge for the industry. Key highlights of this panel discussion will include how AI can advance treatment and prevention, what scientific and regulatory hurdles remain for industry success, and how Congress can best address possible privacy and security issues.  Back
 
Topics:
Leadership and Policy in AI, AI in Healthcare, Medical Imaging & Radiology
Type:
Panel
Event:
GTC Washington D.C.
Year:
2017
Session ID:
DC7159
Download:
Share:
Medical Imaging & Radiology
Presentation
Media
Abstract:
We'll discuss how the latest advances GPU technologies have have made it possible to reduce MRI scan time and increase reconstruction accuracy. These advances have added previously unseen capabilities to nuclear medical imaging by simulating photon ...Read More
Abstract:
We'll discuss how the latest advances GPU technologies have have made it possible to reduce MRI scan time and increase reconstruction accuracy. These advances have added previously unseen capabilities to nuclear medical imaging by simulating photon trajectories with excellent precision and speed. They've also accelerated work in cancer therapy with real-time simulation of the physics of thermal tumor ablation. Learn how we've unleashed the potential of high performance computing and deep learning on GPUs to drive medical imaging innovation.  Back
 
Topics:
Medical Imaging & Radiology, AI in Healthcare, HPC and AI
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9993
Streaming:
Download:
Share:
 
Abstract:
Recent developments in artificial intelligence, advances in GPU computing hardware and the availability of large scale medical imaging datasets allows us to learn how the human brain truly looks like from a biological, physiological, anatomical ...Read More
Abstract:

Recent developments in artificial intelligence, advances in GPU computing hardware and the availability of large scale medical imaging datasets allows us to learn how the human brain truly looks like from a biological, physiological, anatomical and pathological point-of-view. This learning process can be augmented by Electronic Healthcare Record data, cognitive examinations, and diagnostic/radiological report data, thus providing an integrated view of the human interpretation of neurological diseases. This talk will present how AI models can learn from big and unstructured neurological and neuroradiological data and be used as tools for precision medicine, with the aim of translating advanced imaging technologies and biomarkers to clinical practice, streamline the clinical workflow and improve the quality-of-care. It will also explore the technological translational process, requiring full clinical support, deep algorithmic integration into the radiological workflow, and the deployment of a high-throughput hospital-integrated GPU computational platform

  Back
 
Topics:
Medical Imaging & Radiology
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8101
Streaming:
Download:
Share:
 
Speakers:
,
Abstract:
We believe that medicine will be more precise and affordable. Physicians will integrate relevant patient data and insights at the point of decision for precise diagnostics. Therapy will be tailored to the characteristics of both the patient and ...Read More
Abstract:

We believe that medicine will be more precise and affordable. Physicians will integrate relevant patient data and insights at the point of decision for precise diagnostics. Therapy will be tailored to the characteristics of both the patient and disease ? resulting in the right treatment for the right patient at the right time. AI-powered decision support could help to balance the need for personalization when it matters and standardization to reduce unwarranted variations.

  Back
 
Topics:
Medical Imaging & Radiology, Genomics & Bioinformatics
Type:
Talk
Event:
GTC Europe
Year:
2018
Session ID:
E8290
Streaming:
Download:
Share:
 
Abstract:
We'll demonstrate how to use deep learning (DL) approaches to translate big data from routine clinical care into medical innovation that directly improves routine clinical care. Typically, large healthcare institutions have sufficient quantities of ...Read More
Abstract:
We'll demonstrate how to use deep learning (DL) approaches to translate big data from routine clinical care into medical innovation that directly improves routine clinical care. Typically, large healthcare institutions have sufficient quantities of clinical data to facilitate precision medicine through a DL paradigm. However, this clinical data is hardly translated into direct clinical innovation because computer algorithms cannot readily ingest or reason over it. Using routine mammographic screening data for breast cancer as an example, we first downloaded over 30,000 free text pathology reports and used long short term memory DL algorithms to infer cancer outcomes for individual patients. We then labeled over 700,000 mammographic views of breast imaging with our inferred pathology outcomes. Finally, we trained convolutional neural network DL algorithms to directly predict pathology outcomes from breast imaging. With our approach, we demonstrate how to leverage DL to realize precision oncology and significantly improve the interpretation of routine screening mammography for millions of women using routine clinical big data.  Back
 
Topics:
Medical Imaging & Radiology
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8471
Streaming:
Share:
Speech & Language Processing
Presentation
Media
Abstract:
We'll discuss OpenSeq2Seq, a TensorFlow-based toolkit for training deep learning models optimized for NVIDIA GPUs. The main features of our toolkit are ease of use, modularity, and support for fast distributed and mixed-precision training. OpenSeq2S ...Read More
Abstract:
We'll discuss OpenSeq2Seq, a TensorFlow-based toolkit for training deep learning models optimized for NVIDIA GPUs. The main features of our toolkit are ease of use, modularity, and support for fast distributed and mixed-precision training. OpenSeq2Seq provides a large set of state-of-the-art models and building blocks for neural machine translation (GNMT, Transformer, ConvS2S, etc.), automatic speech recognition (DeepSpeech2, Wave2Letter, etc.) speech synthesis (Tacotron2, etc.), and language modeling. All models have been optimized for mixed-precision training with GPU Tensor Cores, and they achieve 1.5-3x training speed-up comparing to float32.  Back
 
Topics:
Speech & Language Processing, AI & Deep Learning Research
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9187
Streaming:
Download:
Share:
Tools & Libraries
Presentation
Media
Abstract:
We'll demonstrate how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at deep learning training and inference and thus provi ...Read More
Abstract:
We'll demonstrate how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at deep learning training and inference and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the convolutional layers: the computational heart of all deep-learning frameworks (TensorFlow, Caffe, etc.). CLBlast has three main advantages over other BLAS libraries: 1) it can be explicitly tuned for specific matrix-sizes and hardware platforms, 2) it runs on less common devices (and it is fast), such as embedded and low-power GPUs, and 3) it can perform operations in half-precision FP16 format, saving precious bandwidth, time, and power.  Back
 
Topics:
Tools & Libraries, Artificial Intelligence and Deep Learning
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7280
Download:
Share:
Virtual Reality & Augmented Reality
Presentation
Media
Abstract:
As Tilt Brush and Quill are not voxel based, a new VR-based voxel painting system with large (40km^3) and detailed (0.3mm^3) canvas would be interesting. We develop an array of octree of depth 24, using 5 indices per cell: parent, child, and 3-neighb ...Read More
Abstract:
As Tilt Brush and Quill are not voxel based, a new VR-based voxel painting system with large (40km^3) and detailed (0.3mm^3) canvas would be interesting. We develop an array of octree of depth 24, using 5 indices per cell: parent, child, and 3-neighbors to accelerate ray traversal. We adaptively refine or coarsen the octree in CPU and sync it with GPU, and then ray cast front to back. To accelerate, we develop a foveated rendering algorithm. We design a quadtree render target whose resolution is dynamically adjusted to heat map, traverse ray, and then interpolate the color in screen space. We traverse ray through upper-level cells as the ray cone widens. We analyze floating point error propagations to thoroughly understand precision problems in deep cells and ray intersections.  Back
 
Topics:
Virtual Reality & Augmented Reality
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7698
Download:
Share: