We'll discuss training techniques and deep learning architectures for high-precision landmark localization. In the first part of the session, we'll talk about ReCombinator Networks, which aims at maintaining pixel-level image information, for high-accuracy landmark localization. This model combines coarse-to-fine features to first observe global (coarse) image information and then recombines local (fine) information. By using this model, we report SOTA on three facial landmark datasets. This model can be used for other tasks that require pixel-level accuracy (for example, image segmentation, image-to-image translation). In the second part, we'll talk about improving landmark localization in a semi-supervised setting, where less labeled data is provided. Specifically, we consider a scenario where few labeled landmarks are given during training, but lots of weaker labels (for example, face emotions, hand gesture) that are easier to obtain are provided. We'll describe training techniques and model architectures that can leverage weaker labels to improve landmark localization.
NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. This lab provides hands-on experience using TensorRT to convert the neural network model to INT8 precision, calibrate, vali date and deploy for inference in a self-driving car application.
We'll disscuss how GPUs are playing a central role in making advances in Ion Torrent's targeted sequencing workflow and talk about the S5 DNA sequencer from Ion Torrent that is enabling democratization of sequencing market and accelerating research in precision medicine at a breathtaking pace with the help of GPUs. We'll highlight our work in liquid biopsy and non-invasive prenatal testing and how the breadth in technology offerings in semiconductor chips gives us the scale of sequencing from small panels to exomes. We'll discuss our analysis pipeline and the latest and greatest in algorithm development and acceleration on GPUs as well as our experiences ranging from Fermi to Pascal GPU architectures.
Machine Learning in Precision Medicine: Patient-Specific Treatment Enabled by Quantitative Medical Imaging, Artificial Intelligence, and GPU Efficiency The attendees will learn about the need for and use of machine learning in today's patient-centered healthcare. The talk will focus on general approaches requiring machine learning to obtain image-based quantitative features, reach patient diagnoses, predict disease outcomes, and identify proper precision-treatment strategies. While the presented methods are general in nature, examples from cardiovascular disease management will be used to demonstrate the need for and power of machine learning enabled by the performance advantages of GPU computation.
This talk will overview the fields of Personalised Computational Medicine and In Silico Clinical Trials, which are revolutionizing Medicine and Medical Product Development. This talk will introduce these concepts, provide examples of how they can transform healthcare, and emphasize why artificial intelligence and machine learning are relevant to them. We will also explain the limitations of these approaches and why it is paramout to engage in both phenomenological (data-driven) and mechanistic (principle-driven) modelling. Both areas are in desperate need for better infrastructures -sofrware and hardaware- giving access to computational and storage resources. The talk will be thought-provoking and eye-opening as to opportunities in this space for researchers and industries alike.
The Role of Data in Achieving Precision and Value in Healthcare The goal of healthcare is to provide the most effective treatment to every patient in the most efficient way. Data plays a key role in every aspect of this process from decision support systems that provide a clinician with the right information at the right time, to scheduling algorithms that predict patient flow and schedule accordingly, to analytics to coach and support patients in achieving or maintaining a healthy lifestyle. Achieving the vision of a data-informed healthcare system will require fundamental advances in many areas including causal inference, inference on complex, high-dimensional and heterogeneous data, missing data, process modeling, bias reduction, statistical validation, and model adaptation, to name a few. In this talk, I will illustrate some of these challenges through concrete examples within the Malone Center.
Learn about using Tensor Cores to perform very fast matrix multiply-accumulate steps like those required in AI training. The key to Tensor Core performance is the use of 16-bit floating point arithmetic, but that causes significant rounding errors. Although algorithms like binomial correction or Karatsuba can reduce rounding errors considerably, they require additional calculations. We'll detail performance of these algorithms based on the Warp Matrix Multiply Accumulate API.
Road identification and route prediction in near real time remains a challenging problem for many geographic regions, particularly in the case of natural disasters or crisis situations. Existing methods such as manual road labeling or aggregation of mobile GPS track data are currently insufficient in dynamic scenarios. The frequent revisits of satellite imaging constellations may accelerate efforts to rapidly update road network and optimal path prediction, provided routing information can be extracted from imaging pixels. We'll demonstrate deep learning segmentation methods for identifying road center lines and intersections from satellite imagery, and inferring networks from these road segments. We'll also explore data quality requirements by comparing open source labels with-high precision labels created as part of the SpaceNet Roads challenge.
OpenSeq2Seq is an open-source, TensorFlow-based toolkit, which supports a wide range of off-the-shelf models for Natural Language Translation (GNMT, Transformer, ConvS2S), Speech Recognition (Wave2Letter, DeepSpeech2), Speech Synthesis (Tacotron 2), Language Modeling and transfer learning for NLP tasks. OpenSeq2Seq is optimized for latest GPUs. It supports multi-GPU and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x faster training time.
Mixed precision training of deep neural networks provides tremendous benefits: it requires half the storage and data movement of single-precision values, and starting with the Volta GPU's Tensor Cores, provides up to 120 TFLOPS of math throughput, an 8x speedup over FP32. In this talk, we first present the considerations and techniques when training with reduced-precision, including master weights and automatic loss scaling. After, we discuss real-world training in mixed precision with a particular focus on the PyTorch and TensorFlow frameworks.
We'll describe training of very deep networks with mixed-precision float (("float16") using Volta Tensor Core. Float16 has two major potential benefits: high training speed and reduced memory footprint. But float16 has smaller numerical range than regular single precision float, which can result in overflow or underflow ("vanishing gradient") during training. We'll describe simple rescaling mechanism which solves these potential issues. With this rescaling algorithm, we successfully used mixed precision training for such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy.Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.
We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,65504). This narrow numerical range can result both in overflow ("inf/nan" problem) or underflow ("vanishing gradient") during training of deep networks. We'll describe the new scaling algorithm, implemented in nvcaffe, which prevents these negative effects. With this algorithm, we successfully trained such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy. Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.
We'll present new techniques for training machine learning models using low-precision computation and communication. We'll start by briefly outlining new theoretical results proving that, surprisingly, many fundamental machine learning tools, such as dense generalized linear models, can be trained end-to-end (samples, model, and gradients) using low precision (as little as one bit per value), while still guaranteeing convergence. We'll then explore the implications of these techniques with respect to two key practical applications: multi-GPU training of deep neural networks, and compressed sensing for medical and astronomical data.
Deep learning tools present a tremendous opportunity to improve healthcare. By increasing efficiency and accuracy of diagnostic testing, and elevating meaning from vast troves of clinical data, deep learning provides a pathway to true precision care. However, there are challenges in the translation of this technology to the clinic: model performance, infrastructure development, data privacy, hospital policy, and vendor relationships are all critical components to this effort. We'll discuss the early experience of the MGH & BWH Center for Clinical Data Science in supporting the translation of deep learning technologies in medicine, touching upon many of the existing and emerging technical, clinical, and cultural challenges that this work presents.
Learn how GPUs are pushing the limits of the largest astronomical telescopes on Earth and how they'll be used to image life-bearing planets outside our solar system. Thanks to hardware features such as Tensor Cores and mixed-precision support, plus optimized AI frameworks, GPU technology is changing how large data streams from optical sensors are digested in real time. We'll discuss how real-time AI made possible by GPUs opens up new means to optimally control the system and calibrate images, which will help scientists get the most out of the largest optical telescopes. GPUs will also benefit future extreme-size facilities like the European Extremely Large Telescope because the complexity of maintaining exquisite image quality increases with the square of its diameter size. We'll present on-sky results obtained on the 8.2-meter Subaru Telescope and explain why these techniques will be essential to future giant telescopes.
We''ll present a summary of ongoing work that targets the use of newer GPU architecture (Pascal and Volta) features in real-time signal processing applications in radio astronomy telescopes, and outline the future growth path for this exciting new application of GPUs. With Pascal and Volta architectures, we''ll discuss the advantage of using higher memory bandwidth, half-single precision, and integer arithmetic in existing GPU-based correlator pipeline code. This is an ongoing effort between the National Centre for Radio Astrophysics and NVIDIA. We''ll look at various processing stages involved in the pipeline for exploring optimization possibilities, and highlight interesting results that were achieved. We''ll address in detail the effect of using half precision with respect to accuracy of performance and required library changes.
Leaders from the mapping technology companies will discuss the advantages of various algorithms to create and maintain maps, followed by a short Q&A session. HERE: Vladimir Shestak, Lead Software Engineer Automated Driving Edge Perception for HD Map Maintenance: We start this talk by presenting a brief overview of HD Live Map created by HERE and its use for connected ADAS or automated driving solutions. Although building such a map with a required centimeter level precision is technically hard, the instant the HD Live Map is built, changes in the real world can occur causing the map to no longer reflect reality. Hence, a proper maintenance strategy must be in place with the goal to identify discrepancies between the HD Live Map and the real world and heal the HD Live Map as quickly as possible. We discuss a spectrum of techniques developed by HERE to address the map-healing process and then focus on our low-cost solutions for in-vehicle change detection. The example system employs a consumer-grade Android-based sensing system streaming imagery and telemetry in real-time into HERE Edge Perception software stack. We present the high-level software architecture of the stack, its main components, i.e., feature detection, object tracking and triangulation, RWO and Maplet generation, as well as in-vehicle deployment options. The real-time performance evaluation of the system concludes our talk. NavInfo Europe: Geetank Raipuria, Computer Vision Engineer Real-Time Object Detection and Semantic Segmentation: This session will discuss how NavInfo uses computer vision and deep learning to build high-definition maps that cover China's highways and large city streets. This involves performing object detection and semantic segmentation on visual imagery collected from vehicle sensors. The NavInfo Europe Advanced Research Lab creates processes that extract information from this data, both in real-time onboard vehicles using the NVIDIA DRIVE platform, and faster than real-time, processing offline gathered video material through NVIDIA DeepStream.
We'll focus on one of the three pilots of the DOE and NCI partnership on precision oncology and the Cancer Moonshot, namely predicting tumor cell response to drug treatments with deep learning. Predicting tumor cell response to drug treatments is a critical challenge for accomplishing the promise of precision medicine in oncology. As part of a joint project between DOE and NCI to develop advanced computing solutions for caner, we are developing a deep learning-based framework for modeling tumor-drug interaction and predicting dose response in pre-clinical screening.
Learn how one of the leading institutes for global weather predictions, the European Centre for Medium-Range Weather Forecasts (ECMWF), is preparing for exascale supercomputing and the efficient use of future HPC computing hardware. I will name the main reasons why it is difficult to design efficient weather and climate models and provide an overview on the ongoing community effort to achieve the best possible model performance on existing and future HPC architectures. I will present the EU H2020 projects ESCAPE and ESiWACE and discuss recent approaches to increase computing performance in weather and climate modelling such as the use of reduced numerical precision and deep learning.
This presentation will provide an overview of Blue River Technology's use of GPUs in developing their See and Spray technology for Precision Agriculture. We will motivate the use of Deep Learning in detection and classification of crops and weeds in production environments, and highlight the ways in which NVIDIA GPUs have provided the tools and platform for training powerful models. NVIDIA GPUs have also helped us perform real-time inference on working machines in the field. This talk will show how these systems perform and provide videos of the machines in operation.
Proving that such a complex system as an autonomous car is safe cannot be done using existing standards. A new method needs to be invented that is much more data driven and probability based. Traditional redundant solutions don't apply when trying to optimize a Precision-Recall curve. Getting acceptance from the regulatory bodies and the public will be much easier if the industry converges on what this new method shall be.
We''ll discuss the GPU accelerated Monte Carlo compute at JP Morgan which was architected for C1060 cards and revamped a few times as new architectures were released. The key features of the code are exclusive use of double precision, data caching, and code structure where significant amount of CPU pre-compute is followed by running multiple GPU kernels. On the latest devices, memory per flop is a throughput limiting factor for a class of our GPU-accelerated models. As byte/flop ratio is continuing to fall from one generation of GPU to the next, we are exploring the ways to re-architecture Monte Carlo simulation code to decrease memory requirements and improve TCO of the GPU-enabled compute. Obvious next steps are store less, re-calculate more, and unified memory.
The MULTI-X platform simplifies the logistical challenges of deploying AI and ML solutions by providing pre-configured environments with ad-hoc scalable computing resources to quickly build, test, share and reproduce scientific applications. Its comprehensible modular framework accelerates the development and reduces the burden and cost of implementing AI solutions. The talk will include details of two exemplary deployments in the area of Cardiac Image Analysis, presented together with the outcome of the analysis of 5000 subjects of the UK-Biobank database. Developing and deploying AI solutions for clinical research use cases can be complex, resource intensive, and therefore expensive and challenging to implement for many researchers, groups and healthcare organisations. In the era of Big-Data and the IoT, the most critical problems are related to the secure access and management of large heterogeneous datasets, the deployment of GPU-accelerated massive parallel processing systems, and the setup of development environments encompassing complex ML tools and applications. Two exemplary use cases of the implementation of GPU-enabled AI solutions in the area of Cardiac Image Analysis, both developed and deployed in MULTI-X, will be presented together with the outcome of the analysis of 5000 Subjects of the UK-Biobank database.
The use of low-precision arithmetic in computing methods has been a powerful tool to accelerate numerous scientific computing applications including Artificial Intelligence. We present an investigation showing that other HPC applications can harness this power too, and in particular, the general HPC problem of solving Ax = b, where A is a large dense matrix, and the solution is needed in FP64 accuracy. Our approach is based on the mixed-precision (FP16->FP64) iterative refinement technique – we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly-tuned implementations where we show how the use of FP16-TC (tensor cores) arithmetic can provide up to 4X speedup and improve the energy consumption by a factor of 5 achieving 74 Gflop/Watt. This is due to the performance boost that the FP16 (Tensor Cores) provide and to its better accuracy that outperforms the classical FP16.
The Cancer Moonshot was established in 2016 with the goal to double the rate of progress in cancer research -- to do in five years what normally would take 10. A major area for the acceleration of progress is the strategy to use modeling, simulation, and machine learning to advance our understanding of cancer biology and to integrate what is known into predictive models that can inform research and guide therapeutic developments. In 2015, the U.S. Department of Energy formed a collaboration with the National Cancer Institute for the joint development of advanced computing solutions for cancer.
An introduction to how the Jetson TX2 can be used for selective harvesting and precision agriculture. The Jetson TX2 is a surprisingly robust platform and the use of some of the unique, basic features of the module such as unified memory and dynamic execution can create robust applications for the sensing, robot control, and data handling required for selective harvesting, or the harvesting of fruit over several cycles.\n
Learn how to develop an Artificial Intelligence system to localize and recognize food on trays to generate a purchase ticket in a check out process.
(1) Solving a real business problem using Deep Learning advanced technology based on object detection and localization.
(2) Combining a pipeline of models to improve accuracy, precision and with reasonable recall levels.
(3) Discovering how to develop and train a model in the cloud to be used embedded in an NVIDIA Jetson TX1 device.
Detecting road users in real-time is key to enabling safe autonomous driving applications in crowded urban environments. The talk presents a distributed sensor infrastructure being deployed in the city of Modena (Italy) at the heart of the Italian 'Motor Valley'. Modena's Automotive Smart Area (MASA) connects hundreds of smart cameras, supporting embedded GPU modules for edge-side real-time detection, with higher performance GPU (fog) nodes at block level and low latency wireless V2X communication. A distributed deep learning paradigm balances precision and response time to give autonomous vehicles the required sensing support in a densely populated urban environment. The infrastructure will exploit a novel software architecture to help programmers and big data practitioners combine data-in-motion and data-at-rest analysis while providing Real-Time guarantees. MASA; funded under the European project CLASS, is an open testbench where interested partners may deploy and test next-generation AD applications in a tightly connected setting.
Recent developments in artificial intelligence, advances in GPU computing hardware and the availability of large scale medical imaging datasets allows us to learn how the human brain truly looks like from a biological, physiological, anatomical and pathological point-of-view. This learning process can be augmented by Electronic Healthcare Record data, cognitive examinations, and diagnostic/radiological report data, thus providing an integrated view of the human interpretation of neurological diseases. This talk will present how AI models can learn from big and unstructured neurological and neuroradiological data and be used as tools for precision medicine, with the aim of translating advanced imaging technologies and biomarkers to clinical practice, streamline the clinical workflow and improve the quality-of-care. It will also explore the technological translational process, requiring full clinical support, deep algorithmic integration into the radiological workflow, and the deployment of a high-throughput hospital-integrated GPU computational platform
We believe that medicine will be more precise and affordable. Physicians will integrate relevant patient data and insights at the point of decision for precise diagnostics. Therapy will be tailored to the characteristics of both the patient and disease ? resulting in the right treatment for the right patient at the right time. AI-powered decision support could help to balance the need for personalization when it matters and standardization to reduce unwarranted variations.