The 2018 GTC opening keynote is delivered by the NVIDIA Founder and CEO, Jensen Huang, speaking on the future of computing.
Don't miss this keynote from NVIDIA Founder & CEO, Jensen Huang, as he speaks on the future of computing.
Don't miss this keynote from NVIDIA Founder & CEO, Jensen Huang, as he speaks on the future of computing.
Opening Keynote Speech
The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. With an annual budget of $7.5 billion, NSF awards grants to nearly 2,000 colleges, universities and other institutions in all 50 states. Hear how NSF is advancing discovery and technological innovation in all fields, including artificial intelligence, to keep the United States at the forefront of global science and engineering leadership.
In this keynote, we'll show how the Cancer Moonshot Task Force under Vice President Biden is unleashing the power of data to help end cancer as we know it. We'll discuss global efforts inspired by the Cancer Moonshot that will empower A.I. and deep learning for oncology with larger and more accessible datasets.
What New Results in Visual Question Answering Have to Say about Old AI
Don't miss GTC's opening keynote address from NVIDIA CEO and co-founder Jensen Huang. He'll discuss the latest breakthroughs in visual computing, including how NVIDIA is fueling the revolution in deep learning.
Over the past few years, we have built large-scale computer systems for training neural networks, and then applied these systems to a wide variety of problems that have traditionally been very difficult for computers. We have made significant improvements in the state-of-the-art in many of these areas, and our software systems and algorithms have been used by dozens of different groups at Google to train state-of-the-art models for speech recognition, image recognition, various visual detection tasks, language modeling, language translation, and many other tasks. In this talk, I''ll highlight some of the distributed systems and algorithms that we use in order to train large models quickly. I''ll then discuss ways in which we have applied this work to a variety of problems in Google''s products, usually in close collaboration with other teams. This talk describes joint work with many people at Google.
Don''t miss the opening keynote feature Jensen Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what''s next in visual computing, and preview disruptive technologies and exciting demonstrations across industries.
A fundamental challenge of modern society is the development of effective approaches to enhance brain function and cognition in both healthy and impaired individuals. For the healthy, this serves as a core mission of our educational system and for the cognitively impaired this is a critical goal of our medical system. Unfortunately, there are serious and growing concerns about the ability of either system to meet this challenge. I will describe an approach developed in our lab that uses custom-designed video games to achieve meaningful and sustainable cognitive enhancement (e.g., Anguera, et al. Nature 2013), as well the next stage of our research program, which uses video games integrated with technological innovations in software (e.g., brain computer interface algorithms, GPU computing) and hardware (e.g., virtual reality headsets, mobile EEG, transcranial electrical brain stimulation) to create a novel personalized closed loop system. I will share with you a vision of the future in which high-tech is used as an engine to enhance our brain''s information processing systems, thus reducing our reliance on non-specific drugs to treat neurological and psychiatric conditions and allowing us to better target our educational efforts. This keynote will be preceded by naming the winner of the CUDA Center of Excellence Achievement Award, winner for Best Poster, and the new CUDA Fellows, followed by the launch announcement of the Global Impact Award. (Award ceremony duration approximately 15 minutes).
This presentation will show how Pixar uses GPU technology to empower artists in the animation and lighting departments. By providing our artists with high-quality, interactive visual feedback, we enable them to spend more time making creative decisions. Animators interactively pose characters in order to create a performance. When features like displacement, fur, and shadows become critical for communicating the story, it is vital to be able to represent these visual elements in motion at interactive frame rates. We will show Presto, Pixar''s proprietary animation system, which uses GPU acceleration to deliver real-time feedback during the character animation process, using examples from Pixar''s recent films. Lighting artists place and adjust virtual lights to create the mood and tone of the scene as well as guide the audience''s attention. A physically-based illumination model allows these artists to create visually-rich imagery using simpler and more direct controls. We will demonstrate our interactive lighting preview tool, based on this model, built on NVIDIA''s OptiX framework, and fully integrated into our new Katana-based production workflow.
Don''t miss the opening keynote feature Jensen Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what''s next in computing and graphics, and preview disruptive technologies and exciting demonstrations across industries.
The human genome is a sequence of 3 billion chemical letters inscribed in a molecule called DNA. Famously, short stretches (~10 letters, or a-base pairs) of DNA fold into a double helix. But what about longer pieces? How does a 2 meter long macromolecule, the genome, fold up inside a 6 micrometer wide nucleus? And, once packed, how does the information contained in this ultra-dense structure remain accessible to the cell? This talk will discuss how the human genome folds in three dimensions, a folding enables the cell to access and process massive quantities of information in parallel. To probe how genomes fold, we developed Hi-C, together with collaborators at the Broad Institute and UMass Medical School. Hi-C couples proximity-dependent DNA ligation and massively parallel sequencing. To analyze our data and reconstruct the underlying folds, we, too must engage in massively parallel computation. I will describe how we use NVIDIA's CUDA technology to analyze our results and simulate the physical processes of genome folding and unfolding.
Ralph Gilles, senior vice president of Product Design and president and CEO of SRT (Street and Racing Technology) Brand and Motorsports at Chrysler Group LLC and the mind behind some of the company most innovative products, will provide a behind-the-scenes look at the auto industry. Gilles will review how GPUs are used to advance every step of the automobile development process from the initial conceptual designs and engineering phases through product assembly and marketing. He will also discuss and how Chrysler Group utilizes GPUs and the latest technologies to build better, safer cars and reduce time to market.
Do not miss the opening keynote, featuring Jensen Huang, CEO and Co-Founder of NVIDIA. Hear about what's next in computing and graphics, and preview disruptive technologies and exciting demonstrations from across industries. Jen-Hsun co-founded NVIDIA in 1993 and has served since its inception as president, chief executive officer and a member of the board of directors.
Collective behavior is one of the most pervasive features of the natural world. Our brains are composed of billions of interconnected cells communicating with chemical and electrical signals. We are integrated in our own human society. Elsewhere in the natural world a fish school convulses, as if one entity, when being attacked by a predator. How does individual behavior produce dynamic group-level properties? Do animal groups -or even cells in a tumor- function as some form of collective mind? How does socially contagious behavior spread through natural human crowds? In his keynote address, Prof. Iain D. Couzin, Professor of Ecology and Evolutionary Biology at Princeton University, will demonstrate how GPU computing has been pivotal in the study of collective behavior, helping reveal how collective action emerges in a wide range of groups from plague locusts to human crowds, and the critical role that uninformed, or weakly-opinionated, individuals play in democratic consensus decision-making.
Do not miss the day 3 keynote, featuring Part-Time Scientists Robert Boehme and Wes Faler. Boehme and Faler are part of a team of international scientists and engineers who want to send a rover to the moon before the end of the year 2013. In this presentation, they will discuss their goals, recent accomplishments and milestones, and how GPUs have help in unexpected ways.
Do not miss this opening keynote, featuring Jensen Huang, CEO and Co-Founder of NVIDIA and special guests. Hear about what’s next in gpu computing, and preview disruptive technologies and exciting demonstrations from across industries. Jensen Huang co-founded NVIDIA in 1993 and has served since its inception as president, chief executive officer and a member of the board of directors.
The opening keynote, features Jensen Huang, CEO and Co-Founder of NVIDIA and special guests. Hear about what''s next in computing and graphics, and preview disruptive technologies and exciting demonstrations from across industries.
How does the H1N1 "Swine Flu" virus avoid drugs while attacking our cells? What can we learn about solar energy by studying biological photosynthesis? How do our cells read the genetic code? What comes next in computational biology? Computational biology is approaching a new and exciting frontier: the ability to simulate structures and processes in living cells. Come learn about the "computational microscope," a new research instrument that scientists can use to simulate biomolecules at nearly infinite resolution. The computational microscope complements the most advanced physical microscopes to guide today's biomedical research. In this keynote address, computational biology pioneer Dr. Klaus Schulten of the University of Illinois, Urbana-Champaign, will introduce the computational microscope, showcase the widely used software underlying it, and highlight major discoveries made with the aid of the computational microscope ranging from viewing protein folding, translating the genetic code in cells, and harvesting solar energy in photosynthesis. He will also look towards a future when cell tomography and computing will establish atom-by-atom views of entire life forms.
What really causes accidents and congestion on our roadways? How close are we to fully autonomous cars? In his keynote address, Stanford Professor and Google Distinguished Engineer, Dr. Sebastian Thrun, will show how his two autonomous vehicles, Stanley (DARPA Grand Challenge winner), and Junior (2nd Place in the DARPA Urban Challenge) demonstrate how close yet how far away we are to fully autonomous cars. Using computer vision combined with lasers, radars, GPS sensors, gyros, accelerometers, and wheel velocity, the vehicle control systems are able to perceive and plan the routes to safely navigate Stanley and Junior through the courses. However, these closed courses are a far cry from everyday driving. Find out what the team will do next to get one step closer to the "holy grail" of computer vision, and a huge leap forward toward the concept of fully autonomous vehicles.
High-Throughput Science How did the universe start? How is the brain wired? How does matter interact at the quantum level? These are some of the great scientific challenges of our times, and answering them requires bigger scientific instruments, increasingly precise imaging equipment and ever-more complex computer simulations. In his keynote address, Harvard professor, researcher and computing visionary Hanspeter Pfister will discuss the computational obstacles scientists face and how commodity high-throughput computing can enable high-throughput science, in which massive data streams are processed and analyzed rapidly -- from the instrument through to the desktop. Finally Professor Pfister will survey several groundbreaking projects at Harvard that leverage GPUs for high- throughput science, ranging from radio astronomy and neuroscience to quantum chemistry and physics.
Games and interactive media have long been the beneficiaries of cutting edge GPU technology and it has not gone unnoticed in the world of feature film production. To date the visual effects industry had been a sideline observer of these advances while awaiting technology to reach maturity. At Lucasfilm, research and development has been on-going for some time and this past summer Industrial Light & Magic employed this technology in two of its summer blockbuster films. Lucasfilm CTO, Richard Kerris, will show a brief history of their computer graphics for film, and will then pull back the curtain on how they are now using GPU technology to advance the state of the art in visual effects and provide a glimpse of what's on the horizon for GPU's in future and how it will impact filmmaking.
Learn how Gensler is using the latest technology in virtual reality across all aspects of the design process for the AEC industry. We'll cover how VR has added value to the process when using different kinds of VR solutions. Plus we'll talk about some of the challenges Gensler has faced with VR in terms of hardware, software, and workflows. Along with all of this, NVIDIA's latest VR visualization tools are helping with the overall process and realism of our designs.
This customer panel brings together A.I. implementers who have deployed deep learning at scale using NVIDIA DGX Systems. We'll focus on specific technical challenges we faced, solution design considerations, and best practices learned from implementing our respective solutions. Attendees will gain insights such as: 1) how to set up your deep learning project for success by matching the right hardware and software platform options to your use case and operational needs; 2) how to design your architecture to overcome unnecessary bottlenecks that inhibit scalable training performance; and 3) how to build an end-to-end deep learning workflow that enables productive experimentation, training at scale, and model refinement.
Businesses of all sizes are increasingly recognizing the potential value of AI, but few are sure how to prepare for the transformational change it is sure to bring to their organizations. Danny Lange rolled out company-wide AI platforms at Uber and Amazon; now, through Unity Technologies, he's making AI available to the rest of us. He'll also share his thoughts for the most exciting advances that AI will bring over the next year. His insights will help you understand the true potential of AI, regardless of your role or industry.
What is Deep Learning? In what fields is it useful? How does it relate to artificial intelligence? We'll discuss deep learning and why this powerful new technology is getting so much attention, learn how deep neural networks are trained to perform tasks with super-human accuracy, and the challenges organizations face in adopting this new approach. We'll also cover some of the best practices, software, hardware, and training resources that many organizations are using to overcome these challenges and deliver breakthrough results.
We''ll introduce deep learning infrastructure for building and maintaining autonomous vehicles, including techniques for managing the lifecycle of deep learning models, from definition, training and deployment to reloading and life-long learning. DNN autocurates and pre-labels data in the loop. Given data, it finds the best run-time optimized deep learning models. Training scales with data size beyond multi-nodes. With these methodologies, one takes only data from the application and feeds DL predictors to it. This infrastructure is divided into multiple tiers and is modular, with each of the modules containerized to lower infrastructures like GPU-based cloud infrastructure.
Join our presentation on the first application of deep learning to cybersecurity. Deep learning is inspired by the brain's ability to learn: once a brain learns to identify an object, its identification becomes second nature. Similarly, as a deep learning-based artificial brain learns to detect any type of cyber threat, its prediction capabilities become instinctive. As a result, the most evasive and unknown cyber-attacks are immediately detected and prevented. We'll cover the evolution of artificial intelligence, from old rule-based systems to conventional machine learning models until current state-of-the-art deep learning models.
We'll introduce a novel approach to digital pathology analytics, which brings together a powerful image server and deep learning based image analysis on a cloud platform. Recent advances in AI and Deep Learning in particular show great promise in several fields of medicine, including pathology. Human expert judgement augmented by deep learning algorithms has the potential to speed up the diagnostic process and to make diagnostic assessments more reproducible. One of the major advantages of the novel AI-based algorithms is the ability to train classifiers for morphologies that exhibit a high level of complexity. We will present examples on context-intelligent image analysis applications, including e.g. fully automated epithelial cell proliferation assay and tumor grading. We will also present other examples of complex image analysis algorithms, which all run on-demand on whole-slide images in the cloud computing environment. Our WebMicroscope® Cloud is sold as a service (SaaS) approach, which is extremely easy to set up from a user perspective, as the need for local software and hardware installation is removed and the solution can immediately be scaled to projects of any size.
Long term goal of any financial institution is achieve the ability to address users with utmost experience within the boundaries of resources. It could only be a possibility when financial institutions adapt to intelligent systems. The success of such systems depends heavily on the intelligence. Deep Learning has provided a huge opportunity for financial institutions to start building and planning for such large scale intelligent systems which are multi-functional and adapt. In this talk, we will discuss about how we used Deep Learning, Vega as the platform and GPUs to build high scale automation use cases in Fraud detection to complex process automation in both banking and insurance.
Innovation can take many forms, and led by varying stakeholders across an organization. One successful model is utilizing AI for Social Good to drive a proof-of-concept that will advance a critical strategic goal. The Data Science Bowl (DSB) is an ideal example, launched by Booz Allen Hamilton in 2014, it galvanizes thousands of data scientists to participate in competitions that will have have far reaching impact across key industries such as healthcare. This session will explore the DSB model, as well as look at other ways organizations are utilizing AI for Social Good to create business and industry transformation.
From healthcare to financial services to retail, businesses are seeing unprecedented levels of efficiencies and productivity, which will only continue to rise and transform how companies operate. This session will look at how Accenture as an enterprise is optimizing itself in the age of AI, as well as how it guides its customers to success. A look at best practices, insights, and measurement to help the audience inform their AI roadmap and journey.
For enterprises daunted by the prospect of AI and investing in a new technology platform, the reality is that AI can leverage already-in-place big data and cloud strategies. This session will explore AI and deep learning use cases that are designed for ROI, and look at how success is being measured and optimized.
We'll introduce new concepts and algorithms that apply deep learning to radio frequency (RF) data to advance the state of the art in signal processing and digital communications. With the ubiquity of wireless devices, the crowded RF spectrum poses challenges for cognitive radio and spectral monitoring applications. Furthermore, the RF modality presents unique processing challenges due to the complex-valued data representation, large data rates, and unique temporal structure. We'll present innovative deep learning architectures to address these challenges, which are informed by the latest academic research and our extensive experience building RF processing solutions. We'll also outline various strategies for pre-processing RF data to create feature-rich representations that can significantly improve performance of deep learning approaches in this domain. We'll discuss various use-cases for RF processing engines powered by deep learning that have direct applications to telecommunications, spectral monitoring, and the Internet of Things.
We'll discuss training techniques and deep learning architectures for high-precision landmark localization. In the first part of the session, we'll talk about ReCombinator Networks, which aims at maintaining pixel-level image information, for high-accuracy landmark localization. This model combines coarse-to-fine features to first observe global (coarse) image information and then recombines local (fine) information. By using this model, we report SOTA on three facial landmark datasets. This model can be used for other tasks that require pixel-level accuracy (for example, image segmentation, image-to-image translation). In the second part, we'll talk about improving landmark localization in a semi-supervised setting, where less labeled data is provided. Specifically, we consider a scenario where few labeled landmarks are given during training, but lots of weaker labels (for example, face emotions, hand gesture) that are easier to obtain are provided. We'll describe training techniques and model architectures that can leverage weaker labels to improve landmark localization.
Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, motion, and change over time. A tracker must be able to modify its underlying model and adapt to new observations. We present Re3, a real-time deep object tracker capable of incorporating temporal information into its model. Rather than focusing on a limited set of objects or training a model at test-time to track a specific instance, we pretrain our generic tracker on a large variety of objects and efficiently update on the fly; Re3 simultaneously tracks and updates the appearance model with a single forward pass. This lightweight model is capable of tracking objects at 150 FPS, while attaining competitive results on challenging benchmarks. We also show that our method handles temporary occlusion better than other comparable trackers using experiments that directly measure performance on sequences with occlusion.
We''ll explore how deep learning approaches can be used for perceiving and interpreting the driver''s state and behavior during manual, semi-autonomous, and fully-autonomous driving. We''ll cover how convolutional, recurrent, and generative neural networks can be used for applications of glance classification, face recognition, cognitive load estimation, emotion recognition, drowsiness detection, body pose estimation, natural language processing, and activity recognition in a mixture of audio and video data.
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don''t perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
The growth in density of housing in cities like London and New York has resulted in the higher demand for efficient smaller apartments. These designs challenge the use of space and function while trying to ensure the occupants have the perception of a larger space than provided. The process of designing these spaces has always been the responsibility and perception of a handful of designers using 2D and 3D static platforms as part of the overall building design and evaluation, typically constraint by a prescriptive program and functional requirement. A combination of human- and AI-based agents creating and testing these spaces through design and virtual immersive environments (NVIDIA Holodeck) will attempt to ensure the final results are efficient and best fit for human occupancy prior to construction.
Go beyond working with a single sensor and enter the realm of Intelligent Multi-Sensor Analytics (IMSA). We''ll introduce concepts and methods for using deep learning with multi-sensor, or heterogenous, data. There are many resources and examples available for learning how to leverage deep learning with public imagery datasets. However, few resources exist to demonstrate how to combine and use these techniques to process multi-sensor data. As an example, we''ll introduce some basic methods for using deep learning to process radio frequency (RF) signals and make it a part of your intelligent video analytics solutions. We''ll also introduce methods for adapting existing deep learning frameworks for multiple sensor signal types (for example, RF, acoustic, and radar). We''ll share multiple use cases and examples for leveraging IMSA in smart city, telecommunications, and security applications.
As the race to full autonomy accelerates, the in-cab transportation experience is also being redefined. Future vehicles will sense the passengers'' identities and activities, as well as their cognitive and emotional states, to adapt and optimize their experience. AI capable of interpreting what we call "people analytics" captured through their facial and vocal expressions, and aspects of the context that surrounds them will power these advances. We''ll give an overview of our Emotion AI solution, and describe how we employ techniques like deep learning-based spatio-temporal modeling. By combining these techniques with a large-scale dataset, we can develop AI capable of redefining the in-cab experience.
Deep residual networks (ResNets) made a recent breakthrough in deep learning. The core idea of ResNets is to have shortcut connections between layers that allow the network to be much deeper while still being easy to optimize avoiding vanishing gradients. These shortcut connections have interesting properties that make ResNets behave differently from other typical network architectures. In this talk we will use these properties to design a network based on a ResNet but with parameter sharing and adaptive computation time, we call it IamNN. The resulting network is much smaller than the original network and can adapt the computational cost to the complexity of the input image. During this talk we will provide an overview of ways to design compact networks, give an overview of ResNets properties and discuss how they can be used to design compact dense network with only 5M parameters for ImageNet classification.
Want to get started using TensorFlow together with GPUs? Then come to this session, where we will cover the TensorFlow APIs you should use to define and train your models, and the best practices for distributing the training workloads to multiple GPUs. We will also look at the underlying reasons why are GPUs are so great to use for Machine Learning workloads?
We''ll present an overview of the StarCraft II machine learning environment, including some basic API examples using C++ and Python.
The artistic manpower needed to create a video-game has been increasing exponentially over the years. Thanks to the computational power of NVIDIA GPUs, new AI accelerated workflows are poised to solve this problem, saving artists and studios time and money, and driving greater creativity. Artomatix is the leading pioneer in this space, its AI-based approach to content creation helps automate many of the mundane, tedious and repetitive tasks artists and designers face every day. This talk introduces the academic theory and history behind Creative AI and then delves into specific use cases and applications such as: Texture Synthesis, Material Enhancement, Hybridization and Style Transfer. Finally, this talk presents the next generation of tools for the creative industries, powered by AI, and gives case studies on how they've been solving some of the game industries largest problems over the past year. Join this session to gain an insight to the future of game creation.
Learn how to use GPUs to run 3D and camera deep learning fusion applications for autonomous driving. Cameras provide high resolution 2D information, while lidar has relatively low resolution but provides 3D data. Smart fusing of both RGB and 3D information, in combination with AI software, enables the building of ultra-high reliability classifiers. This facilitates the required cognition application for semi-autonomous and fully autonomous driving.
We'll present achievements in the field of automated truck driving, specifically the use case of lane keeping in platooning scenarios based on mirror cameras. Lane detection, generating control parameters, controller, and arbitration functions all run on the NVIDIA DRIVE PX 2 with three cameras attached to it.
Vahana started in early 2016 as one of the first projects at A? the advanced projects outpost of Airbus Group in Silicon Valley. The aircraft we're building doesn't need a runway, is self-piloted, and can automatically detect and avoid obstacles and other aircraft. Designed to carry a single passenger or cargo, Vahana is meant to be the first certified passenger aircraft without a pilot. We'll discuss the key challenges to develop the autonomous systems of a self-piloted air taxi that can be operated in urban environments.
Self-driving vehicles will transform the transportation industry, yet must overcome challenges that that go far beyond just technology. We'll discuss both the challenges and opportunities of autonomous mobility and highlight the recent work on autonomous vehicle systems by Optimus Ride Inc., an MIT spinoff company based in Boston. The company develops self-driving technologies and is designing a fully autonomous system for electric vehicle fleets.
We'll show how to program energy-efficient automotive-oriented NVIDIA GPUs to run computationally intensive camera-based perception algorithms in real time. The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. We'll present a fully GPU-accelerated implementation of stixel estimation that produces reliable results at real time (26 frames per second) on the Drive PX 2 platform.
More and more traditional industries begin to use AI, facing the computing platform, system management, model optimization and other challenges. In this session we build a GPU-based AI end-to-end solution based on a comparative analysis of Caffe and TensorFlow's computing and communication.
This talk will describe the process of developing autonomous driving directly from the virtual environment TRONIS, a high resolution virtual environment for prototyping and safeguarding highly automated and autonomous driving functions exploiting a state of the art gaming engine as introduced by UNREAL. Well showcase this process on a real RC-model with High-End NVIDIA hardware targeting self-driving capabilities on a real world Truck. With the help of TRONIS we make early decisions on sensor configurations e.g. camera, sensor positions and deployed algorithms. The development team works on independent instances of the virtual car which build the foundation for multiple experimental setups.
Learn how to adopt a MATLAB-centric workflow to design, develop, and deploy computer vision and deep learning applications on to GPUs whether on your desktop, a cluster, or on embedded Tegra platforms. The workflow starts with algorithm design in MATLAB. The deep learning network is defined in MATLAB and is trained using MATLAB's GPU and parallel computing support. Then, the trained network is augmented with traditional computer vision techniques and the application can be verified in MATLAB. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which can be cross-compiled to Tegra. Performance benchmark for Alexnet inference shows that the auto-generated CUDA code is ~2.5x faster than mxNet, ~5x faster than Caffe2 and is ~7x faster than TensorFlow.
A key technology challenge in computer vision for Autonomous Driving is semantic segmentation of images in a video stream, for which fully-convolutional neural networks (FCNN) are the state-of-the-art. In this research, we explore the functional and non-functional performance of using a hierarchical classifier head for the FCNN versus using a single flat classifier head. Our experiments are conducted and evaluated on the Cityscapes dataset. On basis of the results, we argue that using a hierarchical classifier head for the FCNN can have specific advantages for autonomous driving. Furthermore, we show real-time usage of our network on the DRIVE PX 2 platform.
Learn how combining machine learning and computer vision with GPU computing helps to create a next-generation informational ADAS experience. This talk will present a real-time software solution that encompasses a set of advanced algorithms to create an augmented reality for the driver, utilizing vehicle sensors, map data, telematics, and navigation guidance. The broad range of features includes augmented navigation, visualization for cases of advanced parking assistance, adaptive cruise control and lane keeping, driver infographics, driver health monitoring, support in low visibility. Our approach augments driver's visual reality with supplementary objects in real time, and works with various output devices such as head unit displays, digital clusters, and head-up displays.
The growing range of functions of ADAS and automated systems in vehicles as well as the progressive change towards agile development processes require efficient test. Testing and validation within simulation are indispensable for this as real prototypes are not available at all times and the test catalog can be driven repeatedly and reproducibly. This paper presents different approaches to be used in simulation in order to increase the efficiency of development and testing for different areas of application. This comprises the use of virtual prototypes, the utilization of sensor models and the reuse of test scenarios throughout the entire development process, which may also be applied to train artificial intelligence.
This talk details a team of 17 Udacity Self-Driving Car students as they attempted to apply deep learning algorithms to win an autonomous vehicle race. At the 2017 Self Racing Cars event held at Thunderhill Raceway in California, the team received a car and had two days before the start of the event to work on the car. In this time, we developed a neural network using Keras and Tensorflow which steered the car based on the input from just one front-facing camera in order to navigate all turns on the racetrack. We will discuss the events leading up to the race, development methods used, and future plans including the use of ROS and semantic segmentation.
This presentation shows how driving simulators together with DNN algorithms can be used in order to streamline and facilitate the development of ADAS and Autonomous Vehicle systems. Driving Simulators provide an excellent tool to develop, test and validate control systems for automotive industry. Testing ADAS systems on the driving simulator makes it safer, more affordable and repeateble. This session will focus on a special application in which NVIDIA DRIVE PX 2 has been interfaced with a camera and put in the loop on a driving simulator. Object recognition algorithms have been developed in order to develop and test a Lane Keeping Assist (LKA) system. The robustness of the system can be tested on the simulator by altering the environmental conditions and vehicle parameters.
Thanks to recent breakthroughs in AI vehicles will learn and collaborate with humans. There will be a steering wheel in the majority of vehicles for a long time. Therefore a human centric approach is needed in order to save more lives in the traffic, that is a safe combination of AI and UI.
TomTom is leading in HD Maps in coverage and number of OEMs working with our HD Map. Our multi-source, multi-sensor approach leads to HD maps that have greater coverage, are more richly attributed, and have higher quality than single-source, single-sensor maps. Hear how were weaving in more and more sources, such as AI-intensive video processing, into our map making to accelerate towards our goal of real-time and highly precise maps for safer and more comfortable driving.
We present our experience of running computationally intensive camera-based perception algorithms on NVIDIA GPUs. Geometric (depth) and semantic (classification) information is fused in the form of semantic stixels, which provide a rich and compact representation of the traffic scene. We present some strategies to reduce the computational complexity of the algorithms. Using synthetic data generated by the SYNTHIA tool, including slanted roads from a simulation of San Francisco city, we evaluate performance latencies and frame rates on a DrivePX2-based platform.
Learn how deep learning is used to process video streams to analyse human behaviour in real-time. We will detail our solution to recognise fine-grained movement patterns of people how they perform everyday actions like e.g. walking, eating, shaking hands, talking to each other. The novelty of our technical solution is that our system learns these capabilities from watching lots of video snippets showing such actions. This is exciting because very different applications can be realised with the same algorithms as we follow a purely data-driven, machine learning approach. We will explain what new types of deep neural networks we created and how we employ our Crowd Acting (tm) platform to cost-efficiently acquire hundred thousands videos for that.
2017 is the year when the first driver monitoring systems goes into series production with global automotive OEMs. It will be a mainstay as a vital part in most level 3 automated cars but it also has unique stand alone applications such as drowsiness and attention, functions that adress approximately half of all traffic accidents. Starting in 2019 there will be more advanced systems going to the market based on improvements in hardware such as high resolution cameras and GPU. Around 2022 there is a third generation of in-car AI to be expected as the hardware will consist of multiple HD cameras running on the latest GPUs.
In our NVIDIA lab in New Jersey we taught a deep convolutional neural network (DNN) to drive a car by observing human drivers and emulating their behavior. We found that these networks can learn more aspects of the driving task than is commonly learned today. We present examples of learned lane keeping, lane changes, and turns. We also introduce tools to visualize the internal information processing of the neural network and discuss the results.
GPUs can significantly enhance the capabilities of Military Ground Vehicles. In this session we will discuss the challenges facing the integrator of real time vision systems in the Military applications. From video streaming and military streaming protocols through to deploying vision systems for 360 degree situational awareness with AI capabilities. GPUs are being used for enhanced autonomy and in the defence sector and across the board from Ground Vehicles through to Naval and Air applications. Each application space presenting its own challenges through to deployment. Come and find out how the defence industry is addressing these challenges and where the future potential of GPU enabled platforms lie.
The autonomous electric car revolution is here and a bright clean future awaits. Yet as we shift to this fundamentally different technology, it becomes clear that perhaps the entire vehicle deserves a rethink. This means not just adding powerful computers to outdated vehicle platforms, but instead redesigning the agile device, for this very different future. This process doesnt start with the mechanical structure of yesteryear, instead it starts with the GPU.
Deep Learning has emerged as the most successful field of machine learning with overwhelming success in industrial speech, language and vision benchmarks. Consequently it evolved into the central field of research for IT giants like Google, facebook, Microsoft, Baidu, and Amazon. Deep Learning is founded on novel neural network techniques, the recent availability of very fast computers, and massive data sets. In its core, Deep Learning discovers multiple levels of abstract representations of the input. Currently the development of self-driving cars is one of the major technological challenges across automotive companies. We apply Deep Learning to improve real-time video data analysis for autonomous vehicles, in particular, semantic segmentation.
We will describe a fast and accurate AI-based GPU accelerated Vehicle inspection system which scans the underside of moving vehicles to identify threatening objects or unlawful substances (bombs, unexposed weapons and drugs), vehicle leaks, wear and tear, and any damages that would previously go unnoticed.
We'll introduce the RADLogics Virtual Resident, which uses machine learning image analysis to process the enormous amount of imaging data associated with CTs, MRIs and X-rays, and introduces within minutes, a draft reportwith key imagesinto the reporting system. We'll present several examples of automated analysis using deep learning tools, in applications of Chest CT and Chest X-ray. We'll show the algorithmic solutions used, and quantitative evaluation of the results, along with actual output into the report. It is our goal to provide many such automated applications, to automatically detect and quantify findings thus enabling efficient and augmented reporting.
As computers outperform humans at complex cognitive tasks, disruptive innovation will increasingly remap the familiar with waves of creative destruction. And in healthcare, nowhere is this more apparent or imminent than at the crossroads of Radiology and the emerging field of Clinical Data Science. As leaders in our field, we must shepherd the innovations of cognitive computing by defining its role within diagnostic imaging, while first and foremost ensuring the continued safety of our patients. If we are dismissive, defensive or self-motivated - industry, payers and provider entities will innovate around us achieving different forms of disruption, optimized to serve their own needs. To maintain our leadership position, as we enter the era of machine learning, it is essential that we serve our patients by directly managing the use of clinical data science towards the improvement of carea position which will only strengthen our relevance in the care process as well as in future federal, commercial and accountable care discussions. We'll explore the state of clinical data science in medical imaging and its potential to improve the quality and relevance of radiology as well as the lives of our patients.
You'll learn how Triage is using deep learning to diagnose skin cancer from any smartphone. 1 in 3 cancer diagnoses is skin cancer and 1 in 5 Americans will develop skin cancer in their lifetime. The average wait time to see a dermatologist in the United States is 1 month and even greater in other parts of the world. In that time skin disorders can worsen or become life threatening. Triage's Co-Founder and CEO, Tory Jarmain, will demonstrate how they trained a Convolutional Neural Network to instantly detect 9 in 10 cancer cases with beyond dermatologist-level accuracy. Tory will also show how Triage's technology can identify skin disorders across 23 different categories including acne, eczema, warts and more using Deep Residual Networks.
The need for helping elderly individuals or couples remain in their home is increasing as our global population ages. Cognitive processing offers opportunities to assist the elderly by processing information to identify opportunities for caregivers to offer assistance and support. This project seeks to demonstrate means to improve the elderlys' ability to age at home through understanding of daily activities inferred from passive sensor analysis. This project is an exploration of the IBM Watson Cloud and Edge docker-based Blue Horizon platforms for the use of high-fidelity, low-latency, private sensing and responding at the edge using a RaspberryPi, including deep learning using NVIDIA DIGITS software, K80 GPU servers in the IBM Cloud, and Jetson TX2 edge computing.
Majority of the healthcare data stored is stored in healthcare workflows, electronic health records, and consumer devices. This data is largely untouched. CloudMedx has built a clinical framework that uses advanced algorithms and AI to look at this data in both structured and unstructured formats using Natural Language Processing and Machine Learning to bring insights such as patient risks, outcomes, and action items to the point of care. The goal of the company is to save lives and improve clinical workflows.
Learn how doctors aided in the design process to create authentic VR trauma room scenarios; how expert content and simulation devs crafted a VR experience that would have impact in a world where there's no room for error and why Oculus supports the program. Experiential learning is among the best ways to practice for pediatric emergencies. However, hospitals are spending millions on expensive and inefficient mannequin-based training that does not consistently offer an authentic experience for med students or offer convenient repeatability. Join us for a case study on a groundbreaking pilot program that brought together Children's Hospital Los Angeles with two unique VR and AI dev teams to deliver VR training simulations for the most high stakes emergencies hospitals see: pediatric trauma.
Health systems worldwide need greater availability and intelligent integrated use of data and information technology. Clalit has been leading innovative interventions using clinical data to drive people-centered targeted and effective care models, for chronic disease prevention and control. Clalit actively pursues a paradigm shift to properly deal with these challenges, using IT, data and advanced analytics to transform its healthcare system to one which can bridge the silos of care provision in a patient-centered approach, and move from reactive therapeutic to proactive preventive care. In the presentation we will detail specific examples that allowed for reducing healthcare disparities, preventing avoidable readmissions, and improving control of key chronic diseases.
In this talk, FDNA will present how deep learning is used to build an applicable framework that is used to aid in identification of hundreds of genetic disorders and help kids all over the world. Genetic Disorders affect one in every ten people. Many of these diseases are characterized by observable traits of the affected individuals - a 'phenotype'. In many cases, this phenotype is especially noticeable in the facial features of the patients, Down syndrome for example. But most such conditions have subtle facial patterns and are harder to diagnose. FDNA will describe their solution, its ability to generalize well for hundreds of Disorders while learning from a small amount of images per class, and its application for genetic clinicians and researchers.
The increasing availability of large medical imaging data resources with associated clinical data, combined with the advances in the field of machine learning, hold large promises for disease diagnosis, prognosis, therapy planning and therapy monitoring. As a result, the number of researchers and companies active in this field has grown exponentially, resulting in a similar increase in the number of papers and algorithms. A number of issues need to be addressed to increase the clinical impact of the machine learning revolution in radiology. First, it is essential that machine learning algorithms can be seamlessly integrated in the clinical workflow. Second, the algorithm should be sufficiently robust and accurate, especially in view of data heterogeneity in clinical practice. Third, the additional clinical value of the algorithm needs to be evaluated. Fourth, it requires considerable resources to obtain regulatory approval for machine learning based algorithms. In this workshop, the ACR and MICCAI Society will bring together expertise from radiology, medical image computing and machine learning, to start a joint effort to address the issues above.
Learn how to apply deep learning for detecting and segmenting suspicious breast masses from ultrasound images. Ultrasound images are challenging to work with due to the lack of standardization of image formation. Learn the appropriate data augmentation techniques, which do not violate the physics of ultrasound imaging. Explore the possibilities of using raw ultrasound data to increase performance. Ultrasound images collected from two different commercial machines are used to train an algorithm to segment suspicious breast with a mean dice coefficient of 0.82. The algorithm is shown to perform at par with conventional seeded algorithm. However, a drastic reduction in computation time is observed enabling real-time segmentation and detection of breast masses.
It is not always easy to accelerate a complex serial algorithm with CUDA parallelization. A case in point is that of aligning bisulfite-treated DNA (bsDNA) sequences to a reference genome. A simple CUDA adaptation of a CPU-based implementation can improve the speed of this particular kind of sequence alignment, but it's possible to achieve order-of-magnitude improvements in throughput by organizing the implementation so as to ensure that the most compute-intensive parts of the algorithm execute on GPU threads.
Fast, inexpensive and safe, ultrasound imaging is the modality of choice for the first level of medical diagnostics. The emerging solutions of portable and hand-held 2/3D scanners, advanced imaging algorithms, and deep learning promise further democratization of this technology. During the session, we will present an overview of ultrasound imaging techniques in medical diagnostics, explore the future of ultrasound imaging enabled by GPU processing, as well as set out the path to the conception of a portable 3D scanner. We will also demonstrate our hardware developments in ultrasound platforms with GPU-based processing. Having started with one large research scanner, we have begun our migration towards more commercially-viable solutions with a small hand-held unit built on the mobile GPU NVidia Tegra X1.
We'll disscuss how GPUs are playing a central role in making advances in Ion Torrent's targeted sequencing workflow and talk about the S5 DNA sequencer from Ion Torrent that is enabling democratization of sequencing market and accelerating research in precision medicine at a breathtaking pace with the help of GPUs. We'll highlight our work in liquid biopsy and non-invasive prenatal testing and how the breadth in technology offerings in semiconductor chips gives us the scale of sequencing from small panels to exomes. We'll discuss our analysis pipeline and the latest and greatest in algorithm development and acceleration on GPUs as well as our experiences ranging from Fermi to Pascal GPU architectures.
How can we train medical deep learning models at a petabyte scale and how can these models impact clinical practice? We will discuss possible answers to these questions in the field of Computational Pathology. Pathology is in the midst of a revolution from a qualitative to a quantitative discipline. This transformation is fundamentally driven by machine learning in general and computer vision and deep learning in particular. With the help of PAIGE.AI we are building a clinical-grade AI at Memorial Sloan Kettering Cancer Center. The models are trained based on petabytes of image and clinical data on top of the largest DGX-1 V100 cluster in pathology. The goal is not only to automated cumbersome and repetitive tasks, but to impact diagnosis and treatment decisions in the clinic. This talk will focus on our recent advances in deep learning for tumor detection and segmentation, on how we train these high capacity models with annotations collected from pathologists, and how the resulting systems are implemented in the clinic.
Machine Learning in Precision Medicine: Patient-Specific Treatment Enabled by Quantitative Medical Imaging, Artificial Intelligence, and GPU Efficiency The attendees will learn about the need for and use of machine learning in today's patient-centered healthcare. The talk will focus on general approaches requiring machine learning to obtain image-based quantitative features, reach patient diagnoses, predict disease outcomes, and identify proper precision-treatment strategies. While the presented methods are general in nature, examples from cardiovascular disease management will be used to demonstrate the need for and power of machine learning enabled by the performance advantages of GPU computation.
AI in medical imaging has the potential to provide radiology with an array of new tools that will significantly improve patient care. To realize this potential, AI algorithm developers must engage with physician experts and navigate domains such as radiology workflow and regulatory compliance. This session will discuss a pathway for clinical implementation, and cover ACR's efforts in areas such as use case development, validation, workflow integration, and monitoring.
In this talk I will describe the research and development work on medical imaging, done at PingAn Technology and Google Cloud, covering five different tasks. I'll present the technical details of the deep learning approaches we have developed, and share with the audiences the research direction and scope in the medical fields at PingAn technology and PingAn USA Lab.
Deep learning models give state-of-the-art results on diverse problems, but their lack of interpretability is a major problem. Consider a model trained to predict which DNA mutations cause disease: if the model performs well, it has likely identified patterns that biologists would like to understand. However, this is difficult if the model is a black box. We present algorithms that provide detailed explanations for individual predictions made by a deep learning model and discover recurring patterns across the entire dataset. Our algorithms address significant limitations of existing interpretability methods. We show examples from genomics where the use of deep learning in conjunction with our interpretability algorithms leads to novel biological insights.
Learn how to apply recent advances in GPU and open data to unravel the mysteries of biology and etiology of disease. Our team has built data driven simulated neurons using CUDA and open data, and are using this platform to identify new therapeutics for Parkinson's disease with funding from the Michael J. Fox Foundation. In this session I'll discuss the open data which enables our approach, and how we are using Nvidia Tesla cards on Microsoft Azure to dynamically scale to more than 100,000 GPU cores while managing technology costs.
Radiological diagnosis and interpretation should not take place in a vacuum -- but today, it does. One of the greatest challenges the radiologist faces when interpreting studies is understanding the individual patient in the context of the millions of patients who have come previously. Without access to historical data, radiologists must make clinical decisions based only on their memory of recent cases and literature. Arterys is working to empower the radiologist with an intelligent lung nodule reference library that automatically retrieves historical cases that are relevant to the current case. The intelligent lung nodule reference library is built on top of our state-of-the-art deep learning-based lung nodule detection, segmentation and characterization system.
As deep learning techniques have been applied to the field of healthcare, more and more AI-based medical systems continue to come forth, which are accompanied by new heterogeneity, complexity and security risks. In the real-world we've seen this sort of situation lead to demand constraints, hindering AI applications development in China's hospitals. First, we'll share our experience in building a unified GPU accelerated AI engine system to feed component-based functionality into the existing workflow of clinical routine and medical imaging. Then, we'll demonstrate in a pipeline of integrating the different types of AI applications (detecting lung cancer, predicting childhood respiratory disease and estimating bone age) as microservice to medical station, CDSS, PACS and HIS system to support medical decision-making of local clinicians. On this basis, we'll describe the purpose of establishing an open and unified, standardized, legal cooperation framework to help AI participants to enter the market in China to build collaborative ecology.
This talk will overview the fields of Personalised Computational Medicine and In Silico Clinical Trials, which are revolutionizing Medicine and Medical Product Development. This talk will introduce these concepts, provide examples of how they can transform healthcare, and emphasize why artificial intelligence and machine learning are relevant to them. We will also explain the limitations of these approaches and why it is paramout to engage in both phenomenological (data-driven) and mechanistic (principle-driven) modelling. Both areas are in desperate need for better infrastructures -sofrware and hardaware- giving access to computational and storage resources. The talk will be thought-provoking and eye-opening as to opportunities in this space for researchers and industries alike.
The transformation towards value-based healthcare needs inventive ways to lower cost and increase patient health outcomes. Artificial intelligence is vital for realizing value-based care. Turning medical images into biomarkers helps to increase effectiveness of care.
We will introduce deep learning applications in clinical neuroimaging (using MRI, CT, PET, etc.) and recent breakthrough results from Stanford and Subtle Medical. Perspectives and feedbacks of applying AI technologies in neuroimaging are shared, from expert radiologists and deep learning experts. How Deep Learning/AI is changing clinical neuroimaging practice * How will deep learning be applied in radiology workflow right now and in the future * Practical concerns and perspectives from radiologists How Deep Learning assists smarter neuroimaging decision making * Multi-scale 3D network enables lesion outcome prediction for stroke * More accurate lesion segmentation in neuroimaging How Deep Learning enables safer and cheaper neuroimaging screening * Deep Learning and GAN enables >95% reduction in radiation for functional medical imaging * Deep Learning enables 90% reduction in chemical (Gadolinium) contrast agent usage in contrast enhanced MRI How Deep Learning accelerate neuroimaging * Further acceleration and improved MRI reconstruction using deep learning * Deep Generative Adversarial Network for Compressed Sensing
iFLYTEK Health's mission is to use the most advanced artificial intelligence technologies to revolutionize healthcare industry to help doctors provide quality care to more patients with higher efficiency. Developed upon iFLYTEK's world class hardware/software technologies in voice recognition and voice synthesization, iFLYTEK's products can help reduce doctors' burden in writing medical records and free their time to focus more on caring patients. These technologies can also reduce errors and improve completeness and accuracy of medical records, therefore support advanced intelligence applications based on complete patient data. Automated image analysis tools can help doctors find abnormalities in images with confidence, especially for the inexperienced doctors from lower tier hospitals. Clinical Decision Support (CDS) system is based on authoritative medical literature, large amount of expert knowledge, and real cases to improve primary doctors' ability of accurate diagnosis using complete and accurate patient information.
Discuss the difficulties in digital mammography, and the computational challenges we encountered while adapting deep learning algorithms, including GAN, to digital mammography. Learn how we address those computational issues, and get the information of our benchmarking results using both consumer and enterprise grade GPUs.
There is large promise in machine learning methods for the automated analysis of medical imaging data for supporting disease detection, diagnosis and prognosis. These examples include the extraction of quantitative imaging biomarkers that are related to presence and stage of disease, radiomics approaches for tumor classification and therapy selection, and deep learning methods for directly linking imaging data to clinically relevant outcomes. However, the translation of such approaches requires methods for objective validation in clinically realistic settings or clinical practice. In this talk, I will discuss the role of next generation challenges for this domain.
Learn about the key types of clinical use cases for AI methods in medical imaging beyond simple image classification that will ultimately improve medical practice, as well as the critical challenges and progress in applying AI to these applications. We''ll first describe the types of medical imaging and the key clinical applications for deep learning for improving image interpretation. Next, we''ll describe recent developments of word-embedding methods to leverage narrative radiology reports associated with images to generate automatically rich labels for training deep learning models and a recent AI project that pushes beyond image classification and tackles the challenging problem of clinical prediction. We''ll also describe emerging methods to leverage multi-institutional data for creating AI models that do not require data sharing and recent innovative approaches of providing explanation about AI model predictions to improve clinician acceptance.
Dive in to recent work in medical imaging, where TensorFlow is used to spot cancerous cells in gigapixel images, and helps physicians to diagnose disease. During this talk, we''ll introduce concepts in Deep Learning, and show concrete code examples you can use to train your own models. In addition to the technology, we''ll cover problem solving process of thoughtfully applying it to solve a meaningful problem. We''ll close with our favorite educational resources you can use to learn more about TensorFlow.
Protecting crew health is a critical concern for NASA in preparation of long duration, deep-space missions like Mars. Spaceflight is known to affect immune cells. Splenic B-cells decrease during spaceflight and in ground-based physiological models. The key technical innovation presented by our work is end-to-end computation on the GPU with the GPU Data Frame (GDF), running on the DGXStation, to accelerate the integration of immunoglobulin gene-segments, junctional regions, and modifications that contribute to cellular specificity and diversity. Study results are applicable to understanding processes that induce immunosuppressionlike cancer therapy, AIDS, and stressful environments here on earth.
Learn how researchers at Stanford University are leveraging the power of GPUs to improve medical ultrasound imaging. Ultrasound imaging is a powerful diagnostic tool that can provide clinicians with feedback in real time. Until recently, ultrasound beamforming and image reconstruction has been performed using dedicated hardware in order to achieve the high frame rates necessary for real-time diagnostic imaging. Though many sophisticated techniques have been proposed to further enhance the diagnostic utility of ultrasound images, computational and hardware constraints have made translation to the clinic difficult. We have developed a GPU-accelerated software beamforming toolbox that enables researchers to implement custom real-time beamforming on any computer with a CUDA-capable GPU, including commercial ultrasound scanners. In this session, we will: 1) briefly introduce the basics of ultrasound beamforming, 2) present our software beamforming toolbox, and 3) show videos demonstrating its capabilities from a clinical study of echocardiography, as well as an implementation of a novel speckle removing beamformer that utilizes deep fully convolutional neural networks.
Learn CAIDE Systems'' unique diagnosis system with highly accurate prediction and delineation of brain stroke lesion. We''ll present how we increase sensitivity in medical diagnosis system and how we develop a state-of-the-art generative deep learning model for acquiring segmented stroke lesion CT images, and demonstrate our market-ready product: a diagnostic tool as well as a medical deep learning platform. We trained our diagnostic system using CT image data from thousands of patients with brain stroke and tested to see commercial feasibility of use for hospitals and mobile ambulances.
In medical imaging, acquisition procedures and imaging signals vary across different modalities and, thus, researchers often treat them independently, introducing different models for each imaging modality. To mitigate the number of modality-specific designs, we introduced a simple yet powerful pipeline for medical image segmentation that combines fully convolutional networks (FCNs) with fully convolutional residual networks (FC-ResNets). FCNs are used to obtain normalized images, which are then iteratively refined by means of a FC-ResNet to generate a segmentation prediction. We''ll show results that highlight the potential of the proposed pipeline, by matching state-of-the-art performance on a variety of medical imaging modalities, including electron microscopy, computed tomography, and magnetic resonance imaging.
The NVIDIA Genomics Group has developed a deep learning platform to transform noisy, low-quality DNA sequencing data into clean, high-quality data. Hundreds of DNA sequencing protocols are used to profile phenomena such as protein-DNA binding and DNA accessibility. For example, the ATAC-seq protocol identifies open genomic sites by sequencing open DNA fragments; genome-wide fragment counts provide a profile of DNA accessibility. Recent advances enable profiling from smaller patient samples than previously possible. To reduce sequencing cost, we developed a convolutional neural network that denoises data from a small number of DNA fragments, making the data suitable for various downstream tasks. Our platform aims to accelerate adoption of DNA sequencers by minimizing data requirements.
Nanopore sequencing is a breakthrough technology that marries cutting edge semiconductor processes together with biochemistry, achieving fast, scalable, single molecule DNA sequencing. The challenge is real-time processing of gigabytes of data per second in a compact benchtop instrument. GPUDirect, together with the cuDNN library, enables Roche to maximize the effectiveness of Tesla V100 GPUs in their next generation sequencing instrument. Attendees will learn how these pieces come together to build a streaming AI inference engine to solve a signal processing workflow. Analysis and performance comparisons of the new TensorCore units, available on Volta hardware, will be included.cal cuDNN API
Learn how to use (multi) GPU and CUDA to speed up the process of stitching very large images (up to TeraBytes in size). Image stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Image stitching is widely used in many important fields, like high resolution photo mosaics in digital maps and satellite photos or medical images. Motivated by the need to combine images produced in the study of the brain, we developed and released for free the TeraStitcher tool that we recently enhanced with a CUDA plugin that allows an astonishing speedup of the most computing intensive part of the procedure. The code can be easily adapted to compute different kinds of convolution. We describe how we leverage shuffle operations to guarantee an optimal load balancing among the threads and CUDA streams to hide the overhead of moving back and forth images from the CPU to the GPU when their size exceeds the amount of available memory. The speedup we obtain is such that jobs that took several hours are now completed in a few minutes.
This talk will present the challenges and opportunities in developing a deep learning program for use in medical imaging. It will present a hands on approach to the challenges that need to be overcome and the need for a multidisciplinary approach to help define the problems and potential solutions. The role of highly curated data for training the algorithms and the challenges in creating such datasets is addressed. The annotation of data becomes a key point in training and testing the algorithms. The role of experts in computer vision, and radiology will be addressed and how this project can prove to be a roadmap for others planning collaborative efforts will be addressed Finally I will discuss the early results of the Felix project whose goal is nothing short of the early detection of pancreatic cancer to help improve detection and ultimately improve patient outcomes.
Motion tracking with motion compensation is an important component of modern advanced diagnostic ultrasonic medical imaging with microbubble contrast agents. Search-based on sum of absolute differences a well-known technique for motion estimation is very amenable to efficient implementations, which exploit the fine grained parallelism inherent in GPUs. We''ll demonstrate a real-world application for motion estimation and compensation in the generation of real-time maximum intensity projections over time to create vascular roadmaps in medical images of organs, such as the liver with ultrasound contrast agents. We''ll provide CUDA kernel code examples which make this application possible as well as performance measurements demonstrating the value of instruction-level parallelism and careful control of memory access patterns for kernel performance improvement. We hope to provide insight to CUDA developers interested in motion estimation and compensation as well as general insight into kernel performance optimization relevant for any CUDA developer.
Clinical laboratories play a crucial role in healthcare ecosystem - the laboratories are pivotal and act as a screening sub-system by providing early inference in disease and abnormality diagnosis. An estimated 70% of clinical decisions regarding prevention, diagnosis and treatment involve lab tests. Surprisingly, 60% of the inferencing done at a clinical laboratory can be performed by one "wonder-tool" - microscope. Microscopy has helped pathologists assess and analyse the patients for over several centuries. The key hurdles in the microscopic examination are the amount of time that the pathologists have to spend in manual analysis and the need for the pathologists to be co-located with the specimen. In this talk, we introduce SigTuple's AI powered smart microscope that can automatically learn, analyse and summarize the inferences of several hundred abnormalities across different biological specimen (blood, urine and semen). It also utilizes the power of GPU computing on cloud to provide higher order analysis of the samples and acts as a tele-pathology enabler by providing pathologists the power to view or review any analysis or report from any part of the world.
For more than a decade, GE has partnered with Nvidia in Healthcare to power our most advanced modality equipment, from CT to Ultrasound. Part 1 of this session will offer an introduction to the deep learning efforts at GEHC, the platform we're building on top of NGC to accelerate new algorithm development, and then a deep dive into a case study of the evolution of our cardiovascular ultrasound scanner and the underlying extensible software stack. It will contain 3 main parts as follows: (a) Cardiovascular ultrasound imaging from a user perspective. Which problems we need to solve for our customers. Impact of Cardiovascular disease in a global perspective (b) An introduction to the Vivid E95 and the cSound platform , GPU based real time image reconstruction & visualization. How GPU performance can be translated to customer value and outcomes and how this has evolved the platform during the last 2 ½ years. (c) Role of deep learning in cardiovascular ultrasound imaging, how we are integrating deep learning inference into our imaging system and preliminary results from automatic cardiac view detection.
Hear about how GPU technology is disrupting the way your eye doctor works and how ophthalmic research is performed today. The rise of Electronic Medical Records in medicine has created mountains of Big Data particularly in ophthalmology where many discrete quantitative clinical elements like visual acuity can be tied to rich imaging datasets. In this session, we will explore the transformative nature that GPU acceleration has played in accelerating clinical research and show real-life examples of deep learning applications to ophthalmology in creating new steps forward in automated diagnoses, image segmentation, and computer aided diagnoses.
We'll show how recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. FCNs can be trained to automatically segment 3D medical images, such as computed tomography (CT) scans based on manually annotated anatomies like organs and vessels. The presented methods achieve competitive segmentation results while avoiding the need for handcrafting features or training class-specific models, in a clinical setting. We'll explain a two-stage, coarse-to-fine approach that will first use a 3D FCN based on the 3D U-Net architecture to roughly define a candidate region. This candidate region will then serve as input to a second 3D FCN to do a fine prediction. This cascaded approach reduces the number of voxels the second FCN has to classify to around 10 percent of the original 3D medical image, and therefore allows it to focus on more detailed segmentation of the organs and vessels. Our experiments will illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results on many datasets. Code and trained models will be made available.
Explore how parallelized programming and DL can radically impact medical ultrasound imaging. In this session, we will describe how the processing of ultrasound signals can be implemented not only providing real-time capabilities, but also a flexible environment for research and innovative new products. In this view, we will i) demonstrate 2D and 3D real-time imaging using open hardware platforms, and ii) provide an overview, how both radical parallelization and DL can be integrated within processing pipelines, providing new applications and improved image quality at unprecedented speed.
In this session, attendees will learn how to develop an AI Learning Platform for healthcare, develop initial(imaging) AI applications in specific care areas, and embed AI into devices creating "intelligent imaging systems".
Learn about the importance of clinical domain expertise in AI algorithm/model development and incorporation into clinical workflow, specifically in medical imaging, from a radiologist. With growing media attention, there is much fear, hype, and hope when it comes to using DL in radiology. We will present through examples why it is essential to incorporate clinical domain expertise when developing DL models. We will demonstrate various ways AI can augment the radiologists both in image interpretation as well as beyond within the overall workflow. In the second portion of this talk, we will present the gap between developing a great AI model in isolation and having it become part of daily medical practice. From integration and hospital connectivity to algorithm serving at scale to meet growing demand, we will show how an AI Marketplace can create the ecosystem that allows AI to flourish.
The Role of Data in Achieving Precision and Value in Healthcare The goal of healthcare is to provide the most effective treatment to every patient in the most efficient way. Data plays a key role in every aspect of this process from decision support systems that provide a clinician with the right information at the right time, to scheduling algorithms that predict patient flow and schedule accordingly, to analytics to coach and support patients in achieving or maintaining a healthy lifestyle. Achieving the vision of a data-informed healthcare system will require fundamental advances in many areas including causal inference, inference on complex, high-dimensional and heterogeneous data, missing data, process modeling, bias reduction, statistical validation, and model adaptation, to name a few. In this talk, I will illustrate some of these challenges through concrete examples within the Malone Center.
Diabetic retinopathy, also known as diabetic eye disease, is a major complication of diabetes, which damage occurs to the retina due to diabetes mellitus and is a leading cause of blindness. AirDoc's product Dirctor, Emma Xu and Professor You Li of Shanghai Changzheng Hospital, will share how AirDoc, the leading Intelligent Medical startup in China, leverages Nvidia's GPU and Deep Learning to improve the DR diagnose with Automatic left/right eye recognition, Automatic detection of the location and numbers, Automatic DR staging, Fast recognition speed, Patient Information Management for real-time screening statistics and usage management.
Learn how to utilize GPUs better to accelerate cross-validation in Spark, which is widely used in many bigdata analytics/machine learning applications.
Matrix factorization (MF) has been widely used in recommender systems, topic modeling, word embedding, and more. Stochastic gradient descent (SGD) for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small datasets. Distributed systems have higher aggregated memory bandwidth but suffer from relatively slow network connections. This observation inspires us to accelerate MF by utilizing GPUs's high memory bandwidth and fast intra-node connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF problems. On a single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! and wavefront-update, that fully exploit the massive amount of cores. batch-Hogwild! as a vectorized version of Hogwild! especially overcomes the issue of memory discontinuity. On three datasets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1 to 28.2x as fast compared with state-of-art CPU solutions on 1 to 64 CPU nodes.
We'll describe how deep learning can be applied to detect anomalies, such as network intrusions, in a production environment. In part one of the talk, we'll build an end-to-end data pipeline using Hadoop for storage, Streamsets for data flow, Spark for distributed GPUs, and Deeplearning for anomaly detection. In part two, we'll showcase a demo environment that demonstrates how a deep net uncovers anomalies. This visualization will illustrate how system administrators can view malicious behavior and prioritize efforts to stop attacks. It's assumed that registrants are familiar with popular big data frameworks on the JVM.
Based on a comprehensive performance study of Watson workloads, we'll deep dive into optimizing critical retrieve and rank functions using GPU acceleration. The performance of cognitive applications like answering natural language questions heavily depends on quickly selecting the relevant documents needed to generate a correct answer. While analyzing the question to determine appropriate search terms, weights, and relationships is relatively quick, retrieving and ranking a relevant subset from millions of documents is a time-consuming task. Only after completing it can any advanced natural language processing algorithms be effective.
It is estimated that 85% of worldwide data is held in unstructured/unlabelled formats - increasing at a rate of roughly 7 million digital pages per day. Exploiting these large datasets can open the door for providing policy makers, corporations, and end-users with unprecedented knowledge for better planning, decision making, and new services. Deep learning and probabilistic topic modeling have shown great potential for analysing such datasets. This analysis helps in: discovering anomalies within these datasets, unravelling underlying patterns/trends, or finding similar texts within a dataset. We'll illustrate how we can use a combined unsupervised deep learning and topic modeling approach for sentiment analysis requiring minimal feature engineering or prior assumptions, and outperforming the state of the art approaches to sentiment analysis.
Scaling visual investigations is a tough problem. Analysts in areas like cyber security, anti-fraud, ML model tuning, and network operations are struggling to see their data and how it connects. We'll discuss where visual graph analytics gets used and how Graphistry is dramatically streamlining the analyst experience. For example, when using visual graph models for exploring security event logs, we can load events around an incident and quickly determine the root cause, scope, and progression. We'll demonstrate how we solve three technical aspects of scaling visual graph analysis: streamlining investigation workflows, visualizing millions of events in the browser, and fast analytics. Core to our approach, our platform connects GPUs in the client to GPUs on the server. The result is an investigation experience that feels like a ""Netflix for data"" and can be used by anyone with a browser.
Companies of all sizes and in all industries are driven towards digital transformation. Failure to adapt to this movement places businesses at an increased risk in current and future competitive markets. With the slow compute limitation, enterprises struggle to gain valuable insights fast, monetize the data, enhance customer experience, optimize operational efficiency, and prevent fraudulent attacks all at the same time. NVIDIA helps provide deeper insights, enable dynamic correlation, and deliver predictive outcomes at superhuman speed, accuracy, and scale. We'll highlight specific accelerated analytics use cases -- powered by the NVIDIA Tesla platform, DGX-1 AI supercomputer, and NVIDIA GPU-accelerated cloud computing -- in finance, oil and gas, manufacture, retail, and telco industries.
Predictive AI is often associated with product recommenders. We present a landscape of multi-domain behavioral models that predict multi-modal user preferences and behavior. This session will take the audience from first principles of the new Correlated Cross-Occurrence (CCO) algorithms showing the important innovations that lead to new ways to predict behavior into a deep dive into as variety different use cases, for instance using dislikes to predict likes, using search terms to predict purchase, and using conversion to augment search indexes with behavioral data to produce behavioral search. Some of these are nearly impossible to address without this new technique. We show the tensor algebra that makes up the landscape. Next, we walk through the computation using real-world data. Finally, we show how Mahout's generalized CPU/GPU integration and recently added CUDA support bring significant reductions in time and cost to calculate the CCO models. We expect the audience to come away with an understanding of the kind of applications to be built CCO and how to do so in performant in cost reducing ways.
IBM PowerAI provides the easiest on-ramp for enterprise deep learning. PowerAI helped users break deep learning training benchmarks AlexNet and VGGNet thanks to the world's only CPU-to-GPU NVIDIA NVLink interface. See how new feature development and performance optimizations will advance the future of deep learning in the next twelve months, including NVIDIA NVLink 2.0, leaps in distributed training, and tools that make it easier to create the next deep learning breakthrough. Learn how you can harness a faster, better and more performant experience for the future of deep learning.
Polymatica is an OLAP and Data Mining server with hybrid CPU+GPU architecture which turns any analytical work on billions-records data volumes into a proactive process with no waitings. Polymatica architecture uses NVIDIA Multi-GPU (i.e. in DGX-1) in critical operations with billions of raw business data records. This allows to eliminate pauses and accelerate the speed of analytical operations for up to hundred times. You'll see the performance difference on the example of the real analytical process in retail on different hardware: 1) CPU-only calculations on 2*Intel Xeon, no GPU; 2) 2*Intel Xeon + single Tesla P100; 3) DGX-1: 2*Intel Xeon + 8*Tesla P100. Polymatica on DGX-1 become the fastest OLAP and Data Mining engine allowing advanced analytics on datasets of billions of records.
New deep learning frameworks are being developed on a monthly basis. For most of them, the inventors did not have scale-out parallelisation in mind. ApacheSpark and other data parallel frameworks, on the other hand, are becoming the de-facto standard for BigData analysis. In this talk, we will have a look at different deep learning frameworks and their parallelisation strategies on GPUs and ApacheSpark. Well start with DeepLearning4J and ApacheSystemML as first class citizens. We will then have a look at TensorSpark and TensorFrames and finish with CaffeOnSpark to explain concepts like Inter- and Intra-model parallelism, distributed Cross-Validation and Jeff Dean style parameter averaging.
We utilize a MapR converged data platform to serve as the data layer to provide distributed file system, key-value storage and streams to store and build the data pipeline. On top of that, we use Kubernetes as an orchestration layer to manage the containers to train and deploy deep learning models, as well as serve the deep learning models in the form of containers.
This session will present an overview on how we recently applied modern deep learning techniques to the wide area of nanoscience. We will focus on deep convolutional neural network training to classify Scanning Electron Microscope (SEM) images at the nanoscale, discussing first the issues we faced, and then how we solved them by improving the standard deep learning tools. This session aims to introduce a new promising and stimulating field of research that implements deep learning techniques in the nanoscience domain, with the final aim to provide researchers with advanced and innovative tools. These will contribute to improve the scientific research in the boosting field of experimental and computational nanoscience.
In the world of analytics and AI for many, GPU-accelerated analytics is equivalent to speeding up training time. The question, however, remains is how one interprets such highly complex black box models? How these models can help decision-making? Well discuss and present here a GPU based architecture to not only accelerate training the models but also use the GPU based databases and visual analytics to render billions of rows to solve the challenges of interpreting these black box models. With the advent of algorithms, databases and visualization tools, all based on a GPU architecture a solution like this has become more accessible. Interactive visualization of the model, based on partial dependence analysis, is one approach to interpret these opaque models and is our focus here.
Learn how large requests on big datasets, like production or finance data, can benefit from hybrid engine approaches for calculating on in-memory databases. While hybrid architectures are state-of-the-art in specialized calculation scenarios (e.g., linear algebra), multi-GPU or even multicore usage in database servers is still far from everyday use. In general, the approach to handle requests on large datasets would be scaling the database resources by adding new hardware nodes to the compute cluster. We use intelligent request planning and load balancing to distribute the calculations to multi-GPU and multicore engines in one node. These calculation engines are specifically designed for handling hundreds of millions of cells in parallel with minimal merging overhead.
Discover how Credit Suisse has implemented Deep Learning in eCommunications Surveillance, and how moving to GPU-accelerated models has yielded significant business value. The solution works on unstructured data and leverages bleeding-edge Natural Language Processing techniques, and will be enhanced with emotion analysis running on GPU-farms. This talk will include a demo of the functionality.
Deep learning optimization in real world applications is often limited by the lack of valuable data, either due to missing labels or the sparseness of relevant events (e.g. failures, anomalies) in the dataset. We face this problem when we optimize dispatching and rerouting decisions in the Swiss railway network, where the recorded data is variable over time and only contains a few valuable events. To overcome this deficiency we use the high computational power of modern GPUs to simulate millions of physically plausible scenarios. We use this artificial data to train our deep reinforcement learning algorithms to find and evaluate novel and optimal dispatching and rerouting strategies.
A key driver for pushing high-performance computing is the enablement of new research. One of the biggest and most exiting scientific challenge requiring high-performance computing is to decode the human brain. Many of the research topics in this field require scalable compute resources or the use of advance data analytics methods (including deep learning) for processing extreme scale data volumes. GPUs are a key enabling technology and we will thus focus on the opportunities for using these for computing, data analytics and visualisation. GPU-accelerated servers based on POWER processors are here of particular interest due to the tight integration of CPU and GPU using NVLink and the enhanced data transport capabilities.
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JITundefinedAOT Compiler, and Graph Transform Tool , Ill demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment. This talk is 100% demo based with open source tools and completely reproducible through Docker on your own GPU cluster. In addition, I spin up a GPU cloud instance for every attendee in the audience. We go through the notebooks together as I demonstrate the process of continuously training, optimizing, deploying, and serving a TensorFlow model on a large, distributed cluster of Nvidia GPUs managed by the attendees.
NVIDIA DGX Systems powered by Volta deliver breakthrough performance for today''s most popular deep learning frameworks. Attend this session to hear from DGX product experts and gain insights that will help researchers, developers, and data science practitioners accelerate training and iterate faster than ever. Learn (1) best practices for deploying an end-to-end deep learning practice, (2) how the newest DGX systems including DGX Station address the bottlenecks impacting your data science, and (3) how DGX software including optimized deep learning frameworks give your environment a performance advantage over GPU hardware alone.
Caffe2 is a lightweight, modular, and scalable deep learning framework refactored from the previous Caffe. Caffe2 has been widely used at Facebook to enable new AI & AR experiences. This talk will be divided into two parts. In the first part, we will explain some framework basics, the strengths of Caffe2, large scale training support and will walk you through several product use-cases at Facebook including computer vision, machine translation, speech recognition and content ranking. The second part will explain how users benefit from Caffe2''s built-in neural network model compression, fast convolution for mobile CPUs, and GPU acceleration.
Learn how to leverage GPUs for interactive audio rendering. This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illustrate the benefits of GPU-accelerated audio rendering with results from 3D audio processing and sound scattering simulations. Finally, we will discuss best practices for GPU implementations as well as future opportunities for audio rendering on massively parallel architectures.
Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices. Covers the use of inter-process communication to make a plug-in implementation loadable in 32 bit hosts installed in 64 bit systems, distributing the GPU load on remote servers, and creating a CUDA network for high-end purposes such as a big recording facility.
Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU. We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks.
We explore two contending recognition network representations for speech inference engines: the linear lexical model (LLM) and the weighted finite state transducer (WFST) on NVIDIA GTX285 and GTX480 GPUs. We demonstrate that while an inference engine using the simpler LLM representation evaluates 22x more transitions per second than the advanced WFST representation, the simple structure of the LLM representation allows 4.7-6.4x faster evaluation and 53-65x faster operands gathering for each state transition. We illustrate that the performance of a speech inference engine based on the LLM representation is competitive with the WFST representation on highly parallel GPUs.
Automatic speech recognition (ASR) technology is emerging as a critical component in data analytics for a wealth of media data being generated everyday. ASR-based applications contain fine-grained concurrency that has great potential to be exploited on the GPU. However, the state-of-art ASR algorithm involves a highly parallel graph traversal on an irregular graph with millions of states and arcs, making efficient parallel implementations highly challenging. We present four generalizable techniques including: dynamic data-gather buffer, find-unique, lock-free data structures using atomics, and hybrid global/local task queues. When used together, these techniques can effectively resolve ASR implementation challenges on an NVIDIA GPU.
HYDRA, a real-time LVCSR (Large Vocabulary Speech Recognition) engine that performs decoding on CPU, GPU or hybrid CPU/GPU platforms is presented in this talk. While prior works have demonstrated the effectiveness of manycore graphic processing units (GPU) for high-throughput, limited vocabulary speech recognition, they are unsuitable for recognition with large acoustic and language models due to the limited memory. To overcome this limitation, we have developed a novel architecture for speech recognition decoding that jointly leverages manycore graphic processing units (GPU) and multicore processors (CPU) to perform speech recognition even when large acoustic and language models are applied. The proposed architecture can perform speech recognition at up to 5x faster than real-time with a recognition vocabulary of more than 1 Million words.
Apollo Computing Unit (ACU), a mass production-oriented autonomous driving computing platform launched by Baidu, mainly features Apollo Pilot system and Intelligent Map service. As an important part of the Apollo platform, ACU is launched for mass production by the Baidu''s partners. Based on the different computing capabilities required by different scenarios, it is divided into three series of products: ACU-Basic, ACU-Advanced, and ACU-Professional.
We''ll introduce the latest advances on topics such as learning-to-learn, meta-learning, deep learning for robotics, deep reinforcement learning, and AI for manufacturing and logistics.
There is a growing need for fast and power-efficient computer vision on embedded devices. This session will focus on computer vision capabilities on embedded platforms available to ADAS developers, covering OpenCV CUDA implementation and the new computer vision standard OpenVX. In addition, Itseez traffic sign detection will be showcased. The algorithm is capable of detecting speed limit signs for both North America and EMEA regions as well as several other signs, delivering faster than real-time performance on an embedded platform with a mobile grade GPU.
This talk will introduce the main challenges in the next generation of automotive infotainment applications: OEMs want to take advantage of open source solutions like Linux and Android yet have very high requirements on safety, security and boot-times. In addition, to reduce costs, more functionality needs to be integrated on a single processor. An example of this is the integration of the head-unit and the instrument cluster as two displays of a single device. As a solution to these requirements, we describe a software architecture that uses virtualization with a micro-kernel and that is already implemented and available on NVIDIA Tegra3. We will give a brief outlook on the next steps regarding the sharing of the GPU and hardware virtualization.
In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logistic map. This RNG can take advantage of the higher FP-to-integer instruction throughput ratio of CUDA GPUs to generate a large number of high quality random streams in situ. Second, warp-votes and shared memory were used to consolidate workload from diverging warps. Last, inline PTX was used to emulate 24-bit integer arithmetics by their floating point counterparts in order to increase throughput. An implementation using C++ templates ensures that no type-casting overhead is triggered and also guards the technique from unintentional usage.