SEARCH SESSIONS

Search All
 
Refine Results:
 
Year(s)

SOCIAL MEDIA

EMAIL SUBSCRIPTION

 
 

GTC On-Demand

Presentation
Media
Abstract:
We will discuss a deep learning-based method for improving the quality of 3D reconstruction performed by time-of-flight cameras. Scene motion, multiple reflections, and sensor noise introduce artifacts in the depth reconstruction performed by these sensors. We'll explain our proposed two-stage, deep-learning approach to address all of these sources of artifacts simultaneously. We'll also introduce FLAT, a synthetic dataset of 2000 ToF measurements that capture all of these nonidealities and can be used to simulate different hardware. Using the Kinect camera as a baseline, we show improved reconstruction errors on simulated and real data, as compared with state-of-the-art methods.
We will discuss a deep learning-based method for improving the quality of 3D reconstruction performed by time-of-flight cameras. Scene motion, multiple reflections, and sensor noise introduce artifacts in the depth reconstruction performed by these sensors. We'll explain our proposed two-stage, deep-learning approach to address all of these sources of artifacts simultaneously. We'll also introduce FLAT, a synthetic dataset of 2000 ToF measurements that capture all of these nonidealities and can be used to simulate different hardware. Using the Kinect camera as a baseline, we show improved reconstruction errors on simulated and real data, as compared with state-of-the-art methods.  Back
 
Topics:
AI and DL Research, Computer Vision
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9318
Streaming:
Download:
Share:
 
Abstract:
We'll discuss a deep-learning approach that takes as an input a few images of a scene and synthesizes new views as seen from virtual cameras. This could be used to generate videos such as camera flyby videos, or simply a view of the scene from a new location. Despite several novel view synthesis approaches, the quality of resulting images quickly degrades when the virtual camera moves significantly with respect to the input images due to increasing depth uncertainty and disocclusions. We'll describe how we cast this problem as one of depth probability estimation for the novel view, image synthesis, and conditional image refinement. We'll also cover traditional and deep learning-based depth estimation, issues with warping-based novel view synthesis methods, and how depth information can be used to refine the quality of synthesized images.
We'll discuss a deep-learning approach that takes as an input a few images of a scene and synthesizes new views as seen from virtual cameras. This could be used to generate videos such as camera flyby videos, or simply a view of the scene from a new location. Despite several novel view synthesis approaches, the quality of resulting images quickly degrades when the virtual camera moves significantly with respect to the input images due to increasing depth uncertainty and disocclusions. We'll describe how we cast this problem as one of depth probability estimation for the novel view, image synthesis, and conditional image refinement. We'll also cover traditional and deep learning-based depth estimation, issues with warping-based novel view synthesis methods, and how depth information can be used to refine the quality of synthesized images.  Back
 
Topics:
Animation and VFX, Computer Vision, Video and Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2019
Session ID:
S9576
Streaming:
Share:
 
Abstract:
Telling the right story with a picture requires the ability to create the right composition. Two critical parameters controlling composition are the camera position and the focal length of the lens. The traditional paradigm to capture a picture is for a photographer to mentally visualize the desired result, select the capture parameters to produce it, and finally take the photograph, thus committing to a particular composition. To break this paradigm, we introduce computational zoom, a framework that allows a photographer to manipulate several aspects of composition in post-capture. Our approach also defines a multi-perspective camera that can generate compositions that are not attainable with a physical lens. Our framework requires a high-quality estimation of the scene's depth. Existing methods to estimate 3D information generally fail to produce dense maps, or sacrifice depth uncertainty to avoid missing estimates. We propose a novel GPU-based depth estimation technique that outperforms the state of the art in terms of quality, while ensuring that each pixel is associated with a depth value.
Telling the right story with a picture requires the ability to create the right composition. Two critical parameters controlling composition are the camera position and the focal length of the lens. The traditional paradigm to capture a picture is for a photographer to mentally visualize the desired result, select the capture parameters to produce it, and finally take the photograph, thus committing to a particular composition. To break this paradigm, we introduce computational zoom, a framework that allows a photographer to manipulate several aspects of composition in post-capture. Our approach also defines a multi-perspective camera that can generate compositions that are not attainable with a physical lens. Our framework requires a high-quality estimation of the scene's depth. Existing methods to estimate 3D information generally fail to produce dense maps, or sacrifice depth uncertainty to avoid missing estimates. We propose a novel GPU-based depth estimation technique that outperforms the state of the art in terms of quality, while ensuring that each pixel is associated with a depth value.  Back
 
Topics:
Computer Vision, Video and Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2018
Session ID:
S8253
Streaming:
Share:
 
Abstract:
We'll show how image restoration tasks, such as image denoising and demosaicking, super-resolution, and JPEG deblocking can beat the state-of-the-art methods, when performed with neural networks. In particular, we'll show that even a shallow network can produce good results when it is trained to evaluate images in the same way in which humans do, that is, when perceptual loss functions are used in training. We will also discuss strength and limitations of different perceptual loss functions.
We'll show how image restoration tasks, such as image denoising and demosaicking, super-resolution, and JPEG deblocking can beat the state-of-the-art methods, when performed with neural networks. In particular, we'll show that even a shallow network can produce good results when it is trained to evaluate images in the same way in which humans do, that is, when perceptual loss functions are used in training. We will also discuss strength and limitations of different perceptual loss functions.  Back
 
Topics:
Deep Learning and AI, Video and Image Processing
Type:
Talk
Event:
GTC Silicon Valley
Year:
2017
Session ID:
S7447
Download:
Share: