Robust and Accurate 3D Motion Estimation under Adverse Conditions von Christoph Vogel | ISBN 9783038370031

Robust and Accurate 3D Motion Estimation under Adverse Conditions

von Christoph Vogel
Buchcover Robust and Accurate 3D Motion Estimation under Adverse Conditions | Christoph Vogel | EAN 9783038370031 | ISBN 3-03837-003-7 | ISBN 978-3-03837-003-1

Robust and Accurate 3D Motion Estimation under Adverse Conditions

von Christoph Vogel
3D vision from two eyes and the capability to detect moving objects and estimate their location and movement speed are the basic evidence for humans and animals to interpret and interact with their environment. Likewise, these cues are important for machines, which, equipped with photographic cameras, perceive the world in a similar manner. In the future, intelligent perceiving systems could become omnipresent in our everyday life. Thus, it is not surprising that both, stereo and motion estimation, were in the focus of computer vision since its early days and recently, because of their importance for several key applications like driver assistance, autonomous driving or scene understanding, have regained attention. Somewhat surprisingly, though, both tasks are traditionally tackled individually by the vision community.
This thesis considers the estimation of scene flow, the joint computation of motion and geometry from images of (calibrated) stereo cameras, acquired over at least two time steps. Scene flow models appear to have certain advantages over computing geometry and motion individually. Joint inference should allow to exploit correlations between both entities and for instance better reveal co-occurring motion and geometry discontinuities. The capability to lift the motion representation from the image plane to metric 3D space should also be beneficial, as well as the availability of redundant views of the scene.
This work begins with these ideas, and shows that scene flow methods can indeed outperform dedicated stereo and optical methods at their respective task. In the focus of this thesis are outdoor scenarios, where passive, stereo camera systems possess advantages over active systems like time-of-flight, pattern projection or laser based devices. Under uncontrolled lighting conditions, the constraints imposed by the data can be conflicting or even misleading. Especially in these challenging scenarios, however, scene flow methods can lead to significantly better reconstructions than their 2D counterparts, stereo and optical flow.
We start with a systematic evaluation of different data terms, including several prominent per-pixel and patch-based data costs. Because the data term is vital for high quality reconstructions under adverse conditions, we conduct most of the experiments on a challenging outdoor dataset and try to minimize influences unrelated to the data cost. Motivated by the capability to deliver metric 3D motion estimates, we develop our first scene flow method and focus on the 3D accuracy of the reconstructed motion. Exploiting that many scenes of interest consist of rigidly moving parts, we propose to locally penalize the difference of the flow vectors to a rigid motion. The model prefers locally rigid motions but is not limited to completely rigid scenarios. Building on the experience with this local rigidity assumption we go one step further and propose to model the scene by planar and over time rigidly moving regions, into which the input images are segmented. Compared to conventional pixel-based representation, significantly less model parameters have to be determined, and jointly estimated along the (over-)segmentation of the scene, leading to accurate geometry and motion boundaries. Finally, the piecewise rigid model is extended by introducing the concept of view-consistency. Here each view holds its own representation of the scene, which are encouraged to be consistent across all frames. This leads to a situation, where all the data of all cameras is treated equally and has to be explained. In practice, view-consistency allows for more efficient occlusion handling and stabilizes the estimation especially in the presence of imaging outliers. Because we employ a scene space parameterization, the model can be easily extended to handle multiple frames in a temporal sliding window, which furthermore increases the redundancy and improves the quality of the reconstruction even further. We evaluate our models on recent datasets, and demonstrate results superior to the state-of-the art for both stereo and motion. Overall the results support the proposition that carefully exploiting the aforementioned advantages in the underlying models, can unveil the potential of scene flow.