Date of Award
Doctor of Philosophy (PhD)
The ability to quickly and accurately understand pixel-level scene semantics is a key capability required for various robotics applications such as autonomous driving. Until now, the temporal aspect of this problem has been largely overlooked. Therefore, the focus of research in this dissertation is to study the impact of temporal information in perception-related tasks and investigate whether it is useful to be included, more specifically for semantic scene understanding.
In this thesis, we first propose a set of novel techniques for unsupervised spatio-temporal segmentation in video sequences to obtain regions that are coherent in space and time. We then extend our method to exploit other strong cues present in the scene such as the depth signal or object parts to further improve the accuracy. The bottleneck in studying the temporal data is caused by both the limitations in computing resources and/or the lack of existing comprehensive labeled data. We tackle these issues by introducing a simple and efficient unsupervised label propagation algorithm that transfers the pixel-wise semantic labels from a groundtruth frame to its adjacent neighbor frames and produces auxiliary temporal groundtruth. Finally, we take a further step towards the ultimate goal of holistic scene understanding and present a deep, recurrent multi-scale network that is capable of leveraging the temporal information present in the video data. We show that our model can be easily extended to the related problem of prediction to estimate the expected semantics of the scene a small number of frames into the future. We achieve promising state-of-the-art results on various datasets and prove that our temporal approach is superior to the non-temporal baseline.
Ghafarianzadeh, Mahsa, "Towards Temporal Semantic Scene Understanding" (2017). Computer Science Graduate Theses & Dissertations. 144.