Phone: (+61 8) 6488 2238
Fax: (+61 8) 6488 1089
Holistic Scene Understanding: Towards an Improved Semantic and Geometric Reasoning Framework for Indoor and Outdoor Scenes
For the past few decades individual tasks (such as recognition, segmentation, classification) were the main focus of computer vision research community. Today, the performance of these algorithms is significantly improved, however computer vision algorithms lack the ability to develop a human-like understanding of a scene. In other words, state-of-the-art computer vision algorithms do not jointly reason about objects, their semantic relationships, their geometric extent and the location of the objects in the scene as a whole. As these individual tasks are now well developed, on-line visual content is increasing and new cheap sensors are coming into the market, the need is to combine these components together and make them complement each other in a holistic framework. We propose to contribute towards holistic scene understanding in a way that will not only incorporate components of object recognition, semantic labeling, class existence and spatial and geometric properties of individual classes, but also to deal with scene and event categorization and 3D reconstruction from 2D images. Later, we plan to add robust text identification and recognition in natural scenes to our system. This extension comes from a common observation that real world scenes contain important textual information that can significantly contribute in improving the performance of individual tasks. Since the very nature of scene understanding problem is statistical, this joint reasoning framework will benefit from the machine learning tools and probabilistic graphical models.
Holistic or Total scene understanding can be used for high level reasoning about indoor and outdoor scenes to infer relationships among the different objects and their spatial layouts. This information itself can help in intelligent and robust object recognition, semantic labeling, automatic annotations, context aware segmentation, identifying support relationships, geometric reasoning and enabling robots to automatically manipulate and navigate environments freely.