Large-Scale Image-Based Modeling
Jon Ventura
Computer Science
University of California, Santa
Barbara
Date: Friday, June 13, 2008
Place: Harold Frank Hall, Room 1132 (CS Conference Room)
Time: 2:00 pm — 3:00 pm
Abstract:
The world around us provides an astonishing amount of visual variety. Lighting sources, surface reflectance properties, shape, location, and orientation all affect what we sense when we open our eyes. Thankfully, the human brain is extremely good at understanding what it sees. Computer vision researchers face the similarly difficult problem of making a computer understand the input a camera receives. In this major area exam we will focus specifically on image-based modeling, where one or more images of the same scene are used to produce a model of the scene itself.
Today, with massive storage and computation available at relatively little expense, the field has reached a new frontier: utilizing very large collections of images for model creation. Millions of photographs can already be found on the Internet, taken from many different locations, and with varying degrees of meta-data. With photo collections this large, we can apply image-based modeling techniques to build photo-realistic models of entire cities. Once an organized image set is in place, a picture from your camera phone could be matched with the database and localized, opening up new possibilities for content delivery and social interaction.
A large-scale multi-view image collection can be considered the visual equivalent of a massive text corpus such as those already used by the natural language processing community. A visual corpus would be rich with labeled examples of different scenes and their constituent parts, as well as many connections between images. By having so many exemplars, we can begin to automatically identify semantically meaningful aspects of an image such as individual objects and the overall scene layout. This breakthrough could dramatically change the current state of the art in image-based modeling, which generally relies on matching and triangulation of low-level structures such as points and lines.
However, many technical hurdles still remain. These include: how to efficiently search and update large multi-view image sets; how to use these image sets for localization on a mobile device; how to compensate for the variation among imaging conditions (such as illumination and dynamic scenes); and how to represent the world model to be reconstructed. State-of-the-art approaches to all these problems will be explored in this major area exam.
Major Area Exam Committee: Tobias
Hollerer (chair), Yuan-Fang Wang, B.S. Manjunath
|