Computer Vision and the Internet (09w5126)

Arriving in Banff, Alberta Sunday, August 30 and departing Friday September 4, 2009


(University of Toronto)

(University of Washington)


1. Why this workshop?

We see computer vision as the point of convergence of several current
technological and scientific trends:

- The rapid public adoption of digital photography and of photo- and
video-sharing sites has made the internet a highly-dynamic and
constantly-expanding repository of visual information;

- Vast collections of imagery, previously impossible to capture by an
individual or organization, are now accessible online in a matter of

- Modern processing, storage and networking systems have made analysis
of such collections possible even by individual researchers or small
research groups;

- Geometric optimization algorithms have advanced to the point where
core 3D computer vision problems such as single-view camera calibration,
exterior 3D orientation, triangulation, and bundle adjustment afford
efficient algorithmic solutions that are robust enough to handle
the variability of Internet imagery; and

- Advances in statistical machine learning are providing the core
mathematical and algorithmic tools for reasoning about very large
collections of noisy observational data (images, text, video).

Coupled with the great interest by both industry and the public in
interacting with Internet imagery at a semantic level (for image
search, image-based advertising, virtual walkthroughs, social
networking, etc), these trends suggest that computer vision will
impact, and be impacted by, the Internet in significant ways.

From a scientific point of view, at no time in the field's history has
a visual dataset of the scale and diversity found on the Internet been
available for direct analysis. Even to this day, computer vision
algorithms tackling problems such as 3D reconstruction and object
recognition are designed to handle small, and largely homogeneous,
image collections (a few thousand images at most). A fundamental
question that this workshop will explore is whether the scale and
diversity of Internet data, rather than making core visual analysis
tasks such as reconstruction and recognition harder, actually makes
them simpler---by analyzing key statistical properties that persist
over very large datasets. With this in mind, the workshop will provide
a forum for (1) summarizing the current state of the art, (2)
highlighting the main mathematical and algorithmic tools necessary for
research in this fledgling field, (3) identifying long-term open
research problems, and (4) identifying open problems that are most
readily within reach.

2. Why now?

The last few years have seen an explosion of image-related Internet
sites (Flickr, Facebook, Google Images, and dozens more), to the point
that there are literally billions of images available online. At the same
time, a set of key breakthroughs in computer vision technologies over
the last several years (invariant feature matching, structure-from-motion,
MRF solvers) enable, for the first time, tools that can robustly operate on
the unstructured and diverse imagery found in these Internet collections.
The confluence of these two factors is so recent that publications on this
topic have just begun to appear in the last year (and with much fanfare).
Hence, the proposed workshop is especially timely.

We are aware of no previous workshop, conference, or symposium that
has been devoted to this topic, in North America or elsewhere (we note
that there is a proposal by a different group to hold a one-day "workshop"
on Internet Vision in association with an upcoming computer vision
conference, but the format will be very different and follow the traditional
paper submission/presentation model).

We feel that a focused 5-day meeting will help define and lay the
groundwork for this emerging field, foster collaborations, and spawn
further research.

3. Major topics

The proposed workshop will be centered around the following
main topics:

(1) Recognition, search, and semantic analysis of images and videos

Recent years have seen great progress in the area of recognition and
image retrieval. This progress has been driven in large part by the
application of core statistical learning algorithms to large
collections of visual data. Specific areas of focus will include image
and video clustering; applications of manifold learning and
dimensionality reduction to visual data; spatial data structures,
ranking algorithms, and hashing schemes for efficient image
classification and retrieval; algorithms for supervised learning
object classes from images, text labels and metadata; probabilistic
methods for unsupervised learning of objects and object classes.
How might the Internet transform these fields?

(2) Modeling, reconstruction, and visualization

From the standpoint of shape modeling research, Internet imagery
presents the ultimate data set, capturing the worlds sites and cities
under myriad viewpoint and illumination conditions.
For example, a Google image search for “Notre Dame� returns
nearly a million hits, showing the cathedral from almost every
conceivable viewing position and angle, different times of day and
night, and changes in season, weather, and decade. Furthermore,
entire cities are now being captured from satellite, air, and street level.

With recent and future advances in calibration, matching, and shape
reconstruction, will it be possible to model the world's surface
geometry at high resolution? Similarly, these developments could
enable immersive 3D visualizations with unprecedented detail and

(3) Enabling tools (features, indexing, calibration)

Recent work on invariant features, efficient matching and indexing
techniques, and camera calibration tools have transformed
computer vision techniques to the point that they can operate
robustly and efficiently on very large and diverse datasets. Yet,
we have not yet seen algorithms that operate at "Internet-scale"
(on millions of photos), with extreme appearance changes. What
advances are needed to get to this stage? And what other low-level
capabilities are needed to fuel further progress in the field?

(4) Applications

The last two years have seen several remarkable applications of
the combination of computer vision technologies with Internet imagery,
in areas such as navigation (Google Streetview), 3D reconstruction
and visualization (Microsoft's Photosynth), recognition (Forsyth and
others), image enhancement (Efros), and others.

This "first wave" of applications has fueled a great deal of excitement
both among computer vision researchers and in the public at large.
We believe these represent only the tip of the iceberg, and the next
several years will see many further innovations on the applications
side. A key role of this workshop is to help identify further opportunities
for applications and where we might be in 5-10 years time.

4. About the organizers

Professors Kutulakos and Seitz count half a dozen best-paper awards
among them from the top international conferences in computer vision.
They have received several other prestigious awards (Sloan
Fellowships, NSF CAREER awards, an ONR Investigator Award, etc) and
serve or have served on the Editorial Board of one of the major
journals of the field (IEEE Trans. Pattern Analysis and Machine
Intelligence). Prof. Kutulakos was a Program Co-Chair for the 2003
IEEE Computer Vision and Pattern Recognition Conference, which is the
major annual conference of the field and one of the top three computer
vision conferences internationally. Prof. Seitz has helped pioneer the
area of computer vision on Internet imagery with his influential
work on Photo Tourism (with Snavely and Szeliski) forming the basis of
Microsoft's Photosynth.