Multimedia, Mathematics & Machine Learning II (09w5056)

Arriving in Banff, Alberta Sunday, July 5 and departing Friday July 10, 2009


(Dialogic Research Inc.)

Li Deng (Microsoft Research)

Rabab Ward (University of British Columbia)


Following the success of the Workshop of Multimedia and mathematics (Meeting I) during July 23-28, 2005, Meeting II of the workshop, as proposed here, is intended to bring together the earlier participants plus some new prominent researchers, with the expanded theme to include machine learning. The purpose is to push the state of the art in multimedia processing techniques and multimedia technologies by exploring modern mathematical, pattern recognition, and machine learning methods that have cross-media generality. In particular, we will bring prominent researchers as well as tutorial lecturers with rich working/research experiences in one or more media types to share their experiences on the commonalities and differences in the mathematical and machine learning techniques for processing different types of media contents.

Multimedia technologies represent rich applications and interactions among a variety of information sources including audio/music, speech, image/graphics/animation, video, and text/documents/language. They also span a wide range of information processing tasks including coding/compression, analysis, communication/networking/security, synthesis, user interface, perception/recognition/understanding, and retrieval/mining. Future multimedia technology development will require an increasing level of intelligence, for which mathematical representation, modeling, and learning will play an increasingly important role. As machine learning provides a rich set of practical algorithms derived from rigorous mathematical analysis, it is thus important to include it as a principal element in the workshop's theme.

We can use a matrix to organize the landscape of a multitude of multimedia research areas. Each row in the matrix represents one information processing task across all media types, and each column in the matrix represents all processing tasks for each media type. An entry in the matrix thus gives a specific area in multimedia research. Many researchers tend to work on isolated research areas represented in the matrix. With modern machine learning techniques, it becomes timely to take a global and possibly unifying view on the previously disparate techniques applied to isolated research areas in multimedia processing. We believe this is a fruitful way of pushing the state of the art. To accomplish this goal, one necessary step for the community is to understand the similarities and the differences of the successful processing and analysis techniques across the various media types. This is another specific goal of the proposed Workshop Meeting II.

Looking back to Meeting I of the workshop conducted in 2005, we are glad to learn from the feedback of the participants, that it provided them with a timely cross-disciplinary bridge between the relatively new field of multimedia and the well-established disciplines of mathematics. For many researchers working in a specific area of multimedia, the workshop provided an excellent opportunity to broaden their perspective. It was clear from the workshop's high-quality presentations that there are surprisingly similar mathematical approaches applied to speech, audio, image, and video-processing research. The presentations and informal discussions enabled participants to examine the variety of approaches in the different media areas. As an example, one presentation in the workshop on image segmentation generated heated debate among the workshop participants. The need for the intermediate step (segmentation) was highly questioned , given that the final task is semantic image understanding/classification instead of segmentation. The researchers with speech recognition/understanding expertise have found that integrated pattern-recognition approaches that avoid the step of speech segmentation always provide better results than modular processing approaches that involve explicit segmentation. The discussions on such disparities provided much needed information that has generated new interest in cross-media research and exploration. One of the current proposers of the Meeting II Workshop was one of the attendees in Meeting I. At the 2006 IEEE Workshop on Multimedia Signal Processing (MMSP2006), it was decided, together with the Technical Committee, to continue such discussions and explorations .A special panel was organized at MMSP2006 on Differences and Similarities of Image/Video and Speech/Audio Processing Techniques, where seven prominent researchers with cross-media research expertise were invited to speak. This special panel turned out to be one most successful part of MMSP2006, with exciting debates and discussions and with the conclusion that researchers working on Image/Video can learn a great deal from those working on Speech/Audio and vice versa. This type of cross-media research interactions would create significant impact on the future of multimedia research, an initiative inspired by the BIRS Meeting-I workshop. The proposed Meeting -II workshop is aimed to push this kind of momentum, with the ultimate goal of creating next-generation multimedia technologies that are intelligent, robust, effective, integrative, and unifying across all media types. To achieve this laudable goal, we believe our community needs a solid underlying mathematical and machine learning foundation, a main theme of the proposed Workshop Meeting II.