Sparse Statistics, Optimization and Machine Learning (11w5012)

Arriving Sunday, January 16 and departing Friday January 21, 2011

Organizers

Francis Bach (INRIA - Ecole Normale Superieure)
Alexandre d'Aspremont (Princeton University)
Martin Wainwright (UC Berkeley)

Objectives

A small revolution has recently started brewing in statistics and information theory, with a stream of consistency results on sparse model identification and decoding being produced in the last few years, together with efficient large-scale algorithms to identify these models. Many intensely active research topics such as sparse recovery in coding theory, compressed sensing and basis pursuit in signal processing, lasso and covariance selection in statistics, feature selection in machine learning, all revolve around the core idea that seeking sparse models is a meaningful way of simultaneously stabilizing inference procedures, and highlighting structure in the underlying data.

These results address two fundamental questions in statistical learning. One is about variable selection: Is a particular variable key to the modeling of our observations? The other question is about model structure: Is the relationship between any two variables key to explain these observations? In this spirit, the workshop will bring together researchers in statistical learning and information theory with experts in mathematical programming techniques, with the objective of producing realistic performance bounds on sparse statistical estimation and decoding. The broader objective of this research is to derive a set of scalable information extraction tools that can turn large-scale data sets into sparse, hence interpretable, models.

A number of researchers in mathematical programming have started working on problems arising in statistics or compressed sensing. Conversely, the statistics and learning communities are growing more aware of the wealth of results in mathematical programming. Our plan is to emulate and build upon the very successful workshop on "Mathematical Programming in Data Mining and Machine Learning" held at Banff in 2007, and allow experts in both statistics and optimization to collaborate on common topics of interest during a week long workshop at Banff.

The workshop's program will revolve around two main themes. One is consistency: developing and studying sparse variants of classic statistical learning algorithms and produce bounds on the error rate compared to the true sparse model. The other direction is computational efficiency: many recently developed statistical performance measures in variable rely on intractable properties of the model which are often approximated by convex relaxations techniques. The optimization component of this workshop will be focused on improving the scalability of these convex programming algorithms. On the other hand, it will seek to produce efficient algorithms to solve sparse inference problems with a special focus on methods that allow online updates of inputs or the use of streaming data to allow efficient model validation using fast bootstrapping and cross-validation procedures.

We plan to bring in three groups of people:

1. Leading experts in statistics and machine learning who will describe open problems that can potentially be solved by mathematical programming techniques.

2. Leading figures in mathematical programming, who will give tutorials on recent advance in their field that is useful and accessible to the statistical community.

3. A selection of young academics including some excellent PhD students and postdocs from different fields who have been working on problems relevant to the topics covered in the workshop.

We are convinced that this workshop will trigger new collaboration, raise awareness to the optimizers of the exciting new opportunities in sparse statistical inference and, in return, expose statisticians to a rich collection of advanced mathematical programming tools. Of course, this workshop will also present excellent possibilities for PhD students and young researchers to get in touch with the challenging, exciting developments of optimization methods and their applications in statistics and machine learning and meet with leading experts of both fields.