# Random Measures and Measure-Valued Processes (13w5007)

Arriving in Banff, Alberta Sunday, September 8 and departing Friday September 13, 2013

## Organizers

Jean Bertoin (Universität Zürich)

Shui Feng (McMaster University)

Paul Joyce (University of Idaho)

Ramsés Mena (Universidad Nacional Autónoma de México)

## Objectives

It is timely for an inter-discipline dialog on strategies for future development in random measures and measure-valued processes. This workshop will bring together researchers in several closely related areas including random partitions, stochastic analysis, measure-valued processes, stochastic models in population genetics, and Bayesian non-parametric statistics.

The overall objectives are (1) to provide a platform for the exchange of information and knowledge between different research groups and areas, (2) to tackle open problems, and (3) to discuss for future directions in the research of random measure and measure-valued processes.

We have received 40 positive responses to the workshop including the organizers. The participants are from all over the world. They range from recent Ph.Ds and postdoctoral fellows to research leaders in each areas. There will be two-layered research activities during the workshop: researchers in each area will work together to tackle area-specific open problems and challenges while different groups will interact and share information that could pave the way for interdisciplinary collaborations. In addition to formal lectures, ample time will be allocated for discussion sections.

Kingman's coalescence is the first model that has been used for describing the genealogy of populations; it can be thought of as the dual to the neutral Fleming-Viot process. Kingman's coalescence has been extended to more general exchangeable coalescence processes (also known as coalescent with multiple and possibly simultaneous collisions) by Pitman, Sagitov, M"olhe and Schweinsberg around 2000; these are coalescents known to describe the genealogy of generalized Fleming-Viot processes. Further more recent extensions incorporate immigration and neutral mutations; all these extensions exploit a fundamental relation of analytic duality between the (forward in time) population model and the (backward in time) process of its genealogy. In this setting, the assumption of exchangeability of individuals is crucial; it implies that mutations must be neutral and that there should be no selection. An important challenge would thus be to develop a theory for the genealogy of population models with advantageous mutations and selection.

Ewens Sampling Formula (ESF) for Poisson-Dirichlet random measures is doubtless one of the most useful and best known mathematical results in population genetics, not to mention of course its applications to a variety of distinct areas in mathematics. It provides explicitly the distribution of the allelic partition (i.e. the partition of the current population into sub-populations with the same genetic type) for an $n$-sample of the idealized Wright-Fisher population model. ESF has been extended by Pitman to the so-called two-parameter model, and then further developments were obtained together with Gnedin.

Recently, a different class of random partitions have appeared in connection to the allelic partitions of the total population generated by (sub)-critical Galton-Watson branching processes with neutral mutations. Although these allelic partitions possess simple characterizations, there are no known explicit sampling formulas. In a related direction, while the original models of coalescence did not take geometrical aspects into account, recent spatial coalescent models have been introduced by Limic and Sturm. This also yields new types of random partitions for which it would be interesting to obtain explicit sampling formulas.

The theoretical advances in the field of measure-valued processes have permitted to incorporate a number of phenomena such as competition for resources, selection, immigration, mutations ..., which are intended to make stochastic models closer to reality. There is now a clear need of calibration methods for these more sophisticated models to fit data. This requires developing our knowledge about these stochastic population models. It is not sufficient to know that a certain model can be characterized e.g. as the unique solution to some martingale problem, one also needs more specific features concerning its distributions and its fine properties in order to develop efficient statistical methods.

Interestingly, the Bayesian non-parametric literature provides with several models for random measures and also several ways of constructing them or representing them. Often these variety representations help to reveal distributional or other properties hard to get in other contexts.

In particular, Ferguson's Dirichlet process can be represented in at least six different ways, e.g. as a normalized gamma process, as the limit of P'olya urns, as a species sampling model with stick-breaking weights, as a P'olya tree model, etc. Furthermore, these representations serve also as a way to construct more general models, other than Dirichlet process, for instance Lijoi and Pr"{u}nster, and Lenk defined some general and important classes using respectively Kingman's completely random measures and Gaussian processes. Although the motivation to define random measures from this point of view is different from that used in other areas, these mathematical objects are widely used, applied and generalized within various statistical frameworks providing then with a wide variety of estimation and inference methods. Indeed, in a similar direction but with seemingly different purposes, Bayesian non-parametric methods have also evolved to allow for more complex dependence structures, i.e. other than exchangeable samples. In particular, this has been done through what in this area are called dependent RPMs, namely sequences of RPMs indexed by some covariate or time index. These dependent processes directly connect to the concept of measure-valued processes, e.g. Fleming-Viot process with diploid fertility selection can be simply represented via a generalization of the Blackwell-MacQueen P'olya-urn scheme. Once more, within this viewpoint several distributional characterizations and estimation methods for measure-valued processes have been discovered. An important challenge here is to disentangle the connection of these constructions and properties with those used in areas such as super-processes or population genetics. From one side this would serve a starting point for implementations in those area but at the same time benefit from their canonical constructions and properties when defining new statistical methodologies.

In order to improve the dialog between probabilists and statisticians, and other scientific communities dealing with population genetics, it is important to enrich further the class of stochastic models for population dynamics. For instance, the latter should take into account mating and sexual reproduction, and in particular the interactions between individuals and the selection process should also involve such aspects.

The overall objectives are (1) to provide a platform for the exchange of information and knowledge between different research groups and areas, (2) to tackle open problems, and (3) to discuss for future directions in the research of random measure and measure-valued processes.

We have received 40 positive responses to the workshop including the organizers. The participants are from all over the world. They range from recent Ph.Ds and postdoctoral fellows to research leaders in each areas. There will be two-layered research activities during the workshop: researchers in each area will work together to tackle area-specific open problems and challenges while different groups will interact and share information that could pave the way for interdisciplinary collaborations. In addition to formal lectures, ample time will be allocated for discussion sections.

## Progresses and Challenges

Kingman's coalescence is the first model that has been used for describing the genealogy of populations; it can be thought of as the dual to the neutral Fleming-Viot process. Kingman's coalescence has been extended to more general exchangeable coalescence processes (also known as coalescent with multiple and possibly simultaneous collisions) by Pitman, Sagitov, M"olhe and Schweinsberg around 2000; these are coalescents known to describe the genealogy of generalized Fleming-Viot processes. Further more recent extensions incorporate immigration and neutral mutations; all these extensions exploit a fundamental relation of analytic duality between the (forward in time) population model and the (backward in time) process of its genealogy. In this setting, the assumption of exchangeability of individuals is crucial; it implies that mutations must be neutral and that there should be no selection. An important challenge would thus be to develop a theory for the genealogy of population models with advantageous mutations and selection.

Ewens Sampling Formula (ESF) for Poisson-Dirichlet random measures is doubtless one of the most useful and best known mathematical results in population genetics, not to mention of course its applications to a variety of distinct areas in mathematics. It provides explicitly the distribution of the allelic partition (i.e. the partition of the current population into sub-populations with the same genetic type) for an $n$-sample of the idealized Wright-Fisher population model. ESF has been extended by Pitman to the so-called two-parameter model, and then further developments were obtained together with Gnedin.

Recently, a different class of random partitions have appeared in connection to the allelic partitions of the total population generated by (sub)-critical Galton-Watson branching processes with neutral mutations. Although these allelic partitions possess simple characterizations, there are no known explicit sampling formulas. In a related direction, while the original models of coalescence did not take geometrical aspects into account, recent spatial coalescent models have been introduced by Limic and Sturm. This also yields new types of random partitions for which it would be interesting to obtain explicit sampling formulas.

The theoretical advances in the field of measure-valued processes have permitted to incorporate a number of phenomena such as competition for resources, selection, immigration, mutations ..., which are intended to make stochastic models closer to reality. There is now a clear need of calibration methods for these more sophisticated models to fit data. This requires developing our knowledge about these stochastic population models. It is not sufficient to know that a certain model can be characterized e.g. as the unique solution to some martingale problem, one also needs more specific features concerning its distributions and its fine properties in order to develop efficient statistical methods.

Interestingly, the Bayesian non-parametric literature provides with several models for random measures and also several ways of constructing them or representing them. Often these variety representations help to reveal distributional or other properties hard to get in other contexts.

In particular, Ferguson's Dirichlet process can be represented in at least six different ways, e.g. as a normalized gamma process, as the limit of P'olya urns, as a species sampling model with stick-breaking weights, as a P'olya tree model, etc. Furthermore, these representations serve also as a way to construct more general models, other than Dirichlet process, for instance Lijoi and Pr"{u}nster, and Lenk defined some general and important classes using respectively Kingman's completely random measures and Gaussian processes. Although the motivation to define random measures from this point of view is different from that used in other areas, these mathematical objects are widely used, applied and generalized within various statistical frameworks providing then with a wide variety of estimation and inference methods. Indeed, in a similar direction but with seemingly different purposes, Bayesian non-parametric methods have also evolved to allow for more complex dependence structures, i.e. other than exchangeable samples. In particular, this has been done through what in this area are called dependent RPMs, namely sequences of RPMs indexed by some covariate or time index. These dependent processes directly connect to the concept of measure-valued processes, e.g. Fleming-Viot process with diploid fertility selection can be simply represented via a generalization of the Blackwell-MacQueen P'olya-urn scheme. Once more, within this viewpoint several distributional characterizations and estimation methods for measure-valued processes have been discovered. An important challenge here is to disentangle the connection of these constructions and properties with those used in areas such as super-processes or population genetics. From one side this would serve a starting point for implementations in those area but at the same time benefit from their canonical constructions and properties when defining new statistical methodologies.

In order to improve the dialog between probabilists and statisticians, and other scientific communities dealing with population genetics, it is important to enrich further the class of stochastic models for population dynamics. For instance, the latter should take into account mating and sexual reproduction, and in particular the interactions between individuals and the selection process should also involve such aspects.