# Validating and Expanding Approximate Bayesian Computation Methods (17w5025)

Arriving in Banff, Alberta Sunday, February 19 and departing Friday February 24, 2017

## Organizers

(Universite Paris Dauphine)

(Simon Fraser University)

(Monash University)

(University of Helskinki)

(University of Newcastle)

(University of Sheffield)

## Objectives

There has been an explosion of interest in ABC from an applied perspective over the past 10 years, with some methodological innovations following rather straightforwardly as based on existing, advanced Monte Carlo methodology. Our theoretical understanding of these methodologies in the ABC setting, including their interplay with characteristics of the statistical problems they intend to solve, has really started to improve only in the last few years. These developments lead us to believe that the community is at a critical point where this understanding can be translated into sophisticated and robust methodology for solving the challenging statistical problems that motivated ABC in the first place, many of which have no existing alternatives, and the proposed workshop should constitute a key step in this process.

While there has been workshops (as well as sessions of larger conferences like MCMSki) on ABC methods, those meetings have only concentrated on the growing range of applications of the methodology. We propose to gather experts on the theoretical properties of ABC methods, which are as yet little understood, and on the methodological roadblocks currently in the way of broad-scale adoption of ABC (e.g., high dimensional inputs, big data, dealing with computation constraints etc). We feel that such a workshop will benefit the fields where ABC potentially applies and the research in computational statistical methods, since ABC is still perceived by many as a fringe topic, a surrogate approach to last only until a better" solution is discovered. In addition, we think it will increase the awareness of ABC as a mainstream statistical approach across the North American statistical community.

While the ABC method naturally pertains to the class of Monte Carlo methods, being based on simulations, it also relates to the greater field of approximative models, from which too little has been extracted so far. Speeds of convergence, choice of calibration factors, assessments of reliability, all are still under exploration and we aim the workshop at bringing a more coherent, unified and appealing image of the validity of those methods (which, we recall, have already seen significant application across a broad range of applications).

We also expect the group of experts gathered therein will discuss the stumbling block of big data in ABC settings, since the ability to operate ABC degrades with both the size of the data (since it needs to be simulated) and the dimension of the parameter driving the statistical model (since the distance between estimators automatically increases). When the size of the data or the complexity of the statistical likelihood prohibits algorithms that operate on the whole model at once, even with ABC solutions, this requires techniques that split and merge partial calculations exploiting only parts of the models in the most efficient manner; instances are pre-fetching, collaborative and median mixing, bag of little bootstraps, all proposed in the recent past. Introducing a second level of approximation via the use of insufficient statistics and other information-deprived tools may be the pathway to handle big data and large dimensions, however such solutions must come with theoretical guarantees and implementation guidelines.

### Statement of objectives

The aim of the workshop is to gather an audience of experts made of statisticians and of machine-learning experts who are using and developing computational methods, and of applied mathematicians and computer-scientists studying approximation techniques, towards a clearer picture of the challenges and research directions of the convergence of ABC methods.

The themes that will be central to the workshop are:

• extended theoretical analysis of ABC algorithms (e.g., big data asymptotics, optimal convergence speeds, counter examples and benchmarks);

• novel algorithms to reduce the number of simulations required by ABC (Rao-Blackwellisation, surrogate models, semi-parametric representations, summary simulations);

• novel approaches to calculating distances in ABC (non-Euclidean distances, coding distance, distance models, estimation of the ABC error);

• specifics of ABC model choice, from consistency to novel inference methods

• methods to tackle the ABC curse of dimensionality issue (partial simulations, graph decompositions marginalisation, variational approximations);

• parallel ABC methods (stopping rules, stopping, convergence assessment, data subsampling);

• subsampling ABC representations (merging partial explorations, assessing the information loss);

• the special case of intractable constants in both statistics and physics, where parts of the likelihood are results of intractable integrals, with an ever increasing array of solutions tackling special cases with possible encompassing in a more global perspective;

A proactive effort will be made to achieve a significant presence of younger and under-represented researchers at this workshop. For one thing, it is paramount and enriching that young researchers from our areas are exposed from the start to emerging domains of interest, to intimate research workshops and to debates about state-of-the-art research. We believe that such low stress interactions between generations of researchers is one of the most efficient approaches to future research of all participants. Furthermore, the recent developments in ABC theory and methodology have mostly been induced by young researchers, most of whom are included in the list below. Given the obvious fact that the youngest researchers of 2017 are not all yet at a detectable position, we believe that the various institutions of the mature researchers involved in this workshop will be very open to our request of involving some of their students and postdocs at the time.

#### Bibliography

• [Biau et al., 2015] Biau, G., Cerou, F., and Guyader, A. (2015). New insights into Approximate Bayesian Computation. Annales de l'IHP (Probability and Statistics), 51:376--403.

• [Blum and Francois, 2010] Blum, M. and Francois, O. (2010). Non-linear regression models for approximate Bayesian computation. Statist. Comput., 20:63--73.

• [Csillery et al., 2010] Csillery, K., Blum, M., Gaggiotti, O., and Francois, O. (2010). Approximate Bayesian computation {(ABC) in practice. Trends in Ecology and Evolution, 25:410--418.

• [Dean et al., 2014] Dean, T., Singh, S., Jasra, A., and Peters, G. (2014). Parameter inference for hidden Markov models with intractable likelihoods. Scand. J. Statist. (to appear).

• [Drovandi and Pettitt, 2010] Drovandi, C. and Pettitt, A. (2010). Estimation of Parameters for Macroparasite Population Evolution Using Approximate Bayesian Computation. Biometrics. (To appear.).

• [Fearnhead and Prangle, 2012] Fearnhead, P. and Prangle, D. (2012). Constructing summary statistics for Approximate Bayesian Computation: semi-automatic Approximate Bayesian Computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):419--474. (With discussion.).

• [Frazier et al., 2015] Frazier, D. T., Martin, G. M., and Robert, C. P. (2015). On Consistency of Approximate Bayesian Computation. ArXiv e-prints.

• [Friel and Pettitt, 2008] Friel, N. and Pettitt, A. (2008). Marginal likelihood estimation via power posteriors. J. Royal Statist. Society Series B, 70(3):589--607.

• [Marin et al., 2014] Marin, J., Pillai, N., Robert, C., and Rousseau, J. (2014). Relevant statistics for Bayesian model choice. J. Royal Statist. Society Series B, 76(5):833--859.

• [Marin et al., 2011] Marin, J., Pudlo, P., Robert, C., and Ryder, R. (2011). Approximate {B}ayesian computational methods. Statistics and Computing, pages 1--14.

• [Martin et al., 2014] Martin, G. M., McCabe, B. P. M., Maneesoonthorn, W., and Robert, C. P. (2014). Approximate Bayesian Computation in State Space Models. ArXiv e-prints.

• [Mengersen et al., 2013] Mengersen, K., Pudlo, P., and Robert, C. (2013). Bayesian computation via empirical likelihood. Proceedings of the National Academy of Sciences, 110(4):1321--1326.

• [Pritchard et al., 1999] Pritchard, J., Seielstad, M., Perez-Lezaun, A., and Feldman, M. (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol., 16:1791--1798.

• [Pudlo et al., 2014] Pudlo, P., Marin, J.-M., Estoup, A., Cornuet, J.-M., Gautier, M., and Robert, C. P. (2014). Reliable ABC model choice via random forests. ArXiv e-prints.

• [Robert et al., 2011] Robert, C., Cornuet, J.-M., Marin, J.-M., and Pillai, N. (2011). Lack of confidence in ABC model choice. Proceedings of the National Academy of Sciences, 108(37):15112--15117.

• [Stephens and Donnelly, 2000] Stephens, M. and Donnelly, P. (2000). Inference in molecular population genetics. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4):605--635.

• [Tavare et al., 1997] Tavare, S., Balding, D., Griffith, R., and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics, 145:505--518.