# CosmoStat2013: Statistical challenges from large data sets in cosmology and particle physics (13w5100)

Arriving in Banff, Alberta Sunday, March 17 and departing Friday March 22, 2013

## Organizers

Ofer Lahav (University College London)

Roberto Trotta (Imperial College London)

Ben Wandelt (Sorbonne University)

## Objectives

In 2013 both the cosmic microwave background satellite Planck and the Large Hadron Collider at CERN are expected to report first science results, which makes the proposed workshop highly timely. This conference therefore aims to leverage and combine the existing knowledge and to catalyze the building up of a new, integrated framework to help us along the path of understanding our world, what it is made of and where it came from.

The purpose of the proposed conference is to bring together cosmologists, statisticians, experts in data mining and scientists from other fields, especially from particle physics (both phenomenologists and experimentalists), in order to provide a framework for a fruitful cross-fertilization of ideas across disciplines. The main objective is to foster the exchange of ideas and concepts both within the respective research directions as well as in between them. This will involve critical assessment of algorithms and software packages developed in the 'other camp' which may not be known to workers outside the respective specialities. Ample time will be devoted to informal discussions, in order to leave room for the development of collaborations and the exchange of ideas.

One of the motivations for this workshop is the fact that cosmology and particle physics have developed two rather different approaches to the statistical analysis of their respective datasets, despite the commonalities in their respective goals. While particle physics is using mostly an approach based on the relative frequency of events (called 'frequentist'), which relies heavily on comparing observed data with simulations, the method of choice in cosmology is to translate data into probabilities representing 'degrees of beliefs' in the underlying models and their parameters ('Bayesian statistics').

This state of affairs has come to represent a major hurdle when trying to compare and combine statistical results coming from different fields. A case in point is the determination of neutrino properties (mass and mixing angles): present-day knowledge comes from a combination of laboratory experiments on Earth, from data about astrophysical neutrino fluxes (from the Sun and nearby supernovae) as well as observations of the impact of neutrinos on the growth of structures in the Universe. Since the particle physics and cosmology/astrophysics communities use different approaches and statistical tools, the investigation of such cross-disciplinary question will have to enter a new phase. Two aspects need in particular to be taken care of: on one hand, the complementarity of the datasets might not be fully exploited, because researchers in different disciplines are working independently of each other (for example, combination of dark matter constraints from the LHC and direct detection); on the other hand, the underlying assumptions and possible systematic effects coming from cosmology are often not properly appreciated by particle physicists, to the point that they are often either uncritically accepted or outright dismissed (for example, in relation with cosmological constraints on the neutrino mass). Addressing these issues will become even more urgent as new observations deliver increasingly precise results, for which a new level of statistical sophistication will be required.

The eventual scientific success of the ambitious observational program in cosmology and the experimental particle physics approach at the LHC is critically dependent on our ability to process and reduce the data collected to make correct inferences on the underlying physical mechanisms. As far as cosmology is concerned, one of the most serious limiting factors will be our technical ability to handle the large amount of data produced by the future generation of large-scale surveys. However, beyond this, the different nature of the experiments conducted in collider physics and cosmology have traditionally translated into two very different approaches to statistics and data reduction. Often physicists from different research domains have problems understanding the details of each other's work, and do not trust some of the results for this reason. This conference will

help to develop a common language for analysing cosmological and particle physics data. The fact that both avenues are likely to be required in order to make progress in the outstanding questions outlined above is a compelling reason for comparing approaches and trying to synthetize them in a more fruitful way, learning from past experience how to best deal with future challenges.

The particular focus of this workshop will be to reflect upon the challenges in data reduction, statistical analysis and model inference posed by future cosmological observations and particle physics experiments, making best use of the experience accumulated over the years and drawing upon the expertise of leading statisticians. Examples of topics that will be addressed by the workshop include:

- sparse modeling and compressed sensing and their applications to astrophysical data analysis

- Bayesian computing and novel computational approaches for exploring large model spaces

- measuring performance of signal detection methods

- new data mining techniques for anomaly detection in extremely large data sets

We will also keep open the possibiity of devoting part of the workshop to urgent statistical issues arising from data sets that will have been released from now and 2013.

We will high-profile speakers from the statistics community, so that their keynote talks will encourage cross-disciplinary thinking and the development of synergies between different domains. On the other hand, it is expected that the extremely demanding nature of the future data sets in particle physics and cosmology will spark new avenues for research in the statistics community. Indeed, within the professional statisticians community there is a growing group who have been actively engaged in astrostatistics, becoming involved for example in some of the permanent fora for astronomers-statisticians interactions that now exist, such as Penn State's Center for Astrostatistics, the California-Harvard Astrostatistics Collaboration (CHASC) and the International Computational AstroStatistics Group (InCA). This workshop will therefore build on this growing activity and offer a highly timely opportunity for developing new collaborations.

SOC

Gianfranco Bertone (GRAPPA Amsterdam)

Mike Hobson (Cambridge)

Martin Kunz (Geneva University)

Louis Lyons (Imperial/Oxford)

Jean-Luc Starck (CEA Saclay)

David van Dyk (Imperial College London)

The purpose of the proposed conference is to bring together cosmologists, statisticians, experts in data mining and scientists from other fields, especially from particle physics (both phenomenologists and experimentalists), in order to provide a framework for a fruitful cross-fertilization of ideas across disciplines. The main objective is to foster the exchange of ideas and concepts both within the respective research directions as well as in between them. This will involve critical assessment of algorithms and software packages developed in the 'other camp' which may not be known to workers outside the respective specialities. Ample time will be devoted to informal discussions, in order to leave room for the development of collaborations and the exchange of ideas.

One of the motivations for this workshop is the fact that cosmology and particle physics have developed two rather different approaches to the statistical analysis of their respective datasets, despite the commonalities in their respective goals. While particle physics is using mostly an approach based on the relative frequency of events (called 'frequentist'), which relies heavily on comparing observed data with simulations, the method of choice in cosmology is to translate data into probabilities representing 'degrees of beliefs' in the underlying models and their parameters ('Bayesian statistics').

This state of affairs has come to represent a major hurdle when trying to compare and combine statistical results coming from different fields. A case in point is the determination of neutrino properties (mass and mixing angles): present-day knowledge comes from a combination of laboratory experiments on Earth, from data about astrophysical neutrino fluxes (from the Sun and nearby supernovae) as well as observations of the impact of neutrinos on the growth of structures in the Universe. Since the particle physics and cosmology/astrophysics communities use different approaches and statistical tools, the investigation of such cross-disciplinary question will have to enter a new phase. Two aspects need in particular to be taken care of: on one hand, the complementarity of the datasets might not be fully exploited, because researchers in different disciplines are working independently of each other (for example, combination of dark matter constraints from the LHC and direct detection); on the other hand, the underlying assumptions and possible systematic effects coming from cosmology are often not properly appreciated by particle physicists, to the point that they are often either uncritically accepted or outright dismissed (for example, in relation with cosmological constraints on the neutrino mass). Addressing these issues will become even more urgent as new observations deliver increasingly precise results, for which a new level of statistical sophistication will be required.

The eventual scientific success of the ambitious observational program in cosmology and the experimental particle physics approach at the LHC is critically dependent on our ability to process and reduce the data collected to make correct inferences on the underlying physical mechanisms. As far as cosmology is concerned, one of the most serious limiting factors will be our technical ability to handle the large amount of data produced by the future generation of large-scale surveys. However, beyond this, the different nature of the experiments conducted in collider physics and cosmology have traditionally translated into two very different approaches to statistics and data reduction. Often physicists from different research domains have problems understanding the details of each other's work, and do not trust some of the results for this reason. This conference will

help to develop a common language for analysing cosmological and particle physics data. The fact that both avenues are likely to be required in order to make progress in the outstanding questions outlined above is a compelling reason for comparing approaches and trying to synthetize them in a more fruitful way, learning from past experience how to best deal with future challenges.

The particular focus of this workshop will be to reflect upon the challenges in data reduction, statistical analysis and model inference posed by future cosmological observations and particle physics experiments, making best use of the experience accumulated over the years and drawing upon the expertise of leading statisticians. Examples of topics that will be addressed by the workshop include:

- sparse modeling and compressed sensing and their applications to astrophysical data analysis

- Bayesian computing and novel computational approaches for exploring large model spaces

- measuring performance of signal detection methods

- new data mining techniques for anomaly detection in extremely large data sets

We will also keep open the possibiity of devoting part of the workshop to urgent statistical issues arising from data sets that will have been released from now and 2013.

We will high-profile speakers from the statistics community, so that their keynote talks will encourage cross-disciplinary thinking and the development of synergies between different domains. On the other hand, it is expected that the extremely demanding nature of the future data sets in particle physics and cosmology will spark new avenues for research in the statistics community. Indeed, within the professional statisticians community there is a growing group who have been actively engaged in astrostatistics, becoming involved for example in some of the permanent fora for astronomers-statisticians interactions that now exist, such as Penn State's Center for Astrostatistics, the California-Harvard Astrostatistics Collaboration (CHASC) and the International Computational AstroStatistics Group (InCA). This workshop will therefore build on this growing activity and offer a highly timely opportunity for developing new collaborations.

SOC

Gianfranco Bertone (GRAPPA Amsterdam)

Mike Hobson (Cambridge)

Martin Kunz (Geneva University)

Louis Lyons (Imperial/Oxford)

Jean-Luc Starck (CEA Saclay)

David van Dyk (Imperial College London)