Statistical Inference Problems in high energy physics and astronomy

We are proposing a BIRS workshop on some specific problems of statistical inference that are motivated by problems arising in particular physics and astronomy. We plan to invite twenty physicists and astronomers with strong interests and experience in statistical data analysis, and twenty statisticians with expertise in theoretical statistics, to develop solutions for these problems. While the specific problems have developed from particular types of experiments or data collection efforts, they have common features that are amenable to statistical analyses, and the solutions developed will be more broadly useful in physics and astronomy. We have selected three problems that arise often and for which there is a ongoing development in theoretical statistics. The goal is to bring the latest methods to the attention of the scientific community, and to develop statistical theory further by considering special aspects that arise in these scientific contexts.

The first problem is to study the frequently encountered issue of establishing a confidence range or upper limit on a parameter of interest, in the presence of nuisance parameters, often called `systematic effects' in physics. Methods that have been studied by physicists include Bayesian techniques, profile likelihood, fully frequentist methods, and mixed frequentist for the parameter of interest and Bayes for the nuisance parameters. This is a problem that has been well studied in the theory of statistics, where there is a large literature on how to adjust profile likelihoods to accommodate the estimation of nuisance parameters, and on how to combine these likelihoods in Bayesian and non Bayesian approaches. Two approaches that have been widely used in other contexts are higher order likelihood based approximations, and Bayesian inference via MCMC sampling. There is also a body of work on the construction of priors for high-dimensional parameters. The statistical literature has been developed in fairly general contexts, but the application of the methods to problems arising in particle physics would be both useful for the practical questions, and of theoretical interest in that some unique feature arise; for example it is very common that one or more parameters has as bounded range, which raises particular theoretical aspects. Further, in many of these contexts there are few observations, which, for example, causes strong dependence on prior specification if a Bayesian approach is adopted.

The second problem that arises in many physics analyses involves the separation of multivariate signal events from high-dimensional background. A common approach to estimating this discrimination is to separate the different distributions in the multi-dimensional space of variables that is defined for each event. There are many methods of achieving this separation: simple cuts, Fisher discriminants, principal component analysis, independent component analysis, self-organising maps, kernel methods, neural networks, support vector machines, and boosted decision trees, and these methods are widely studied in the literature on statistical methods in machine learning. Several invited talks at the PHYSTAT Conferences have been devoted to this topic. A systematic study and comparison of the methods would be most valuable, as well as an investigation of techniques for reducing the number of input variables without significantly compromising performance. Of practical interest also is the robustness of these methods to incorrect models, as physical models are often partial and approximate.

The third, closely related, problem is assessing the goodness of fit of a model. In any attempt use data to determine parameters, it is crucial to know whether the data is indeed consistent with the model with the optimised parameters. The simplest method is to bin the data and compute a chi-squared test. There are more flexible approaches that have been studied in the statistical literature, as the binning required by the chi-squared method is unsuitable for sparse data in several dimensions. Practical questions include the power of these methods to discriminate against specific deviations from the tested model. Again this has been a topic which has attracted several talks and much discussion at previous PHYSTAT Conferences. Multidimensional measures of goodness of fit are of interest not only at the final stage of analysis, accepting or rejecting a theoretical model, but also in earlier stages such as testing the accuracy of complex multidimensional models used to simulate performance of apparatus for inputs such as those used to train multidimensional methods mentioned above.

We will also keep open the possibility of devoting some of the Workshop to any burning issue that may arise at the PHYSTAT05 Conference in September 2005.

Details of the previous Conferences can be found in: CERN: http://cern.web.cern.ch/CERN/Divisions/EP/Events/CLW and CERN Yellow Report 2000-005 Fermilab: http://conferences.fnal.gov/cl2k/ Durham: http://www.ippp.dur.ac.uk/Workshops/02/statistics/ and Durham report IPPP/02/39 SLAC: http://www-conf.slac.stanford.edu/phystat2003/ and SLAC-R-703 eConf:C030908