Statistical issues relevant to significance of discovery claims (10w5068)

Arriving in Banff, Alberta Sunday, July 11 and departing Friday July 16, 2010

Organizers

(Michigan State University)

(Simon Fraser University)

Louis Lyons (University of Oxford)

Objectives

The outcome of the 2006 Workshop was so encouraging that we have proposed another BIRS workshop for 2010. By that time new
facilities in Particle Physics and Astrophysics (e.g. the Large Hadron
Collider and the GLAST telescope for gamma rays) should be
producing a large amount of data. There is a strong hope that these will
result in exciting new discoveries. There are interesting statistical issues relating to discovery claims, and it is important to be able to give reliable, widely accepted statistical assessments of the evidence that the result is not due just to a statistical fluctuation.

A potentially disturbing example arises from an experimental Particle Physics collaboration who analyzed their data in 2003, and found that, at greater than a 5 sigma level, their data were inconsistent with the null hypothesis, and instead gave evidence for a new type of particle, the penta-quark. However a subsequent calculation of the Bayes factor comparing the null hypothesis with the alternative of a new particle was said to favor mildly the null hypothesis. This apparent sensitivity of an important conclusion to the statistical technique employed is worrying, and needs to be understood. The conflicting papers from the same authors analyzing the same data can be see at:
http://arxiv.org/abs/hep-ex/0307018 and http://arxiv.org/abs/0709.3154

It would be extremely valuable to have in depth discussions between
scientists and statisticians concerning the issues involved. Some of these
are:

1) Why Particle Physicists like 5 sigma as a discovery criterion; for
Statisticians, requiring a 5 standard error deviation from the null, which
corresponds to a significance level on the order of 1 in a million,
is extraordinarily stringent.

2) Allowing for multiple tests; research groups carry out many tests
on the same data.

3) Blind analysis techniques; classical frequency theory analysis
relies on probabilities computed before the data are collected or
analyzed. These probabilities are not relevant if the frequency theory technique is adjusted after seeing the data. In clinical trials this is traditionally achieved by pre-specifying a protocol for data analysis but the proposal here -- rare in statisticians' experience -- is to build randomness into the fitting software which hides the fitted values of parameters from experimenters as models are tuned.


4) Goodness of fit tests for comparing sparse multi-dimensional data with theory.

5) Comparison of different techniques for comparing 2 hypotheses, for example:

i) p-values (including methods for combining p-values for different tests);
ii) The so-called $"CL_s"$ (ratio of p-values for null hypothesis and alternative), an approach to setting upper confidence limits which is little known in the statistical community;
iii) Likelihood ratio tests, even when null and alternative hypotheses are composite;
iv) Difference in chi-squared of 2 separate fits to the same data;
v) Model selection techniques such as AIC or BIC;
vi) Bayesian techniques such as posterior odds or Bayes factors (including the issue of choice of prior).

6) Adjusting for nuisance parameters in p-value and likelihood calculations.

7) Definitions of sensitivity of searches for new phenomena.


The goal is to invite about 25 physicists and astronomers, and 15
statisticians with expertise in theoretical statistics, to develop solutions for these problems. While the specific problems have developed
from particular types of experiments or data collection efforts, they have
common features that are amenable to statistical analyses; the solutions
developed will thus be more broadly useful in physics and astronomy.
We want to bring the latest methods to the attention of the scientific
community, and to develop statistical theory further by considering
special aspects that arise in these scientific contexts.

The topics considered will also fertilize the statistical community by
providing other scientific contexts in which to evaluate the statistical
ideas arising, for instance, from bioinformatics, from remote sensing, and
from climate modeling. In these areas and others the issues of multiple
testing, model assessment and validation (including goodness-of-fit with
very sparse data), appropriate use of Bayes factors, choice of prior
(including sensitivity analysis for this choice), and appropriate elimination
of nuisance parameters have stimulated a great deal of statistical
research. Evaluation of these ideas in the particle and astrophysics
contexts should have multi-way benefits: better analysis of physics
and astronomy data; better understanding by statisticians of their data
analysis suggestions in practical contexts; and perhaps new data
analysis ideas for application back to bioinformatics, climate research and
so on. At the same time the issues surrounding comparison of hypothesis
testing techniques will cause statisticians to reflect on the classical
controversies that are at the foundations of their discipline, but this time they will be informed by experiments with solid data.

We will also keep open the possibility of devoting some of the Workshop
to any burning issues that may arise from measurements obtained with
the new detectors.

By assembling Physicists and Statisticians with direct interests in
discovery questions, we consider that this Workshop would be extremely
useful in clarifying the statistical issues involved. With high-profile results becoming available, the Workshop will be both timely and important.