Some general comments: Radford Neal -- What is the problem? -- What data is available? -- How can the data answer the questions? -- Monte Carlo simulations -- How do we do inference with this? -- What form does the result of this take? (plot of the likelihood function) --------------- Important Issues --------------- -- PID variable creation -- Robustness: How can we handle flaws in the models? How can we detect flaws? -- How to run the MC simulation? ---------------- Some more details ---------------- THe (A?) Standard (?) setup: P(event | f) = f P(event | signal) + (1-f) P(event | background) (1) (two distributions over PID variables; we want to know what f is and whether it's nonzero) Likelihood function: L(f) = \Prod P(event_i | f) Now we have from (1) P(event | f) = f P(signal | event) P(event) + (1-f) { ... simliar term...} ------------------------- P(signal) The denominator comes from the Monte Carlo simulation, it's not = f. Now there is a marginal distribution of events (signal and background) from the Monte Carlo. Factor out the common P(event) to get the next expression and ignore that; we now have a form of the problem that allows us to use the classifier that we've trained. So our original likelihood has been converted to something that depends only on the properties of the classifier, not on extraneous parameters. This explains why multivariate classifiers seem to be the right thing to do. Now can we multiply these together to form an likelihood for N events. Is this wise? Possibly not, because our trained classifier is not perfect. A robustness problem may enter at this point. It is also possible that this is less robust to problems in the original formulation. This motivates a kind of thresholding; which leads then to the simplified version of the problem of Poisson background and Poisson signal events. Note that this is not a statistical classification problem; it is closer to a ocmputation problem to figuring out good approximations to the probability of a signal given an event. We can get arbitrarily close to this in the Monte Carlo. (Comment: not true, you cannot generate unlimited sample sizes + many other comments after this...) Although, statistically motivated classifiers my be very useful here; boosting is an example of this. You should not assume that the classifiers have to have a very limited set of PID variables; with some effort several hundred can be accommodated. And finally, how important is it that the classifier produced be quick on classifying real events? Answer: it depends what it's used for: for PID very fast; for desktop analysis it doesn't matter