Statistics and Nonlinear Dynamics in Biology and Medicine (14w5079)

Arriving in Banff, Alberta Sunday, July 27 and departing Friday August 1, 2014

Organizers

(Simon Fraser University)

(McMaster University)

(Cornell University)

(University of Michigan)

(Newcastle University)

Objectives

The past decade has seen a considerable growth in interest in the problem of combining nonlinear dynamic models with data. This has been sparked by both an increase in computational power -- allowing hitherto infeasible methods to be employed -- as well as an enormous increase in data from public health and environmental monitoring along with new experimental protocols and medical apparatuses. These all pose great opportunities and challenges for Statistics, Applied Mathematics as well as substantive fields within Biology and Medicine. These challenges range from purely computational problems in simulating or filtering finite-population processes to theoretical and conceptual problems in distinguishing good models, developing methods of experimental design and formulating properties such as robustness within this context. This workshop will bring together prominent and promising researchers in the fields of Statistics, Epidemiology, Applied Mathematics, Ecology and Systems Biology to consolidate recent advances in methods for inference in dynamical systems and to foster closer collaboration between these disciplines and set priorities for future research directions.The principle distinction in these models is the notion of being mechanistic; that is, they derive an idealized representation of system dynamics from first-principles rather than modeling the phenomenological behavior of the system from data alone. This means that while the interpretation can differ, the models used to describe dynamics are very similar across the areas of Ecology, Immunology, Epidemiology and Systems Biology. Despite this, they each also have unique characteristics in terms of the dimensionality of their system, the level of stochastic volatility, the precision with which measurements can be taken and the concerns of the relevant discipline. Thus, this combination of fields is intended to provide a useful test-bed for the applicability of new statistical methods as well as benefiting from their development.Despite a considerable, though recent, body of literature that has developed around the problems addressed in this workshop, there remain substantial unsolved problems. These range from computational challenges in optimizing over difficult or stochastic likelihood surfaces, simulating complex stochastic processes and developing efficient Monte Carlo methods for inference. There similarly remain considerable theoretical challenges including: understanding the relationship between system behavior (i.e. bifurcations and their stochastic counterparts) and statistical properties of estimates based on data from these systems, problems in model selection with high-dimensional state spaces -- in particular reconstructing the structure of network models, the development of approximate simulation schemes for complex models, model comparison, notions of robustness to model miss-specification and experimental design. All of these represent substantial challenges that will be addressed through the week.We have already contacted a number of prominent researchers who have contributed to the literature in dynamical systems and statistical inference and have listed those who have expressed interest in attending below. We will, additionally, actively invite new researchers and post-docs working in the field or interested in joining it.The workshop will be structured around four mathematical and statistical areas: deterministic modeling, stochastic modeling, simulation-based inference and a day devoted to broader statistical issues. A final day will be reserved for round-table discussions of important future problems in the field. The intention is to use statistical and methodological issues as a means to bring out commonalities in problems across biological disciplines and facilitate further cross-disciplinary collaboration. In particular we will include talks from several substantive disciplines each day. We will also begin the meeting with a short introductory session allowing each participant a few minutes to describe their own research interests as a means of facilitating informal discussion and collaboration. DAY 1: Deterministic Models and Differential EquationsDeterministic models, particularly ordinary differential equations, have been the mainstay of applied mathematics modeling since Newton and there is thus a wealth of models and methods associated with them. They remain highly important due to the ease with which it is possible to analyze the qualitative behavior of their dynamics and to predict the qualitative effect of interventions in systems through the use of bifurcation methods. However, they remain challenging to employ within statistical inference for two reasons: - the structure of dynamics means that objective surfaces for parameter estimation tend to be complex, multi-modal objects that are difficult to optimize. - the simplifying assumptions required to make equations tractable, combined with strictly deterministic evolution, tend to result in poor agreement with empirical data even when the parameters are chosen optimally.There has thus been considerable attention payed to global optimization methods as well as to fitting qualitative aspects (i.e. summaries) of the data. Similarly, the approach of estimating the trajectory of a system through nonparametric methods and then choosing parameters to maximize the agreement of the derivative of the estimated trajectory with the value predicted from the equation has received considerable recent statistical attention. Both of these approaches can be viewed as ways to provide inference that is robust to model mis-specification, including to the assumption of determinism, although these notions of robustness have yet to be formalized.Very recent work has also seen increasing interest in model selection within ordinary differential equations. A particular problem here is in inferring the structure of network models in systems biology. Here, model selection is achieved through a sparsity prior such as the LASSO or with model selection priors in Bayesian analysis. These ideas also result in substantial computational and theoretical challenges; in integrating high-dimensional ordinary differential equations, in the theoretical properties of estimates for such problems and in particular the statistical behavior of sparsity penalties under these conditions and substantial time will be devoted to these problems as well.DAY 2: Stochastic ModelsAlongside the deterministic systems studied in Day 1, stochastic models have been given increasing interest in applied fields as well as in mathematics and statistics. This is in part because they retain flexibility to fit observed data directly. However, since direct observations of all the states of a dynamical system are rarely available, under this model the evolution of the state becomes a latent stochastic process. Typically, statistical inference is conducted under the framework of Partially Observed Markov Processes (POMPs). This framework facilitates estimation in these processes both in terms of deriving analytic expressions for the likelihood of some models and also of methods based on simulating the process rather than requiring calculations of often analytically intractable transition probabilities.These models share many of the same open problems as the deterministic models above, although frequently with greater computational or analytic challenges. In addition, important challenges include: - Data-driven choices of stochastic models; what components of the systems evolution should be made random and how? - Models for dynamics involving states of changing dimension. These occur, for example, in influenza in which the rapid evolution of the virus leads to multiple competing strains. - Methods for understanding, and drawing inferences about, the qualitative behavior of stochastic systems. - The incorporation of spatial information and variation into stochastic models. - Incorporating random effects from multiple observed realizations of the process.DAY 3: Broader Statistical IssuesThis day will be devoted to broader statistical issues in modeling dynamical systems. In particular, we will consider the following topics: - Experimental design; for systems that can be controlled in a laboratory, how should control parameters be set? What observation regimes should be employed? If repeated experiments can be run, how should they be allocated? These are problems in both deterministic and stochastic frameworks. - Goodness of fit and model comparisons. How do we compare competing and potentially non-nested models? How do we test for goodness of fit, and how do we allocate poor fit between models for the evolution of the state variables and models of the observation process? - Nonparametric terms within dynamical systems models have seen some interest, what is their role and what statistical properties do they have? Because Wednesdays are traditionally shorter at BIRS, these relatively unexplored topics are intended to stimulate discussion in an afternoon outing.DAY 4: Simulation-based MethodsMany of the most successful methods for inference in POMPs have involved sequential Monte Carlo (SMC) methods which require only the ability to simulate the process under consideration and, in particular, do not require the user to derive often intractable expressions for the conditional probability of a state given its value at a previous time point. These methods allow an approximation of the likelihood and have been used both in frequentist and in Bayesian inference, in Multiple Iterated Filtering, and in particle MCMC. Methods based on summaries such as Approximate Bayesian Computation (ABC) also come under this framework.Because of the important role that these methods have played in statistical methodology, we have devoted a day exclusively to issues involving them. Particular challenges here involve - Fast approximate conditional simulation methods for computationally intensive processes. This includes tau-leaping methods along with diffusion approximations the linear noise approximation. - Efficient Markov Chain Monte Carlo schemes for SMC-based methods in particle MCMC. - Sequetial online methods for real-time monitoring, especially of epidemic processes. - The selection of summaries in ABC and efficient Bayesian inference.DAY 5: Roundtable discussionsWe will devote the final day to a series of round-table discussions. These will focus on the needs of substantive areas within Biology and Medicine in terms of methodological and theoretical development. Our intention is that these will assist collaboration between statisticians and those in applied fields as well as help to both increase the penetration of statistical methods into Biology and Medicine and focus future methodological and theoretical research.