Distributed Data for Dynamics and Manifolds (17w5070)

Arriving in Oaxaca, Mexico Sunday, September 3 and departing Friday September 8, 2017

Organizers

(Simon Fraser University)

(Cornell University)

(McGill University)

(Politecnico di Milano, Italy)

Fang Yao (University of Toronto)

Objectives

Modern scientific instruments have provided unprecedented data on a large range of phenomena; from tracking cell motion, and even current over cell membranes to remote sensing measurements of atmospheric processes. The complex structures and high-dimensionality make traditional statistical methods ineffective and sometimes even inappropriate. These systems, and many more (electrical potentials on the surface of organs, brain activity measurements, epidemics and animal movement models to name a few) all have in common that they represent data distributed over time on geometrically complex domains (the earth's surface, cell or organ surfaces, animal ranges) and models for these data must account for spatial dependence over these domains as well as describing the temporal evolution for these systems.

In applied mathematics, models for such systems generally come from partial differential equations (PDEs) and stochastic versions of these. While this field has a long history, solving non-linear systems, particular on complex domains, is difficult, generally relying on computationally intensive numerical tools, with finite element methods playing a particularly important role. However, there has been very little work on incorporating modern data and statistical methods with these models, beyond the data assimilation approaches used in weather prediction and other models. In particular, our research program will focus on larger statistical problems within these models, quantifying the uncertainty in the estimation, assessing and improving model fit in the light of data.

In aiming at developing new tools for performing parameter estimation and statistical inference for complex dynamical models on non-planar domains, we will bring together experts from several fields. Our goals in this will be focused on the development of computationally practical, easy to access (both in software and understanding), mathematically sound methods for large data sets. We have added deliberately unorthodox elements to the workshop structure with this aim in mind.

The methods needed to achieve these goals vary with the type of dynamical system (ODEs, PDEs, stochastic models etc) and with the task at hand and direct approaches can be computationally challenging and provide poor results in systems that are not specified exactly correctly. A particular set of tools derived from functional data analysis (FDA) has emerged with particular promise in providing flexible and practical analysis of these systems. FDA is especially appropriate when data can be viewed as arising from functions of continuous variables encountered in problems with temporal evolution and/or spatial adaptation. While most research in FDA has focused on data distributed over simple geometric domains, such as intervals or rectangles, the complex data schemes and model systems aimed for here will advance the development of new methods to accommodate, for example, spatial dependence over nontrivial manifold structures.

The publication of Ramsay, Hooker, Campbell and Cao (2007) revealed a framework for parameter estimation for dynamical systems in which statistical notions of smoothing were combined with differential equation models by defining an estimated trajectory that trades-off fit to observed data with agreement with a proposed differential equation model. This approach can then be used to estimate parameters and assess fit with the added benefit of avoiding the need to explicitly solve an ODE. A book arising from this work along with related methods is well advanced, as are a number of published papers.

While these methods are well developed for ODE models, extensions of them to PDEs and complex domains remain challenging. Here some work has been doing on the use of finite element methods along with the smoothing penalty for non-parametric function estimation (Ramsay, 2002) and extensions of these to complex spatial regions (Sangalli, Ramsay and Ramsay, 2013); they also have a long history in inverse problems (Cotter, Dashti and Stuart, 2010). These methods are not well known to statisticians and will be developed in a two-day tutorial to be held prior to the workshop, see below.

In addition to the use of smoothing methods and functional data analysis, simulation-based methods for the statistical problems of calibrating dynamic models have seen significant development. Methods based on particle filtering, approximate Bayesian computation, numerical solutions to dynamical systems and approximation methods for stochastic systems have all be explored.

This workshop will explore these tools along with others from data assimilation, filtering methods, and inverse problems. We will invite a rich cross-section of the communities involved in these endeavors with the intention of promoting new intersections between these methods. The long-term objective of our research program is to fill in the gap between modeling complex systems and real data by developing solid statistical methods, equipped with user-friendly software, that provide accurate and robust parameter estimates for DE models from real data.

Specifically, we propose two phases to the BIRS workshop:

1. An open 2-day tutorial workshop available to anyone. We propose beginning the program with a tutorial workshop on finite element methods for both solving PDEs and for smoothing. This will ideally take place the weekend immediately prior to the BIRS 5-day workshop and will be primarily intended for a statistical audience. We would like to make use of a BIRS 2-day workshop to hold this. If BIRS would not be in a position to support this tutorial, we would turn to the University of Calgary to host the event with participants filling the funding gap between other funding and what would be required. In this tutorial workshop, the aspects of dynamical systems and spatial and temporal-spatial needed for statistical research will be presented via lectures by the workshop organizers and invited presenters, and sample analysis and workshop exercises using Matlab and R will be provided.

2. Following this, a 5-day workshop will be scheduled around themes; with time devoted to new methods from PDEs, stochastic models of systems over shapes, statistics on manifolds, extensions of functional data analysis, and methods in dynamic systems. Three or four sample data sets from real applications will be provided in advance, preliminary analyses will be presented in the first day or two, and break-out groups will then take up for further exploration. Throughout the workshop will be supported by input from participants selected from relevant research and application areas. We will challenge these groups to provide an informal presentation of progress on the final day.

A keynote lecture is scheduled each day. The keynote speakers in each session will be given a set of challenge problems and encouraged to present methods to approach these. Throughout the workshop, groups of presentations will be followed by a general discussion section devoted to the challenge problems with the aim of combining and applying the newly-presented ideas. We hope that this will foster an organic process of cross-collaboration, facilitated by substantial breaks to allow smaller groups to informally develop and implement these ideas. The final day and a half will map out the use of resources for further communication and dissemination of research and the problems requiring urgent attention from the research community.

We have prepared some interesting applications with open problems and real data for the workshop participants to work on them in groups. Below are two examples:

1. FDA on Manifolds. In anesthesia depth analysis, the observed physiological time series signals, including electroencephalogram, electrocardiogram, etc, are functional data in nature. While the underlying differential equation guiding the brain activity is complicated and inaccessible to data analysts, by modelling the state space of the differential equation as a manifold on a metric space, we could combine the FDA techniques and the manifold learning techniques to study how the anesthetic agents influence the brain activity. For instance, the diffusion process guided by the Laplace-Beltrame operator allows one to quantify the dynamics. While this combination has led to a successful preliminary analysis, more statistical input is necessary to better quantify the system properties for the inference purpose. In addition to the anesthesia depth analysis problem, the intricate interplay between the point process model of the heart beat rate and the morphological characteristics of the electrocardiogram signal could also benefit from the FDA approaches. To better understand the pathological nature of the atrial fibrillation disease and predict the prognosis, it is essential to understand atrial activity. This requires separating atrial and ventrical activity recorded in the same electrocardiogram. Here we may hope to use differing expected properties of these signals to extract smooth ventrical activity from higher energy atrial signal.

2. Estimating SDE parameters from real data. The Cognitive Science Lab at SFU conducts experiments in which human subjects are asked to categorize a series of images based on criteria that are not initially specified to the subject. At the beginning of a trial, a subject must simply guess the correct category, but is given feedback on performance. Within an hour, most subjects learn the correct rule for categorization. The data collected includes the guesses of the subjects, the timing of the guesses, and eye tracking data, i.e. where on a video screen the subject’s gaze is fixated at every point in time. Prof. Paul Tuppper has constructed a complicated SDE model that is able to reproduce many features of this data set, while trying to maintain neuropsychological plausibility. This SDE model has tens of parameters, yielding new challenges in high dimensional modelling with dynamic data.