Frontiers in Functional Data Analysis (15w5096)
Debashis Paul (University of California, Davis)
Surajit Ray (University of Glasgow)
David Ruppert (Cornell University)
The main objective of this workshop is to bring together people who are practitioners of functional data analysis working with data having complex characteristics, including spatial dependence structure, intrinsic geometrical constraints, missingness, lack of coherence, etc, together with leading theoreticians in the field to help enhance scientific investigations that can benefit from a population based analysis of such complex objects. Development of statistical theory for functional data with such characteristics has not kept pace with the deluge of data and methodological developments have often lacked the incorporation of insight that come from the experts in the source fields. This underlines a genuine need for a platform where people with their varied skills and knowledge can exchange ideas and have an open discussion about problems on this new frontier of functional data analysis. This workshop is expected to serve this role and act as a launching pad for successful scientific collaborations among statisticians, neuroscientists, image analysts and other researchers working on functional data problems.
Involvement of young researchers is an essential requirement for moving this rapidly evolving field forward. Among our potential participants, about one-fifth are young researchers at the beginning of a promising career and a quarter are female. The workshop will feature overview of current computational and analytical techniques for dealing with functional data and expository lectures on functional data with special geometric features. Several prominent researchers in this area from all over the world will lead discussions and deliver lectures or tutorials. We hope that this workshop will provide a great opportunity for young researchers, through training and learning and building new collaborations. The workshop may be able to chart the research focus in functional data analysis for years to come.
DAY 1: Dependent functional data and their applications
DAY 2: Complex functional data objects
DAY 3: Registration and dynamics of functional data
DAY 3: Inferential problems in functional data analysis
DAY 5: Round table discussions and collaborative activities
We will devote the final day to a series of round table discussions. Our intention is that these will assist collaboration between statisticians and those in applied fields. These discussions will focus on developing strategies for enhancing existing methods for data analysis, building new analytical frameworks for tackling FDA problems with complex data structure and constructing computational platforms for implementing the methodologies and integrating them with the data sources.
A brief description of the main themes is given below.
Dependent functional data and their applications:
Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which include:
- robust statistical models for incorporating temporal and spatial dependencies in functional data
- developing reliable prediction and interpolation techniques for dependent functional data
- developing inferential framework for testing hypotheses related to simplified dependent structures
- analyzing sparsely observed functional data by borrowing information from neighbors
- visualization of data summaries associated with dependent functional data
Complex functional data objects:
In recent years there has been a deluge of population level data arising in domains like biomedical imaging and neuroscience which require sophisticated analytical techniques. These types of data come from far ranging fields: images of the internal structures of a body provided by diagnostic medical scanners (e.g., angiographies, tomographies, magnetic resonance imaging devices); images of steady or moving objects recorded by computer vision devices (e.g., in automatic detection of objects from video recording); simultaneous measurements of gene expression levels from next generation sequencing techniques; multi-spectral data from satellite remote sensing, such as overhead geodetic collections used for geological and hydrological studies, or emission spectra of various chemicals for the analysis of chemical concentrations in the atmosphere. The samples often correspond to either different individuals and/or longitudinal measurements from the same individual. The analysis of these data poses new and challenging problems to modern statistics and requires an always stronger interplay of statistics with pure and applied math, computer sciences and engineering. Increasingly, researchers are adopting a functional data analysis viewpoint for dealing with such data. This viewpoint is helpful in giving a statistical description of the variations in shapes of internal organs, or the evolution and subject-to-subject variability in the pattern of structural and functional connectivity in the brain, or the variations in the thickness of blood vessels, etc. Such analyses are also useful in building predictive models for health disorders such as organ failures and neurodegenerative diseases.
In spite of the recent progresses in the statistical analysis of such complex object data, there remain many outstanding challenges, both methodological and computational. These include:
- adopting appropriate geometric framework or coordinate system in representing functional object data. - registration of objects across different time points (for longitudinal studies) and/or subjects. - obtaining informative summary statistics that respect the geometry of the object space. - fitting statistical models linking clinical findings and covariates with the objects. - inferential questions associated with statistical models including data summary, model selection, etc.
Registration and dynamics of functional data:
Feature extraction from functional data has been one of the principal domains of FDA. This problem has mostly been addressed by using the dimension reduction framework through the use of principal component analysis, independent component analysis or canonical correlation analysis. These approaches are typically successful in settings where the functions are measured on the same domain and the main features are roughly aligned in terms of their time or spatial location of occurrence. However, the presence of intrinsic phase variations, which is common in many biological processes like growth, progress of disease, etc., can reduce the effectiveness of the aforementioned dimension reduction tools by introducing additional geometric features in the observed data. Approaches for incorporating such phase variations historically focused on formulating parametric or nonparametric registration or curve alignment schemes. More recently, approaches based on dynamical systems with random effects have also been employed to address this problem. The latter approach has an advantage of providing a mechanistic description of the data. Both these viewpoints have had successes in extracting features from functional data observed on a time domain. But for data types such as images of anatomical objects, data obtained by various neuroimaging techniques, etc., the domain of the functions is either 2-dimensional or 3-dimensional space, or some ``artificial'' coordinate system (like medial line or plane for anatomical objects). The analysis of such data using appropriate schemes that account for phase variations pose new challenges that can be broadly categorized as follows:
- mathematical descriptions of registration schemes that are identifiable and computable. - description of functional data on spatial domains through the use of differential equations with stochastic components. - incorporation of the geometry of the space in which the functions take values while formulating registration schemes or differential equation-based models. - development of stable computational schemes for estimation of model parameters. - development of nonparametric inferential schemes such as resampling procedures. - mathematical analysis of these schemes.
Inferential problems in functional data analysis:
Due to recent advances in functional data research in computational methods and availability of FDA software, functional data is now being widely used for modeling complex problems arising in a host of application areas. Nonetheless the follow up inference, especially for complex functional data objects, is often missing. As most classical inferential techniques are inappropriate for functional data, there is an immediate need to provide a robust inferential framework for functional data analysis techniques. Many researchers have made important contributions but they have all raised the alarm that there is a huge need for development of inferential techniques for existing functional data analysis approaches especially for data where the functions are observed temporally or spatially. Many more are hypothesised to arise with the use of FDA in other application areas. Some specific challenges in this area include developing methods for the following:
- inference on effects of covariates in functional data regression. - inference on number of clusters for functional clustering. - inference on functional time series, appropriateness of functional autoregressive models. - inference on magnitude and structure of spatial correlation, for dependent functional data. - formulation and inferential problems for dependent data.