# Schedule for: 21w5107 - Foundations of Objective Bayesian Methodology

Beginning on Sunday, November 28 and ending Friday December 3, 2021

All times in Oaxaca, Mexico time, CST (UTC-6).

Sunday, November 28 | |
---|---|

14:00 - 23:59 | Check-in begins (Front desk at your assigned hotel) |

19:30 - 22:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

20:30 - 21:30 | Informal gathering (Hotel Hacienda Los Laureles) |

Monday, November 29 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

08:45 - 09:00 | Introduction and Welcome (Conference Room San Felipe) |

09:00 - 09:45 |
Igor Pruenster: Nonparametric priors for partially exchangeable data: dependence structure and borrowing of information ↓ Partial exchangeability is the ideal probabilistic framework for analyzing data from different, though related, sources. The implications on the induced dependence structure and borrowing of information across groups are explored. These findings inspire a new general class of nonparametric priors, termed multivariate species sampling models, which is characterized by its partially exchangeable partition probability function. This class encompasses several popular dependent nonparametric priors and has the merit of highlighting their core distributional properties. (Conference room) |

09:45 - 10:30 |
Beatrice Franzolini: Nonparametric priors with full-range borrowing of information ↓ When data are grouped into distinct samples, they typically are homogeneous within and heterogeneous across groups. In this case, the Bayesian paradigm requires a prior law over a collection of distributions. From a modelling point of view, it is essential to study how this structure reflects on the observables, especially in nonparametric models. We introduce the notion of hyper-ties and show that they play the same role of actual ties in the exchangeable setting, driving the dependence between observations. Using hyper-ties, we can compute correlation between observables and show how its sign depends from the joint specification. Finally, we propose a novel class of dependent nonparametric priors, which may induce either positive or negative correlation across samples.
" (Zoom) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Marta Catalano: A Wasserstein index of dependence for Bayesian nonparametric modeling ↓ Optimal transport (OT) methods and Wasserstein distances are flourishing in many scientific fields as an effective means for comparing and connecting different random structures. In this talk we describe the first use of an OT distance between Lévy measures with infinite mass to solve a statistical problem. Complex phenomena often yield data from different but related sources, which are ideally suited to Bayesian modeling because of its inherent borrowing of information. In a nonparametric setting, this is regulated by the dependence between random measures: we derive a general Wasserstein index for a principled quantification of the dependence gaining insight into the models’ deep structure. It also allows for an informed prior elicitation and provides a fair ground for model comparison. Our analysis unravels many key properties of the OT distance between Lévy measures, whose interest goes beyond Bayesian statistics, spanning to the theory of partial differential equations and of Lévy processes. (Conference room) |

11:45 - 12:30 |
Isadora Antoniano-Villalobos: Bayesian mixture models for the prediction of extreme observations ↓ In many applications with interest in large or extreme observations, usual inferential methods may fail to reproduce the tail behaviour of the variables involved. Recent literature has proposed the use of multivariate extreme value theory to predict an unobserved component of a random vector given large observed values of the rest. This is achieved through the estimation of the angular measure controlling the dependence structure in the tail of the distribution. The idea can be extended and used for prediction of multiple components at adequately large levels, provided the model used for the angular measure is sufficiently flexible enough to capture complex dependence structures. The use of Bernstein polynomials ensures such flexibility and their interpretation as mixture models allows the use of current trans-dimensional MCMC posterior simulation methods for inference. (Zoom) |

12:30 - 12:40 | Group Photo (Zoom/Hotel hacienda los laureles) |

12:40 - 13:30 | Group work and informal discussions (Conference room, Break out rooms 1 & 2) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 15:45 |
Julyan Arbel: Improving MCMC convergence diagnostic with a local version of R-hat ↓ Diagnosing convergence of Markov chain Monte Carlo (MCMC) is crucial in Bayesian analysis. Among the most popular methods, the potential scale reduction factor (commonly named R-hat) is an indicator that monitors the convergence of all chains to the stationary distribution, based on a comparison of the between- and within-variance of the chains. Several improvements have been suggested since its introduction by Gelman & Rubin (1992). Here, we analyse some properties of the theoretical value R associated to R-hat in the case of a localized version that focuses on quantiles of the distribution. This leads to proposing a new indicator, which is shown to allow both for localizing the MCMC convergence in different quantiles of the distribution, and at the same time for handling some convergence issues not detected by other R-hat versions. (Conference room) |

15:45 - 16:30 |
Trevor Campbell: Parallel Tempering on Optimized Paths ↓ Parallel tempering (PT) is a class of Markov chain Monte Carlo algorithms that constructs a path of distributions annealing between a tractable reference and an intractable target, and then interchanges states along the path to improve mixing in the target. The performance of PT depends on how quickly a sample from the reference distribution makes its way to the target, which in turn depends on the particular path of annealing distributions. However, past work on PT has used only simple paths constructed from convex combinations of the reference and target log-densities. In this talk I'll show that this path performs poorly in the common setting where the reference and target are nearly mutually singular. To address this issue, I'll present an extension of the PT framework to general families of paths, formulate the choice of path as an optimization problem that admits tractable gradient estimates, and present a flexible new family of spline interpolation paths for use in practice. Theoretical and empirical results will demonstrate that the proposed methodology breaks previously-established upper performance limits for traditional paths. (Zoom) |

16:30 - 17:00 | Coffee Break (Conference Room San Felipe) |

17:00 - 17:45 |
María Fernanda Gil Leyva Villa: Gibbs sampling for mixtures in order of appearance: the ordered allocation sampler ↓ Gibbs sampling methods for mixture models are based on data augmentation schemes that account for the unobserved partition in the data. They have been broadly classified into two categories: marginal and conditional samplers. Marginal samplers are termed this way because they integrate out part of the mixing distribution and model directly the partition structure. They can be used to implement mixture models with a tractable exchangeable partition probability function (EPPF) associated to the mixing distribution. However, if the EPPF is not available in closed form, marginal samplers are hard to adapt. In contrast, conditional samplers rely on allocation variables that identify each observation with a mixture component. While conditional samplers are more broadly applicable and allow direct inference on the mixing distribution, they are known to suffer from slow mixing. Moreover, for mixtures models with infinitely many components some form of truncation, either deterministic or random, is required. As for mixtures with a random number of components, the exploration of parameter spaces of different dimensions can also be challenging. We tackle these issues by expressing the mixture components in the random order of appearance in an exchangeable sequence directed by the mixing distribution. We derive a sampler, called the ordered allocation sampler, that is straightforward to implement for mixing distributions with tractable size-biased ordered weights. In infinite mixtures, no form of truncation is necessary. As for finite mixtures with random dimension, a simple updating of the number of components is obtained by a blocking argument, thus easing challenges found in trans-dimensional moves via Metropolis-Hasting steps. Although the ordered allocation sampler is a conditional sampler, sampling occurs in the space of ordered partitions with blocks labelled in the least element order. This improves mixing and promotes a consistent labelling of mixture components throughout iterations. (Zoom) |

17:45 - 18:30 |
Anirban Bhattacharya: Coupling-based convergence assessment of some Gibbs samplers for high-dimensional Bayesian regression with shrinkage priors ↓ We consider Markov chain Monte Carlo (MCMC) algorithms for Bayesian high-dimensional regression with continuous shrinkage priors. A common challenge with these algorithms is the choice of the number of iterations to perform. This is critical when each iteration is expensive, as is the case when dealing with modern data sets, such as genome-wide association studies with thousands of rows and up to hundreds of thousands of columns. We develop coupling techniques tailored to the setting of high-dimensional regression with shrinkage priors, which enable practical, non-asymptotic diagnostics of convergence without relying on traceplots or long-run asymptotics. By establishing geometric drift and minorization conditions for the algorithm under consideration, we prove that the proposed couplings have finite expected meeting time. Focusing on a class of shrinkage priors which includes the 'Horseshoe', we empirically demonstrate the scalability of the proposed couplings. A highlight of our findings is that less than 1000 iterations can be enough for a Gibbs sampler to reach stationarity in a regression on 100,000 covariates. The numerical results also illustrate the impact of the prior on the computational efficiency of the coupling, and suggest the use of priors where the local precisions are Half-t distributed with degree of freedom larger than one. (Joint work with Niloy Biswas, Pierre Jacob, and James Johndrow) (Zoom) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Tuesday, November 30 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:45 |
Helen Ogden: Approximate cross validation for mixture models ↓ Choosing appropriate priors and hyperparameters to control the number of components used by a mixture model is often challenging: it is typically hard to interpret such parameters directly, which makes it difficult to use subjective prior knowledge. I will focus instead on how to choose these quantities to give a model with good frequentist properties. In principle, models could be assessed by cross validation, but in practice direct calculation of a cross validation criterion is computationally expensive and numerically unstable. I will discuss methods for approximating cross validation criteria for mixture models, which aim to address both of these issues. (Zoom) |

09:45 - 10:30 |
Alexander Ly: Default Bayes Factors for Testing the (In)equality of Several Population Variances ↓ The goal of this presentation is to elaborate on the notion of objectivity Bayesian tests. Concretely, I’ll discuss Harold Jeffreys’s desiderata for objective Bayes factors that were formalised by Bayarri, Berger, Forte and García-Donato (2012) within the context of testing the (in)equality of several population variances. I’ll also put forth the desideratum of across-sample consistency for K-sample problems, and show that for this problem, such an objective Bayes factor adhering to all these desiderata (1) exists, (2) is easily calculable, and (3) has good frequentist properties. If time allows, I’ll also discuss the sequential properties of the resulting Bayes factor. (Zoom) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Luis E. Nieto-Barajas: Characterizing variation of nonparametric random probability measures using the Kullback–Leibler divergence ↓ This work characterizes the dispersion of some popular random probability measures, including the bootstrap, the Bayesian bootstrap, and the Pólya tree prior. This dispersion is measured in terms of the variation of the Kullback–Leibler divergence of a random draw from the process to that of its baseline centring measure. By providing a quantitative expression of this dispersion around the baseline distribution, our work provides insight for comparing different parameterizations of the models and for the setting of prior parameters in applied Bayesian settings. This highlights some limitations of the existing canonical choice of parameter settings in the Pólya tree process. (Conference room) |

11:45 - 12:30 |
Chris Holmes: Predictive Inference: a view towards objectivity ↓ We revisit the predictive approach to Bayesian statistics, advocated by Geisser and others, as a framework to facilitate objective inference. We explore the predictive viewpoint of Bayesian nonparametric learning as a means to improve robustness in M-open and we point to future research directions. (Zoom) |

12:30 - 13:15 |
Diana Cai: Finite mixtures are typically inconsistent for the number of components ↓ Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. A common suggestion is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM component-count posterior is consistent; that is, the posterior concentrates on the true, generating number of components. But consistency requires the assumption that the component likelihoods are perfectly specified, which is unrealistic in practice. In this paper, we add rigor to data-analysis folk wisdom by proving that under even he slightest model misspecification, the FMM component-count posterior \emph{diverges}: the posterior probability of any particular finite number of components converges to 0 in the limit of infinite data. Contrary to intuition, posterior-density consistency is not sufficient to establish this result. We develop novel sufficient conditions that are more realistic and easily checkable than those common in the asymptotics literature. We illustrate practical consequences of our theory on simulated and real data. (Joint work with Trevor Campbell and Tamara Broderick.) (Zoom) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 15:45 |
Judith Rousseau: Using cut posterior in semi parametric inference with applications to semiparametric and nonparametric Bayesian inference in hidden Markov models ↓ If the theory of Bayesian approaches in standard nonparametric or high dimensional models is beginning to be well developed, not so much is known in the context of semi-parametric models outside very specific priors and models. We propose in this talk a pseudo Bayesian approach, based on the cut posterior which allows for the construction of a distribution on the whole parameter and is constructed such that the marginal posterior on the parameter of interest has optimal properties. We apply this approach to the setup of nonparametric hidden Markov models with finite state space and nonparametric emission distributions. Since the seminal paper of Gassiat et al. (2016), it is known that in such models the transition matrix $Q$ and the emission distributions $F_1, · · · , F_K$ are identifiable, up to label switching. We a cut posterior to simultaneously estimate $Q$ at the rate $\sqrt{n}$ and the emission distributions at the usual nonparametric rates. To do so, we first consider a prior $\pi_1$ on $Q$ and $F_1, · · · , F_K$ which leads to a posterior marginal distribution on $Q$ which verifies the Bernstein von mises property and thus to an estimator of $Q$ which is efficient. We then combine the marginal posterior on $Q$ with an other posterior distribution on the emission distributions, following the cut-posterior approach, to obtain a posterior which also concentrates around the emission distributions at the minimax rates. In addition an important intermediate result of our work is an inversion inequality which allows to upper bound the $L_1$ norms between the emission densities by the $L_1$ norms between marginal densities of 3 consecutive observations. (Conference room) |

15:45 - 16:30 |
Sinead Williamson: Distributed, partially collapsed MCMC for Bayesian nonparametrics ↓ Bayesian nonparametric (BNP) models provide elegant methods for discovering underlying latent features within a data set, but inference in such models can be slow. We exploit the fact that completely random measures, which commonly-used models like the Dirichlet process and the beta-Bernoulli process can be expressed using, are decomposable into independent sub-measures. We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components. We then select different inference algorithms for the two components: uncollapsed samplers mix well on the finite measure, while collapsed samplers mix well on the infinite, sparsely occupied tail. The resulting hybrid algorithm can be applied to a wide class of models, and can be easily distributed to allow scalable inference without sacrificing asymptotic convergence guarantees.
Joint work w/ Kumar Avinava Dubey, Michael Zhang, & Eric Xing (Zoom) |

16:30 - 17:00 | Coffee Break (Conference Room San Felipe) |

17:00 - 17:45 |
Michele Guindani: A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data ↓ The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this talk, we propose a nested Common Atoms Model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. If time allows, we will also discuss an application to the analysis of time series calcium imaging experiments in awake behaving animals. We further investigate the performance of our model in capturing true distributional structures in the population by means of simulation studies. (Conference room) |

17:45 - 18:30 |
Giovanni Rebaudo: Graph-Aligned Random Partition Model ↓ Bayesian nonparametric mixtures and random partition models are eﬀective tools to perform probabilistic clustering. However, standard independent mixture models can be restrictive in some applications such as inference on cell-lineage due to the biological relations of the clusters. The increasing availability of large genomics data and studies require new statistical tolls to perform model-based clustering and infer the relationship between the homogeneous subgroups of units. Motivated by single-cell RNA applications we develop a novel dependent mixture model to jointly perform cluster analysis and align the cluster on a graph. Our flexible graph-aligned random partition model (gRPM) cleverly exploits Gibbs -type priors as building blocks allowing us to derive analytical results on the probability mass function of the random partition. From the pmf of the random partition, we derive a generalization of the well-known Chinese restaurant process and a related eﬃcient MCMC algorithm to perform Bayesian inference. We perform posterior inference on real single-cell RNA data from mice stem cells. We further investigate the performance of our model in capturing underlying clustering structure as well as the underlying graph by means of a simulation study. (Conference room) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Wednesday, December 1 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:45 |
David Rossell: Confounder importance learning for treatment effect inference ↓ An important basic problem is to estimate the association of a set of covariates of interest (treatments) while accounting for many potential confounders. It has been shown that standard high-dimensional Bayesian and penalized likelihood methods perform poorly in practice. The sparsity embedded in such methods leads to low power when there are strong correlations between treatments and confounders, or between confoundres, which causes an under-selection (or omitted variable) bias. Current solutions encourage the inclusion of confounders to increase power, but as we show this can lead to serious over-selection problems. To address these issues, we propose an empirical Bayes framework to learn what confounders should be encouraged (or disencouraged) to feature in the regression. We develop exact computations and a faster expectation-propagation strategy for the family of exponential regression models. We illustrate the applied impact of these issues to study the association between salary and potentially discriminatory factors such as gender, race and place of birth. (Zoom) |

09:45 - 10:30 |
Veronika Rockova: Metropolis-Hastings via Classification ↓ This paper develops a Bayesian computational platform at the interface between posterior sampling and optimization in models whose marginal likelihoods are difficult to evaluate. Inspired by contrastive learning and Generative Adversarial Networks (GAN), we reframe the likelihood function estimation problem as a classification problem. Pitting a Generator, who simulates fake data, against a Classifier, who tries to distinguish them from the real data, one obtains likelihood (ratio) estimators which can be plugged into the Metropolis-Hastings algorithm. The resulting Markov chains generate, at a steady state, samples from an approximate posterior whose asymptotic properties we characterize. Drawing upon connections with empirical Bayes and Bayesian mis-specification, we quantify the convergence rate in terms of the contraction speed of the actual posterior and the convergence rate of the Classifier. Asymptotic normality results are also provided which justify the inferential potential of our approach. We illustrate the usefulness of our approach on examples which have challenged for existing Bayesian likelihood-free approaches. (Zoom) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Jack Jewson: General Bayesian Loss Function Selection and the use of Improper Models ↓ Statisticians often face the choice between using probability models or a paradigm defined by minimising a loss function. Both approaches are useful and, if the loss can be re-cast into a proper probability model, there are many tools to decide which model or loss is more appropriate for the observed data, in the sense of explaining
the data’s nature. However, when the loss leads to an improper model, there are no principled ways to guide this choice. We address this task by combining the Hyvarinen score, which naturally targets infinitesimal relative probabilities, and general Bayesian updating, which provides a unifying framework for inference on losses and models. Specifically we propose the H-score, a general Bayesian selection criterion and prove that it consistently selects the (possibly improper) model closest to
the data-generating truth in Fisher’s divergence. We also prove that an associated H-posterior consistently learns optimal hyper-parameters featuring in loss functions, including a challenging tempering parameter in generalised Bayesian inference. As salient examples, we consider robust regression and non-parametric density estimation where popular loss functions define improper models for the data and hence cannot be dealt with using standard model selection tools. These examples illustrate advantages in robustness-efficiency tradeoffs and provide a Bayesian implementation for kernel density estimation, opening a new avenue for Bayesian non-parametrics. (Zoom) |

11:45 - 12:30 |
Rajesh Ranganath: Where did my Bayes Go? ↓ I've spent time working on Bayesian methods, especially scalable computation. However, my recent work has developed algorithms tailored to problems in healthcare that do not easily translate to standard Bayesian computation. In this talk, I will highlight two such methods, one for survival analysis based on multiplayer games and another for building predictive models in the presence of spurious correlations. At the end, I'll highlight thoughts on how Bayesian analysis might play a role in these problems. (Zoom) |

12:30 - 13:00 | Informal discussion - questions of the day (Zoom) |

13:00 - 14:30 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 19:00 | Free Afternoon (Oaxaca) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Thursday, December 2 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:45 |
Noirrit Chandra: Bayesian Scalable Precision Factor Analysis for Massive Sparse Gaussian Graphical Models ↓ "We propose a novel approach to estimating the precision matrix of multivariate Gaussian data that relies on decomposing them into a low-rank and a diagonal component. Such decompositions are very popular for modeling large covariance matrices as they admit a latent factor based representation that allows easy inference. The same is however not true for precision matrices due to the lack of computationally convenient representations which restricts inference to low-to-moderate dimensional problems. We address this remarkable gap in the literature by building on a latent variable representation for such decomposition for precision matrices. The construction leads to an efficient Gibbs sampler that scales very well to high-dimensional problems far beyond the limits of the current state-of-the-art. The ability to efficiently explore the full posterior space also allows the model uncertainty to be easily assessed. The decomposition crucially additionally allows us to adapt sparsity inducing priors to shrink the insignificant entries of the precision matrix toward zero, making the approach adaptable to high-dimensional small-sample-size sparse settings. Exact zeros in the matrix encoding the underlying conditional independence graph are then determined via a novel posterior false discovery rate control procedure. A near minimax optimal posterior concentration rate for estimating precision matrices is attained by our method under mild regularity assumptions.
We evaluate the method's empirical performance through synthetic experiments and illustrate its practical utility in data sets from two different application domains. (Conference room) |

09:45 - 10:30 |
Daniele Durante: Advances in Bayesian inference for regression models with binary, categorical and partially-discretized data ↓ A broad class of models that routinely appear in several fields of application can be expressed as partially or fully discretized Gaussian linear regressions. Besides including the classical Gaussian response setting, this class crucially encompasses probit, multinomial probit and tobit models, among others, and further includes key extensions to dynamic, skewed and multivariate contexts. The relevance of such representations has motivated decades of research in the Bayesian field. The main reason for this active interest is that, unlike for the Gaussian response setting, the posterior distribution induced by these models does not apparently belong to a known and tractable class, under the commonly-assumed Gaussian priors. This has motivated the development of several alternative solutions for posterior inference relying either on sampling-based strategies or on deterministic approximations, which, however, still experience scalability, mixing and accuracy issues, especially in high dimension. The scope of this talk is to review, unify and extend recent advances in Bayesian inference and computation for such a class of models. To address this goal, I will prove that the likelihoods induced by all these formulations crucially share a common analytical structure which implies conjugacy with a broad class of distributions, namely the unified skew-normals (SUN), that generalize multivariate Gaussians to skewed contexts, and include these variables as a special case. This result unifies and extends recent conjugacy properties for specific models within the class analyzed, and opens new avenues for improved posterior inference, under a broader class of core formulations and prior distributions, via novel closed-form expressions, tractable Monte Carlo methods based on independent and identically distributed samples from the exact SUN posteriors, and more accurate and scalable approximations from variational Bayes and expectation-propagation. These advantages are illustrated in extensive simulation studies and applications, and are expected to boost the routine-use of these such core Bayesian models, while providing a novel framework for studying general theoretical properties and developing future extensions. (Zoom) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Filippo Ascolani: Trees of random probability measures and Bayesian nonparametric modelling ↓ We introduce a way to generate trees of random probability measures, where the link between two nodes is given by a hierarchical procedure: starting from a common root, each node of the tree is endowed with a random probability measure, whose baseline distribution is again random and given by the associated node in the previous layer. The data can be observed at any node of the tree and different branches may have different length: the split mechanism can be also considered random or based on covariates of interest. When the branches have the same length and the observations are linked only to the leaves, we recover the well known family of discrete hierarchical processes We prove that, if the distribution at each node is given by the normalization of a completely random measure (NRMI), the model is analytically tractable: conditional on a suitable latent structure, the posterior is still given by a deep NRMI. Furthermore, the asymptotic behaviour of the number of clusters is derived, when either the sample size at a particular layer diverges or the number of levels grows. Finally, the extension to kernel mixtures is discussed. (Zoom) |

11:45 - 12:30 |
Yang Ni: Individualized Causal Discovery with Latent Trajectory Embedded Bayesian Networks ↓ Bayesian networks have been widely used for generating causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated causing significant biases when the underlying population is inherently heterogeneous. To explicitly account for the heterogeneity, we propose a novel Bayesian network model, termed BN-LTE, that embeds the heterogeneous data onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from population level to observation level (individualized causal models). Moreover, while Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under common causal assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in discovering causal relationships as well as inferring observation-specific gene regulatory networks from observational data. (Zoom) |

12:30 - 13:30 | Group work and informal discussions (Conference room, Break out rooms 1 & 2) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 15:45 |
José Antonio Perusquía: A Bayesian Approach to Anomaly Detection in Computer Systems: A Review ↓ Computer systems are vast, complex and dynamic objects that have become crucial in modern life. To ensure their correct performance, there is a need to efficiently detect vulnerabilities and anomalies that could shut them down with potentially catastrophic consequences. Nowadays, there exist a wide number of classical and machine learning models used for such an important task. However, these approaches lack the flexibility and the inherent probabilistic characterisation of uncertainty that Bayesian statistics offer. That is why, in recent years Bayesian anomaly detection models applied specifically to computer systems have gained considerable attention, in particular in the field of cyber security. That is why in this talk we centre our attention on how these models have been used, the specific challenges and interesting areas of opportunity. (Zoom) |

15:45 - 16:30 |
Katherine Heller: Towards Trustworthy Machine Learning in Medicine and the Role of Uncertainty ↓ As ML is increasingly used in society, we need methods that we have confidence that we can rely on, particularly in the medical domain. In this talk I discuss 3 pieces of work, the role uncertainty plays in understanding and combating issues with generalization and bias, and particular mitigations that we can take into consideration.
1) Sepsis Watch - I present a Gaussian Process (GP) + Recurrent Neural Network (RNN) model for predicting sepsis infections in Emergency Department patients. I will discuss the benefit of uncertainty given by the GP. I will then discuss the social context in introducing such a system into a hospital setting.
2) Uncertainty and Electronic Health Records (EHR) - I will discuss Bayesian RNN models developed for mortality prediction, and the distinction between population level predictive performance and individual level predictive performance, and its implications for bias.
3) Underspecification and the credibility implications of hyperparameter choices in ML models -- I will discuss medical imaging applications and how using the uncertainty of model performance conditioned on choice of hyperparameters can help identify situations in which methods may not generalize well outside the training domain. (Conference room) |

16:30 - 17:00 | Coffee Break (Conference Room San Felipe) |

17:00 - 17:45 |
Mengyang Gu: Marginalization of latent variables for correlated data ↓ We will discuss marginalization of latent variables for correlated outcomes, such as multiple time series, spatio-temporal processes, and computer simulations. We first review the Kalman filter and its connection to Gaussian processes with Matern covariance. Then we discuss vector regressive models, linear models of coregionalization, and their connections to Gaussian processes with product covariance. We show marginalizing correlated latent variables leads to efficient estimation of model parameters and predictions. As an example, we will introduce generalized probabilistic principal component analysis (GPPCA) to study the latent factor model for multiple correlated outcomes. Our method generalizes the previous probabilistic formulation of principal component analysis (PPCA) by providing the closed-form maximum marginal likelihood estimator of the factor loadings and other parameters, where each factor is modeled by a Gaussian process. Lastly we will introduce efficient representation of Gaussian processes with product Matern covariance and its applications on emulating massive computer simulations. We will present numerical studies of simulated and real data that confirms good predictive accuracy and computational efficiency of proposed approaches. (Zoom) |

17:45 - 18:30 |
Alan Riva-Palacio: Bayesian analysis of vectors of subordinators ↓ Non-decreasing additive processes, also called subordinators, have many applications throughout mathematical modeling; for instance, they have been quite used in risk and finance. Well known examples of subordinators are the stable, gamma and compound Poisson processes with positive jumps. Extension to a multivariate setting for studying heterogeneous data by considering vectors of subordinators can be performed and has been studied in a frequentist setting. In this talk we will discuss the challenges for the Bayesian analysis of models based on such vectors of subordinators. (Conference room) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Friday, December 3 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 10:30 | Group work and informal discussions (Conference room, Break out rooms 1 & 2) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 13:00 | Group work and informal discussions (Conference room, Break out rooms 1 & 2) |

13:00 - 14:30 | Lunch (Restaurant Hotel Hacienda Los Laureles) |