# Schedule for: 17w5093 - Optimal Transport meets Probability, Statistics and Machine Learning

Arriving in Oaxaca, Mexico on Sunday, April 30 and departing Friday May 5, 2017

Sunday, April 30 | |
---|---|

14:00 - 23:59 | Check-in begins (Front desk at your assigned hotel) |

19:30 - 22:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

20:30 - 21:30 |
Informal gathering ↓ A welcome drink will be served at the hotel. (Hotel Hacienda Los Laureles) |

Monday, May 1 | |
---|---|

07:30 - 08:45 | Breakfast (Restaurant at your assigned hotel) |

08:45 - 09:00 | Introduction and Welcome (Conference Room San Felipe) |

09:00 - 10:00 |
Dejan Slepcev: Consistency of objective functionals in semi-supervised learning ↓ We consider a regression problem of semi-supervised learning: given real-valued labels on a small subset of data recover the function on the whole data set while taking into account the information provided by a large number of unlabeled data points. Objective functionals modeling this regression problem involve terms rewarding the regularity of the function estimate while enforcing agreement with the labels provided. We will discuss and prove which of these functionals make sense when the number of data points goes to infinity. Furthermore we will discuss qualitative properties of function estimates that different regularizations lead to. In particular we will discuss regularizations motivated by p-Laplace equation and higher order fractional Laplacians.
The talk is based on joint work with Matthew Dunlop, Andrew Stuart, and Matthew Thorpe. (Conference Room San Felipe) |

10:00 - 10:30 |
Jörn Schrieber: Cost-Dependent Clustering: A General Multiscale Approach ↓ The recent past has seen the advent of a variety of multiscale methods to tackle large
discrete optimal transport problems. While many different algorithms have been proposed
to be used with a multiscale scheme, the scaled instances themselves are mostly generated
as simple coarsenings of the original instances.
Cost-dependent clustering is a new approach to construct small-scale approximations
of discrete optimal transport instances by coupling points with regard to similar cost
vectors. Using the solution to the clustered problem, we derive upper and lower bounds
on the optimal value of the original instance. The clustering approach is independent of
the structure of the underlying space and can be applied to general cost functions (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 12:00 |
Youssef Marzouk: Inference via low-dimensional couplings ↓ Integration against an intractable probability measure is among the fundamental challenges of statistical inference, particularly in the Bayesian setting. A principled approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Yet characterizing such a map---e.g., representing, constructing, and evaluating it---grows challenging in high dimensions.
We will present links between the conditional independence structure of the target measure and the existence of certain low-dimensional couplings, induced by transport maps that are sparse or decomposable. We also describe conditions, common in Bayesian inverse problems, under which transport maps have a particular low-rank structure. Our analysis not only facilitates the construction of couplings in high-dimensional settings, but also suggests new inference methodologies. For instance, in the context of nonlinear and non-Gaussian state space models, we describe new variational algorithms for filtering, smoothing, and online parameter estimation. These algorithms implicitly characterize---via a transport map---the full posterior distribution of the sequential inference problem using only local operations while avoiding importance sampling or resampling.
This is joint work with Alessio Spantini and Daniele Bigoni. (Conference Room San Felipe) |

12:00 - 13:00 |
Esteban Tabak: Explanation of variability through optimal transport ↓ An optimal-transport based methodology is proposed for the explanation of variability in data. The central idea is to estimate and simulate conditional distributions $\rho(x|z)$ by mapping them optimally to their Wasserstein barycenter $\mu(y)$, thus removing all $z$-dependence. The barycenter problem needs to be formulated and solved in a data-driven format, as the distributions are only known through samples.
A particular implementation is developed, ``attributable components’’, in which the maps are restricted to $z$-dependent, non-parametric rigid translations. It is shown that this proposal encompasses standard methodologies, such as least-square regression, k-means clustering and principal components, and extends them broadly to explain accurately and robustly variability driven by complex sets of explicit and latent covariates in a computationally effective way.
Applications are shown to climate science, medicine, risk estimation and variations of the Netflix problem. (Conference Room San Felipe) |

13:00 - 13:20 |
Elsa Cazelles: Regularization of Barycenters in the Wasserstein Space ↓ The concept of barycenter in the Wasserstein space corresponds to define a notion of Fréchet mean of a set of probability measures. However, depending on the data at hand, such barycenters may be irregular. We thus introduce a convex regularization of Wasserstein barycenters for random measures supported on $\mathbb{R}^d$. We prove the existence and uniqueness of such barycenters for a large class of regularizing functions. A stability result of regularized barycenters in terms of Bregman distance associated to the convex regularization term is also given. This allows to compare the case of data made of $n$ probability measures $\nu_1,\ldots,\nu_n$ with the more realistic setting where we have only access to a dataset of random variables $(X_{i,j})_{1\leq i\leq n; 1\leq j\leq p_i}$ organized in the form of $n$ experimental units such that $X_{i,1},\ldots,X_{i,p_i}$ are iid observations sampled from the measure $\nu_i$ for each $1\leq i\leq n$. We also analyse the convergence of the regularized empirical barycenter of a set o $n$ iid random probability measures towards its population counterpart, and we discuss its rate of convergence. This approach is shown to be appropriate for the statistical analysis of discrete or absolutely continuous random measures. (Conference Room San Felipe) |

13:20 - 13:30 | Group Photo (Hotel Hacienda Los Laureles) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 16:00 |
Jan Obloj: Martingale Optimal Transport: at the crossroad of mathematical finance, probability and optimal transport ↓ Transport (MOT) and its links with probability theory and mathematical finance as well as some of its optimal transport heritage.
From an OT perspective, MOT is a version the classical OT problem with a further constraint that the transport plan is a martingale. From a probabilistic perspective, thinking in continuous time, it is the problem of selection a solution to the Skorokhod embedding problem with an additional optimality property. Finally, from a financial perspective, it is the problem of computing range of no-arbitrage prices (primal) and robust hedging strategies which enforce these (dual), when given market quoted prices for co-maturing vanilla options (calls).
MOT offers an exciting interaction of the three fields. It brought tremendous new geometrical insights into structure of (optimal) Skorokhod embeddings and it links naturally with martingale inequalities. In exchange, probabilistic methods offer explicit transport plans and the problem, when compared with OT, features an intricate structure of polar sets.
In this talk we endeavour to present a panorama of the field with emphasis on some recent contributions. (Conference Room San Felipe) |

16:00 - 16:30 | Coffee Break (Conference Room San Felipe) |

16:30 - 17:00 |
Gaoyue Guo: Numerical computation of martingale optimal transport on real line ↓ We provide two numerical methods for solving the one dimensional
martingale optimal transport problem, based respectively on the primal and dual problems. The first scheme considers the approximation of marginal distributions, through which the primal problem reduces to a linear optimisation problem. The second one aims at solving a minimisation problem $\psi\mapsto J(\psi)$ over the space of continuous functions $\psi: R\to R$ with linear growth, where $J$ involves some concave envelope and can be computed numerically. The second approach allows not only to solve the martingale optimal transport problem, but also to yield a family of approximating dual optimisers. (Conference Room San Felipe) |

17:00 - 17:30 |
Young-Heon Kim: Optimal martingale transport ↓ Optimal martingale transport problem is a variant of optimal transport problem with the constraint that the transport satisfies a martingale condition. A natural question is to understand the structure of the martingale transport, regarding how it splits the mass. We will explain a joint work with Nassif Ghoussoub and Tongseok Lim. (Conference Room San Felipe) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Tuesday, May 2 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 10:00 |
Bruno Levy: Some algorithmic aspects of semi-discrete optimal transport. ↓ In semi-discrete optimal transport, a measure with a density is transported to a sum of Dirac masses. This setting
is very well adapted to a computer implementation, because the transport map is determined by a vector of parameters
(associated with each Dirac mass) that maximizes a convex function (Kantorovich dual). An efficient numerical solution mechanism
requires to carefully orchestrate the interplay of geometry and numerics. On geometry, I will present two algorithms to
efficiently compute Laguerre cells, one that uses arbitrary precision predicates, and one that uses standard double-precision
arithmetics. On numerical aspects, when implementing a Newton solver (a 3D version of [Kitagawa Merigot Thibert]), the main
difficulty is to assemble the Hessian of the objective function, a sparse matrix with a non-zero pattern that changes during the iterations.
By exploiting the relation between the Hessian and the 1-skeleton of the Laguerre diagram, it is possible to efficiently construct the Hessian.
The algorithm can be used to simulate incompressible Euler fluids [Merigot Gallouet].
I will also show applications of these algorithms in more general settings (transport between surfaces and approximation of general
Laguerre diagrams). (Conference Room San Felipe) |

10:00 - 10:30 |
Tryphon Georgiou: Optimal mass transport and density flows ↓ We will discuss certain new directions in the nexus of ideas that originate in Optimal Mass Transport (OMT) and the Schroedinger Bridge Problem (SBP). More specifically, we will discuss generalizations to the setting of matrix-valued and vector-valued distributions. Matrix-valued OMT in particular allows us to define a Wasserstein geometry on the space of density matrices of quantum mechanics and, as it turns out, the Lindblad equation of open quantum systems (quantum diffusion) turns out to be exactly the gradient flow of the von Neumann quantum entropy in this sense.
The talk is based on joint work with Yongxin Chen (MSKCC),
Michele Pavon (University of Padova), and Allen Tannenbaum (Stony Brook). (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 12:00 |
Bertram Düring: Lagrangian schemes for Wasserstein gradient flows ↓ A wide range of diffusion equations can be interpreted as gradient flow with respect to Wasserstein distance of an energy functional. Examples include the heat equation, the porous medium equation, and the fourth-order Derrida-Lebowitz-Speer-Spohn equation. When it comes to solving equations of gradient flow type numerically, schemes that respect the equation's special structure are of particular interest. The gradient flow structure gives rise to a variational scheme by means of the minimising movement scheme (also called JKO scheme, after the seminal work of Jordan, Kinderlehrer and Otto) which constitutes a time-discrete minimization problem for the energy.
While the scheme has been used originally for analytical aspects, a number of authors have explored the numerical potential of this scheme. Such schemes often use a Lagrangian representation where instead of the density, the evolution of a time-dependent homeomorphism that describes the spatial redistribution of the density is considered. (Conference Room San Felipe) |

12:00 - 12:30 |
Katy Craig: A blob method for diffusion and degenerate diffusion ↓ For a range of physical and biological processes—from biological swarming to dynamics of granular media—the evolution of a large number of interacting agents can be described in terms of the competing effects of drift, diffusion, and nonlocal interaction. The resulting partial differential equations are gradient flows with respect to the Wasserstein metric, and this gradient flow structure provides a natural framework for numerical particle methods. However, developing deterministic particle methods for problems involving diffusion poses unique challenges, particularly when nonlocal interaction terms are present. In this talk, I will present new work on a blob method for diffusion and degenerate diffusion, inspired by blob methods from classical fluid mechanics. This is joint work with Francesco Patacchini and José Antonio Carrillo. (Conference Room San Felipe) |

12:30 - 13:00 |
Clarice Poon: The total variation Wasserstein gradient flow ↓ In this talk, we present a study of the JKO scheme for the total variation functional. In particular, we present a characterization of the optimizers and some of their qualitative properties (a sort of maximum principle and the regularity of level sets). Furthermore, in dimension one, we shall discuss the convergence as the time step goes to zero to a solution of a fourth-order nonlinear evolution equation. This is joint work with Guillaume Carlier. (Conference Room San Felipe) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 16:00 |
Codina Cotar: Density functional theory and many-marginals optimal transport with Coulomb and Riesz costs ↓ Multi-marginal optimal transport with Coulomb cost arises as a dilute
limit of density functional theory, which is a widely used electronic structure model. The number N of marginals corresponds to the number of particles. I will discuss the question whether ''Kantorovich minimizers'' must be ''Monge minimizers'' (yes for N=2, open for N>2, no for N=infinity), and derive the surprising phenomenon that the extreme correlations of the minimizers turn into independence in the large N limit. Time permitting, I will also discuss open problems on the next order term and work in progress on the topic.
The talk is based on joint works with Gero Friesecke (TUM), Claudia
Klueppelberg (TUM), Brendan Pass (Alberta) which appeared in CPAM (2013)
and Calc.Var.PDE (2014), and on joint work in progress with Mircea Petrache on the next order term. (Conference Room San Felipe) |

16:00 - 16:30 | Coffee Break (Conference Room San Felipe) |

16:30 - 17:00 |
Jean-Michel Loubes: Transport based kernels for Gaussian Process Modeling ↓ Monge-Kantorovich distances, otherwise known as Wasserstein distances, have
received a growing attention in statistics and machine learning as a powerful discrepancy
measure for probability distributions. In this paper, we focus on forecasting a Gaussian
process indexed by probability distributions. For this, we provide a family of positive definite
kernels built using transportation based distances. We provide a probabilistic understanding
of these kernels and characterize the corresponding stochastic processes. Then we consider the asymptotic properties of the forecast process. (Conference Room San Felipe) |

17:00 - 17:30 |
Eustasio Del Barrio: CLTs for empirical cost in general dimension ↓ We consider the problem of optimal transportation with quadratic cost
between a empirical measure and a general target probability on $\mathbb{R}^d$,
with $d\geq 1$. We provide new results on the uniqueness and stability of the
associated optimal transportation potentials, namely, the minimizers in the dual
formulation of the optimal transportation problem. As a consequence, we show
that a CLT holds for the empirical transportation cost
under mild moment and smoothness requirements. The limiting distributions are
Gaussian and admit a simple description in terms of the optimal transportation
potentials. (Conference Room San Felipe) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Wednesday, May 3 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:30 |
Sebastian Reich: Optimal transport and its use in data assimilation and sequential Bayesian inference ↓ I will review the use of optimal coupling argument in the design of sequential Monte Carlo
methods for data assimilation and sequential Bayesian inference. I will also address their efficient
implementation using the Sinkhorn approximation and second-order corrections. If time permits it,
I will also review the mathematical structure of continuous-time filtering problems within a generalised
Kalman formulation and its link to continuous-time optimal transport. (Conference Room San Felipe) |

09:30 - 10:00 |
Espen Bernton: Inference in generative models using the Wasserstein distance ↓ In purely generative models, one can simulate data given parameters but not necessarily evaluate the likelihood. We use Wasserstein distances between empirical distributions of observed data and empirical distributions of synthetic data drawn from such models to estimate their parameters. Previous interest in the Wasserstein distance for statistical inference has been mainly theoretical, due to computational limitations. Thanks to recent advances in numerical transport, the computation of these distances has become feasible for relatively large data sets, up to controllable approximation errors. We leverage these advances to propose point estimators and quasi-Bayesian distributions for parameter inference, first for independent data. For dependent data, we extend the approach by using delay reconstruction, residual reconstruction, and curve matching techniques. For large data sets, we also propose an alternative distance using the Hilbert space-filling curve, whose computation scales as $n\log n$ where $n$ is the size of the data. We provide a theoretical study of the proposed estimators, and adaptive Monte Carlo algorithms to approximate them. The approach is illustrated on several examples, including a toggle switch model from systems biology, a Lotka-Volterra model for plankton population sizes, and a L\'evy-driven stochastic volatility model. (Conference Room San Felipe) |

10:00 - 10:30 |
Sanvesh Srivastava: Scalable Bayes via Barycenter in Wasserstein Space ↓ As commonplace as big data now are, there are few statistical methods and computational algorithms that are both widely applicable and account for uncertainty in their results. Theories and applications of non-probabilistic approaches account for most of the significant advances in the statistical methodology for big data. A major limitation of non-probabilistic methods is that it is unclear how to account for uncertainty in their results. The Bayesian framework provides a probabilistic approach for analyzing big data and quantifies uncertainty using a probability measure. This flexibility comes at the cost of intractable computations in realistic modeling of massive data sets. Divide-and-conquer based Bayesian methods provide a general approach for tractable computations in massive data sets. These methods first divide the data into smaller subsets, perform computations to estimate a probability measure in parallel across all subsets, and then combine the probability measures from all the subsets to approximate the probability measure computed using the full data. In this talk, I will introduce one such approach that relies on the geometry of probability measures estimated across different subsets and combines them through their barycenter in a Wasserstein space of probability measures. The geometric method has attractive theoretical properties and has superior empirical performance on a large movie ratings database.
This presentation is based on a joint work with David Dunson (Department of Statistical Science, Duke University) and Cheng Li (Department of Statistics and Applied Probability, National University of Singapore) (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 12:00 |
Christian Léonard: Some results about entropic transport ↓ Optimal transport is a powerful tool for proving entropy-entropy production inequalities related to rates of convergence to equilibrium of heat flows. In this talk, an alternate similar approach will be introduced that relies on entropic transport rather than standard optimal transport. The optimal transport problem is replaced by the Schrödinger problem: an entropy minimization problem on the set of probability measures with marginal constraints. The large deviation principle leading to the Schrödinger problem will be introduced and a stochastic proof of the HWI inequality, based on this entropy minimization problem, will be sketched. This will be an opportunity to present several properties of entropic transport. (Conference Room San Felipe) |

12:00 - 13:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

13:00 - 19:00 | Free Afternoon (Oaxaca) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Thursday, May 4 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 10:00 |
Adam Oberman: PDE approach to regularization in deep learning ↓ Deep neural networks have achieved significant success in a number of challenging engineering problems.
There is consensus in the community that some form of smoothing of the loss function is needed, and there have been hundreds of papers and many conferences in the past three years on this topic. However, so far there has been little analysis by mathematicians.
The fundamental tool in training deep neural networks is Stochastic Gradient Descent (SGD) applied to the ``loss'' function, $f(x)$, which is high dimensional and nonconvex.
\begin{equation}\label{SGDintro}\tag{SDG}
dx_t = -\nabla f(x_t) dt + dW_t
\end{equation}
There is a consensus in the field that some for of regularization of the loss function is needed, but so far there has been little progress. This may be in part because smoothing techniques, such a convolution, which are useful in low dimensions, are computationally intractable in the high dimensional setting.
Two recent algorithms have shown promise in this direction. The first, \cite{zhang2015deep}, uses a mean field approach to perform SGD in parallel. The second, \cite{chaudhari2016entropy}, replaced $f$ in \eqref{SGDintro} with $f_\gamma(x)$, the \emph{local entropy} of $f$, which is defined using notions from statistical physics \cite{baldassi2016unreasonable}.
We interpret both algorithms as replacing $f$ with $f_\gamma$, where $f_\gamma = u(x,\gamma)$ and $u$ is the solution of the viscous Hamilton-Jacobi PDE
\[
u_t(x,t) = - \frac 1 2 |\grad u(x,t)|^2 + \beta^{-1} \Delta u(x,t)
\]
along with $u(x,0) = f(x)$. This interpretation leads to theoretical validation for empirical results.
However, what is needed for \eqref{SGDintro} is $\grad f_\gamma(x)$. Remarkably, for short times, this vector can be computed efficiently by solving an auxiliary \emph{convex optimization} problem, which has much better convergence properties than the original non-convex problem. Tools from optimal transportation \cite{santambrogio2016euclidean} are used to justify the fast convergence of the solution of the auxiliary problem.
In practice, this algorithm has significantly improved the training time (speed of convergence) for Deep Networks in high dimensions. The algorithm can also be applied to nonconvex minimization problems where \eqref{SGDintro} is used. (Conference Room San Felipe) |

10:00 - 10:30 |
Rémi Flamary: Joint distribution optimal transportation for domain adaptation ↓ This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function $f$ in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/labels space distributions of the two domain ${\mathrm{ps}}$ and ${\mathrm{pt}}$. We propose a solution of this problem with optimal transport, that allows to recover an estimated target ${\mathrm{pt}}^f=(X,f(X))$ by optimizing simultaneously the optimal coupling and $f$. We show that our method corresponds to the minimization of a generalization bound, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis
or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.
Joint work with Nicolas COURTY, Amaury Habrard, and Alain RAKOTOMAMONJY (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:30 |
Jianbo Ye: New numerical tools for optimal transport and their machine learning applications ↓ In this talk, I plan to introduce my recent work on two numerical
tools that solve optimal transport and its related problems. The first
tool is a Bregman ADMM approach to solve optimal transport and
Wasserstein barycenter. Based on my empirical experience, I will
discuss its pro and cons in practice, and compare it with the popular
entropic regularization approach. The second tool is a simulated
annealing approach to solve Wasserstein minimization problems, in
which I will illustrate there exists a simple Markov chain
underpinning the dual OT. This approach gives very different
approximate solution compared to other smoothing techniques. I will
also discuss how this approach will be related to solve some more
recent problems in machine learning, such as Wasserstein NMF,
out-of-sample mapping estimation. Finally, I will present several
applications in document analysis, sequence modeling, and image
analytics, using those tools which I have developed during my PhD
research. (Conference Room San Felipe) |

11:30 - 12:30 |
Jun Kitagawa: On the multi-marginal optimal partial transport and partial barycenter problems ↓ Relatively recently, there has been much activity on two particular generalizations of the classical two-marginal optimal transport problem. The first is the partial transport problem, where the total mass of the two distributions to be coupled may not match, and one is forced to choose submeasures of the constraints for coupling. The other generalization is the multi-marginal transport problem, where there are 3 or more probability distributions to be coupled together in some optimal manner. By combining the above two generalizations we obtain a natural extension: the multi-marginal optimal partial transport problem. In joint work with Brendan Pass (University of Alberta), we have obtained uniqueness of solutions (under hypotheses analogous to the two-marginal partial transport problem given by Figalli) by relating the problem to what we deem the “partial barycenter problem” for finite measures. Interestingly enough, we also observe some significantly different behavior of solutions compared to the two marginal case. (Conference Room San Felipe) |

12:30 - 13:00 |
Christoph Brune: Combined modelling of optimal transport and segmentation ↓ For studying vascular structures in 4D biomedical imaging, it is of great importance to automatically determine the velocity of flow in video sequences, for example blood flow in vessel networks. In this work, new optimal transport models focusing on direction and segmentation are investigated to find an accurate displacement between two density distributions. By incorporating fluid dynamics constraints, one can obtain a realistic description of the displacement. With an a-priori given segmentation of the network structure, transport models can be improved. However, a segmentation is not always known beforehand. Therefore, in this work a joint segmentation-optimal transport model has been described. Other contributions are the ability of the model to allow for inflow or outflow and the incorporation of anisotropy in the displacement cost. For the problem, a convex variational method has been used and primal-dual proximal splitting algorithms have been implemented. Existence of a solution of the model has been proved. The framework has been applied to synthetic vascular structures and real data, obtained from a collaboration with the hospital in Cambridge. This is joint work with Yoeri Boink and Carola Schönlieb. (Conference Room San Felipe) |

13:00 - 13:30 |
Peyman Mohajerin Esfahani: Data-driven Optimization Using the Wasserstein Metric: Performance Guarantees and Tractable Reformulations ↓ We consider stochastic programs where the distribution of the uncertain parameters is only observable through a finite training dataset. Using the Wasserstein metric, we construct a ball in the space of probability distributions centered at the uniform distribution on the training samples, and we seek decisions that perform best in view of the worst-case distribution within this Wasserstein ball. The state-of-the-art methods for solving the resulting distributionally robust optimization (DRO) problems rely on global optimization techniques, which quickly become computationally excruciating. In this talk we demonstrate that, under mild assumptions, the DRO problems over Wasserstein balls can in fact be reformulated as finite convex programs---in many interesting cases even as tractable linear programs. We further discuss performance guarantees as well as connection to the celebrated regularization techniques in the Machine Learning literature. This talk is based on a joint work with Daniel Kuhn (EPFL). (Conference Room San Felipe) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

15:00 - 16:00 |
Robert McCann: On Concavity of the Monopolist's Problem Facing Consumers with Nonlinear Price Preferences ↓ The principal-agent problem is an important paradigm in economic theory
for studying the value of private information; the nonlinear pricing problem
faced by a monopolist is a particular example. In this lecture, we identify
structural conditions on the consumers' preferences and the monopolist's
profit functions which guarantee either concavity or convexity of the monopolist's
profit maximization. Uniqueness and stability of the solution are particular consequences
of this concavity. Our conditions are similar to (but simpler than) criteria given by Trudinger and others
for prescribed Jacobian equations to have smooth solutions.
By allowing for different dimensions of agents and contracts, nonlinear dependence of agent preferences on prices,
and of the monopolist's profits on agent identities, we improve on the literature in a number of ways.
The same mathematics can also be adapted to the maximization of societal welfare by a regulated monopoly,
This is joint work with PhD student Shuangjian Zhang. (Conference Room San Felipe) |

16:00 - 16:30 | Coffee Break (Conference Room San Felipe) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Friday, May 5 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 11:00 | Free discussions (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |