# Schedule for: 24w5191 - PDE Methods in Machine Learning: from Continuum Dynamics to Algorithms

Beginning on Sunday, June 9 and ending Friday June 14, 2024

All times in Granada, Spain time, MDT (UTC-6).

Sunday, June 9 | |
---|---|

16:00 - 17:30 | Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Hotel Tent Granada) |

Monday, June 10 | |
---|---|

07:00 - 09:00 | Breakfast (Restaurant - Hotel Tent Granada) |

09:00 - 09:30 | Introduction and Welcome by IMAG Staff (Main Meeting Room - Calle Rector López Argüeta) |

09:30 - 10:00 |
Sinho Chewi: Variational inference via Wasserstein gradient flows ↓ Variational inference (VI), which seeks to approximate the Bayesian posterior by a more tractable distribution within a variational family, has been widely advocated as a scalable alternative to MCMC. However, obtaining non-asymptotic convergence guarantees has been a longstanding challenge. In this talk, I will argue that viewing this problem as optimization over the Wasserstein space of probability measures equipped with the optimal transport metric leads to the design of principled algorithms which exhibit strong practical performance and are backed by rigorous theory. In particular, we address Gaussian VI, as well as (non-parametric) mean-field VI. (Main Meeting Room - Calle Rector López Argüeta) |

10:10 - 10:40 |
Nicolas Garcia Trillos: FedCBO: Reaching Group Consensus in Clustered Federated Learning and Robustness to Backdoor Adversarial Attacks ↓ Federated learning is an important framework in modern machine learning that seeks to integrate the training of learning models from multiple users, each with their own local data set, in a way that is sensitive to the users’ data privacy and to communication cost constraints. In clustered federated learning, one assumes an additional unknown group structure among users, and the goal is to train models that are useful for each group, rather than training a single global model for all users.
In the first part of this talk, I will present a novel solution to the problem of clustered federated learning that is inspired by ideas in consensus-based optimization (CBO). Our new CBO-type method is based on a system of interacting particles that is oblivious to group memberships. Our algorithm is accompanied by theoretical justification and tested on real data experiments.
I will then discuss an additional issue of concern in federated learning: the vulnerability of federated learning protocols to “backdoor” adversarial attacks. This discussion will motivate the introduction of a second, improved particle system with enhanced robustness properties and that, at an abstract level, can be interpreted as a bi-level optimization algorithm based on interacting particle dynamics. This talk is based on joint works with Sixu Li, Yuhua Zhu, Konstantin Riedl, and Jose Carrillo. (Main Meeting Room - Calle Rector López Argüeta) |

10:40 - 11:10 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

11:10 - 12:00 |
Jianfeng Lu: Vision Lecture: Convergence analysis of classical and quantum dynamics via hypocoercivity ↓ In this talk we will review some recent developments in the framework of hypocoervicity to obtain quantitative convergence estimate of classical and quantum dynamics, with focus on underdamped Langevin dynamics for sampling and Lindblad dynamics for open quantum systems. (Main Meeting Room - Calle Rector López Argüeta) |

12:10 - 12:40 |
Jonathan Niles-Weed (probably Central limit theorems for smooth optimal transport maps) ↓ TBD (Main Meeting Room - Calle Rector López Argüeta) |

12:50 - 13:20 | Anna Korba (Main Meeting Room - Calle Rector López Argüeta) |

13:30 - 15:00 | Lunch (Restaurant - Hotel Tent Granada) |

15:00 - 15:30 | Quentin Mérigot (Main Meeting Room - Calle Rector López Argüeta) |

15:40 - 16:10 | Boris Hanin (Main Meeting Room - Calle Rector López Argüeta) |

16:10 - 16:40 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

20:00 - 21:30 | Dinner (Restaurant - Hotel Tent Granada) |

Tuesday, June 11 | |
---|---|

07:00 - 09:00 | Breakfast (Restaurant - Hotel Tent Granada) |

09:30 - 10:20 |
Gabriele Steidl: Vision Lecture: Wasserstein Gradient Flows and Generative Models for Posterior Sampling in Inverse Problems ↓ This talk is concerned with inverse problems in imaging from a Bayesian point of view, i.e. we want to sample from the posterior given noisy measurement. We tackle the problem by studying gradient flows of particles in high dimensions. More precisely, we analyze Wasserstein gradient flows of maximum mean discrepancies defined with respect to different kernels,
including non-smooth ones. In high dimensions, we propose the efficient flow computation via Radon transform (slicing) and
subsequent sorting. Special attention is paid to non-smooth Riesz kernels which Wasserstein gradient flows have a rich structure. Finally, we approximate our particle flows by conditional generative neural networks and apply them for conditional image generation and in inverse image restoration problems like computerized tomography. (Main Meeting Room - Calle Rector López Argüeta) |

10:30 - 11:00 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

11:00 - 11:30 |
Davide Carbone: Generative models as out-of-equilibrium particle systems: training of Energy-Based Models using Non-Equilibrium Thermodynamics ↓ In recent years, generative diffusion models (GDM) have emerged as a powerful class of models for generating high-quality data across various domains, even in the context of scientific machine learning. These models operate by gradually transforming a simple, tractable distribution into a complex data distribution through a series of diffusion steps. In this talk I will firstly give a summary of the strict relation between physics and GDM. Then I will show some results about energy-based models (EBMs), which have gained significant attention due to their ability to model complex data distributions and to provide an interpretation of energy landscapes. I will show how is possible to leverage tools from nonequilibrium statistical physics to improve the training of EBMs, usually performed via Contrastive Learning or modification. (Main Meeting Room - Calle Rector López Argüeta) |

11:40 - 12:10 | Loucas Pillaud-Vivien (Main Meeting Room - Calle Rector López Argüeta) |

12:20 - 12:50 |
Franca Hoffmann: Dynamics of Strategic Agents and Algorithms as PDEs ↓ We propose a PDE framework for modeling the distribution shift of a strategic population interacting with a learning algorithm. We consider two particular settings; one, where the objective of the algorithm and population are aligned, and two, where the algorithm and population have opposite goals. We present convergence analysis for both settings, including three different timescales for the opposing-goal objective dynamics. We illustrate how our framework can accurately model real-world data and show via synthetic examples how it captures sophisticated distribution changes which cannot be modeled with simpler methods. (Main Meeting Room - Calle Rector López Argüeta) |

13:00 - 13:15 | Group Photo (Main Meeting Room - Calle Rector López Argüeta) |

13:30 - 15:00 | Lunch (Restaurant - Hotel Tent Granada) |

15:00 - 15:30 |
Oliver Tse: Variational Acceleration Methods in the Space of Probability Measures ↓ The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance. While much attention has been directed towards optimizing within Euclidean space, the necessity to perform optimization within spaces of probability measures prevalent in e.g. machine learning has motivated the exploration of accelerated gradient methods within this context.
In this talk, I will give a brief overview of variational acceleration methods in Euclidean space and describe one way of lifting these methods to the space of probability measures. I will then discuss their convergence rates under suitable assumptions on the functional to be minimized. (Main Meeting Room - Calle Rector López Argüeta) |

15:40 - 16:10 |
Marco Mondelli: Two Vignettes on PDE Methods for Deep Learning: Implicit Bias of Gradient Descent, and Score-Based Generative Models ↓ In the spirit of the workshop title, I will present two vignettes in which PDE methods prove useful for the analysis of deep learning models. The first vignette comes from mean-field theory. By connecting the gradient descent dynamics to the solution of a PDE, we show a bias of two-layer ReLU networks towards a simple solution: for a univariate regression problem, at convergence a piecewise linear map of the inputs is implemented, and the number of "knot" points - i.e., points where the tangent of the ReLU network estimator changes - between two consecutive training inputs is at most three. The second vignette comes from score-based generative models (SGMs), a powerful tool to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time T by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. However, the existing analysis paradigm requires a diverging T, which is problematic in terms of stability, computational cost and error propagation. We address the issue by providing convergence guarantees for a version of the popular predictor-corrector scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. (Main Meeting Room - Calle Rector López Argüeta) |

16:10 - 16:40 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

20:00 - 21:30 | Dinner (Restaurant - Hotel Tent Granada) |

Wednesday, June 12 | |
---|---|

07:00 - 09:00 | Breakfast (Restaurant - Hotel Tent Granada) |

09:30 - 10:00 |
Jose A. Carrillo: The Ensemble Kalman Filter in the Near-Gaussian Setting ↓ We provide analysis for the accuracy of the ensemble Kalman filter for problems where the filtering distribution is non-Gaussian, but can be characterized as close to Gaussian after appropriate lifting to the joint space of state and data. The ensemble Kalman filter is widely used in applications because, for high dimensional filtering problems, it has a robustness that is not shared for example by the particle filter; in particular it does not suffer from weight collapse. However, there is no theory which quantifies its accuracy, as an approximation of the true filtering distribution, except in the Gaussian setting. We use the mean-field description to address this issue. Our results rely on stability estimates that can be obtained by rewriting the mean field ensemble Kalman filter in terms of maps on probability measures, and then introducing a weighted total variation metric in which these maps are locally Lipschitz. This is a joint work with F. Hoffmann, A. Stuart and U. Vaes. (Main Meeting Room - Calle Rector López Argüeta) |

10:10 - 10:40 |
Stephan Wojtowytsch: Accelerated gradient descent for the Modica-Mortola energy ↓ Accelerated methods have been primarily analyzed for convex optimization problems. We will discuss their application to non-convex optimization, in particular the Modica-Mortola (or Cahn/Hilliard or Ginzburg-Landau) approximation to the perimeter functional. (Main Meeting Room - Calle Rector López Argüeta) |

10:40 - 11:10 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

11:10 - 11:40 |
Theodor Misiakiewicz: On the complexity of learning under group invariance ↓ In this talk, we are interested in connecting learning with gradient descent to the computational complexity of general classes of learning algorithms. We present some early attempts in that direction. We introduce differentiable learning queries (DLQ) as a subclass of statistical query algorithms, and consider learning the orbit of a distribution under a group of symmetry. In this setting, we can derive sharp upper and lower bounds for the computational complexity of DLQ in terms of a “leap complexity”. We then illustrate how these results offer some insights on the training dynamics of neural networks in the mean-field regime. (Main Meeting Room - Calle Rector López Argüeta) |

11:50 - 12:20 |
Anna Little: Dimension reduction with path metrics: theory, applications, and challenges ↓ This talk introduces a geometric framework for both unsupervised and supervised dimension reduction. The main tool is the power-weighted path metric, which can simultaneously de-noise high-dimensional data while preserving intrinsic geometric structure. In the unsupervised context, a new geometry is obtained by shrinking data in directions of high data density. Theoretical guarantees include convergence of graph Laplacian operators constructed with this random metric, and applications include the analysis of single cell RNA data and multi-manifold clustering. In the supervised context, a new geometry is obtained by elongating data in the direction of a label gradient. Computing geodesics under this new geometry produce data visualizations which achieve noise reduction while preserving geometric structure. Furthermore, integration into Laplacian learning algorithms enables accurate prediction of complex, nonlinear functions from sparse label information. (Main Meeting Room - Calle Rector López Argüeta) |

12:30 - 13:30 | Open problem session (Main Meeting Room - Calle Rector López Argüeta) |

13:30 - 15:00 | Lunch (Restaurant - Hotel Tent Granada) |

15:00 - 20:00 | Free Afternoon (Other (See Description)) |

20:30 - 22:00 |
Social Dinner ↓ Social dinner will take place at the gardens of Carmen de la Victoria (part of the University of Granada). (Other (See Description)) |

Thursday, June 13 | |
---|---|

07:00 - 09:00 | Breakfast (Restaurant - Hotel Tent Granada) |

09:30 - 10:00 |
Maria Bruna: Macroscopic limits of systems of strongly interacting particles: from simple exclusion processes to interacting Brownian motions ↓ I will address the macroscopic dynamics of strongly interacting particles, which diverge significantly from weak (mean-field) or moderately interacting particles, especially in scenarios involving multiple species. These models are essential for understanding many-particle systems in various fields such as biology and industrial applications, where considerations like particle size (steric or excluded-volume effects) are paramount. While in simple cases, such as identical particles, the limit equations for the strongly interacting regime coincide with localising the mean-field limit, this correspondence breaks down for non-gradient type models, notably multispecies systems. Through the exploration of two illustrative models — a discrete simple exclusion process with available rigorous hydrodynamic limits in particular cases and continuous interacting Brownian motions with only approximate results — I will demonstrate this divergence. (Main Meeting Room - Calle Rector López Argüeta) |

10:10 - 10:40 |
Matt Jacobs: Lagrangian solutions to PME and score matching ↓ There is a large body of recent work on the approximation of diffusion equations by particle systems. Most of this analysis approaches the problem from a stochastic perspective due to the difficulty of studying deterministic particle trajectories. This is largely due to the fact that in the continuous setting, it may be extremely hard to solve diffusion equations in Lagrangian coordinates. In fact, the existence of Lagrangian solutions to the Porous Media Equation (PME) with general initial data was open until 2022. In this talk, I will discuss how to construct Lagrangian solutions to PME. I will then sketch how this analysis can be used to obtain convergence rates for certain deterministic versions of the score matching algorithm. (Main Meeting Room - Calle Rector López Argüeta) |

10:40 - 11:10 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

11:10 - 11:40 |
Zhenjie Ren: Approximations to the long time limit of Mean-field Langevin diffusion ↓ The training of two-layer neural network can be viewed as a convex mean-field optimization problem with entropy regularization, of which the minimizer distribution can be characterized as the invariant measure of the mean-field Langevin diffusion. In this talk, we will explore recent progresses on different methods to approximate this target measure. This includes the uniform-in-time propagation of chaos results for the particle system and the ergodicity of the self-interaction diffusion. (Main Meeting Room - Calle Rector López Argüeta) |

11:45 - 12:15 |
Claudia Totzeck: Consensus-based optimization: multi-objective problems & gradient inference ↓ We discuss how ensemble-based gradient inference is able to improve the performance of particle methods for global optimization tasks. We will especially focus on Consensus-based optimization (CBO) and sampling. Moreover, we discuss an extension of CBO to approximate the Pareto front of multi-objective problems. Joint work with: Kathrin Klamroth/Michael Stiglmayr and Philipp Wacker/Claudia Schillings (Main Meeting Room - Calle Rector López Argüeta) |

12:20 - 12:50 |
Yulong Lu: On the generalization of diffusion models in high dimensions ↓ Diffusion models, particularly score-based generative models (SGMs), have emerged as powerful tools in diverse machine learning applications, spanning from computer vision to modern language processing. In this talk, I will discuss about the generalization theory of SGMs for learning high-dimensional distributions. Our analysis show that SGMs achieve a dimension-free generation error bound when applied to a class of sub-Gaussian distributions characterized by certain low-complexity structures. (Main Meeting Room - Calle Rector López Argüeta) |

12:55 - 13:25 |
Lukasz Szpruch: Fisher–Rao gradient flow for entropy regularised MDPs in Polish spaces ↓ We study the global convergence of a Fisher-Rao policy gradient flow for infinite- horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well- posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows. (Main Meeting Room - Calle Rector López Argüeta) |

13:30 - 15:00 | Lunch (Restaurant - Hotel Tent Granada) |

15:00 - 16:00 | Open problems working groups (Main Meeting Room - Calle Rector López Argüeta) |

16:00 - 16:30 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

20:00 - 21:30 | Dinner (Restaurant - Hotel Tent Granada) |

Friday, June 14 | |
---|---|

07:00 - 09:00 | Breakfast (Restaurant - Hotel Tent Granada) |

09:30 - 10:00 |
Daniel Sanz Alonso: Structured Covariance Operator Estimation ↓ Covariance operator estimation is a fundamental task underpinning many algorithms for data assimilation, inverse problems, and machine learning. This talk introduces a notion of sparsity for infinite-dimensional covariance operators and a family of thresholded estimators which exploits it. In a small lengthscale regime, we show that thresholded estimators achieve an exponential improvement in sample complexity over the standard sample covariance estimator. Our analysis explains the importance of using covariance localization techniques in ensemble Kalman methods for data assimilation and inverse problems. (Main Meeting Room - Calle Rector López Argüeta) |

10:10 - 10:40 |
Cristina Cipriani: An optimal control perspective on Neural ODEs and adversarial training ↓ Neural ODEs are a special type of neural networks, which allow the interpretation of deep neural networks as discretizations of control systems. This unique perspective provides the advantage of employing powerful tools from control theory to advance and comprehend machine learning. Specifically, the training of Neural ODEs can be viewed as an optimal control problem, allowing for numerical approaches inspired by this control-oriented viewpoint.
In this talk, we consider the mean-field formulation of the problem and derive first-order optimality conditions in the form of a mean-field Pontryagin Maximum Principle, which we apply to different numerical examples. Moreover, we extend this perspective to the case of adversarial training of neural ODEs, which is a way to enforce reliable and stable outcomes in neural networks. We formalize adversarial training with perturbed data as a minimax optimal control problem and derive first-order optimality conditions in the form of Pontryagin’s Maximum Principle. Moreover, we provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task. (Main Meeting Room - Calle Rector López Argüeta) |

10:40 - 11:10 | Coffee Break (Main Meeting Room - Calle Rector López Argüeta) |

10:40 - 11:10 | Checkout by 11AM (Front Desk - Hotel Tent Granada) |

11:10 - 11:40 |
Lisa Kreusser: Fokker-Planck equations for score-based diffusion models ↓ Generative models have become very popular over the last few years in the machine learning community. These are generally based on likelihood based models (e.g. variational autoencoders), implicit models (e.g. generative adversarial networks), as well as score-based models. As part of this talk, I will provide insights into our recent research in this field focussing on score-based diffusion models. Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, link it to an associated Fokker–Planck equation, and provide a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker–Planck residual. We also show numerically that reducing the Fokker–Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality. (Main Meeting Room - Calle Rector López Argüeta) |

11:45 - 12:15 |
Jaume de Dios Pont: Complexity lower bounds for log-concave sampling ↓ Given a density rho(x), how does one effectively generate samples from a random variable with this density rho? Variations of this question arise in most computational fields, and significant effort has been devoted to designing more and more efficient algorithms, ranging from relatively simple algorithms to increasingly sophisticated such as Langevin-based or diffusion based models.
This talk will focus on the model case in which log-density is a strongly concave smooth function. We will discuss some of the most widely used algorithms, and study fundamental limitations to the problem by finding universal complexity bounds that no algorithm can beat.
Based on joint work with Sinho Chewi, Jerry Li, Chen Lu and Shyam Narayanan. (Main Meeting Room - Calle Rector López Argüeta) |

12:20 - 12:50 |
Raphaël Barboni: Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport ↓ We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-convexity and the non-coercivity of the objective. Yet, in applications, those tasks are successfully solved by simple optimization algorithms such as gradient descent. To better understand this phenomenon, we focus here on a ``mean-field'' model of infinitely deep and arbitrarily wide ResNet, parameterized by probability measures over the product set of layers and parameters and with constant marginal on the set of layers. Indeed, in the case of shallow neural networks, mean field models have proven to benefit from simplified loss-landscapes and good theoretical guarantees when trained with gradient flow for the Wasserstein metric on the set of probability measures. Motivated by this approach, we propose to train our model with gradient flow w.r.t. the conditional Optimal Transport distance: a restriction of the classical Wasserstein distance which enforces our marginal condition. Relying on the theory of gradient flows in metric spaces we first show the well-posedness of the gradient flow equation and its consistency with the training of ResNets at finite width. Performing a local Polyak-Łojasiewicz analysis, we then show convergence of the gradient flow for well-chosen initializations: if the number of features is finite but sufficiently large and the risk is sufficiently small at initialization, the gradient flow converges towards a global minimizer. This is the first result of this type for infinitely deep and arbitrarily wide ResNets. (Main Meeting Room - Calle Rector López Argüeta) |

12:55 - 13:25 |
Nikola Kovachki: Function Space Diffusion for Video Modeling ↓ We present a generalization of score-based diffusion models to function space by perturbing functional data via a Gaussian process at multiple scales. We obtain an appropriate notion of score by defining densities with respect to Guassian measures and generalize denoising score matching. We then define the generative process by integrating a function-valued Langevin dynamic. We show that the corresponding discretized algorithm generates samples at a fixed cost that is independent of the data discretization. As an application for such a model, we formulate video generation as a sequence of joint inpainting and interpolation problems defined by frame deformations. We train an image diffusion model using Gaussian process inputs and use it to solve the video generation problem by enforcing equivariance with respect to frame deformations. Our results are state-of-the-art for video generation using models trained only on image data. (Main Meeting Room - Calle Rector López Argüeta) |

13:30 - 15:00 | Lunch (Restaurant - Hotel Tent Granada) |