Schedule for: 25w5382 - Efficient and Reliable Deep Learning Methods and their Scientific Applications

Beginning on Sunday, June 22 and ending Friday June 27, 2025

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, June 22
16:00 - 17:30 Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
20:00 - 22:00 Informal gathering (TCPL Foyer)
Monday, June 23
07:00 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
08:45 - 09:00 Introduction and Welcome by BIRS Staff
A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions.
(TCPL 201)
09:00 - 09:30 Russel Caflisch: An Adjoint Method for Optimization of the Boltzmann Equation
We present an adjoint method for optimization of the Boltzmann equation for rarefied gas dynamics, both spatially homogeneous and with spatial effects. The adjoint method is derived using a "discretize then optimize" approach. Discretization (in time and velocity) is via the Direct Simulation Monte Carlo (DSMC) method, and adjoint equations are derived from an augmented Lagrangian. After a forward (in time) solution of DSMC, the adjoint variables are found by a backwards solver. They are equal to velocity derivatives of an objective function, and are used for optimization of the Boltzmann equation. For general collision models, DSMC requires the use of a rejection sampling step, which involves discontinuities that lead to a new term, involving a score function. This is joint work with Yunan Yang (Cornell) and Denis Silantyev (U Colorado, Colorado Springs).
(TCPL 201)
09:30 - 10:00 Cory Hauck: Data-driven strategies for moment closures in radiation transport
The radiation transport equation (RTE) is a kinetic equation defined on a high-dimensional phase space. The size of this phase space makes direct simulation challenging, even for the largest supercomputers. Moment methods provide surrogate models that reduce computational complexity at the cost of reduced fidelity. To recover information lost in the moment-based approach, a closure is needed. In this talk, I will discuss two well-known and competing closures strategies and our efforts to improve their accuracy and efficiency using data-driven approaches. The first strategy is a linear spectral method that has excellent performance for smooth solutions, but suffers from artifacts when solutions lack regularity. We apply a filtering approach with a filter strength that is tuned using a machine learning algorithm. The second is a nonlinear spectral method, in which the coefficients of the spectral expansion depend on the moments via an expensive optimization problem. In this case, we use convex neural networks to bypass the optimization to yield a method that is computationally tractable, while maintaining important structural properties.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Kui Ren: Loss functions for training neural-network-based solvers for inverse problems of PDEs
This work studies optimization problems in neural-network-based, mainly PINNs-like, solution strategies for inverse problems of partial differential equations. We propose and analyze loss functions that balance contributions from different components of the data matching requirement: the equation, the boundary conditions, and the observed data. In a few simplified settings, we provide some theoretical understanding of the impact of the proposed loss functions on the computational solutions to the inverse problems.
(TCPL 201)
11:00 - 11:30 Zhiqiang Cai: Neural Networks in Scientific Computing (SciML): Basics and Challenging Questions
As a new class of approximating functions, ReLU neural networks can accurately approximate discontinuous and non-smooth functions with degrees of freedom several orders of magnitude lower than those required by existing classes of approximating functions such as finite elements on quasi-uniform mesh. This talk will first use simple examples to demonstrate these properties. For computationally challenging problems such as interface singularities, thin transitional interior or boundary layers, and discontinuities, while some existing neural network-based approaches, such as Physics-Informed Neural Networks (PINNs), attempt to incorporate physical principles, they often fail to fully preserve the underlying physics. In contrast, this talk will discuss and introduce fundamentally different approaches: Physics-Preserved Neural Networks (P2NN) methods, which rigorously enforce physical laws at the discrete level. Despite ReLU neural network's remarkable approximation property, a major computational challenge is the inherently non-convex optimization problem they produce. This talk will conclude with a discussion of our latest advancements in overcoming this critical issue, paving the way for more efficient, robust, and physically faithful neural network-based simulations.
(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:00 - 13:30 Stanley Osher: A Novel Approach for Solving Hamilton–Jacobi Equations with Applications to Optimal Transport
This talk presents a novel solution formula for effectively solving initial value problems of Hamilton–Jacobi partial differential equations (HJ PDEs). Although HJ PDEs are fundamental in many applications, the non-uniqueness and non-smoothness of solutions present significant challenges in obtaining viscosity solutions. We introduce an implicit solution formula derived from the method of characteristics and explore its connection with classical Hopf-type formulas. Building on this formulation, we propose a deep learning-based methodology for solving HJ PDEs without relying on supervised data. By leveraging the mesh-free nature of neural networks, the method offers a scalable, efficient, and accurate framework for addressing high-dimensional and even non-convex problems. Furthermore, we demonstrate the broad applicability and flexibility of the proposed formulation by extending it to problems in optimal transport. Joint work with Yesom Park, with whom the talk will be shared.
(Online)
13:30 - 14:00 Yen-Hsi Tsai: Deep Learning Approaches for Solving Differential Equations by Classical Convergent Numerical Schemes
I will present our approach using classical numerical methods in deep learning models to solve high-dimensional Hamilton-Jacobi equations and multiscale Hamiltonian systems. The approach uses a stochastic gradient descent-based optimization algorithm to minimize the least squares functionals defined by the numerical schemes. In the talk, I will discuss the advantages of using numerical schemes, including improved data and training efficiency, the ability to compute the viscosity solutions, and improved structure preservation compared to other popular approaches. I will also discuss some critical issues related to the critical points of the least squares functionals, the choice of activation functions, and network architecture.
(TCPL 201)
14:00 - 14:30 Yu-Yu Liu: Optimal Control Problems in Level-Set Type Front Propagation Model in Complex Flows
G-equation is a level-set-type partial differential equation that models the propagation of turbulent flame in a fluid field with a burning speed. In the framework of Hamilton-Jacobi-Bellman equations, the characteristics of its solutions correspond to the optimal trajectories driven by the flow velocity and the laminar velocity. Many analytical works focus on constructing specific trajectories to obtain lower estimates of the propagation speeds. In this talk, I will present recent works on finding the optimal trajectories using batch descending methods. The results give structure of trajectories in complex 2D or 3D flows with high intensity, also the computed turbulent flame speeds are consistent with previous results by simulation of G-equation.
(TCPL 201)
14:30 - 15:00 Wenrui Hao: Homotopy Training Algorithms in Scientific Machine Learning
This talk examines the synergy between machine learning and nonlinear scientific computing, with an emphasis on developing efficient algorithms for training neural networks to tackle complex mathematical problems. We present a novel Homotopy Training Algorithm (HTA) that bridges convex and non-convex optimization landscapes, improving the convergence behavior of deep neural networks. In particular, training neural networks for sharp interface problems poses significant challenges, as certain parameters in the governing PDEs can induce near-singularities in the loss function. To address this, we introduce a homotopy-based approach that dynamically manipulates these parameters, enabling stable and effective training under such conditions.
(TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:00 Nhat Thanh Tran: Efficient Local-Global Attention Approximation
Transformer is the current architecture that forms the backbone of large language models such as GPT and Gemini and of computer-vision models such as COSMO. The attention mechanism plays a large role in the success of these models. However, its quadratic computational cost inhibits the ability to handle long-sequence tasks. In this talk, I will discuss some of the current approaches to resolve this issue, and present our two efficient and robust attention models, named FWin and SEMA, in applications to time-series and computer-vision tasks. I will show that both of these approaches are good approximations of the vanilla attention mechanism and will present numerical results that verify the theoretical findings.
(TCPL 201)
16:00 - 16:30 Rongjie Lai: Unsupervised Solution Operator Learning for Mean-Field Games
Recent advances in deep learning have introduced numerous innovative frameworks for solving high-dimensional mean-field games (MFGs). However, these methods are often limited to solving single-instance MFGs and require extensive computational time for each instance, presenting challenges for practical applications. In this talk, I will present our recent work on a novel framework for learning the MFG solution operator using transformers. Our model takes MFG instances as input and directly outputs their solutions in a single forward pass, significantly improving computational efficiency. Our method offers two key advantages: (1) it is discretization-free, making it particularly effective for high-dimensional MFGs, and (2) it can be trained without requiring supervised labels, thereby reducing the computational burden of preparing training datasets common in existing operator learning methods. If time permits, I will also discuss a generalization-error analysis on this transformer-based model, which bridges the proposed framework to emerging theory on in-context learning and highlights its broader implications and avenues for further work.
(TCPL 201)
16:30 - 17:00 Yulong Lu: Provable in-context learning of PDEs
Transformer-based foundation models, pre-trained on a wide range of tasks with large datasets, demonstrate remarkable adaptability to diverse downstream applications, even with limited data. One of the most striking features of these models is their in-context learning (ICL) capability: when presented with a prompt containing examples from a new task alongside a query, they can make accurate predictions without requiring parameter updates. This emergent behavior has been recognized as a paradigm shift in transformers, though its theoretical underpinnings remain underexplored. In this talk, I will discuss some recent theoretical understandings of ICL for PDEs, emphasizing its approximation power and generalization capabilities. The theoretical analysis will focus on two scientific problems: elliptic PDEs and stochastic dynamical systems.
(TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
19:30 - 20:00 Zhiwen Zhang: DeepParticle: learning PDE dynamics by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method
High-dimensional PDEs are hard to compute with traditional mesh-based methods, especially with large gradients or unknown concentrations. Mesh-free methods are more appealing, but slow and expensive for long computations. We present DeepParticle, an approach integrating Deep Learning (DL), mini-batch Optimal Transport (OT), and interacting particle (IP) , first through a case study of Fisher-Kolmogorov-Petrovsky-Piskunov (FKPP) front speeds in incompressible flows. PDE analysis reduces the problem to computing the principal eigenvalue of an advection-diffusion operator. Feynman-Kac representation enables a genetic IP algorithm to evolve particle distribution to a time-invariant measure for front speed extraction. This measure is parameterized by the Peclet number. We learn this family of measures by training a physically parameterized DNN on affordable IP data at moderate Peclet numbers, then predict at larger, more expensive Peclet numbers. Our method extends to learning and generating other PDE dynamics, for which we show a second case study on aggregation patterns in Keller-Segel chemotaxis systems, and compare with diffusion models.
(Online)
20:00 - 20:30 Yunan Yang: Neural Inverse Operators for Solving PDE Inverse Problems
A large class of inverse problems for PDEs are only well-defined as mappings from operators to functions. Existing operator learning frameworks map functions to functions and need to be modified to learn inverse maps from data. We propose a novel architecture termed Neural Inverse Operators (NIOs) to solve these PDE inverse problems. Motivated by the underlying mathematical structure, NIO is based on a suitable composition of DeepONets and FNOs to approximate mappings from operators to functions. A variety of experiments are presented to demonstrate that NIOs significantly outperform baselines and solve PDE inverse problems robustly, accurately and are several orders of magnitude faster than existing direct and PDE-constrained optimization methods.
(Online)
20:30 - 21:00 Tingwei Meng: HJ-sampler: a Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton--Jacobi PDEs and score-based generative models
The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasingly been applied to sample generation. This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator. Within this framework, we found that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process. We also introduce a novel scenario where the linear operator is the adjoint of the generator, linking to Bayesian inference under specific initial and terminal conditions. Leveraging this theoretical foundation, we develop a new algorithm, named the HJ-sampler, for Bayesian inference for the inverse problem of a stochastic differential equation with given terminal observations. The HJ-sampler involves two stages: solving viscous Hamilton-Jacobi (HJ) partial differential equations (PDEs) and sampling from the associated stochastic optimal control problem. Our proposed algorithm naturally allows for flexibility in selecting the numerical solver for viscous HJ PDEs. We introduce two variants of the solver: the Riccati-HJ-sampler, based on the Riccati method, and the SGM-HJ-sampler, which utilizes diffusion models. Numerical examples demonstrate the effectiveness of our proposed methods. This is a joint work with Zongren Zou, Jerome Darbon, and George Em Karniadakis.
(Online)
Tuesday, June 24
07:00 - 09:00 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Gitta Kutyniok: Reliable and Sustainable AI: From Mathematical Foundations to the Next Generation AI Computing
Artificial intelligence is currently leading to one breakthrough after the other, in industry, public life, and the sciences. However, major drawbacks are the lack of reliability of such methodologies in particular for critical infrastructure as well as the enormous energy consumption of current AI computing. In this lecture, we will first provide an introduction to the mathematical perspective on these problems. We will then discuss some of our recent advances in reliable AI, specifically, on generalization and explainability. We will next touch upon the topic of sustainable AI in the sense of energy efficiency. Again taking a mathematical perspective will lead us naturally to the world of analog AI systems such as neuromorphic computing and the related model of spiking neural networks. We will finish with our very new results on the expressivity and generalization abilities of spiking neural networks.
(TCPL 201)
09:30 - 10:00 Adam Oberman: AI risks and current approaches to AI safety
This year took a research leave to work with Yoshua Bengio at his newly formed AI safety institute. I’ll talk about current AI risks (misuse, reliability, and systemic) https://www.gov.uk/government/publications/international-scientific-report-on-the-safety-of-advanced-ai AI forecasting https://ai-2027.com/ Approaches to AI safety, including the AI Scientist approach https://arxiv.org/abs/2502.15657
(Online)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Guang Lin: Active operator learning with predictive uncertainty quantification for partial differential equations
In this work, we develop a method for uncertainty quantification in deep operator networks (DeepONets) using predictive uncertainty estimates calibrated to model errors observed during training. The uncertainty framework operates using a single network, in contrast to existing ensemble approaches, and introduces minimal overhead during training and inference. We also introduce an optimized implementation for DeepONet inference (reducing evaluation times by a factor of five) to provide models well-suited for real-time applications. We evaluate the uncertainty-equipped models on a series of partial differential equation (PDE) problems, and show that the model predictions are unbiased, non-skewed, and accurately reproduce solutions to the PDEs. To assess how well the models generalize, we evaluate the network predictions and uncertainty estimates on in-distribution and out-of-distribution test datasets. We find the predictive uncertainties accurately reflect the observed model errors over a range of problems with varying complexity; simpler out-of-distribution examples are assigned low uncertainty estimates, consistent with the observed errors, while more complex out-of-distribution examples are properly assigned higher uncertainties. We also provide a statistical analysis of the predictive uncertainties and verify that these estimates are well-aligned with the observed error distributions at the tail-end of training. Finally, we demonstrate how predictive uncertainties can be used within an active learning framework to yield improvements in accuracy and data-efficiency for outer-loop optimization procedures.
(TCPL 201)
11:00 - 11:30 Wei Cai: Martingale deep learning for very high-dimensional quasi-linear partial differential equations and stochastic optimal controls
In this talk, we will present a highly parallel and derivative-free martingale neural network method, based on the probability theory of Varadhan’s martingale formulation of PDEs, to solve Hamilton-Jacobi-Bellman (HJB) equations arising from stochastic optimal control problems (SOCPs), as well as general quasilinear parabolic partial differential equations (PDEs). In both cases, the PDEs are reformulated into a martingale problem such that loss functions will not require the computation of the gradient or Hessian matrix of the PDE solution, and can be computed in parallel in both time and spatial domains. Moreover, the martingale conditions for the PDEs are enforced using a Galerkin method realized with adversarial learning techniques, eliminating the need for direct computation of the conditional expectations associated with the martingale property. For SOCPs, a derivative-free implementation of the maximum principle for optimal controls is also introduced. The numerical results demonstrate the effectiveness and efficiency of the proposed method, which is capable of solving HJB and quasilinear parabolic PDEs accurately and fast in dimensions as high as 10,000.
(Online)
11:30 - 11:40 Group Photo
Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo!
(TCPL Foyer)
11:40 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:00 - 13:30 Patrick Guidotti: Point Clouds Analysis via Kernel Interpolation and Approximate Kernel Interpolation
We describe a method to produce geometric information from a point cloud that has an analytical justification when the point cloud is the sample of a smooth manifold. The method is in the spirit of numerical mesh free methods and allows for the recovery of the normal and the curvatures when the point cloud is the sample of a hypersurface. Based on this information one has direct access to the surface nabla and its Laplace-Beltrami operator directly from a set of values on the point cloud. We also make a connection between kernel interpolation and Gaussian Process regression that yields a justification for the use of approximate interpolation (a regularized version of interpolation). The latter proves very useful when dealing with noisy data, i.e. when the point clouds or the values associated to it are polluted by noise. Several numerical experiments will be shown that highlight the effectiveness of the method.
(Online)
13:30 - 14:00 Haizhao Yang: Modeling and Computation in the Space of Language: Symbolic and LLM-Based Approaches
Scientific modeling and computation traditionally rely on structured mathematics and hand-designed algorithms. In this talk, I propose a new perspective: treating both modeling and computation as processes operating within the space of natural language. I will introduce two complementary approaches that realize this vision. The first uses symbolic learning based on tree structures to generate mathematical expressions, where modeling is performed by constructing symbolic trees and computation is governed by operator rules. The Finite Expression Method (FEX) exemplifies this approach by discovering interpretable, high-accuracy solutions to PDEs and physical systems. The second approach employs large language models (LLMs) for automatic code generation and reasoning to translate scientific problem descriptions into formal mathematical models and executable solvers to solve these problems. As an example, the OptimAI framework demonstrates how multi-agent LLM collaboration enables reliable end-to-end optimization problem modeling and solving. Together, these methods point toward a unified paradigm where symbolic and language models form the foundation for interpretable, scalable scientific discovery and computation.
(TCPL 201)
14:00 - 14:30 Penghang Yin: A Finite Sample Analysis for Learning Binarized Neural Network
In this talk, we will first present a finite sample analysis of straight-through estimator (STE) for neural network quantization. STE has become the most widely adopted heuristic for optimizing discrete objective functions, as it enables backpropagation through non-differentiable operations by introducing surrogate gradients. However, its theoretical properties remain largely unexplored, with few existing works in the context of quantization simplifying the analysis by assuming an infinite amount of training data. By analyzing a two-layer neural network with binary weights and activations and Gaussian input data, we derive the sample complexity needed for STE training to successfully recover the ground-truth weights. In a separate line of work, we introduce a simple weight magnitude reduction technique based on L-infinity norm regularization. We demonstrate its effectiveness in two applications: post-training quantization and the efficient fine-tuning of large language models.
(TCPL 201)
14:30 - 15:00 Xichuan Tian: Solving Nonlinear PDEs with Sparse Radial Basis Function Networks
We propose a novel framework for solving nonlinear PDEs using sparse radial basis function (RBF) networks. Sparsity-promoting regularization is employed to prevent over-parameterization and reduce redundant features. This work is motivated by longstanding challenges in traditional RBF collocation methods, along with the limitations of physics-informed neural networks (PINNs) and Gaussian process (GP) approaches, aiming to blend their respective strengths in a unified framework. The theoretical foundation of our approach lies in the function space of Reproducing Kernel Banach Spaces (RKBS) induced by one-hidden-layer neural networks of possibly infinite width. We prove a representer theorem showing that the solution to the sparse optimization problem in the RKBS admits a finite solution and establishes error bounds that offer a foundation for generalizing classical numerical analysis. The algorithmic framework is based on a three-phase algorithm to maintain computational efficiency through adaptive feature selection, second-order optimization, and pruning of inactive neurons. Numerical experiments demonstrate the effectiveness of our method and highlight cases where it offers notable advantages over GP approaches. This work opens new directions for adaptive PDE solvers grounded in rigorous analysis with efficient, learning-inspired implementation. This is a joint work with Konstantin Pieper (Oak Ridge National Lab) and Zihan Shao (UC San Diego).
(TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:00 Qin Li: Optimization over probability measure space
Define a forward problem as ρ_y=G#ρ_x, where the probability distribution ρ_x is mapped to another distribution ρ_y using the forward operator G. We examine the associated inverse problem: Given ρ_y, how to find ρx? It turns out the solution heavily depends on the metric used. In the overdetermined case, we formulate a variational problem min_{ρ_x} D(G#ρ_x,ρ_y). The marginal and conditional distribution are respectively obtained if D is set to be the Wasserstein distance and a ϕ-divergence. In the underdetermined case, we formulate the constrained optimization min_{G#ρ_x=ρ_y}E[ρ_x]. Then the classical least-norm solution, or piecewise constant solution is obtained respectively if E set to be the second moment or the entropy. These results were unknown in Euclidean space, and are unique to the probability measure space. Joint work with Yunan Yang and Li Wang.
(TCPL 201)
16:00 - 16:30 Zhongjian Wang: Wasserstein bound for generative diffusion model under Gaussian tail assumption
In this talk, I will present some recent results on sqrt(d)-complexity bound for generative diffusion models under the Wasserstein metric. In the analysis, our main assumption is the Gaussian type tail of the target distribution. We first show an uniform in space and exponential decay in time Hessian bound of the score potential function. Then we derive an uniformly bounded accumulation error in discretization for any bounded time interval. Our Gaussian tail assumption covers both the early stopping case and a Bayesian posterior with trace class prior.
(TCPL 201)
16:30 - 17:00 Shihao Zhang: Quantization and Compression of Neural Networks with Theoretical Guarantees
We introduce a new state-of-the-art post-training quantization algorithm Qronos. It sequentially rounds and updates neural network weights based on an interpretable and disciplined optimization framework that subsumes and surpasses an existing widely used method OPTQ. We provide the first rigorous theoretical error bound analysis for OPTQ and Qronos with stochastic rounding using techniques from convex ordering and high dimensional probability. We also develop a new analytical framework for data-driven post-training low-rank compression. We present three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. To the best of our knowledge, this represents a first step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches.
(TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
19:30 - 20:00 Yuan Gao: Self-Test Loss Functions for Data-Driven Modeling of Weak-Form Operators
The construction of loss functions remains a major challenge in data-driven modeling of weak-form operators in PDEs and gradient flows, primarily due to the need for careful selection of test functions. We introduce self-test loss functions, where the test functions depend on the unknown parameters, specifically tailored for operators linear in the unknowns. These loss functions conserve energy in gradient flows and match the expected log-likelihood ratio for stochastic differential equations. Their quadratic structure enables theoretical analysis of identifiability and well-posedness, while supporting efficient regression algorithms. Computationally simple and often derivative-free, the proposed method exhibits strong robustness to noisy and discrete data, as confirmed by numerical experiments.
(Online)
20:00 - 20:30 Yue Yu: Nonlocal Attention Operator: Towards a Foundation Model for Physical Responses
While foundation models have gained considerable attention in core AI fields such as natural language processing (NLP) and computer vision (CV), their application to learning complex responses of physical systems from experimental measurements remains underexplored. In physical systems, learning problems are often characterized as discovering operators that map between function spaces, using only a few samples of corresponding function pairs. For instance, in the automated discovery of heterogeneous material models, the foundation model must be capable of identifying the mapping between applied loading fields and the resulting displacement fields, while also inferring the underlying microstructure that governs this mapping. While the former task can be seen as a PDE forward problem, the later task frequently constitutes a severely ill-posed PDE inverse problem. In this talk, we will explore the development of a foundation model for physical systems, by learning neural operators for both forward and inverse PDE problems. Specifically, we show that the attention mechanism is mathematically equivalent to a double integral operator, enabling nonlocal interactions among spatial tokens through a data-dependent kernel that characterizes the inverse mapping from data to the hidden PDE parameter field of the underlying operator. Consequently, the attention mechanism captures global prior information from training data generated by multiple systems and suggests an exploratory space in the form of a nonlinear kernel map. Based on this theoretical analysis, we introduce a novel neural operator architecture, the Nonlocal Attention Operator (NAO). By leveraging the attention mechanism, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and enhancing generalizability. To demonstrate the applicability of NAO to material modeling problems, we apply it to the development of a foundation constitutive law across multiple materials, showcasing its generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning an interpretable foundation model of physical systems, but also offers a new perspective towards understanding the attention mechanism.
(Online)
20:30 - 21:00 Shuhao Cao: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows
Recent advancements in operator-type neural networks have shown promising results in approximating the solutions of spatiotemporal PDEs. However, these neural networks often entail considerable training expenses, and may not always achieve the desired accuracy required in many scientific and engineering disciplines. In this paper, we propose a new learning framework to address these issues. A new spatiotemporal adaptation is proposed to generalize any Fourier Neural Operator (FNO) variant to learn maps between Bochner spaces, which can perform an arbitrary-lengthed temporal super-resolution for the first time. To better exploit this capacity, a new paradigm is proposed to refine the commonly adopted end-to-end neural operator training and evaluations with the help from the wisdom from traditional numerical PDE theory and techniques. Numerical experiments demonstrate significant improvements in both computational efficiency and accuracy, compared to end-to-end evaluation and traditional numerical PDE solvers under certain conditions
(Online)
Wednesday, June 25
07:00 - 09:00 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Jinchao Xu: Integral Representations of Sobolev Spaces via $ReLU^k$ Activation Functions and Optimal Error Estimates for Linearized Networks
We will present two main theoretical results concerning shallow neural networks with $ReLU^k$ activation functions. We establish a novel integral representation for Sobolev spaces, showing that every function in $H^{\frac{d+2k+1}{2}}(\Omega)$ can be expressed as an $L^2$-weighted integral of $ReLU^k$ ridge functions over the unit sphere. This result mirrors the known representation of Barron spaces and highlights a fundamental connection between Sobolev regularity and neural network representations. Moreover, we prove that linearized shallow networks—constructed by fixed inner parameters and optimizing only the linear coefficients—achieve optimal approximation rates in Sobolev spaces.
(TCPL 201)
09:30 - 10:00 Justin Sirignano: Convergence Analysis of Neural Network Methods for Solving PDEs
Physics-informed neural networks (PINNs) and Deep Galerkin Methods (DGM) directly solve PDEs with neural networks. For linear elliptic PDEs, we prove that DGM/PINNs -- despite the non-convexity of neural networks -- trained with gradient descent globally converge to the PDE solution as the number of training steps and hidden units go to infinity. A key technical challenge is the lack of a spectral gap for the training dynamics of the neural network. We will also discuss using deep learning to model unknown terms within a PDE. The neural network terms in the PDE are optimized using adjoint PDEs, which again is a highly non-convex objective function. Similar to the result for PINNs, we are able to prove that the trained neural network-PDE converges to a global minimizer.
(Online)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Molei Tao: Where do all the scores come from? – generation accuracy of diffusion model, and multimodal sampling via denoising annealing
Diffusion model is a prevailing Generative AI approach. It uses a score function to characterize a complex data distribution and its evolution toward an easy distribution. This talk will report progress in two different topics, both closely related to the origins of the score function (if time limited, only the first will be reported). The first topic, which will take most time of the talk, will be on a quantification of the generation accuracy of diffusion model. The importance of this problem already led to a rich and substantial literature; however, most existing theoretical investigations assumed that an epsilon-accurate score function has already been oracle-given, and focused on just the inference process of diffusion model. I will instead describe a first quantitative understanding of the actual generative modeling protocol, including both score training (optimization) and inference (sampling). The resulting full error analysis will elucidate (again, but this time theoretically) how to design the training and inference processes for effective generation. The second topic will no longer be about generative modeling, but sampling instead. The goal is leverage the fact that diffusion model is very good at handling multimodal distributions, and extrapolate it to the holy grail problem of efficient sampling from multimodal density. There, one needs to rethink about how to get the score function, as no more data samples are available and one instead has unnormalized density. A new sampler that is insensitive to metastability, with performance guarantee, and not even requiring continuous density, will be presented.
(TCPL 201)
11:00 - 11:30 Shih-Hsin Wang: On the Connection and Discrepancy Between Diffusion and Flow Matching
Diffusion models and flow matching have been developed independently as methods for learning transport between probability distributions. Recent work has uncovered a deep connection between these frameworks through the lens of ODEs and SDEs. In this talk, I will begin by reviewing this connection and then highlight a key discrepancy: despite their shared foundations, applying the same training strategies to both frameworks can lead to suboptimal learning in flow matching models. I will then introduce our recent theoretical work, which addresses this issue by leveraging Duhamel’s formula to develop a principled correction to the classical flow matching loss.
(TCPL 201)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:30 - 17:30 Free Afternoon
For remote hikers, visit our offices for supply (e.g. bear spray). Some options for you to explore. (1) Lake Louise Scenic lake and hiking. Accessible by personal vehicle or public bus ( reservations HIGHLY recommended). (2) Sulphur Mountain Hike or Gondola: a 2.5 hour hike to the summit of SulphurMountain. Trailhead accessible via public bus (route 1) . Board the bus at Banff Park Museum (10-15 minute walk from BIRS campus). Automated ticket machine accepts credit cards. The summit has an interpretive centre with a cafe and some restaurant outlets, and a boardwalk to an old weather station. To summit mountain without hike: take gondola up. (3) Tunnel Mountain Hike: 1 hour hike. Walk up the hill to the top of campus, turn left when you reach Tunnel Mountain Drive and you’ll eventually come across the trailhead on your right.
(Banff National Park)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Thursday, June 26
07:00 - 09:00 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:30 Hongkai Zhao: Mathematical and Computational Understanding of Neural Networks: From Representation to Learning Dynamics and From Shallow to Deep
In this talk I will present both mathematical and numerical analysis as well as experiments to understand a few basic computational issues in using neural networks, as a particular form of nonlinear representation determined by the network structure and activation function, to approximate functions. I will start with frequency bias of shallow networks in terms of both representation and learning. Based on the understanding of shallow networks, we propose a structured and balanced approximation using multi-component and multi-layer neural network (MMNN) structure. While an easy modification to fully connected neural networks (FCNNs) or multi-layer perceptrons (MLPs) through the introduction of balanced multi-component structures in the network, MMNNs achieve a significant reduction of training parameters, a much more efficient training process, and a much improved accuracy compared to FCNNs or MLPs. Extensive numerical experiments are presented to illustrate the effectiveness of MMNNs in approximating functions with significant high frequency components and its automatic adaptivity in both space and frequency domain, a desirable feature for non-linear representation.
(TCPL 201)
09:30 - 10:00 Haomin Zhou: Parameterized Wasserstein Geometric Flow
I will present a parameterization strategy that can be used to design algorithms simulating geometric flows on the Wasserstein manifold, the probability density space equipped with an optimal transport metric. The framework leverages the theory of optimal transport and the techniques like the push-forward operators and neural networks, leading to a system of ODEs for the parameters of neural networks. The resulting methods are mesh-less, basis-less, sample-based schemes that scale well to higher dimensional problems. The strategy works for Wasserstein gradient flows such as the Fokker-Planck equation, and Wasserstein Hamiltonian flow like Schrodinger equation. Theoretical error bounds measured in the Wasserstein metric are established. This presentation is based on joint work with Yijie Jin (Math, GT), Wuchen Li (South Carolina), Shu Liu (UCLA), Hao Wu (Wells Fargo), Xiaojing Ye (Georgia State), and Hongyuan Zha (CUHK-SZ).
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Dario Coscia: A Variational Bayesian Method for Sequence Models Predictions and Uncertainty Quantification
Sequential models are a foundational technology driving many of today’s AI advancements, powering applications such as large language models, molecular generation, and neural PDE solvers. Despite their remarkable success, these models face a significant limitation: they struggle to assess the confidence of their predictions, particularly when operating outside their training distribution. This shortcoming raises safety concerns in critical applications, where understanding the uncertainty behind a prediction is as important as the prediction itself. Reliable uncertainty quantification is therefore essential for improving both the safety and performance of these models. To address this challenge, we introduce BARNN, a variational Bayesian method for sequential models that provides a principled approach to transforming any sequential model into its Bayesian counterpart, thereby enabling effective uncertainty quantification. Our method applies the concept of local reparameterization (or variational dropout) to sequential architectures and employs a prior that extends the "VampPrior" framework to sequence models. We demonstrate the effectiveness of BARNN on several AI4Science tasks, including PDE surrogate modeling and molecular generation
(Online)
11:00 - 11:30 Konstantinos Spiliopoulos: Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks
Recurrent neural networks (RNNs) are commonly trained with the truncated backpropagation-through-time (TBPTT) algorithm. For the purposes of computational tractability, the TBPTT algorithm truncates the chain rule and calculates the gradient on a finite block of the overall data sequence. Such approximation could lead to significant inaccuracies, as the block length for the truncated backpropagation is typically limited to be much smaller than the overall sequence length. In contrast, Real-time recurrent learning (RTRL) is an online optimization algorithm which asymptotically follows the true gradient of the loss on the data sequence as the number of sequence time steps t→∞. RTRL forward propagates the derivatives of the RNN hidden/memory units with respect to the parameters and, using the forward derivatives, performs online updates of the parameters at each time step in the data sequence. RTRL's online forward propagation allows for exact optimization over extremely long data sequences, although it can be computationally costly for models with large numbers of parameters. We prove convergence of the RTRL algorithm for a class of RNNs. The convergence analysis establishes a fixed point for the joint distribution of the data sequence, RNN hidden layer, and the RNN hidden layer forward derivatives as the number of data samples from the sequence and the number of training steps tend to infinity. We prove convergence of the RTRL algorithm to a stationary point of the loss. Numerical studies illustrate our theoretical results. One potential application area for RTRL is the analysis of financial data, which typically involve long time series and models with small to medium numbers of parameters. This makes RTRL computationally tractable and a potentially appealing optimization method for training models. Thus, we include an example of RTRL applied to limit order book data.
(Online)
11:30 - 13:00 Lunch
Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
13:00 - 13:30 Yifei Lou: Graph-Based Active Learning for Nearly Blind Hyperspectral Unmixing
Hyperspectral unmixing (HSU) is an effective tool to ascertain the material composition of each pixel in a hyperspectral image with typically hundreds of spectral channels. In this work, we introduce two graph-based semi-supervised unmixing methods. The first one directly applies graph learning to the unmixing problem. The second one solves an optimization problem that combines the linear unmixing model and a graph-based regularization term. Following a semi-supervised framework, our methods require a very small number of training pixels that can be selected by a graph-based active learning method. We assume to obtain the ground-truth information at these selected pixels, which can be either the exact (EXT) abundance value or the one-hot (OH) pseudo-label. In practice, the latter is much easier to obtain, which can be achieved by minimally involving a human in the loop. Compared with other popular blind unmixing methods, our methods significantly improve performance with minimal supervision. Specifically, the experiments demonstrate that the proposed methods improve the state-of-the-art blind unmixing approaches by 50% or more using only 0.4% of training pixels. This is a joint work with Bohan Chen (Caltech), Andrea Bertozzi (UCLA), and Jocelyn Chanussot (Grenoble INP)
(TCPL 201)
13:30 - 14:00 Wei Zhu: Structure-preserving machine learning and data-driven structure discovery
Many machine learning and scientific computing tasks, including computer vision and the computational modeling of physical and engineering systems, have intrinsic structures. Empirical studies demonstrate that models incorporating these structures often achieve significantly improved performance. Meanwhile, there is growing interest in discovering structures directly from observational data. In this talk, I will present our recent works on the interplay between structure and data. I will discuss how specific structures can be efficiently embedded into machine learning models and rigorously quantify the resulting performance gains. Furthermore, I will explore techniques for discovering structures, such as conservation laws, integrability, and Lax pairs, from observational physical data.
(TCPL 201)
14:00 - 14:30 Qi Tang: Structure-preserving machine learning for learning dynamical systems
In this talk I will discuss our recent effort to develop structure-preserving ML for dynamical systems. The first part presents a structure-preserving neural ODE framework that accurately captures chaotic dynamics in dissipative systems. We learn the right-hand-side of an ODE by adding the outputs of two networks together, one learning a linear term and the other a nonlinear term. The architecture is inspired by the inertial manifold theorem. We apply this method to chaotic trajectories of the Kuramoto-Sivashinsky equation, where our model keeps long-term trajectories on the attractor and remains robust to noisy initial conditions. The second part explores structure-preserving ML for singularly perturbed dynamical systems. A powerful tool to address these systems is the Fenichel normal form, which significantly simplifies fast dynamics near slow manifolds. I will discuss a novel realization of this concept using ML. Specifically, a fast-slow neural network is proposed, enforcing the existence of a trainable, attractive invariant slow manifold as a hard constraint.
(TCPL 201)
14:30 - 15:00 Nicholas Boffi: Stochastic interpolants: from generative modeling to generative science and engineering
While diffusion-based generative models have achieved state-of-the-art performance across diverse data modalities, their design remains largely empirical, lacking a systematic framework for use in domain-specific applications across science and engineering. In this talk, I will introduce a mathematical framework that unifies flows and diffusions, substantially expanding the design space of flow-based generative models. To this end, I will define a process called a stochastic interpolant that establishes an exact connection between two arbitrary probability measures in finite time. I will then show how this construction enables efficient learning of generative models described by ordinary or stochastic differential equations. Empirically, I will demonstrate that our approach outperforms diffusion models in high-resolution image synthesis at no additional computational cost, and that it enjoys the flexibility to trade computational budget for sample quality at inference time. I will further illustrate how to apply the framework to design tailored generative models for problems in inverse imaging and probabilistic forecasting of turbulent fluids. Building on this foundation, I will conclude by describing our recent extension of the framework to learning the flow map of an ordinary differential equation, which avoids solving a differential equation at inference time and can generate samples in a single neural network evaluation. This dramatically improves the efficiency of modern generative models and makes feasible real-time engineering applications such as robotic planning and control.
(TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:00 Jue Yan: Conservative cell-average-based neural network method for nonlinear conservation laws
This talk introduces the recently developed Cell-Average-Based Neural Network (CANN). Finite volume schemes inspire this method. It replaces traditional spatial and temporal discretization with a learned explicit one-step method. The well-trained parameters of the network act as the coefficients for the scheme. Unlike conventional numerical methods, the CANN approach is not limited by small time step CFL conditions, enabling significantly larger time steps. This leads to a highly efficient and rapid computational method. We present a conservative version of the CANN method for nonlinear conservation laws. This conservative approach ensures mass conservation and effectively captures relevant physical solutions, including contact discontinuities, shock collisions, and interactions between shocks and rarefaction waves. Additionally, we will discuss recent results related to the bound-preserving neural network method, which maintains L-infinity sense stability for the piecewise constant numerical solution at all time levels.
(TCPL 201)
16:00 - 16:30 Jing Qin: Form-Finding and Physical Property Predictions of Tensegrity Structures Using Deep Neural Networks
In the design of tensegrity structures, traditional form-finding methods utilize kinematic and static approaches to identify geometric configurations that achieve equilibrium. However, these methods often fall short when applied to actual physical models due to imperfections in the manufacturing of structural elements, assembly errors, and material nonlinearities. In this work, we introduce a deep neural network (DNN) approach to predict the geometric configurations and physical properties-such as nodal coordinates, member forces, and natural frequencies-of any tensegrity structures in equilibrium states. First, we outline the analytical governing equations for tensegrity structures, covering statics involving nodal coordinates and member forces, as well as modal information. Next, we propose a data-driven framework for training an appropriate DNN model capable of simultaneously predicting tensegrity forms and physical properties, thereby circumventing the need to solve equilibrium equations. For validation, we analyze three tensegrity structures, including a tensegrity D-bar, prism, and lander, demonstrating that our approach can identify approximation systems with relatively very small output errors. This technique is applicable to a wide range of tensegrity structures, particularly in real-world construction, and can be extended to address additional challenges in identifying structural physics information.
(TCPL 201)
16:30 - 17:00 George Stepaniants: Learning Memory and Material Dependent Constitutive Laws
The simulation of multiscale viscoelastic materials poses a significant challenge in computational materials science, requiring expensive numerical solvers that can resolve dynamics of material deformations at the microscopic scale. The theory of homogenization offers an alternative approach to modeling, by locally averaging the strains and stresses of multiscale materials. This procedure eliminates the smaller scale dynamics but introduces a history dependence between strain and stress that proves very challenging to characterize analytically. In the one-dimensional setting, we give the first full characterization of the memory-dependent constitutive laws that arise in multiscale viscoelastic materials. Using this theory, we develop a neural differential equation architecture, that simultaneously across a wide range of material microstructures, accurately predicts their homogenized constitutive laws, thus enabling us to simulate their deformations under forcing. We use the approximation theory of neural operators to provide guarantees on the generalization of our approach to unseen material samples.
(TCPL 201)
17:00 - 17:30 Tao Wang: Deep ReQU Mode-Informed Learning
This paper proposes a novel deep mode-informed learning approach that utilizes rectifier quadratic unit (ReQU) activated deep neural networks to establish a robust nonlinear mode regression model tailored for time series data. Unlike traditional mean regression approaches, this method emphasizes the mode, the most frequently occurring value in the data distribution, offering improved robustness against outliers and accommodating heavy-tailed distributions. Leveraging the universal approximation properties of deep neural networks, the proposed approach effectively captures intricate temporal dependencies inherent in time series data. To ensure sparsity and mitigate overfitting, the developed approach integrates a mode-oriented regularization mechanism by applying a least absolute shrinkage and selection operator (LASSO) penalty to the ReQU activated neural network weights. The kernel-based objective function, combined with the LASSO regularization, is optimized using an iterative NewtonRaphson algorithm designed to account for temporal dependencies in the data. The hyperparameters are selected through a mode-oriented cross-validation procedure that preserves the temporal structure of observations, ensuring the integrity of time-dependent relationships. The theoretical properties of the model, including selection consistency and universal approximation with sparsity, are rigorously analyzed. Monte Carlo simulations are conducted to assess the finite sample properties of the proposed approach, demonstrating its superior performance under various scenarios. Finally, the method is applied to forecasting global temperatures under the intermediate RCP 4.5 emissions scenario, predicting that a temperature increase of +1.5◦C is the most probable 10-year ahead outcome
(TCPL 201)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building.
(Vistas Dining Room)
Friday, June 27
07:00 - 09:00 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 10:00 Open Discussion
The organizers plan to coordinate an open discussion based on workshop presentations and topics suggested by participants.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:00 Checkout by 11AM
5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM.
(Front Desk - Professional Development Centre)
11:00 - 11:30 Perspectives on Future Research
The organizers plan to coordinate an open discussion on future research directions.
(TCPL 201)
11:30 - 13:30 Lunch from 11:30 to 13:30 (Vistas Dining Room)