# Schedule for: 24w5308 - New Directions in Machine Learning Theory

Beginning on Sunday, October 20 and ending Friday October 25, 2024

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, October 20 | |
---|---|

09:00 - 10:00 | Placeholder (Front Desk - Professional Development Centre) |

16:00 - 17:30 | Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre) |

17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |

20:00 - 22:00 |
Informal gathering ↓ Informal Meet and Greet at BIRS Lounge (Professional Development Centre 2nd Floor) (Other (See Description)) |

Monday, October 21 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201) |

09:00 - 09:30 |
Jamie Morgenstern: What governs predictive disparity in modern machine learning applications? ↓ The deployment of statistical models in impactful environments is far from new—simple correlations have been used to guide decisions throughout the sciences, health care, political campaigns, and in pricing financial instruments and other products for decades. Many such models, and the decisions they supported, were known to have different degrees of predictive power for different demographic groups. These differences had numerous sources, including: limited expressiveness of the statistical models; limited availability of data from marginalized populations; noisier measurements of both features and targets from certain populations; and features with less mutual information about the prediction target for some populations than others.
Modern decision systems which use machine learning are more ubiquitous than ever, as are their differences in performance for different populations of people. In this talk, I will discuss some similarities and differences in the sources of differing performance in contemporary ML systems including facial recognition systems and those incorporating generative AI. (TCPL 201) |

09:30 - 10:00 |
Tijana Zrnic: Prediction-Powered Inference ↓ From remote sensing to text analysis, machine learning predictions are beginning to substitute for real data when collection of the latter is difficult, slow, or costly. In this talk I will present recent work that permits the use of predictions without compromising the validity of downstream inferences. I will discuss the use of machine learning predictions as substitutes for high-quality data on one hand, and as a tool for guiding real data collection on the other. In both cases, machine learning allows for a significant boost in accuracy compared to classical baselines that do not leverage predictions. (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:00 |
Christian Ikeokwu: Bursting the Filter Bubble: Disincentivizing Echo Chambers in Social Networks ↓ On social networks, algorithmic personalization drives users into filter bubbles where they rarely see content that deviates from their interests. We present a model for content curation and personalization that avoids filter bubbles, along with algorithmic guarantees and nearly matching lower bounds. In our model, the platform interacts with $n$ users over $T$ timesteps, choosing content for each user from $k$ categories. The platform receives stochastic rewards as in a multi-arm bandit.
To avoid filter bubbles, we draw on the intuition that if some users are shown some category of content, then all users should see at least a small amount of that content. We first analyze a naive formalization of this intuition and show it has unintended consequences: it leads to ``tyranny of the majority'' with the burden of diversification borne disproportionately by those with minority interests. This leads us to our model which distributes this burden more equitably. We require that the probability any user is shown a particular type of content is at least $\gamma$ times the average probability all users are shown that type of content. Full personalization corresponds to $\gamma = 0$ and complete homogenization corresponds to $\gamma = 1$; hence, $\gamma$ encodes a hard cap on the level of personalization. We also analyze additional formulations where the platform can exceed its cap but pays a penalty proportional to its constraint violation. We provide algorithmic guarantees for optimizing recommendations subject to these constraints. These include nearly matching upper and lower bounds for the entire range of $\gamma \in [0,1]$ showing that the cumulative reward of a multi-agent variant of the Upper-Confidence-Bound algorithm is nearly optimal. Using real-world preference data, we empirically verify that under our model, users share the burden of diversification and experience only minor utility loss when recommended more diversified content.
This is joint work with Christian Borgs, Jennifer Chayes and Ellen Vitercik (TCPL 201) |

11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:00 - 13:30 |
Luana Ruiz: A Poincaré Inequality and Consistency Results for Signal Sampling on Large Graphs ↓ Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this paper, we introduce a signal sampling theory for a type of graph limit---the graphon. We prove a Poincaré inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks. (TCPL 201) |

13:30 - 14:00 |
Ellen Vitercik: Online matching with graph neural networks ↓ Online Bayesian bipartite matching is a central problem in digital marketplaces and exchanges, including advertising, crowdsourcing, ridesharing, and kidney exchange. We introduce a graph neural network (GNN) approach that emulates the problem's combinatorially complex optimal online algorithm, which selects actions (e.g., which nodes to match) by computing each action's value-to-go (VTG)—the expected weight of the final matching if the algorithm takes that action, then acts optimally in the future. We train a GNN to estimate VTG and show empirically that this GNN returns high-weight matchings across a variety of tasks. Moreover, we identify a common family of graph distributions in spatial crowdsourcing applications, such as rideshare, under which VTG can be efficiently approximated by aggregating information within local neighborhoods in the graphs. This structure matches the local behavior of GNNs, providing theoretical justification for our approach.
This is joint work with Alexandre Hayderi, Amin Saberi, and Anders Wikum. (TCPL 201) |

14:00 - 14:20 |
Group Photo ↓ Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo! (TCPL Foyer) |

14:30 - 15:00 |
Arwen Bradley: Step-by-Step Diffusion ↓ We present an accessible first course on diffusion models for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms. (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |

Tuesday, October 22 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

09:00 - 09:30 |
Simina Brânzei: Dueling over dessert, mastering the art of repeated cake cutting ↓ We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice’s cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler [1995].
We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob’s preferences with increasing precision, thereby securing a disproportionate share of the resource over time.
We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of (1/2, 1/2) in the limit on every trajectory of play, by keeping the other player’s utility to approximately 1/2 on average while guaranteeing they themselves get at least approximately 1/2 on average. We show this theorem using a connection with Blackwell approachability [1956].
Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the
equitable utility profile of (1/2, 1/2) at a rate of O(1/sqrt(T)).
arXiv: https://arxiv.org/abs/2402.08547. (TCPL 201) |

09:30 - 10:00 |
Jon Schneider: Pareto-Optimal Algorithms for Learning in Games ↓ We study the problem of characterizing optimal learning algorithms for playing repeated games against an adversary with unknown payoffs. In this problem, the first player (called the learner) commits to a learning algorithm against a second player (called the optimizer), and the optimizer best-responds by choosing the optimal dynamic strategy for their (unknown but well-defined) payoff. Classic learning algorithms (such as no-regret algorithms) provide some counterfactual guarantees for the learner, but might perform much more poorly than other learning algorithms against particular optimizer payoffs.
In this talk, we will introduce the notion of asymptotically Pareto-optimal learning algorithms. Intuitively, if a learning algorithm is Pareto-optimal, then there is no other algorithm which performs asymptotically at least as well against all optimizers and performs strictly better (by at least Ω(T)) against some optimizer. We show that well-known no-regret algorithms such as Multiplicative Weights and Follow The Regularized Leader are Pareto-dominated. However, while no-regret is not enough to ensure Pareto-optimality, we show that a strictly stronger property, no-swap-regret, is a sufficient condition for Pareto-optimality. Proving these results requires us to address various technical challenges specific to repeated play, including the fact that there is no simple characterization of how optimizers who are rational in the long-term best-respond against a learning algorithm over multiple rounds of play. To address this, we introduce the idea of the asymptotic menu of a learning algorithm: the convex closure of all correlated distributions over strategy profiles that are asymptotically implementable by an adversary. We show that all no-swap-regret algorithms share the same asymptotic menu, implying that all no-swap-regret algorithms are ``strategically equivalent''.
Based on joint work with Eshwar Ram Arunachaleswaran and Natalie Collina. (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:00 |
Yang Cai: Tractable Phi-equilibria in Non-Concave Games ↓ While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to coarse correlated equilibria in games where each agent’s utility is concave in their own strategy, this is not the case when utilities are non-concave — a common scenario in machine learning applications where strategies are parameterized by deep neural networks, or when agents’ utilities are computed by neural networks, or both. Non-concave games introduce significant game-theoretic and optimization challenges: (i) Nash equilibria may not exist; (ii) local Nash equilibria, though they exist, are intractable; and (iii) mixed Nash, correlated, and coarse correlated equilibria generally have infinite support and are intractable. To address these challenges, we revisit the classical solution concept of $\Phi$-equilibria introduced by Greenwald and Jafari, which is guaranteed to exist for an arbitrary set of strategy modifications $\Phi$, even in non-concave games. However, the tractability of $\Phi$-equilibria in such games remains largely unexplored. We initiate the study of tractable $\Phi$-equilibria in non-concave games. In this talk, we will discuss the tractability of $\Phi$-equilibria for several natural families of strategy modifications, presenting both upper and lower bounds. (TCPL 201) |

11:00 - 11:30 |
Eric Mazumdar: Behavioral Economics-Inspired Multi-Agent Learning ↓ Machine learning algorithms are increasingly being deployed into environments in which they must interact with other strategic agents like algorithms and people with potentially misaligned objectives. While the presence of these strategic interactions creates new challenges for learning algorithms, they also give rise to new opportunities for algorithm design. In this talk I will discuss how ideas from economics can give us new insights into the analysis and design of machine learning algorithms for these real-world environments.
In this talk I will focus on equilibrium computation in Markov games. A significant roadblock to the development of principled multi-agent reinforcement learning algorithms is the fact that the underlying problem of equilibrium computation is computationally intractable. To overcome this obstacle, I will take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all n-player matrix and finite-horizon Markov games. In particular, I will show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts I will show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics and present a first analysis of the sample complexity of approximating these equilibria using multi-agent reinforcement learning. (TCPL 201) |

11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:00 - 13:30 |
Nika Haghtalab: Theory of Multi-objective Machine Learning ↓ In this talk, I will introduce the framework of multi-objective learning as a modern generalization of statistical learning.This framework that acts as a unifying paradigm for addressing needs such as robustness, collaboration, and fairness aims to optimize a set of complex and unstructured objectives from only a small amount of sampled data. I will also discuss how the multi-objective learning paradigm relates to the classical and modern considerations in machine learning broadly, such as generalization, introducing technical tools with versatile provable guarantees, and empirical evidence for its performance on existing benchmarks. (TCPL 201) |

13:30 - 14:00 |
Vatsal Sharan: A multigroup perspective to go beyond loss minimization in ML ↓ Minimizing some loss function on average across all datapoints is the dominant paradigm in ML, but applications of ML in societal systems often involve more complex considerations. Different individuals participating in the system may have their own loss functions, it may not be possible to make decisions for these individuals in isolation, and we may care about the model’s behavior on various groups of individuals and not just on average across all of them. I will discuss a multigroup perspective which examines the model's predictions on a large number of groups within the population, and can provide solutions to some of the above problems. (TCPL 201) |

14:00 - 14:30 |
Chara Podimata: Learning in Strategic Environments: from Calibrated Agents to General Information Asymmetry ↓ In this talk, I will discuss learning in principal-agent games where there is information asymmetry between what the principal and the agent know about each other’s chosen actions. I will introduce a generalization of the standard Stackelberg Games (SGs) framework: Calibrated Stackelberg Games (CSGs). In CSGs, a principal repeatedly interacts with an agent who (contrary to standard SGs) does not have direct access to the principal’s action but instead best-responds to calibrated forecasts about it. I will show that in CSGs, the principal can achieve utility that converges to the optimum Stackelberg value of the game (i.e., the value that they could achieve had the agent known the principal’s strategy all along) both in finite and continuous settings, and that no higher utility is achievable. Finally, I will discuss a meta-question: when learning in strategic environments, can agents overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty? And can they do this solely through interactions with each other? (TCPL 201) |

14:30 - 15:00 |
Preetum Nakkiran: Mechanisms of LLM Generalization: A Computational Approach ↓ Large language models in the wild exhibit many surprising behaviors — surprising in part because they involve “out-of-distribution” prompts, which are seemingly very far from anything in their train sets.
I will discuss several recent works which study this type of generalization in a restricted setting: length-generalization on algorithmic tasks. E.g: Can a model trained to solve 10 digit addition problems (and nothing more) automatically generalize to 50 digit addition? For which tasks do we expect this to work, and why? It turns out that taking a careful look at the Transformer architecture can help answer this question, and may provide insight about more powerful types of generalization in the wild. (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |

Wednesday, October 23 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

09:00 - 09:30 |
Mikhail Khodak: Efficiently learning instance-optimal linear system solvers ↓ Augmenting classical algorithms with learned predictions or configurations has found many successful applications in energy management, database systems, scientific computing, and beyond. At the same time it has been theoretically challenging to go beyond the computationally inefficient learning of static predictors and configurations. We study the problem of sequentially solving linear systems, a fundamental problem in numerical computing, and introduce an algorithm that efficiently learns to do instance-optimal linear solver configuration while using only the number of iterations as feedback. Our approach points towards new directions for analyzing iterative methods, designing surrogate objectives for optimizing algorithmic costs, and incorporating tools from online learning into algorithm design. (TCPL 201) |

09:30 - 10:00 |
Etai Littwin: Should we predict in Latent Space in Self-Supervised Learning? ↓ Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architectures (JEPAs) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other. A recent successful approach that falls under the JEPA framework is self-distillation, where an online encoder is trained to predict the output of the target encoder, sometimes with a lightweight predictor network. This is contrasted with the Masked Auto Encoder (MAE) paradigm, where an encoder and decoder are trained to reconstruct missing parts of the input in ambient space rather than its latent representation. A common motivation for using the JEPA approach over MAE is that the JEPA objective prioritizes abstract features over fine-grained pixel information (which can be unpredictable and uninformative). In this talk, I will discuss recent theoretical results regarding the pros and cons of the JEPA approach, and the promise it might hold for learning general purpose representations from unlabeled data. (TCPL 201) |

10:00 - 10:15 | Coffee Break (TCPL Foyer) |

10:15 - 10:45 |
Vasilis Syrgkanis: Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals ↓ We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates. (TCPL 201) |

10:45 - 11:15 |
Manolis Zampetakis: Towards Theoretical Understanding of Extrapolation in Data Science ↓ In a wide range of data analysis problems, across many scientific disciplines, we only have access to biased or censored data due to imperfections in the data collection procedure. To design methods with good performance is such settings we need to provide guarantees for their extrapolation properties. In this talk, we present a general formulation of extrapolation and how it applies in fundamental statistical estimation tasks density estimation and regression. Finally, we discuss how this formulation is related to classical problems in econometrics such as truncation and self-selection. (TCPL 201) |

11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |

13:30 - 17:30 | Free Afternoon (Banff National Park) |

17:30 - 19:30 |
Dinner ↓ |

Thursday, October 24 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ |

09:00 - 09:30 |
Sanjoy Dasgupta: Recent progress on interpretable clustering ↓ The widely-used k-means procedure returns k clusters that
have arbitrary convex shapes. In high dimension, such a clustering
might not be easy to understand. A more interpretable alternative is
to constraint the clusters to be the leaves of a decision tree with
axis-parallel splits; then each cluster is a hyperrectangle given by a
small number of features.
Is it always possible to find clusterings that are intepretable in
this sense and yet have k-means cost that is close to the
unconstrained optimum? A recent line of work has answered this in the
affirmative and moreover shown that these interpretable clusterings
are easy to construct.
I will give a survey of these results: algorithms, methods of
analysis, and open problems. (TCPL 201) |

09:30 - 10:00 |
Eric Balkanski: Fair Secretaries with Unfair Predictions ↓ Algorithms with predictions is a recent framework for decision-making under uncertainty that leverages the power of machine-learned predictions without making any assumption about their quality. The goal in this framework is for algorithms to achieve an improved performance when the predictions are accurate while maintaining acceptable guarantees when the predictions are erroneous. A serious concern with algorithms that use predictions is that these predictions can be biased and, as a result, cause the algorithm to make decisions that are deemed unfair. We show that this concern manifests itself in the classical secretary problem in the learning-augmented setting---the state-of-the-art algorithm can have zero probability of accepting the best candidate, which we deem unfair, despite promising to accept a candidate whose expected value is at least max{Omega(1), 1 - O(ε)} times the optimal value, where ε is the prediction error. We show how to preserve this promise while also guaranteeing to accept the best candidate with probability Omega(1). Our algorithm and analysis are based on a new ``pegging'' idea that diverges from existing works and simplifies/unifies some of their results. Finally, we extend to the k-secretary problem and complement our theoretical analysis with experiments.
Joint work with Will Ma and Andreas Maggiori (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:00 | Shai Ben-David: The misleading promise of fair representations (TCPL 201) |

11:00 - 11:30 |
Bailey Flanigan: Algorithmic tools for targeting sortition ideals ↓ In the past few years, there has been substantial computer science research on sortition, the task of randomly sampling a representative subset of people from a broader population. Multiple tools emerging from this work have already been applied to select participants of citizens' assemblies, and these tools could in principle be used in many other models of participatory governance as well. This talk will give an overview of existing and emerging algorithmic sortition tools, and then we will briefly delve into one tool that is the subject of ongoing research. (TCPL 201) |

11:30 - 13:00 |
Lunch ↓ |

13:00 - 13:30 |
Kasper Green Larsen: Majority-of-Three: The Simplest Optimal Learner? ↓ Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this talk we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.
This work was accepted at COLT’24 and is joint with Ishaq Aden-Ali, Mikael Møller Høgsgaard and Nikita Zhivotovskiy (TCPL 201) |

13:30 - 14:00 |
Tosca Lechner: Inherent Limitations for Characterizing Distribution Learning ↓ We consider the long-standing question of finding a parameter of a class of probability distributions that characterizes its PAC learnability. While for many learning tasks (such as binary classification and online learning) there is a notion of dimension whose finiteness is equivalent to learnability within any level of accuracy, we show, rather surprisingly, that such parameter exists for distribution learning.
Concretely, our results apply for several general notions of characterizing learnability
and for several learning tasks. We show that there is no notion of scale-invariant dimension that characterizes the \emph{sample complexity} of learning distribution classes.
We then consider the weaker requirement of only characterizing learnability (rather than the quantitative sample complexity function). We propose some natural requirements for such a characterizing dimension and go on to show that there exists no characterization of learnability that satisfies these requirements for classes of distributions." (TCPL 201) |

14:00 - 14:30 |
Will Ma: VC Theory vs. Empirical DP ↓ We study offline learning of online policies that face a sequence of n numbers, drawn from an unknown distribution over sequences. Two prototypical examples are Prophet Inequality, where the goal is to stop on a large number, and Inventory Replenishment, where the goal is to control a stockpile to approximately match the next number. For Prophet Inequality (resp. Inventory Replenishment), the class of policies defined by n stopping thresholds (resp. n base stocks) is optimal when the entries of the sequence are assumed to be independent.
We consider PAC learning of best-in-class policies for these classes, with and without the independence assumption. For Prophet Inequality, the sample complexity is Theta(n/eps^2) without independence, which we improve to Theta(1/eps^2) by making the independence assumption. For Inventory Replenishment, the previous sample complexity is O(n/\eps^2) with independence, which we improve to Theta(1/\eps^2) by relaxing (!) the independence assumption. This is because relaxing the independence assumption forces us to use VC theory instead of analyzing the empirical Dynamic Program for the Inventory Replenishment problem, whose fat-shattering dimension is surprisingly small.
Based on two papers:
VC Theory for Inventory Policies: with Yaqi Xie, Linwei Xin
Sample Complexity of Posted Pricing for a Single Item: with Billy Jin, Thomas Kesselheim, Sahil Singla (TCPL 201) |

14:30 - 15:00 |
Sivan Sabato: Interactive Learning with Discriminative Feature Feedback ↓ We study a model of learning with feature-based explanations, that we call Discriminative Feature Feedback.
This model formalizes a natural notion of interactive learning with explanations.
I will discuss learning algorithms and mistake bounds for this model, as well as some open questions.
Based on joint work with Sanjoy Dasgupta, Omri Bar-Oz, Nicholas Roberts and Akansha Dey. (TCPL 201) |

15:00 - 15:30 | Coffee Break (TCPL Foyer) |

15:30 - 16:00 | Karbasi Amin: Statistical indistinguishability of learning algorithms (TCPL 201) |

17:30 - 19:30 |
Dinner ↓ |

Friday, October 25 | |
---|---|

07:00 - 08:45 |
Breakfast ↓ |

09:00 - 09:30 |
Sebastian Bordt: Who is allowed to train on the test set? ↓ A core principle of machine learning is that a model should not be trained on the holdout data used for evaluation. With recent models being trained on internet-scale data, both the implementation and the meaning of this paradigm have come into question. In this talk, I give a brief overview of recent developments in "training on the test set". I will present the results of some experiments that I have recently conduced, and argue that new theory / concepts might be required to properly understand the problem. (TCPL 201) |

09:30 - 10:00 |
Vasilis Kontonis: Beyond Worst-Case Models for Learning from Noisy Labels ↓ ML has shown promise towards solving problems previously thought to be far from reach or impossible. However, evaluating ML models and ensuring their reliability is a challenging task. Provable performance guarantees are crucial to reduce human supervision in deployed ML systems, in defence against data-poisoning attacks, for ensuring out-of-distribution generalization, etc. The huge size of modern datasets naturally limits the investigation to efficient algorithms. In this talk we investigate to what extend we can provide provable guarantees for noisy supervised learning problems using tractable resources. We first discuss learning under semi-random label noise – where the true label of each example is ﬂipped with probability at most 50% and present efficient algorithms. We will then present a new smoothed analysis framework for learning that generalizes previous beyond-worst case learning models such as learning with margin and give new efficient algorithms. In particular, we give an agnostic learner for intersections of $k$ halfspaces with margin that has runtime $2^{polylog k}$. Prior to our work the best known runtime was exponential in $k$. (TCPL 201) |

10:00 - 10:30 | Coffee Break (TCPL Foyer) |

10:30 - 11:00 |
Checkout by 11AM ↓ 5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM. (Front Desk - Professional Development Centre) |

12:00 - 13:30 | Lunch from 11:30 to 13:30 (Vistas Dining Room) |