Schedule for: 25w5389 - Machine Learning and Statistics: From Theory to Practice

Beginning on Sunday, January 12 and ending Friday January 17, 2025

All times in Chennai, India time, MST (UTC-7).

Monday, January 13
09:00 - 10:10 Tara Javidi: Quantitative Group Testing: Easy to State and Conjecture, Hard to Close Gaps! (online)
This talk addresses a variant of classical group testing, where an $n$-dimensional binary incident vector with $k$ non-zero entries representing $n$ items $k$ of which are defective. Unlike classical group testing, where tests yield a binary yes/no answer, in quantitative group testing (QGT), each test, indicated by a group/subset of items inspected, returns the count of defective items in the group—providing richer information. Information theoretically, this promises a $\log (k)$ multiplicative reduction in the number of tests. While this sounds rather rudimentary, closing this performance gap with the classical group testing has been full of unexpected technical challenges: Most existing efficient non-adaptive algorithms fall short of the information-theoretic bound by a multiplicative gap of log(k), while adaptive algorithms—despite achieving the log⁡(k) quantitative gain—still miss a factor-2 multiplicative adaptivity gaint over the theoretical lower bound. In this talk, I will discuss the the unexpected challenges we have discovered working on this problem and present our recent work and advances across four classes of algorithms (and their complexities): sparse recovery/coding algorithms, fully-adaptive combinatorial approach, multi-stage tests with tunable adaptation, and non-adaptive algorithms with polynomial complexity that reduce the gap (asymptotically and practically) to the information theoretic bound. This is joint work Dr Mahdi Soleymani at UCSD.
(CMI - Lecture Hall 202)
10:30 - 11:10 Subhonmesh Bose (TBA)
11:10 - 11:50 Vivek Borkar: Reinforcement Learning in Non-Markovian Environments (CMI - Lecture Hall 202)
11:50 - 12:30 Avhishek Chatterjee: Learning the Influence Graph of a High-Dimensional Markov Process with Memory
Motivated by applications in social networks, nervous systems and financial risk analysis, we consider the problem of learning the underlying (directed) influence or causal graph of a high-dimensional multivariate discrete-time Markov process with memory. At any discrete time instant, each observed variable of the multivariate process is a random binary string of a random length, whose statistics are parameterized by an unobservable or hidden $[0,1]$-valued scalar. These hidden scalars corresponding to the variables evolve according to a discrete-time linear stochastic dynamics with memory. This evolution is dictated by the edge weights of an underlying directed influence graph, whose nodes correspond to the variables. We extend an existing algorithm for learning i.i.d. graphical models to this setting and prove a logarithmic (in number of variables or nodes) sample complexity result when the edge weights satisfy a regularity condition. This condition is derived by lower bounding the absolute spectral gap of the Markov process (with memory) in terms of the edge weights by using coupling arguments. This is a joint work with Smita Bagewadi (IIT Madras).
(CMI - Lecture Hall 202)
14:00 - 14:40 Rajasekhar Anguluri: Structure Learning in Critical Infrastructure Networks
Adversarial attacks and a rapidly changing climate disrupt the operations of infrastructure networks, including those in energy, water, manufacturing, and transportation sectors. If these disruptions are not promptly addressed, they can lead to system-wide shutdowns, highlighting the critical need for quick and accurate identification methods. A significant form of disruption arises from changes or perturbations to network edges, such as additions or deletions. Accurately identifying these changes with limited data is the theme of this talk. In the first part, I will discuss an optimization-agnostic method for identifying multiple edge changes using the covariance data of nodal measurements in infrastructure networks obeying equilibrium equations. In the second part, I will present a sparsity-promoting optimization approach to address similar edge change problems, accounting for uncertainties in network parameters and data. Both approaches leverage the inherent network structure encoded in the Laplacian matrix, leading to theoretical results with practical engineering applications (for e.g., power grids).
(CMI - Lecture Hall 202)
14:40 - 15:20 Nikhil Karamchandani: Best Arm Identification in Multi-Armed Bandits: Confidence Intervals and Asymptotic Optimality
We address the challenge of identifying the op- timal arm in a stochastic K-armed bandit scenario with the minimum number of arm pulls, given a predefined error prob- ability (fixed confidence setting). Our focus is on examining the asymptotic behavior (as the target error probability goes to zero) of sample complexity and the distribution of arm weights upon termination, under confidence-interval based algorithms. Specifically, we analyze these for the well-known LUCB algorithm, and introduce a new variant, the LUCB Greedy algorithm. We demonstrate that the upper bounds on the sample complex- ities for both algorithms are asymptotically within a constant factor of the established lower bounds.
(CMI - Lecture Hall 202)
15:40 - 16:20 Krishna Pillutla: Correlated Noise Provably Beats Independent Noise for Differentially Private Learning
Differentially private (DP) learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent empirical work has shown empirically that introducing temporal (anti-)correlations in the noise can greatly improve their utility. In the first part of the talk, I will provide the first clear theoretical separation between both classes of algorithms. Our tight matching upper and lower bounds for linear regression problems demonstrate that correlated noise can be (up to) exponentially better than independent noise. In the second part of the talk, I will describe how to attain provably near-optimal (up to log factors) runtime for correlated noise mechanisms. The key step in the algorithm design relies on a rational approximation to the square root function. In both cases, experiments on private deep learning validate.
(CMI - Lecture Hall 202)
16:20 - 17:00 Bhaswar Bhattacharya: Kernel and Graphical Methods for Comparing Conditional Distributions (online)
In this talk we will discuss various nonparametric methods for comparing conditional distributions based on kernels and nearest-neighbor graphs. The methods can be readily applied to a broad range of problems, ranging from classical nonparametric statistics to modern machine learning. Specifically, we will discuss applications in testing model calibration, regression curve evaluation, and validation of emulators in simulation-based inference. (Joint work with Anirban Chatterjee and Ziang Niu)
(CMI - Lecture Hall 202)
Tuesday, January 14
09:00 - 10:10 Praneeth Netrapalli: Second Order Methods for Bandit Optimization and Control
Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. After giving a brief overview of this area, in this talk, we will describe a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that we call $\kappa$-convex. This class contains a wide range of practically relevant loss functions including linear, quadratic, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQR/LQG problems under a fully adversarial noise model, thereby resolving an open question posed in (Gradu et al. 2020) and (Sun et al. 2023). Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses. Based on joint works with Arun Suggala, Jennifer Sun and Elad Hazan.
(CMI - Lecture Hall 202)
10:30 - 11:10 Ramya Korlakai Vinayak: Towards Pluralistic Alignment: Foundations for Learning Diverse Human Preferences
Large pre-trained models trained on internet-scale data are often not ready for deployment out- of-the-box. They are heavily fine-tuned or aligned using large quantities of human preference data, usually elicited using pairwise comparisons. While aligning an AI/ML model to human preferences or values, it is important to ask whose preference and values we are aligning it to? The current approaches of preference alignment are severely limited due to inherent assumption of uniformity by the preference models. We aim to overcome this limitation by building mathematical foundations for learning diverse human preferences. In this talk, I will present, PAL, a personalize- able reward modelling framework for pluralistic alignment. PAL has modular design that leverages commonalities across users while catering to individual personalization, enabling efficient few-shot generalization. PAL is versatile to be applied to various domains and matches or outperforms state-of-the-art methods on both text-to-text and text-to-image tasks with 100x fewer parameters in practice. I will also present theoretical results on per user sample complexity for generalization and fundamental limitations when there are limited pairwise comparisons. Based on works with Daiwei Chen, Yi Chen, Aniket Rege, Zhi Wang, Geelon So, Greg Canal, Blake Mason, Gokcan Tatli, and Rob Nowak. References: 1. PAL: Pluralistic Alignment Framework for learning from heterogeneous preferences (preprint, 2024) 2. One-for-all: Simultaneous metric and preference learning (NeurIPS 2022) 3. Metric learning via limited pairwise comparisons (UAI 2024), and 4. Learning Populations of Preferences via pairwise comparisons (AISTATS 2024).
(CMI - Lecture Hall 202)
11:10 - 11:50 Parthe Pandit: Can kernel machines be a viable alternative to deep neural networks?
Deep learning remains an art with several heuristics that do not always translate across application domains. Kernel machines, a classical model in ML, have received renewed attention following the discovery of the Neural Tangent Kernel and its equivalence to wide neural networks. I will present 2 results which show the promise of kernel machines for large scale applications. 1. Data-dependent kernels: https://www.science.org/stoken/author-tokens/ST-1738/full 2. Fast training algorithms: https://arxiv.org/abs/2411.16658
(CMI - Lecture Hall 202)
11:50 - 12:30 Dheeraj Nagaraj: Poisson Midpoint Method: Efficient Discretization for Diffusion Based Models
Diffusion based sampling and generative models have been extremely successful. They are continuous time stochastic processes which are implemented via time discretization. We introduce the Poisson midpoint discretization which trades off bias for variance in the standard Euler-Maruyama discretization to give a quadratic speedup in sampling under general assumptions.
(CMI - Lecture Hall 202)
14:00 - 14:40 Sheetal Kalyani: Flipped Huber: A new additive noise mechanism for differential privacy
The framework of differential privacy protects an individual's privacy while publishing query responses on congregated data. In this work, a new noise addition mechanism for differential privacy is introduced where the noise added is sampled from a hybrid density that resembles Laplace in the centre and Gaussian in the tail. With a sharper centre and light, sub-Gaussian tail, this density has the best characteristics of both distributions. We theoretically analyze the proposed mechanism, and we derive the necessary and sufficient condition in one dimension and a sufficient condition in higher dimensions for the mechanism to guarantee approximate differential privacy. Numerical simulations corroborate the efficacy of the proposed mechanism compared to other existing mechanisms in achieving a better trade-off between privacy and accuracy.
(CMI - Lecture Hall 202)
14:40 - 15:20 Pranay Sharma: Federated Communication-Efficient Multi-Objective Optimization (online)
We study a federated version of multi-objective optimization (MOO), where a single model is trained to optimize multiple objective functions. MOO has been extensively studied in the centralized setting but is less explored in federated or distributed settings. We propose FedCMOO, a novel communication-efficient federated multi-objective optimization (FMOO) algorithm that improves the error convergence performance of the model compared to existing approaches. Unlike prior works, the communication cost of FedCMOO does not scale with the number of objectives, as each client sends a single aggregated gradient, obtained using randomized SVD (singular value decomposition), to the central server. We provide a convergence analysis of the proposed method for smooth non-convex objective functions under milder assumptions than in prior work. In addition, we introduce a variant of FedCMOO that allows users to specify a preference over the objectives in terms of a desired ratio of the final objective values. Through extensive experiments, we demonstrate the superiority of our proposed method over baseline approaches.
(CMI - Lecture Hall 202)
15:20 - 16:00 Pradeep Ravikumar: Wood Wide Models
Foundation models are monolithic models that are trained on a broad set of data, and which are then in principle fine-tuned to various specific tasks. But they are ill-suited to many heterogeneous settings, for instance numeric tabular data, or numeric time-series data, where training a single monolithic model over a large collection of such datasets is not meaningful. For instance, why should numeric times series of stock prices have anything to do with time series comprising the vital signs of an ICU patient? For such settings, we propose the class of wood wide models. The wood wide web is often used to describe an underground network of fungal threads that connect many trees and plants together, which stands in contrast to a large concrete foundation on top of which we might build specialized buildings. Analogously, in contrast to a single foundation model upon which one might build specialized models, we have many smaller wood wide models that all borrow subtler ingredients from each other. But to be able to share nutrients from the wood wide web, trees need a special root based architecture that can connect to these fungal threads. Accordingly, to operationalize wood wide models, we develop a novel neuro-symbolic architecture, that we term "neuro-causal", that uses a synthesis of deep neural models and causal graphical models to automatically infer higher level symbolic information from lower level "raw features", while also allowing for rich relationships among the symbolic variables. Neuro-causal models retain the flexibility of modern deep neural network architectures while simultaneously capturing statistical semantics such as identifiability and causality, which are important to discuss ideal, target representations and their tradeoffs. But most interestingly, these can further form a web of wood wide models when they borrow in part from a shared conceptual ontology, as well as causal mechanisms. We provide conditions under which this entire architecture can be recovered uniquely. We also discuss efficient algorithms and provide experiments illustrating the algorithms in practice.
(CMI - Lecture Hall 202)
16:20 - 17:00 Lalitha Sankar: Understanding Last Layer Retraining Methods for Fair Classification: Theory and Algorithms
Last-layer retraining (LLR) methods have emerged as an efficient framework for ensuring fairness and robustness in deep models. In this talk, we present an overview of existing methods and provide theoretical guarantees for several prominent methods. Under the threat of label noise, either in the class or domain annotations, we show that these naive methods fail. To address these issues, we present a new robust LLR method in the framework of two-stage corrections and demonstrate that it achieves SOTA performance under domain label noise with minimal data overhead.
(CMI - Lecture Hall 202)
Wednesday, January 15
09:00 - 10:10 Ramji Venkataramanan: Estimation in Generalized Linear Models: From Spectral Methods to Approximate Message Passing (and back)
Generalized linear models (GLMs) are widely used in statistics and machine learning for regression and classification. Important special cases include linear regression, logistic regression and phase retrieval. In this talk, we discuss two classes of estimators for GLMs: spectral estimators and estimators based on Approximate Message Passing (AMP). Spectral methods provide simple yet effective estimators for many GLMs, serving as a warm start for other iterative algorithms. We will review results on the asymptotic performance of spectral estimators for i.i.d. designs, and then show i) how spectral estimators can be combined with AMP, and ii) how AMP can be used to obtain exact asymptotics for spectral estimators in the more challenging (and practical) setting of correlated designs. Time permitting, we will also discuss Bayes-optimal estimation in GLMs and how they can be achieved with efficient algorithms. The talk is based on joint works with Marco Mondelli, Yihan Zhang, Hong Chang Ji, and Pablo Pascual Cobo.
(CMI - Lecture Hall 202)
10:10 - 10:50 Vincent Tan: Best Arm Identification with Minimal Regret
Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This innovative variant of the multi-armed bandit problem elegantly amalgamates two of its most ubiquitous objectives: regret minimization and BAI. More precisely, the agent's goal is to identify the best arm with a prescribed confidence level δ, while minimizing the cumulative regret up to the stopping time. Focusing on single-parameter exponential families of distributions, we leverage information-theoretic techniques to establish an instance-dependent lower bound on the expected cumulative regret. Moreover, we present an intriguing impossibility result that underscores the tension between cumulative regret and sample complexity in fixed-confidence BAI. Complementarily, we design and analyze the Double KL-UCB algorithm, which achieves asymptotic optimality as the confidence level tends to zero. Notably, this algorithm employs two distinct confidence bounds to guide arm selection in a randomized manner. Our findings elucidate a fresh perspective on the inherent connections between regret minimization and BAI. https://arxiv.org/abs/2409.18909
(CMI - Lecture Hall 202)
10:50 - 12:30 Poster session (CMI - Lecture Hall 202)
Thursday, January 16
09:00 - 10:10 Anand Sarwate: Exploring strange neu(ral network) worlds with well-worn tools
Machine learning practice "in the wild" has drifted farther and farther from tractable theory. This talk takes a cue from system engineering and think of ML practices as phenomena to understand rather than design. In this view, even old tools and frameworks can shed some insights into what is happening before, during, and after training. I will describe some initial forays within this reframing using random matrices, kernel machines, and even PCA. These new settings still require teaching "old dogs" some "new tricks" but there are still many more questions left to explore.
(CMI - Lecture Hall 202)
10:30 - 11:10 Somabha Mukherjee: Least Squares Estimation of a Multivariate Quasiconvex Regression Function
Nonparametric least squares estimation of a multivariate function based on the economic axiom of quasiconvexity is fundamentally different from least-squares estimation under the classical shape constraints of monotonicity and convexity, because unlike the latter two shape constraint problems, the least squares constraint space for the former problem is not convex. In this talk, I will show how to construct a quasiconvex function estimate through a mixed integer quadratic optimization technique, and discuss about the consistency and finite sample risk bounds of the proposed estimate. Towards the end, I will also illustrate the performance of this method on simulated and two real life datasets.
(CMI - Lecture Hall 202)
11:10 - 11:50 Andrew Thangaraj: Investigating the missing parts of distributions from observed samples
Suppose we observe a sequence of samples from a very large alphabet and the number of samples is much lesser than the alphabet size. Several letters from the alphabet will be missing in the observed samples. What can be inferred about the distribution's mass on the missing letters? The sum of the masses on all missing letters is the classical missing mass. When the samples are iid, the famous Good-Turing estimator is minimax optimal over all distributions and alphabet sizes. In this talk, we will discuss the estimation of missing mass in Markov samples, present a windowed version of the Good-Turing estimator and show that it is near-optimal in the minimax sense. Going beyond missing mass, we will present and discuss missing g-mass that can potentially reveal additional structure of the missing part of the distribution from observed samples. We will close with some interesting open problems in the area.
(CMI - Lecture Hall 202)
11:50 - 12:30 Lekshmi Ramesh (TBA)
14:00 - 14:40 Gowtham Raghunath Kurri: Fractional Subadditivity of Submodular Functions: Equality Conditions and Their Applications
Submodular functions are known to satisfy various forms of fractional subadditivity. In this work, we investigate the conditions for equality to hold exactly or approximately in fractional subadditivity. We establish that a small inequality gap implies that the function is close to being modular, and that the gap is zero if and only if the function is modular. We also present the natural implications of these results for special cases of submodular functions, such as entropy, relative entropy, and matroid rank. As a consequence, we characterize the necessary and sufficient conditions for equality in Shearer's lemma, recovering a result of Ellis et al. (2016) as a special case. We leverage our results to propose a new multivariate mutual information, which generalizes Watanabe's total correlation (1960) and Han's dual total correlation (1978), and analyze its properties. Among these properties, we extend Watanabe's characterization of total correlation as the maximum correlation over partitions to fractional partitions. When applied to matrix determinantal inequalities for positive definite matrices, our results recover the equality conditions of the classical determinantal inequalities of Hadamard, Szasz, and Fischer as special cases.
(CMI - Lecture Hall 202)
14:40 - 15:20 Lalitha Vadlamani: Codes for Distributed Gradient Descent
In a distributed gradient descent problem, a gradient computation job is divided into multiple parallel tasks, which are computed on different servers, and the job is finished when all the tasks are complete. In this framework, a subset of straggling servers form a bottleneck to the efficient execution of the gradient descent. Gradient coding ensures efficient distributed gradient computation even in the presence of stragglers by utilizing coding theoretic techniques. We will introduce two variants of gradient coding and present results in these settings: (i) Delayed start of tasks corresponding to a subset of servers is allowed (ii) a form of approximate gradient coding where only sum of a fraction of the gradients need to be recovered.
(CMI - Lecture Hall 202)
15:40 - 16:20 Amir R. Asadi: Hierarchical Learning: An Entropy-Based Approach (online)
Machine learning, the predominant approach in the field of artificial intelligence, enables computers to learn from data and experience. In the supervised learning framework, accurate and efficient learning of dependencies between data instances and their corresponding labels requires auxiliary information about the data distribution and the target function. This central concept aligns with the notion of regularization in statistical learning theory. Real-world datasets are often characterized by multiscale data instance distributions and well-behaved, smooth target functions. Scale-invariant probability measures, such as power-law distributions, provide notable examples of multiscale data instance distributions in various contexts. In this talk, we introduce a hierarchical learning model that leverages such a multiscale data structure with a multiscale entropy-based training procedure. The statistical and computational advantages of the model, inspired by the logical progression in human learning from easy to complex tasks, are discussed. The multiscale analysis of the statistical risk yields stronger guarantees compared to conventional uniform convergence bounds.
(CMI - Lecture Hall 202)
16:20 - 17:00 Shahab Asoodeh: Locally Private Samplers: Minimax Optimality for General f-Divergences (online)
The problem of sampling under local differential privacy has recently gained attention due to its potential applications in generative models. However, a thorough understanding of the privacy-utility trade-off in this context is still lacking. In this talk, I’ll discuss the minimax optimality of locally private sampling using f-divergences. I’ll demonstrate that this setup corresponds to the limiting case of distribution estimation under “user-level” local differential privacy, where each user has access to a large amount of data. As the main result of the talk, I’ll present families of optimal sampling mechanisms for both discrete and continuous domains. Remarkably, these samplers are universally optimal across all f-divergences, distinguishing sampling from typical learning problems. If time permits, I’ll also discuss how non-private data can be integrated into private sampling problems. I’ll conclude the talk by highlighting several open questions in the area of private sampling.
(CMI - Lecture Hall 202)
Friday, January 17
09:00 - 10:10 Rajesh Sundaresan: Statistical principles in the design of serosurveys: From practice to theory to practice
SARS-CoV-2 Infections were near their first peak in Karnataka during September 2020 when a statewide COVID-19 serosurvey was conducted. For accurate total disease-burden estimation during such periods, both the active infection and the seroprevalence of antibodies to the virus must be estimated. This requires the use of multiple tests, e.g., antigen and RT-PCR tests for active infection estimation, and serology for antibody prevalence estimation. We will discuss the challenges in combining data from multiple tests, the science of optimal design, what ought to have been the design, and how this optimal design was used in the second survey in January-February 2021. The talk will be based on joint work with collaborators from the Indian Institute of Public Health, Indian Statistical Institute, Strand Life Sciences, and the Indian Institute of Science.
(CMI - Lecture Hall 202)
10:30 - 11:10 Debasis Kundu: Some Non-Linear Model: Robust Estimation
In this presentation we consider several non-linear models which have significant applications in the statistical signal processing, mathemati- cal finance, optics, biology etc. For these models extensive work have been done in developing several efficient procedures and establishing their properties. The least squares estimators (LSEs) are known to be the most efficient estimators in presence of additive noise. But it is observed that the LSEs are quite sensitive in presence of outliers. We propose to use the weighted least squares estimators (WLSEs) of the unknown parameters in presence of additive white noise. It is observed that in presence of outliers, the WLSEs are more robust than the least squares estimators (LSEs) and they behave very similarly to some of the other well known robust estimators, for example the least absolute devi- ation estimators (LADEs) or Huber-M estimators. It is observed that developing the properties of the LADEs or Huber-M estimators is not immediate. We derive the consistency and asymptotic normality properties of the proposed WLSEs. Extensive simulations have been performed to show the effectiveness of the proposed method. Synthetic data sets have been analyzed to illustrate how the proposed method can be used in practice.
(CMI - Lecture Hall 202)
11:10 - 11:50 Chandra Murthy: A Probably Approximately Correct Analysis of Group Testing Algorithms
The goal of group testing, also called pool testing, is to successfully identify a set of k defectives from a population of n items using only m (< n) group tests. In each group test, a subset of the n items is tested together as dictated by a pooling protocol. The outcome of a group test is negative if and only if none of the defective items participate in that group test; it is positive otherwise. When k << n, group testing can help significantly reduce the number of tests needed. We will present sufficiency bounds on the number of tests for well-known Boolean non-adaptive group testing algorithms with random pooling. We view the group testing problem through the lens of a function learning problem and formulate it in a probably approximately correct (PAC) analysis framework. This enables us to characterize our sufficiency bounds by a confidence parameter and an approximation error tolerance parameter. Our resulting sufficiency bounds provide a finer perspective of the random-pooling-based group testing algorithms by separately accounting for the randomness in the pooling protocol and the defective set identification errors (approximation error tolerance). We show the equivalence between the PAC learning and group testing problems and that our bounds reduce to the existing bounds in the literature, when the approximation error tolerance is set to zero. In the process, we derive the expected stopping time and the tail probability for a variant of the so-called coupon collector problem, where one is interested in collecting only a subset of the coupons, a result that may be of independent interest.
(CMI - Lecture Hall 202)
11:50 - 12:30 Sourish Das: Jacobi Prior: An Alternative Bayesian Method for Supervised Learning
The 'Jacobi prior' is an alternative Bayesian method for predictive models. It performs better than well-known methods such as Lasso, Ridge, Elastic Net, and MCMC-based Horse-Shoe Prior, particularly in terms of prediction accuracy and run-time. This method is implemented for Gaussian process classification, adeptly handling a nonlinear decision boundary. The Jacobi prior demonstrates its capability to manage partitioned data across global servers, making it highly useful in distributed computing environments. Additionally, we show that the Jacobi prior is more than a hundred times faster than these methods while maintaining similar predictive accuracy. As the method is both fast and accurate, it is advantageous for organisations looking to reduce their environmental impact and meet ESG standards. To demonstrate the effectiveness of the Jacobi prior, we conducted a detailed simulation study with four experiments focusing on statistical consistency, accuracy, and speed. We also present two empirical studies: the first evaluates credit risk by analysing default probability using data from the U.S. Small Business Administration (SBA), and the second uses the Jacobi prior for classifying stars, quasars, and galaxies in a three-class problem using multinomial logit regression on data from the Sloan Digital Sky Survey. Different filters were used as features in this study. All codes and datasets for this paper are available in the following GitHub repository [https://github.com/sourish-cmi/Jacobi-Prior/?tab=readme-ov-file]
(CMI - Lecture Hall 202)