Jonathan Williams

Jonathan Williams

North Carolina State University

Website

jonathanpw.github.io

Bio

Assistant Professor of Statistics

Articles

The inferential model (IM) framework offers alternatives to the familiar probabilistic (e.g., Bayesian and fiducial) uncertainty quantification in statistical inference. Allowing uncertainty quantification to be imprecise makes exact validity/reliability possible. But is imprecision and exact validity compatible with attainment of statistical efficiency? This paper gives an affirmative answer to this question via a new possibilistic Bernstein--von Mises theorem that parallels a fundamental result in Bayesian inference. Among other things, our result demonstrates that the IM solution is asymptotically efficient in the sense that, asymptotically, its credal set is the smallest that contains the Gaussian distribution with variance equal to the Cramer--Rao lower bound.

A common goal in statistics and machine learning is estimation of unknowns. Point estimates alone are of little value without an accompanying measure of uncertainty, but traditional uncertainty quantification methods, such as confidence sets and p-values, often require strong distributional or structural assumptions that may not be justified in modern problems. The present paper considers a very common case in machine learning, where the quantity of interest is the minimizer of a given risk (expected loss) function. For such cases, we propose a generalized universal procedure for inference on risk minimizers that features a finite-sample, frequentist validity property under mild distributional assumptions. One version of the proposed procedure is shown to be anytime-valid in the sense that it maintains validity properties regardless of the stopping rule used for the data collection process. We show how this anytime-validity property offers protection against certain factors contributing to the replication crisis in science.

Motivated by the need for the development of safe and reliable methods for uncertainty quantification in machine learning, I propose and develop ideas for a model-free statistical framework for imprecise probabilistic prediction inference. This framework facilitates uncertainty quantification in the form of prediction sets that offer finite sample control of type 1 errors, a property shared with conformal prediction sets, but this new approach also offers more versatile tools for imprecise probabilistic reasoning. Furthermore, I propose and consider the theoretical and empirical properties of a precise probabilistic approximation to the model-free imprecise framework. Approximating a belief/plausibility measure pair by an [optimal in some sense] probability measure in the credal set is a critical resolution needed for the broader adoption of imprecise probabilistic approaches to inference in statistical and machine learning communities. It is largely undetermined in the statistical and machine learning literatures, more generally, how to properly quantify uncertainty in that there is no generally accepted standard of accountability of stated uncertainties. The research I present in this manuscript is aimed at motivating a framework for statistical inference with reliability and accountability as the guiding principles.

The parameters of a machine learning model are typically learned by minimizing a loss function on a set of training data. However, this can come with the risk of over training; in order for the model to generalize well, it is of great importance that we are able to find the optimal parameter for the model on the entire population—not only on the given training sample. In this paper, we construct valid confidence sets for this optimal parameter of a machine learning model, which can be generated using only the training data without any knowledge of the population. We then show that studying the distribution of this confidence set allows us to assign a notion of confidence to arbitrary regions of the parameter space, and we demonstrate that this distribution can be well-approximated using bootstrapping techniques.

Word embeddings are a fundamental tool in natural language processing. Currently, word embedding methods are evaluated on the basis of empirical performance on benchmark data sets, and there is a lack of rigorous understanding of their theoretical properties. This paper studies word embeddings from a statistical theoretical perspective, which is essential for formal inference and uncertainty quantification. We propose a copula-based statistical model for text data and show that under this model, the now-classical Word2Vec method can be interpreted as a statistical estimation method for estimating the theoretical pointwise mutual information (PMI). Next, by building on the work of Levy & Goldberg (2014), we develop a missing value-based estimator as a statistically tractable and interpretable alternative to the Word2Vec approach. The estimation error of this estimator is comparable to Word2Vec and improves upon the truncation-based method proposed by Levy & Goldberg (2014). The proposed estimator also performs comparably to Word2Vec in a benchmark sentiment analysis task on the IMDb Movie Reviews data set.

Transfer learning uses a data model, trained to make predictions or inferences on data from one population, to make reliable predictions or inferences on data from another population. Most existing transfer learning approaches are based on fine-tuning pre-trained neural network models, and fail to provide crucial uncertainty quantification. We develop a statistical framework for model predictions based on transfer learning, called RECaST. The primary mechanism is a Cauchy random effect that recalibrates a source model to a target population; we mathematically and empirically demonstrate the validity of our RECaST approach for transfer learning between linear models, in the sense that prediction sets will achieve their nominal stated coverage, and we numerically illustrate the method’s robustness to asymptotic approximations for nonlinear models. Whereas many existing techniques are built on particular source models, RECaST is agnostic to the choice of source model. For example, our RECaST transfer learning approach can be applied to a continuous or discrete data model with linear or logistic regression, deep neural network architectures, etc. Furthermore, RECaST provides uncertainty quantification for predictions, which is mostly absent in the literature. We examine our method’s performance in a simulation study and in an application to real hospital data.

Multistate Markov models are a canonical parametric approach for data mod- eling of observed or latent stochastic processes supported on a finite state space. Continuous-time Markov processes describe data that are observed irregularly over time, as is often the case in longitudinal medical and biological data sets, for exam- ple. Assuming that a continuous-time Markov process is time-homogeneous, a closed- form likelihood function can be derived from the Kolmogorov forward equations – a system of differential equations with a well-known matrix-exponential solution. Un- fortunately, however, the forward equations do not admit an analytical solution for continuous-time, time-inhomogeneous Markov processes, and so researchers and prac- titioners often make the simplifying assumption that the process is piecewise time- homogeneous. In this paper, we provide intuitions and illustrations of the potential biases for parameter estimation that may ensue in the more realistic scenario that the piecewise-homogeneous assumption is violated, and we advocate for a solution for likelihood computation in a truly time-inhomogeneous fashion. Particular focus is afforded to the context of multistate Markov models that allow for state label mis- classifications, which applies more broadly to hidden Markov models (HMMs), and Bayesian computations bypass the necessity for computationally demanding numeri- cal gradient approximations for obtaining maximum likelihood estimates (MLEs).

Historically, a lack of cross-disciplinary communication has led to the development of statistical methods for detecting exoplanets by astronomers, independent of the contemporary statistical literature. The aim of our paper is to investigate the proper- ties of such methods. Many of these methods (both transit- and radial velocity-based) have not been discussed by statisticians despite their use in thousands of astronomical papers. Transit methods aim to detect a planet by determining whether observations of a star contain a periodic component. These methods tend to be overly rudimentary for starlight data and lack robustness to model misspecification. Conversely, radial velocity methods aim to detect planets by estimating the Doppler shift induced by an orbiting companion on the spectrum of a star. Many such methods are unable to detect Doppler shifts on the order of magnitude consistent with Earth-sized planets around Sun-like stars. Modern radial velocity approaches attempt to address this de- ficiency by adapting tools from contemporary statistical research in functional data analysis, but more work is needed to develop the statistical theory supporting the use of these models, to expand these models for multiplanet systems, and to develop methods for detecting ever smaller Doppler shifts in the presence of stellar activity.

A key task in the emerging field of materials informatics is to use machine learning to predict a material's properties and functions. A fast and accurate predictive model allows researchers to more efficiently identify or construct a material with desirable properties. As in many fields, deep learning is one of the state-of-the art approaches, but fully training a deep learning model is not always feasible in materials informatics due to limitations on data availability, computational resources, and time. Accordingly, there is a critical need in the application of deep learning to materials informatics problems to develop efficient {\em transfer learning} algorithms. The Bayesian framework is natural for transfer learning because the model trained from the source data can be encoded in the prior distribution for the target task of interest. However, the Bayesian perspective on transfer learning is relatively unaccounted for in the literature, and is complicated for deep learning because the parameter space is large and the interpretations of individual parameters are unclear. Therefore, rather than subjective prior distributions for individual parameters, we propose a new Bayesian transfer learning approach based on the penalized complexity prior on the Kullback–Leibler divergence between the predictive models of the source and target tasks. We show via simulations that the proposed method outperforms other transfer learning methods across a variety of settings. The new method is then applied to a predictive materials science problem where we show improved precision for estimating the band gap of a material based on its structural properties.

  • Neil Dey
  • Jing Ding
  • Jack Ferrell
  • Carolina Kapper
  • Maxwell Lovig
  • Emiliano Planchon
  • Jonathan Williams

Submitted on 2021-11-04

Modern machine learning algorithms are capable of providing remarkably accurate point-predictions; however, questions remain about their statistical reliability. Unlike conventional machine learning methods, conformal prediction algorithms return confidence sets (i.e., set-valued predictions) that correspond to a given significance level. Moreover, these confidence sets are valid in the sense that they guarantee finite sample control over type 1 error probabilities, allowing the practitioner to choose an acceptable error rate. In our paper, we propose inductive conformal prediction (ICP) algorithms for the tasks of text infilling and part-of-speech (POS) prediction for natural language data. We construct new conformal prediction-enhanced bidirectional encoder representations from transformers (BERT) and bidirectional long short-term memory (BiLSTM) algorithms for POS tagging and a new conformal prediction-enhanced BERT algorithm for text infilling. We analyze the performance of the algorithms in simulations using the Brown Corpus, which contains over 57,000 sentences. Our results demonstrate that the ICP algorithms are able to produce valid set-valued predictions that are small enough to be applicable in real-world applications. We also provide a real data example for how our proposed set-valued predictions can improve machine generated audio transcriptions.

  • Jonathan Williams
  • Gudmund Hermansen
  • Håvard Nygård
  • Govinda Clayton
  • Siri Rustad
  • Håvard Strand

Submitted on 2021-10-08

A crucial challenge for solving problems in conflict research is in leveraging the semi-supervised nature of the data that arise. Observed response data such as counts of battle deaths over time indicate latent processes of interest such as intensity and duration of conflicts, but defining and labeling instances of these unobserved processes requires nuance and imprecision. The availability of such labels, however, would make it possible to study the effect of intervention-related predictors --- such as ceasefires --- directly on conflict dynamics (e.g., latent intensity) rather than through an intermediate proxy like observed counts of battle deaths. Motivated by this problem and the new availability of the ETH-PRIO Civil Conflict Ceasefires data set, we propose a Bayesian autoregressive (AR) hidden Markov model (HMM) framework as a sufficiently flexible machine learning approach for semi-supervised regime labeling with uncertainty quantification. We motivate our approach by illustrating the way it can be used to study the role that ceasefires play in shaping conflict dynamics. This ceasefires data set is the first systematic and globally comprehensive data on ceasefires, and our work is the first to analyze this new data and to explore the effect of ceasefires on conflict dynamics in a comprehensive and cross-country manner.

In this paper, we extend the epsilon admissible subsets (EAS) model selection approach, from its original construction in the high-dimensional linear regression setting, to an EAS framework for performing group variable selection in the high-dimensional multivariate regression setting. Assuming a matrix-Normal linear model we show that the EAS strategy is asymptotically consistent if there exists a sparse, true data generating set of predictors. Nonetheless, our EAS strategy is designed to estimate a posterior-like, generalized fiducial distribution over a parsimonious class of models in the setting of correlated predictors and/or in the absence of a sparsity assumption. The effectiveness of our approach, to this end, is demonstrated empirically in simulation studies, and is compared to other state-of-the-art model/variable selection procedures.

An exciting new algorithmic breakthrough has been advanced for how to carry out inferences in a Dempster-Shafer (DS) formulation of a categorical data generating model. The developed sampling mechanism, which draws on theory for directed graphs, is a clever and remarkable achievement, as this has been an open problem for many decades. In this discussion, I comment on important contributions, central questions, and prevailing matters of the article.

One formulation of forensic identification of source problems is to determine the source of trace evidence, for instance, glass fragments found on a suspect for a crime. The current state of the science is to compute a Bayes factor (BF) comparing the marginal distribution of measurements of trace evidence under two competing propositions for whether or not the unknown source evidence originated from a specific source. The obvious problem with such an approach is the ability to tailor the prior distributions (placed on the features/parameters of the statistical model for the measurements of trace evidence) in favor of the defense or prosecution, which is further complicated by the fact that the typical number of measurements of trace evidence is typically sufficiently small that prior choice/specification has a strong influence on the value of the BF. To remedy this problem of prior specification and choice, we develop an alternative to the BF, within the framework of generalized fiducial inference (GFI), that we term a {\em generalized fiducial factor} (GFF). Furthermore, we demonstrate empirically, on the synthetic and real Netherlands Forensic Institute (NFI) casework data, deficiencies in the BF and classical/frequentist likelihood ratio (LR) approaches.

Conferences

No conferences here!

You haven't subscribed to any conferences yet.

© 2018–2025 Researchers.One