Professor of Statistics at North Carolina State University and co-founder of ResearchersOne.

Submitted on 2020-08-14

C. Cunen, N. Hjort, and T. Schweder published a comment on our paper, Satellite conjunction analysis and the false confidence theorem. Here is our response to their comment.

Submitted on 2020-08-14

The inferential model (IM) framework produces data-dependent, non-additive degrees of belief about the unknown parameter that are provably valid. The validity property guarantees, among other things, that inference procedures derived from the IM control frequentist error rates at the nominal level. A technical complication is that IMs are built on a relatively unfamiliar theory of random sets. Here we develop an alternative---and practically equivalent---formulation, based on a theory of possibility measures, which is simpler in many respects. This new perspective also sheds light on the relationship between IMs and Fisher's fiducial inference, as well as on the construction of optimal IMs.

Predicting the response at an unobserved location is a fundamental problem in spatial statistics. Given the difficulty in modeling spatial dependence, especially in non-stationary cases, model-based prediction intervals are at risk of misspecification bias that can negatively affect their validity. Here we present a new approach for model-free spatial prediction based on the *conformal prediction* machinery. Our key observation is that spatial data can be treated as exactly or approximately exchangeable in a wide range of settings. For example, when the spatial locations are deterministic, we prove that the response values are, in a certain sense, locally approximately exchangeable for a broad class of spatial processes, and we develop a local spatial conformal prediction algorithm that yields valid prediction intervals without model assumptions. Numerical examples with both real and simulated data confirm that the proposed conformal prediction intervals are valid and generally more efficient than existing model-based procedures across a range of non-stationary and non-Gaussian settings.

Submitted on 2020-01-23

A fundamental problem in statistics and machine learning is that of using observed data to predict future observations. This is particularly challenging for model-based approaches because often the goal is to carry out this prediction with no or minimal model assumptions. For example, the inferential model (IM) approach is attractive because it has certain validity guarantees, but requires specification of a parametric model. Here we show that a new perspective on a recently developed generalized IM approach can be applied to construct an IM for prediction that satisfies the desirable validity guarantees without specification of a model. One important special case of this approach corresponds to the powerful conformal prediction framework and, consequently, the desirable properties of conformal prediction follow immediately from the general IM validity theory. Several numerical examples are presented to illustrate the theory and highlight the method's performance and flexibility.

Submitted on 2019-11-29

Inferential challenges that arise when data are censored have been extensively studied under the classical frameworks. In this paper, we provide an alternative generalized inferential model approach whose output is a data-dependent plausibility function. This construction is driven by an association between the distribution of the relative likelihood function at the interest parameter and an unobserved auxiliary variable. The plausibility function emerges from the distribution of a suitably calibrated random set designed to predict that unobserved auxiliary variable. The evaluation of this plausibility function requires a novel use of the classical Kaplan--Meier estimator to estimate the censoring rather than the event distribution. We prove that the proposed method provides valid inference, at least approximately, and our real- and simulated-data examples demonstrate its superior performance compared to existing methods.

Submitted on 2019-10-16

Bias resulting from model misspecification is a concern when predicting insurance claims. Indeed, this bias puts the insurer at risk of making invalid or unreliable predictions. A method that could provide provably valid predictions uniformly across a large class of possible distributions would effectively eliminate the risk of model misspecification bias. Conformal prediction is one such method that can meet this need, and here we tailor that approach to the typical insurance application and show that the predictions are not only valid but also efficient across a wide range of settings.

Submitted on 2019-09-30

Whether the predictions put forth prior to the 2016 U.S. presidential election were right or wrong is a question that led to much debate. But rather than focusing on right or wrong, we analyze the 2016 predictions with respect to a core set of {\em effectiveness principles}, and conclude that they were ineffective in conveying the uncertainty behind their assessments. Along the way, we extract key insights that will help to avoid, in future elections, the systematic errors that lead to overly precise and overconfident predictions in 2016. Specifically, we highlight shortcomings of the classical interpretations of probability and its communication in the form of predictions, and present an alternative approach with two important features. First, our recommended predictions are safer in that they come with certain guarantees on the probability of an erroneous prediction; second, our approach easily and naturally reflects the (possibly substantial) uncertainty about the model by outputting *plausibilities* instead of *probabilities*.

Submitted on 2019-09-30

Meta-analysis based on only a few studies remains a challenging problem, as an accurate estimate of the between-study variance is apparently needed, but hard to attain, within this setting. Here we offer a new approach, based on the *generalized inferential model* framework, whose success lays in marginalizing out the between-study variance, so that an accurate estimate is not essential. We show theoretically that the proposed solution is at least approximately valid, with numerical results suggesting it is, in fact, nearly exact. We also demonstrate that the proposed solution outperforms existing methods across a wide range of scenarios.

Submitted on 2019-07-19

In the context of predicting future claims, a fully Bayesian analysis---one that specifies a statistical model, prior distribution, and updates using Bayes's formula---is often viewed as the gold-standard, while Buhlmann's credibility estimator serves as a simple approximation. But those desirable properties that give the Bayesian solution its elevated status depend critically on the posited model being correctly specified. Here we investigate the asymptotic behavior of Bayesian posterior distributions under a misspecified model, and our conclusion is that misspecification bias generally has damaging effects that can lead to inaccurate inference and prediction. The credibility estimator, on the other hand, is not sensitive at all to model misspecification, giving it an advantage over the Bayesian solution in those practically relevant cases where the model is uncertain. This begs the question: does robustness to model misspecification require that we abandon uncertainty quantification based on a posterior distribution? Our answer to this question is *No*, and we offer an alternative *Gibbs posterior* construction. Furthermore, we argue that this Gibbs perspective provides a new characterization of Buhlmann's credibility estimator.

An inferential model encodes the data analyst's degrees of belief about an unknown quantity of interest based on the observed data, posited statistical model, etc. Inferences drawn based on these degrees of belief should be reliable in a certain sense, so we require the inferential model to be *valid*. The construction of valid inferential models based on individual pieces of data is relatively straightforward, but how to combine these so that the validity property is preserved? In this paper we analyze some common combination rules with respect to this question, and we conclude that the best strategy currently available is one that combines via a certain dimension reduction step before the inferential model construction.

Submitted on 2019-03-03

In this paper we adopt the familiar sparse, high-dimensional linear regression model and focus on the important but often overlooked task of prediction. In particular, we consider a new empirical Bayes framework that incorporates data in the prior in two ways: one is to center the prior for the non-zero regression coefficients and the other is to provide some additional regularization. We show that, in certain settings, the asymptotic concentration of the proposed empirical Bayes posterior predictive distribution is very fast, and we establish a Bernstein--von Mises theorem which ensures that the derived empirical Bayes prediction intervals achieve the targeted frequentist coverage probability. The empirical prior has a convenient conjugate form, so posterior computations are relatively simple and fast. Finally, our numerical results demonstrate the proposed method's strong finite-sample performance in terms of prediction accuracy, uncertainty quantification, and computation time compared to existing Bayesian methods.

Submitted on 2019-02-03

Statistics has made tremendous advances since the times of Fisher, Neyman, Jeffreys, and others, but the fundamental and practically relevant questions about probability and inference that puzzled our founding fathers remain unanswered. To bridge this gap, I propose to look beyond the two dominating schools of thought and ask the following three questions: what do scientists need out of statistics, do the existing frameworks meet these needs, and, if not, how to fill the void? To the first question, I contend that scientists seek to convert their data, posited statistical model, etc., into calibrated degrees of belief about quantities of interest. To the second question, I argue that any framework that returns additive beliefs, i.e., probabilities, necessarily suffers from *false confidence*---certain false hypotheses tend to be assigned high probability---and, therefore, risks systematic bias. This reveals the fundamental importance of *non-additive beliefs* in the context of statistical inference. But non-additivity alone is not enough so, to the third question, I offer a sufficient condition, called *validity*, for avoiding false confidence, and present a framework, based on random sets and belief functions, that provably meets this condition. Finally, I discuss characterizations of p-values and confidence intervals in terms of valid non-additive beliefs, which imply that users of these classical procedures are already following the proposed framework without knowing it.

Submitted on 2018-12-05

Nonparametric estimation of a mixing density based on observations from the corresponding mixture is a challenging statistical problem. This paper surveys the literature on a fast, recursive estimator based on the *predictive recursion* algorithm. After introducing the algorithm and giving a few examples, I summarize the available asymptotic convergence theory, describe an important semiparametric extension, and highlight two interesting applications. I conclude with a discussion of several recent developments in this area and some open problems.

Submitted on 2018-12-05

Bayesian methods provide a natural means for uncertainty quantification, that is, credible sets can be easily obtained from the posterior distribution. But is this uncertainty quantification valid in the sense that the posterior credible sets attain the nominal frequentist coverage probability? This paper investigates the frequentist validity of posterior uncertainty quantification based on a class of empirical priors in the sparse normal mean model. In particular, we show that our marginal posterior credible intervals achieve the nominal frequentist coverage probability under conditions slightly weaker than needed for selection consistency and a Bernstein--von Mises theorem for the full posterior, and numerical investigations suggest that our empirical Bayes method has superior frequentist coverage probability properties compared to other fully Bayes methods.

This article describes how the filtering role played by peer review may actually be harmful rather than helpful to the quality of the scientific literature. We argue that, instead of trying to filter out the low-quality research, as is done by traditional journals, a better strategy is to let everything through but with an acknowledgment of the uncertain quality of what is published, as is done on the RESEARCHERS.ONE platform. We refer to this as "scholarly mithridatism." When researchers approach what they read with doubt rather than blind trust, they are more likely to identify errors, which protects the scientific community from the dangerous effects of error propagation, making the literature stronger rather than more fragile.

Submitted on 2018-09-04

In a Bayesian context, prior specification for inference on monotone densities is conceptually straightforward, but proving posterior convergence theorems is complicated by the fact that desirable prior concentration properties often are not satisfied. In this paper, I first develop a new prior designed specifically to satisfy an empirical version of the prior concentration property, and then I give sufficient conditions on the prior inputs such that the corresponding empirical Bayes posterior concentrates around the true monotone density at nearly the optimal minimax rate. Numerical illustrations also reveal the practical benefits of the proposed empirical Bayes approach compared to Dirichlet process mixtures.

Submitted on 2018-09-04

Accurate estimation of value-at-risk (VaR) and assessment of associated uncertainty is crucial for both insurers and regulators, particularly in Europe. Existing approaches link data and VaR indirectly by first linking data to the parameter of a probability model, and then expressing VaR as a function of that parameter. This indirect approach exposes the insurer to model misspecification bias or estimation inefficiency, depending on whether the parameter is finite- or infinite-dimensional. In this paper, we link data and VaR directly via what we call a discrepancy function, and this leads naturally to a Gibbs posterior distribution for VaR that does not suffer from the aforementioned biases and inefficiencies. Asymptotic consistency and root-*n* concentration rate of the Gibbs posterior are established, and simulations highlight its superior finite-sample performance compared to other approaches.

Submitted on 2018-08-31

Inference on parameters within a given model is familiar, as is ranking different models for the purpose of selection. Less familiar, however, is the quantification of uncertainty about the models themselves. A Bayesian approach provides a posterior distribution for the model but it comes with no validity guarantees, and, therefore, is only suited for ranking and selection. In this paper, I will present an alternative way to view this model uncertainty problem, through the lens of a valid inferential model based on random sets and non-additive beliefs. Specifically, I will show that valid uncertainty quantification about a model is attainable within this framework in general, and highlight the benefits in a classical signal detection problem.

Submitted on 2018-07-24

Confidence is a fundamental concept in statistics, but there is a tendency to misinterpret it as probability. In this paper, I argue that an intuitively and mathematically more appropriate interpretation of confidence is through belief/plausibility functions, in particular, those that satisfy a certain validity property. Given their close connection with confidence, it is natural to ask how a valid belief/plausibility function can be constructed directly. The inferential model (IM) framework provides such a construction, and here I prove a complete-class theorem stating that, for every nominal confidence region, there exists a valid IM whose plausibility regions are contained by the given confidence region. This characterization has implications for statistics understanding and communication, and highlights the importance of belief functions and the IM framework.

© 2018-2020 Researchers.One